16+ Apache Hadoop Jobs in India
Apply to 16+ Apache Hadoop Jobs on CutShort.io. Find your next job, effortlessly. Browse Apache Hadoop Jobs and apply today!
Roles and Responsibilities:
- Design, develop, and maintain the end-to-end MLOps infrastructure from the ground up, leveraging open-source systems across the entire MLOps landscape.
- Creating pipelines for data ingestion, data transformation, building, testing, and deploying machine learning models, as well as monitoring and maintaining the performance of these models in production.
- Managing the MLOps stack, including version control systems, continuous integration and deployment tools, containerization, orchestration, and monitoring systems.
- Ensure that the MLOps stack is scalable, reliable, and secure.
Skills Required:
- 3-6 years of MLOps experience
- Preferably worked in the startup ecosystem
Primary Skills:
- Experience with E2E MLOps systems like ClearML, Kubeflow, MLFlow etc.
- Technical expertise in MLOps: Should have a deep understanding of the MLOps landscape and be able to leverage open-source systems to build scalable, reliable, and secure MLOps infrastructure.
- Programming skills: Proficient in at least one programming language, such as Python, and have experience with data science libraries, such as TensorFlow, PyTorch, or Scikit-learn.
- DevOps experience: Should have experience with DevOps tools and practices, such as Git, Docker, Kubernetes, and Jenkins.
Secondary Skills:
- Version Control Systems (VCS) tools like Git and Subversion
- Containerization technologies like Docker and Kubernetes
- Cloud Platforms like AWS, Azure, and Google Cloud Platform
- Data Preparation and Management tools like Apache Spark, Apache Hadoop, and SQL databases like PostgreSQL and MySQL
- Machine Learning Frameworks like TensorFlow, PyTorch, and Scikit-learn
- Monitoring and Logging tools like Prometheus, Grafana, and Elasticsearch
- Continuous Integration and Continuous Deployment (CI/CD) tools like Jenkins, GitLab CI, and CircleCI
- Explain ability and Interpretability tools like LIME and SHAP
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
What you get to do in this role:
Work on extremely high scale RUST web services or backend systems.
Design and develop solutions for highly scalable web and backend systems.
Proactively identify and solve performance issues.
Maintain a high bar on code quality and unit testing.
What you bring to the role:
5+ years of hands-on software development experience.
At least 2+ years of RUST development experience.
Knowledge of cargo packages for kafka, redis etc.
Strong CS fundamentals, including system design, data structures and algorithms.
Expertise in backend and web services development.
Good analytical and troubleshooting skills.
What will help you stand out:
Experience working with large scale web services and applications.
Exposure to Golang, Scala or Java
Exposure to Big data systems like Kafka, Spark, Hadoop etc.
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
What you'll do:
Design and development of scalable applications.
Collaborate with tech leads to get maximum understanding of underlying infrastructure.
Contribute to continual improvement by suggesting improvements to the software system.
Ensure high scalability and performance
You will advocate for good, clean, well documented and performing code; follow standards and best practices.
We'd love for you to have:
Education: Bachelor/Master Degree in Computer Science
Experience: 1-3 years of relevant experience in BI/Big-Data with hands-on coding experience
Mandatory Skills
Strong in problem-solving
Good exposure to Big Data technologies, Hive, Hadoop, Impala, Hbase, Kafka, Spark
Strong experience of Data Engineering
Able to comprehend challenges related to Database and Data Warehousing technologies and ability to understand complex design, system architecture
Experience with the software development lifecycle, design, develop, review, debug, document, and deliver (especially in a multi-location organization)
Working knowledge of Java, python
Desired Skills
Experience with reporting tools like Tableau, QlikView
Awareness of CI-CD pipeline
Inclination to work on cloud platform ex:- AWS
Crisp communication skills with team members, Business owners.
Be able to work in a challenging, dynamic environment and meet tight deadlines
- Create and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet business requirements
- Identifying, designing, and implementing internal process improvements including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual processes
- Work with Data, Analytics & Tech team to extract, arrange and analyze data
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS technologies
- Building analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition
- Works closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.
- Working with stakeholders including data, design, product, and executive teams, and assisting them with data-related technical issues
- Working with stakeholders including the Executive, Product, Data, and Design teams to support their data infrastructure needs while assisting with data-related technical issues.
- SQL
- Ruby or Python(Ruby preferred)
- Apache-Hadoop based analytics
- Data warehousing
- Data architecture
- Schema design
- ML
- Prior experience of 2 to 5 years as a Data Engineer.
- Ability in managing and communicating data warehouse plans to internal teams.
- Experience designing, building, and maintaining data processing systems.
- Ability to perform root cause analysis on external and internal processes and data to identify opportunities for improvement and answer questions.
- Excellent analytic skills associated with working on unstructured datasets.
- Ability to build processes that support data transformation, workload management, data structures, dependency, and metadata.
• Create and maintain data pipeline
• Build and deploy ETL infrastructure for optimal data delivery
• Work with various including product, design and executive team to troubleshoot data
related issues
• Create tools for data analysts and scientists to help them build and optimise the product
• Implement systems and process for data access controls and guarantees
• Distill the knowledge from experts in the field outside the org and optimise internal data
systems
Preferred qualifications/skills:
• 5+ years experience
• Strong analytical skills
____ 04
Freight Commerce Solutions Pvt Ltd.
• Degree in Computer Science, Statistics, Informatics, Information Systems
• Strong project management and organisational skills
• Experience supporting and working with cross-functional teams in a dynamic environment
• SQL guru with hands on experience on various databases
• NoSQL databases like Cassandra, MongoDB
• Experience with Snowflake, Redshift
• Experience with tools like Airflow, Hevo
• Experience with Hadoop, Spark, Kafka, Flink
• Programming experience in Python, Java, Scala
The Platform Data Science team works at the intersection of data science and engineering. Domain experts develop and advance platforms, including the data platforms, machine learning platform, other platforms for Forecasting, Experimentation, Anomaly Detection, Conversational AI, Underwriting of Risk, Portfolio Management, Fraud Detection & Prevention and many more. We also are the Data Science and Analytics partners for Product and provide Behavioural Science insights across Jupiter.
About the role:
We’re looking for strong Software Engineers that can combine EMR, Redshift, Hadoop, Spark, Kafka, Elastic Search, Tensorflow, Pytorch and other technologies to build the next generation Data Platform, ML Platform, Experimentation Platform. If this sounds interesting we’d love to hear from you!
This role will involve designing and developing software products that impact many areas of our business. The individual in this role will have responsibility help define requirements, create software designs, implement code to these specifications, provide thorough unit and integration testing, and support products while deployed and used by our stakeholders.
Key Responsibilities:
Participate, Own & Influence in architecting & designing of systems
Collaborate with other engineers, data scientists, product managers
Build intelligent systems that drive decisions
Build systems that enable us to perform experiments and iterate quickly
Build platforms that enable scientists to train, deploy and monitor models at scale
Build analytical systems that drives better decision making
Required Skills:
Programming experience with at least one modern language such as Java, Scala including object-oriented design
Experience in contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
Bachelor’s degree in Computer Science or related field
Computer Science fundamentals in object-oriented design
Computer Science fundamentals in data structures
Computer Science fundamentals in algorithm design, problem solving, and complexity analysis
Experience in databases, analytics, big data systems or business intelligence products:
Data lake, data warehouse, ETL, ML platform
Big data tech like: Hadoop, Apache Spark
-
Responsibilities
- Responsible for implementation and ongoing administration of Hadoop
infrastructure.
- Aligning with the systems engineering team to propose and deploy new
hardware and software environments required for Hadoop and to expand existing
environments.
- Working with data delivery teams to setup new Hadoop users. This job includes
setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig
and MapReduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes using tools like
Ganglia, Nagios, Cloudera Manager Enterprise, Dell Open Manage and other tools
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines
- Screen Hadoop cluster job performances and capacity planning
- Monitor Hadoop cluster connectivity and security
- Manage and review Hadoop log files.
- File system management and monitoring.
- Diligently teaming with the infrastructure, network, database, application and
business intelligence teams to guarantee high data quality and availability
- Collaboration with application teams to install operating system and Hadoop
updates, patches, version upgrades when required.
READ MORE OF THE JOB DESCRIPTION
Qualifications
Qualifications
- Bachelors Degree in Information Technology, Computer Science or other
relevant fields
- General operational expertise such as good troubleshooting skills,
understanding of systems capacity, bottlenecks, basics of memory, CPU, OS,
storage, and networks.
- Hadoop skills like HBase, Hive, Pig, Mahout
- Ability to deploy Hadoop cluster, add and remove nodes, keep track of jobs,
monitor critical parts of the cluster, configure name node high availability, schedule
and configure it and take backups.
- Good knowledge of Linux as Hadoop runs on Linux.
- Familiarity with open source configuration management and deployment tools
such as Puppet or Chef and Linux scripting.
Nice to Have
- Knowledge of Troubleshooting Core Java Applications is a plus.
2. Perform data migration and conversion activities.
3. Develop and integrate software applications using suitable development
methodologies and standards, applying standard architectural patterns, taking
into account critical performance characteristics and security measures.
4. Collaborate with Business Analysts, Architects and Senior Developers to
establish the physical application framework (e.g. libraries, modules, execution
environments).
5. Perform end to end automation of ETL process for various datasets that are
being ingested into the big data platform.
Only a solid grounding in computer engineering, Unix, data structures and algorithms would enable you to meet this challenge. 7+ years of experience architecting, developing, releasing, and maintaining large-scale big data platforms on AWS or GCP Understanding of how Big Data tech and NoSQL stores like MongoDB, HBase/HDFS, ElasticSearch synergize to power applications in analytics, AI and knowledge graphs Understandingof how data processing models, data location patterns, disk IO, network IO, shuffling affect large scale text processing - feature extraction, searching etc Expertise with a variety of data processing systems, including streaming, event, and batch (Spark, Hadoop/MapReduce) 5+ years proficiency in configuring and deploying applications on Linux-based systems 5+ years of experience Spark - especially Pyspark for transforming large non-structured text data, creating highly optimized pipelines Experience with RDBMS, ETL techniques and frameworks (Sqoop, Flume) and big data querying tools (Pig, Hive) Stickler of world class best practices, uncompromising on the quality of engineering, understand standards and reference architectures and deep in Unix philosophy with appreciation of big data design patterns, orthogonal code design and functional computation models |
Good understating or hand's on in Kafka Admin / Apache Kafka Streaming.
Implementing, managing, and administering the overall hadoop infrastructure.
Takes care of the day-to-day running of Hadoop clusters
A hadoop administrator will have to work closely with the database team, network team, BI team, and application teams to make sure that all the big data applications are highly available and performing as expected.
If working with open source Apache Distribution, then hadoop admins have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the hadoop admin need not configure them manually.
Hadoop admin is responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the hadoop cluster.
Hadoop admin is also responsible for deciding the size of the hadoop cluster based on the data to be stored in HDFS.
Ensure that the hadoop cluster is up and running all the time.
Monitoring the cluster connectivity and performance.
Manage and review Hadoop log files.
Backup and recovery tasks
Resource and security management
Troubleshooting application errors and ensuring that they do not occur again.
Data Platform engineering at Uber is looking for a strong Technical Lead (Level 5a Engineer) who has built high quality platforms and services that can operate at scale. 5a Engineer at Uber exhibits following qualities:
- Demonstrate tech expertise › Demonstrate technical skills to go very deep or broad in solving classes of problems or creating broadly leverageable solutions.
- Execute large scale projects › Define, plan and execute complex and impactful projects. You communicate the vision to peers and stakeholders.
- Collaborate across teams › Domain resource to engineers outside your team and help them leverage the right solutions. Facilitate technical discussions and drive to a consensus.
- Coach engineers › Coach and mentor less experienced engineers and deeply invest in their learning and success. You give and solicit feedback, both positive and negative, to others you work with to help improve the entire team.
- Tech leadership › Lead the effort to define the best practices in your immediate team, and help the broader organization establish better technical or business processes.
What You’ll Do
- Build a scalable, reliable, operable and performant data analytics platform for Uber’s engineers, data scientists, products and operations teams.
- Work alongside the pioneers of big data systems such as Hive, Yarn, Spark, Presto, Kafka, Flink to build out a highly reliable, performant, easy to use software system for Uber’s planet scale of data.
- Become proficient of multi-tenancy, resource isolation, abuse prevention, self-serve debuggability aspects of a high performant, large scale, service while building these capabilities for Uber's engineers and operation folks.
What You’ll Need
- 7+ years experience in building large scale products, data platforms, distributed systems in a high caliber environment.
- Architecture: Identify and solve major architectural problems by going deep in your field or broad across different teams. Extend, improve, or, when needed, build solutions to address architectural gaps or technical debt.
- Software Engineering/Programming: Create frameworks and abstractions that are reliable and reusable. advanced knowledge of at least one programming language, and are happy to learn more. Our core languages are Java, Python, Go, and Scala.
- Data Engineering: Expertise in one of the big data analytics technologies we currently use such as Apache Hadoop (HDFS and YARN), Apache Hive, Impala, Drill, Spark, Tez, Presto, Calcite, Parquet, Arrow etc. Under the hood experience with similar systems such as Vertica, Apache Impala, Drill, Google Borg, Google BigQuery, Amazon EMR, Amazon RedShift, Docker, Kubernetes, Mesos etc.
- Execution & Results: You tackle large technical projects/problems that are not clearly defined. You anticipate roadblocks and have strategies to de-risk timelines. You orchestrate work that spans multiple teams and keep your stakeholders informed.
- A team player: You believe that you can achieve more on a team that the whole is greater than the sum of its parts. You rely on others’ candid feedback for continuous improvement.
- Business acumen: You understand requirements beyond the written word. Whether you’re working on an API used by other developers, an internal tool consumed by our operation teams, or a feature used by millions of customers, your attention to details leads to a delightful user experience.