Apache Oozie Jobs in Pune

Explore top Apache Oozie Job opportunities in Pune from Top Companies & Startups. All jobs are added by verified employees who can be contacted directly below.
icon
icon
Pune, Bengaluru (Bangalore), Coimbatore, Hyderabad, Gurugram
icon
3 - 10 yrs
icon
₹18L - ₹40L / yr
Apache Kafka
Spark
Hadoop
Apache Hive
Big Data
+5 more

Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.



You’ll spend time on the following:

  • You will partner with teammates to create complex data processing pipelines in order to solve our clients’ most ambitious challenges
  • You will collaborate with Data Scientists in order to design scalable implementations of their models
  • You will pair to write clean and iterative code based on TDD
  • Leverage various continuous delivery practices to deploy data pipelines
  • Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
  • Develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
  • Create data models and speak to the tradeoffs of different modeling approaches

Here’s what we’re looking for:

 

  • You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
  • You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
  • Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
  • You are comfortable taking data-driven approaches and applying data security strategy to solve business problems 
  • Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
  • Strong communication and client-facing skills with the ability to work in a consulting environment
Read more

consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry

Agency job
via Jobdost by Sathish Kumar
icon
Ahmedabad, Hyderabad, Pune, Delhi
icon
5 - 7 yrs
icon
₹18L - ₹25L / yr
AWS Lambda
AWS Simple Notification Service (SNS)
AWS Simple Queuing Service (SQS)
Python
PySpark
+9 more
  1. Data Engineer

 Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON

Mandatory Requirements  

  • Experience in AWS Glue
  • Experience in Apache Parquet 
  • Proficient in AWS S3 and data lake 
  • Knowledge of Snowflake
  • Understanding of file-based ingestion best practices.
  • Scripting language - Python & pyspark 

CORE RESPONSIBILITIES 

  • Create and manage cloud resources in AWS 
  • Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
  • Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
  • Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
  • Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
  • Define process improvement opportunities to optimize data collection, insights and displays.
  • Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
  • Identify and interpret trends and patterns from complex data sets 
  • Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
  • Key participant in regular Scrum ceremonies with the agile teams  
  • Proficient at developing queries, writing reports and presenting findings 
  • Mentor junior members and bring best industry practices 

QUALIFICATIONS 

  • 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
  • Strong background in math, statistics, computer science, data science or related discipline
  • Advanced knowledge one of language: Java, Scala, Python, C# 
  • Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
  • Proficient with
  • Data mining/programming tools (e.g. SAS, SQL, R, Python)
  • Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
  • Data visualization (e.g. Tableau, Looker, MicroStrategy)
  • Comfortable learning about and deploying new technologies and tools. 
  • Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
  • Good written and oral communication skills and ability to present results to non-technical audiences 
  • Knowledge of business intelligence and analytical tools, technologies and techniques.

  

Familiarity and experience in the following is a plus:  

  • AWS certification
  • Spark Streaming 
  • Kafka Streaming / Kafka Connect 
  • ELK Stack 
  • Cassandra / MongoDB 
  • CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Read more
DP
Posted by Indrajeet Deshmukh
icon
Pune
icon
3 - 6 yrs
icon
Best in industry
SQL
Python
JVM
Google Cloud Platform (GCP)
Spark
About DeepIntent:
DeepIntent is a marketing technology company that helps healthcare brands strengthen communication with patients and healthcare professionals by enabling highly effective and performant digital advertising campaigns. Our healthcare technology platform, MarketMatch™, connects advertisers, data providers, and publishers to operate the first unified, programmatic marketplace for healthcare marketers. The platform’s built-in identity solution matches digital IDs with clinical, behavioral, and contextual data in real-time so marketers can qualify 1.6M+ verified HCPs and 225M+ patients to find their most clinically-relevant audiences, and message them on a one-to-one basis in a privacy compliant way. Healthcare marketers use MarketMatch to plan, activate, and measure digital campaigns in ways that best suit their business, from managed service engagements to technical integration or self-service solutions. DeepIntent was founded by Memorial Sloan Kettering alumni in 2016 and acquired by Propel Media, Inc. in 2017. We proudly serve major pharmaceutical and Fortune 500 companies out of our offices in New York, Bosnia and India.

Roles and Responsibilities
  • Establish formal data practice for the organisation.
  • Build & operate scalable and robust data architectures.
  • Create pipelines for the self-service introduction and usage of new data
  • Implement DataOps practices
  • Design, Develop, operate Data Pipelines which support Data scientists and machine learning Engineers.
  • Build simple, highly reliable Data storage, ingestion, transformation solutions which are easy to deploy and manage.
  • Collaborate with various business stakeholders, software engineers, machine learning engineers, analysts.
  •  
Desired Skills
  • Experience in designing, developing and operating configurable Data pipelines serving high volume and velocity data.
  • Experience working with public clouds like GCP/AWS.
  • Good understanding of software engineering, DataOps, and data architecture, Agile and DevOps methodologies.
  • Experience building Data architectures that optimize performance and cost, whether the components are prepackaged or homegrown
  • Proficient with SQL,Python or JVM based language, Bash.
  • Experience with any of Apache open source projects such as Spark, Druid, Beam, Airflow etc.and big data databases like BigQuery, Clickhouse, etc
  • Good communication skills with ability to collaborate with both technical and non technical people.
  • Ability to Think Big, take bets and innovate, Dive Deep, Bias for Action, Hire and Develop the Best, Learn and be Curious.
 
 
 
 
 
 

 

Read more
icon
Pune
icon
6 - 10 yrs
icon
Best in industry
Machine Learning (ML)
Data Science
Natural Language Processing (NLP)
Python
SQL
+3 more
  • 5+ years of professional experience in experiment design and applied machine learning predicting outcomes in large-scale, complex datasets.
  • Proficiency in Python, Azure ML, or other statistics/ML tools.
  • Proficiency in Deep Neural Network, Python based frameworks.
  • Proficiency in Azure DataBricks, Hive, Spark.
  • Proficiency in deploying models into production (Azure stack).
  • Moderate coding skills. SQL or similar required. C# or other languages strongly preferred.
  • Outstanding communication and collaboration skills. You can learn from and teach others.
  • Strong drive for results. You have a proven record of shepherding experiments to create successful shipping products/services.
  • Experience with prediction in adversarial (energy) environments highly desirable.
  • Understanding of the model development ecosystem across platforms, including development, distribution, and best practices, highly desirable.
  • A Masters or Ph.D degree with coursework in Statistics, Data Science, Experimentation Design, and Machine Learning highly desirable


    In-person Interview- 24th Sept, Saturday- Pune Office

Read more

contract intelligence platform

Agency job
via wrackle by Naveen Taalanki
icon
Pune
icon
12 - 20 yrs
icon
₹50L - ₹100L / yr
Data Science
Natural Language Processing (NLP)
Machine Learning (ML)
Algorithms
Python
+5 more
Responsibilities
  • Partners with business stakeholders to translate business objectives into clearly defined analytical projects.
  • Identify opportunities for text analytics and NLP to enhance the core product platform, select the best machine learning techniques for the specific business problem and then build the models that solve the problem.
  • Own the end-end process, from recognizing the problem to implementing the solution.
  • Define the variables and their inter-relationships and extract the data from our data repositories, leveraging infrastructure including Cloud computing solutions and relational database environments.
  • Build predictive models that are accurate and robust and that help our customers to utilize the core platform to the maximum extent.

Skills and Qualification
  • 12 to 15 yrs of experience.
  • An advanced degree in predictive analytics, machine learning, artificial intelligence; or a degree in programming and significant experience with text analytics/NLP. He shall have a strong background in machine learning (unsupervised and supervised techniques). In particular, excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, logistic regression, MLPs, RNNs, etc.
  • Experience with text mining, parsing, and classification using state-of-the-art techniques.
  • Experience with information retrieval, Natural Language Processing, Natural Language
  • Understanding and Neural Language Modeling.
  • Ability to evaluate the quality of ML models and to define the right performance metrics for models in accordance with the requirements of the core platform.
  • Experience in the Python data science ecosystem: Pandas, NumPy, SciPy, sci-kit-learn, NLTK, Gensim, etc.
  • Excellent verbal and written communication skills, particularly possessing the ability to share technical results and recommendations to both technical and non-technical audiences.
  • Ability to perform high-level work both independently and collaboratively as a project member or leader on multiple projects.
Read more
DP
Posted by Ketaki Kambale
icon
Remote, Pune
icon
3 - 9 yrs
icon
₹5L - ₹20L / yr
Data Warehouse (DWH)
Informatica
ETL
Python
SQL
+1 more

We’re hiring a talented Data Engineer and Big Data enthusiast to work in our platform to help ensure that our data quality is flawless.  As a company, we have millions of new data points every day that come into our system. You will be working with a passionate team of engineers to solve challenging problems and ensure that we can deliver the best data to our customers, on-time. You will be using the latest cloud data warehouse technology to build robust and reliable data pipelines.

Duties/Responsibilities Include:

  •  Develop expertise in the different upstream data stores and systems across Numerator.
  • Design, develop and maintain data integration pipelines for Numerators growing data sets and product offerings.
  • Build testing and QA plans for data pipelines.
  • Build data validation testing frameworks to ensure high data quality and integrity.
  • Write and maintain documentation on data pipelines and schemas
 

Requirements:

  • BS or MS in Computer Science or related field of study
  • 3 + years of experience in the data warehouse space
  • Expert in SQL, including advanced analytical queries
  • Proficiency in Python (data structures, algorithms, object oriented programming, using API’s)
  • Experience working with a cloud data warehouse (Redshift, Snowflake, Vertica)
  • Experience with a data pipeline scheduling framework (Airflow)
  • Experience with schema design and data modeling

Exceptional candidates will have:

  • Amazon Web Services (EC2, DMS, RDS) experience
  • Terraform and/or ansible (or similar) for infrastructure deployment
  • Airflow -- Experience building and monitoring DAGs, developing custom operators, using script templating solutions.
  • Experience supporting production systems in an on-call environment
Read more
DP
Posted by Alex P
icon
Gurugram, Bengaluru (Bangalore), Pune
icon
2 - 15 yrs
icon
₹10L - ₹35L / yr
Scala
PySpark
Data engineering
Big Data
Hadoop
+3 more

Data Engineer – SQL, RDBMS, pySpark/Scala, Python, Hive, Hadoop, Unix

 

Data engineering services required:

  • Build data products and processes alongside the core engineering and technology team;
  • Collaborate with senior data scientists to curate, wrangle, and prepare datafor use in their advanced analytical models;
  • Integrate datafrom a variety of sources, assuring that they adhere to data quality and accessibility standards;
  • Modify and improve data engineering processes to handle ever larger, more complex, and more types of data sources and pipelines;
  • Use Hadoop architecture and HDFS commands to design and optimize data queries at scale;
  • Evaluate and experiment with novel data engineering tools and advises information technology leads and partners about new capabilities to determine optimal solutions for particular technical problems or designated use cases.

 

Big data engineering skills:

  • Demonstrated ability to perform the engineering necessary to acquire, ingest, cleanse, integrate, and structure massive volumes of data from multiple sources and systems into enterprise analytics platforms;
  • Proven ability to design and optimize queries to build scalable, modular, efficient data pipelines;
  • Ability to work across structured, semi-structured, and unstructured data, extracting information and identifying linkages across disparate data sets;
  • Proven experience delivering production-ready data engineering solutions, including requirements definition, architecture selection, prototype development, debugging, unit-testing, deployment, support, and maintenance;
  • Ability to operate with a variety of data engineering tools and technologies
Read more
icon
Pune
icon
3 - 8 yrs
icon
₹3L - ₹20L / yr
Intelligence
Artificial Intelligence (AI)
Deep Learning
Machine Learning (ML)
Data extraction
+3 more
Responsibilities
● Frame ML / AI use cases that can improve the company’s product
● Implement and develop ML / AI / Data driven rule based algorithms as software items
● For example, building a chatbot that replies an answer from relevant FAQ, and
reinforcing the system with a feedback loop so that the bot improves
Must have skills:
● Data extraction and ETL
● Python (numpy, pandas, comfortable with OOP)
● Django
● Knowledge of basic Machine Learning / Deep Learning / AI algorithms and ability to
implement them
● Good understanding of SDLC
● Deployed ML / AI model in a mobile / web product
● Soft skills : Strong communication skills & Critical thinking ability

Good to have:
● Full stack development experience
Required Qualification:
B.Tech. / B.E. degree in Computer Science or equivalent software engineering
Read more
Agency job
via Response Informatics by Anupama Lavanya Uppala
icon
Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune
icon
3 - 10 yrs
icon
₹10L - ₹25L / yr
PySpark
Python
  • Minimum 1 years of relevant experience, in PySpark (mandatory)
  • Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus 
  • Ability to play lead role and independently manage 3-5 member of Pyspark development team 
  • EMR ,Python and PYspark mandate.
  • Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Read more

Cloud infrastructure solutions and support company. (SE1)

Agency job
via Multi Recruit by Ranjini A R
icon
Pune
icon
2 - 6 yrs
icon
₹12L - ₹16L / yr
SQL
ETL
Data engineering
Big Data
Java
+2 more
  • Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
  • Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
  • Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
  • Build data pipelines that clean, transform, and aggregate data from disparate sources
  • Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
  • Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
  • Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
  • Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

  • Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
  • 5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
  • Technical expertise with data models, data mining.
  • Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
  • Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
  • Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
  • Hands-on knowledge in SQL and No-SQL database design.
  • Having knowledge in CI/CD for the building and hosting of the solutions.
  • Having AWS certification is an added advantage.
  • Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
  • A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
  • Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
  • A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists
Read more
DP
Posted by Sandeep Chaudhary
icon
Pune
icon
2 - 5 yrs
icon
₹1L - ₹18L / yr
Hadoop
Spark
Apache Hive
Apache Flume
Java
+5 more
Description Deep experience and understanding of Apache Hadoop and surrounding technologies required; Experience with Spark, Impala, Hive, Flume, Parquet and MapReduce. Strong understanding of development languages to include: Java, Python, Scala, Shell Scripting Expertise in Apache Spark 2. x framework principals and usages. Should be proficient in developing Spark Batch and Streaming job in Python, Scala or Java. Should have proven experience in performance tuning of Spark applications both from application code and configuration perspective. Should be proficient in Kafka and integration with Spark. Should be proficient in Spark SQL and data warehousing techniques using Hive. Should be very proficient in Unix shell scripting and in operating on Linux. Should have knowledge about any cloud based infrastructure. Good experience in tuning Spark applications and performance improvements. Strong understanding of data profiling concepts and ability to operationalize analyses into design and development activities Experience with best practices of software development; Version control systems, automated builds, etc. Experienced in and able to lead the following phases of the Software Development Life Cycle on any project (feasibility planning, analysis, development, integration, test and implementation) Capable of working within the team or as an individual Experience to create technical documentation
Read more
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort