Apache Oozie Jobs in Pune

Explore top Apache Oozie Job opportunities in Pune from Top Companies & Startups. All jobs are added by verified employees who can be contacted directly below.
icon

at Thoughtworks

1 video
36 recruiters
DP
Posted by sabarinath konath
Pune, Bengaluru (Bangalore), Coimbatore, Hyderabad, Gurugram
3 - 10 yrs
₹18L - ₹40L / yr
Apache Kafka
Spark
Hadoop
Apache Hive
Big Data
+5 more

Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.



You’ll spend time on the following:

  • You will partner with teammates to create complex data processing pipelines in order to solve our clients’ most ambitious challenges
  • You will collaborate with Data Scientists in order to design scalable implementations of their models
  • You will pair to write clean and iterative code based on TDD
  • Leverage various continuous delivery practices to deploy data pipelines
  • Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
  • Develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
  • Create data models and speak to the tradeoffs of different modeling approaches

Here’s what we’re looking for:

 

  • You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
  • You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
  • Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
  • You are comfortable taking data-driven approaches and applying data security strategy to solve business problems 
  • Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
  • Strong communication and client-facing skills with the ability to work in a consulting environment
Read more

contract intelligence platform

Agency job
via wrackle by Naveen Taalanki
Pune
12 - 20 yrs
₹50L - ₹100L / yr
Data Science
Natural Language Processing (NLP)
Machine Learning (ML)
Algorithms
Python
+5 more
Responsibilities
  • Partners with business stakeholders to translate business objectives into clearly defined analytical projects.
  • Identify opportunities for text analytics and NLP to enhance the core product platform, select the best machine learning techniques for the specific business problem and then build the models that solve the problem.
  • Own the end-end process, from recognizing the problem to implementing the solution.
  • Define the variables and their inter-relationships and extract the data from our data repositories, leveraging infrastructure including Cloud computing solutions and relational database environments.
  • Build predictive models that are accurate and robust and that help our customers to utilize the core platform to the maximum extent.

Skills and Qualification
  • 12 to 15 yrs of experience.
  • An advanced degree in predictive analytics, machine learning, artificial intelligence; or a degree in programming and significant experience with text analytics/NLP. He shall have a strong background in machine learning (unsupervised and supervised techniques). In particular, excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, logistic regression, MLPs, RNNs, etc.
  • Experience with text mining, parsing, and classification using state-of-the-art techniques.
  • Experience with information retrieval, Natural Language Processing, Natural Language
  • Understanding and Neural Language Modeling.
  • Ability to evaluate the quality of ML models and to define the right performance metrics for models in accordance with the requirements of the core platform.
  • Experience in the Python data science ecosystem: Pandas, NumPy, SciPy, sci-kit-learn, NLTK, Gensim, etc.
  • Excellent verbal and written communication skills, particularly possessing the ability to share technical results and recommendations to both technical and non-technical audiences.
  • Ability to perform high-level work both independently and collaboratively as a project member or leader on multiple projects.
Read more

at Amber

1 recruiter
DP
Posted by Aarti Sharma
Pune
2 - 3 yrs
₹15L - ₹17L / yr
Python
Amazon Web Services (AWS)
Big Data
ETL
Java
+9 more

About Amber (https://amberstudent.com)
Long-term accommodation booking platform for students (think booking.com for
student housing). Amber helps 80M students worldwide, find and book full-time accommodations near their universities, without the hassle of negotiation, nonstandardized and cumbersome paperwork, and a broken payment process.

We are the leading student housing platform globally, with 1M+ student housing units listed in 6 countries and across 80 cities.

We are growing rapidly and targeting $400M in annual gross bookings value by 2022.
If you are passionate about making international mobility and living, seamless and accessible, then - Join us in building the future of student housing!
We are amongst the fastest growing companies in Asia-Pacific as per
Financial times https://www.ft.com/high-growth-asia-pacific-ranking-2022 .

 

Responsibilities
  • In charge of converting raw data into usable information for analytics and business decision-making
  • Setting up accurate data pipelines to structure the Data and optimize the cost
  • Create and maintain optimal data pipeline architecture
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS technologies.
  • Work with stakeholders including the Executive, Product, Analytics and Design teams to assist with data-related technical issues and support their data infrastructure needs.

 

Requirements
  • Minimum 2 years of previous experience as a data engineer or in a similar role.
  • Technical expertise in data models, data mining, and segmentation
  • techniques.
  • Knowledge and hands-on with of programming languages (e.g. Java, Python
  • and Scala)
  • Hands-on experience with SQL database design and AWS lambda function.
  • Experience with big data tools: Spark, and Kafka.
  • Experience with AWS cloud services: Redshift and S3.
  • Experience in ETL frameworks like AWS Glue.
  • Experience in designing Data warehousing and streaming processes.

 

What will you get from amber: 
  • Fast-paced growth (can skip intermediate levels)
  • Total freedom and authority (everything under you, just get the job done!)
  • Open and Inclusive Environment
  • Great Compensation (and ESOPs)
Read more

at CoffeeBeans Consulting

7 recruiters
DP
Posted by Nelson Xavier
Bengaluru (Bangalore), Pune, Hyderabad
4 - 8 yrs
₹10L - ₹25L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+4 more

Job responsibilities

- You will partner with teammates to create complex data processing pipelines in order to solve our clients' most complex challenges

- You will pair to write clean and iterative code based on TDD

- Leverage various continuous delivery practices to deploy, support and operate data pipelines

- Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available

- Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions

- Create data models and speak to the tradeoffs of different modeling approaches

- Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process

- Encouraging open communication and advocating for shared outcomes

 

Technical skills

- You have a good understanding of data modelling and experience with data engineering tools and platforms such as Spark (Scala) and Hadoop

- You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting

- Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions

- You are comfortable taking data-driven approaches and applying data security strategy to solve business problems

- Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems

- You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments

 



Professional skills

- You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives

- An interest in coaching, sharing your experience and knowledge with teammates

- You enjoy influencing others and always advocate for technical excellence while being open to change when needed

- Presence in the external tech community: you willingly share your expertise with others via speaking engagements, contributions to open source, blogs and more

Read more
DP
Posted by Tanu Chauhan
Pune, Mumbai
2 - 8 yrs
₹5L - ₹15L / yr
Data Science
Machine Learning (ML)
Python
Tableau
R Programming
+1 more

Specialism- Advance Analytics, Data Science, regression, forecasting, analytics, SQL, R, python, decision tree, random forest, SAS, clustering classification

Senior Analytics Consultant- Responsibilities

  • Understand business problem and requirements by building domain knowledge and translate to data science problem
  • Conceptualize and design cutting edge data science solution to solve the data science problem, apply design thinking concepts
  • Identify the right algorithms , tech stack , sample outputs required to efficiently adder the end need
  • Prototype and experiment the solution to successfully demonstrate the value
    Independently or with support from team execute the conceptualized solution as per plan by following project management guidelines
  • Present the results to internal and client stakeholder in an easy to understand manner with great story telling, story boarding, insights and visualization
  • Help build overall data science capability for eClerx through support in pilots, pre sales pitches, product development , practice development initiatives
Read more

at TIGI HR Solution Pvt. Ltd.

1 video
31 recruiters
DP
Posted by Rutu Lakhani
Mumbai, Bengaluru (Bangalore), Pune, Hyderabad, Noida
2 - 5 yrs
₹10L - ₹17L / yr
Data engineering
Hadoop
Big Data
Python
SQL
+2 more
Position : Data Engineer
Employee Strength : around 600 in all over India
Working days: 5 days
Working Time: Flexible
Salary : 30-40% Hike on Current CTC
As of now work from home.
 
Job description:
  • Design, implement and support an analytical data infrastructure, providing ad hoc access to large data sets and computing power.
  • Contribute to development of standards and the design and implementation of proactive processes to collect and report data and statistics on assigned systems.
  • Research opportunities for data acquisition and new uses for existing data.
  • Provide technical development expertise for designing, coding, testing, debugging, documenting and supporting data solutions.
  • Experience building data pipelines to connect analytics stacks, client data visualization tools and external data sources.
  • Experience with cloud and distributed systems principles
  • Experience with Azure/AWS/GCP cloud infrastructure
  • Experience with Databricks Clusters and Configuration
  • Experience with Python, R, sh/bash and JVM-based languages including Scala and Java.
  • Experience with Hadoop family languages including Pig and Hive.
Read more

at Datametica Solutions Private Limited

1 video
7 recruiters
DP
Posted by Nikita Aher
Pune, Hyderabad
7 - 12 yrs
₹12L - ₹33L / yr
Big Data
Hadoop
Spark
Apache Spark
Apache Hive
+3 more

Job description

Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)

Primary Location : India-Pune, Hyderabad

Experience : 7 - 12 Years

Management Level: 7

Joining Time: Immediate Joiners are preferred


  • Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
  • Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
  • Align architecture with business requirements and stabilizing the developed solution
  • Ability to build prototypes to demonstrate the technical feasibility of your vision
  • Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
  • To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
  • Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
  • Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
  • Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
  • Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
  • Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
  • Deployment sophisticated analytics program of code using any of cloud application.


Perks and Benefits we Provide!


  • Working with Highly Technical and Passionate, mission-driven people
  • Subsidized Meals & Snacks
  • Flexible Schedule
  • Approachable leadership
  • Access to various learning tools and programs
  • Pet Friendly
  • Certification Reimbursement Policy
  • Check out more about us on our website below!

www.datametica.com

Read more
Agency job
via Technogen India Pvt.Ltd by RAHUL BATTA
NCR (Delhi | Gurgaon | Noida), Bengaluru (Bangalore), Mumbai, Pune
7 - 8 yrs
₹15L - ₹16L / yr
Data steward
MDM
Tamr
Reltio
Data engineering
+7 more
  1. Data Steward :

Data Steward will collaborate and work closely within the group software engineering and business division. Data Steward has overall accountability for the group's / Divisions overall data and reporting posture by responsibly managing data assets, data lineage, and data access, supporting sound data analysis. This role requires focus on data strategy, execution, and support for projects, programs, application enhancements, and production data fixes. Makes well-thought-out decisions on complex or ambiguous data issues and establishes the data stewardship and information management strategy and direction for the group. Effectively communicates to individuals at various levels of the technical and business communities. This individual will become part of the corporate Data Quality and Data management/entity resolution team supporting various systems across the board.

 

Primary Responsibilities:

 

  • Responsible for data quality and data accuracy across all group/division delivery initiatives.
  • Responsible for data analysis, data profiling, data modeling, and data mapping capabilities.
  • Responsible for reviewing and governing data queries and DML.
  • Accountable for the assessment, delivery, quality, accuracy, and tracking of any production data fixes.
  • Accountable for the performance, quality, and alignment to requirements for all data query design and development.
  • Responsible for defining standards and best practices for data analysis, modeling, and queries.
  • Responsible for understanding end-to-end data flows and identifying data dependencies in support of delivery, release, and change management.
  • Responsible for the development and maintenance of an enterprise data dictionary that is aligned to data assets and the business glossary for the group responsible for the definition and maintenance of the group's data landscape including overlays with the technology landscape, end-to-end data flow/transformations, and data lineage.
  • Responsible for rationalizing the group's reporting posture through the definition and maintenance of a reporting strategy and roadmap.
  • Partners with the data governance team to ensure data solutions adhere to the organization’s data principles and guidelines.
  • Owns group's data assets including reports, data warehouse, etc.
  • Understand customer business use cases and be able to translate them to technical specifications and vision on how to implement a solution.
  • Accountable for defining the performance tuning needs for all group data assets and managing the implementation of those requirements within the context of group initiatives as well as steady-state production.
  • Partners with others in test data management and masking strategies and the creation of a reusable test data repository.
  • Responsible for solving data-related issues and communicating resolutions with other solution domains.
  • Actively and consistently support all efforts to simplify and enhance the Clinical Trial Predication use cases.
  • Apply knowledge in analytic and statistical algorithms to help customers explore methods to improve their business.
  • Contribute toward analytical research projects through all stages including concept formulation, determination of appropriate statistical methodology, data manipulation, research evaluation, and final research report.
  • Visualize and report data findings creatively in a variety of visual formats that appropriately provide insight to the stakeholders.
  • Achieve defined project goals within customer deadlines; proactively communicate status and escalate issues as needed.

 

Additional Responsibilities:

 

  • Strong understanding of the Software Development Life Cycle (SDLC) with Agile Methodologies
  • Knowledge and understanding of industry-standard/best practices requirements gathering methodologies.
  • Knowledge and understanding of Information Technology systems and software development.
  • Experience with data modeling and test data management tools.
  • Experience in the data integration project • Good problem solving & decision-making skills.
  • Good communication skills within the team, site, and with the customer

 

Knowledge, Skills and Abilities

 

  • Technical expertise in data architecture principles and design aspects of various DBMS and reporting concepts.
  • Solid understanding of key DBMS platforms like SQL Server, Azure SQL
  • Results-oriented, diligent, and works with a sense of urgency. Assertive, responsible for his/her own work (self-directed), have a strong affinity for defining work in deliverables, and be willing to commit to deadlines.
  • Experience in MDM tools like MS DQ, SAS DM Studio, Tamr, Profisee, Reltio etc.
  • Experience in Report and Dashboard development
  • Statistical and Machine Learning models
  • Python (sklearn, numpy, pandas, genism)
  • Nice to Have:
  • 1yr of ETL experience
  • Natural Language Processing
  • Neural networks and Deep learning
  • xperience in keras,tensorflow,spacy, nltk, LightGBM python library

 

Interaction :  Frequently interacts with subordinate supervisors.

Education : Bachelor’s degree, preferably in Computer Science, B.E or other quantitative field related to the area of assignment. Professional certification related to the area of assignment may be required

Experience :  7 years of Pharmaceutical /Biotech/life sciences experience, 5 years of Clinical Trials experience and knowledge, Excellent Documentation, Communication, and Presentation Skills including PowerPoint

 

Read more

5 years old AI Startup

Agency job
Pune
2 - 6 yrs
₹12L - ₹18L / yr
Data Science
Machine Learning (ML)
Python
Natural Language Processing (NLP)
Deep Learning
  •  3+ years of experience in Machine Learning
  • Bachelors/Masters in Computer Engineering/Science.
  • Bachelors/Masters in Engineering/Mathematics/Statistics with sound knowledge of programming and computer concepts.
  • 10 and 12th acedemics 70 % & above.

Skills :
 - Strong Python/ programming skills
 - Good conceptual understanding of Machine Learning/Deep Learning/Natural Language            Processing
 - Strong verbal and written communication skills.
 - Should be able to manage team, meet project deadlines and interface with clients.
 - Should be able to work across different domains and quickly ramp up the business                   processes & flows & translate business problems into the data solutions

Read more

at IQVIA

6 recruiters
DP
Posted by Nishigandha Wagh
Pune
3 - 6 yrs
₹5L - ₹15L / yr
Data Warehouse (DWH)
Business Intelligence (BI)
Amazon Web Services (AWS)
SQL
MDM
+1 more
Consultants will have the opportunity to :
- Build a team with skills in ETL, reporting, MDM and ad-hoc analytics support
- Build technical solutions using latest open source and cloud based technologies
- Work closely with offshore senior consultant, onshore team and client's business and IT teams to gather project requirements
- Assist overall project execution from India - starting from project planning, team formation system design and development, testing, UAT and deployment
- Build demos and POCs in support of business development for new and existing clients
- Prepare project documents and PowerPoint presentations for client communication
- Conduct training sessions to train associates and help shape their growth
Read more

at Saama Technologies

6 recruiters
DP
Posted by Sandeep Chaudhary
Pune
4 - 8 yrs
₹1L - ₹16L / yr
Data Science
Python
Machine Learning (ML)
Natural Language Processing (NLP)
Big Data
+2 more
Description Must have Direct Hands- on, 4 years of experience, building complex Data Science solutions Must have fundamental knowledge of Inferential Statistics Should have worked on Predictive Modelling, using Python / R Experience should include the following, File I/ O, Data Harmonization, Data Exploration Machine Learning Techniques (Supervised, Unsupervised) Multi- Dimensional Array Processing Deep Learning NLP, Image Processing Prior experience in Healthcare Domain, is a plus Experience using Big Data, is a plus Should have Excellent Analytical, Problem Solving ability. Should be able to grasp new concepts quickly Should be well familiar with Agile Project Management Methodology Should have excellent written and verbal communication skills Should be a team player with open mind
Read more
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort