20+ Hadoop Jobs in Mumbai | Hadoop Job openings in Mumbai
Apply to 20+ Hadoop Jobs in Mumbai on CutShort.io. Explore the latest Hadoop Job opportunities across top companies like Google, Amazon & Adobe.
Experience: 12-15 Years
Key Responsibilities:
- Client Engagement & Requirements Gathering: Independently engage with client stakeholders to
- understand data landscapes and requirements, translating them into functional and technical specifications.
- Data Architecture & Solution Design: Architect and implement Hadoop-based Cloudera CDP solutions,
- including data integration, data warehousing, and data lakes.
- Data Processes & Governance: Develop data ingestion and ETL/ELT frameworks, ensuring robust data governance and quality practices.
- Performance Optimization: Provide SQL expertise and optimize Hadoop ecosystems (HDFS, Ozone, Kudu, Spark Streaming, etc.) for maximum performance.
- Coding & Development: Hands-on coding in relevant technologies and frameworks, ensuring project deliverables meet stringent quality and performance standards.
- API & Database Management: Integrate APIs and manage databases (e.g., PostgreSQL, Oracle) to support seamless data flows.
- Leadership & Mentoring: Guide and mentor a team of data engineers and analysts, fostering collaboration and technical excellence.
Skills Required:
- a. Technical Proficiency:
- • Extensive experience with Hadoop ecosystem tools and services (HDFS, YARN, Cloudera
- Manager, Impala, Kudu, Hive, Spark Streaming, etc.).
- • Proficiency in programming languages like Spark, Python, Scala and a strong grasp of SQL
- performance tuning.
- • ETL tool expertise (e.g., Informatica, Talend, Apache Nifi) and data modelling knowledge.
- • API integration skills for effective data flow management.
- b. Project Management & Communication:
- • Proven ability to lead large-scale data projects and manage project timelines.
- • Excellent communication, presentation, and critical thinking skills.
- c. Client & Team Leadership:
- • Engage effectively with clients and partners, leading onsite and offshore teams.
About the company
DCB Bank is a new generation private sector bank with 442 branches across India.It is a scheduled commercial bank regulated by the Reserve Bank of India. DCB Bank’s business segments are Retail banking, Micro SME, SME, mid-Corporate, Agriculture, Government, Public Sector, Indian Banks, Co-operative Banks and Non-Banking Finance Companies.
Job Description
Department: Risk Analytics
CTC: Max 18 Lacs
Grade: Sr Manager/AVP
Experience: Min 4 years of relevant experience
We are looking for a Data Scientist to join our growing team of Data Science experts and manage the processes and people responsible for accurate data collection, processing, modelling, analysis, implementation, and maintenance.
Responsibilities
- Understand, monitor and maintain existing financial scorecards (ML Based) and make changes to the model when required.
- Perform Statistical analysis in R and assist IT team with deployment of ML model and analytical frameworks in Python.
- Should be able to handle multiple tasks and must know how to prioritize the work.
- Lead cross-functional projects using advanced data modelling and analysis techniques to discover insights that will guide strategic decisions and uncover optimization opportunities.
- Develop clear, concise and actionable solutions and recommendations for client’s business needs and actively explore client’s business and formulate solutions/ideas which can help client in terms of efficient cost cutting or in achieving growth/revenue/profitability targets faster.
- Build, develop and maintain data models, reporting systems, data automation systems, dashboards and performance metrics support that support key business decisions.
- Design and build technical processes to address business issues.
- Oversee the design and delivery of reports and insights that analyse business functions and key operations and performance metrics.
- Manage and optimize processes for data intake, validation, mining, and engineering as well as modelling, visualization, and communication deliverables.
- Communicate results and business impacts of insight initiatives to the Management of the company.
Requirements
- Industry knowledge
- 4 years or more of experience in financial services industry particularly retail credit industry is a must.
- Candidate should have either worked in banking sector (banks/ HFC/ NBFC) or consulting organizations serving these clients.
- Experience in credit risk model building such as application scorecards, behaviour scorecards, and/ or collection scorecards.
- Experience in portfolio monitoring, model monitoring, model calibration
- Knowledge of ECL/ Basel preferred.
- Educational qualification: Advanced degree in finance, mathematics, econometrics, or engineering.
- Technical knowledge: Strong data handling skills in databases such as SQL and Hadoop. Knowledge with data visualization tools, such as SAS VI/Tableau/PowerBI is preferred.
- Expertise in either R or Python; SAS knowledge will be plus.
Soft skills:
- Ability to quickly adapt to the analytical tools and development approaches used within DCB Bank
- Ability to multi-task good communication and team working skills.
- Ability to manage day-to-day written and verbal communication with relevant stakeholders.
- Ability to think strategically and make changes to data when required.
Key Responsibilities:
• Install, configure, and maintain Hadoop clusters.
• Monitor cluster performance and ensure high availability.
• Manage Hadoop ecosystem components (HDFS, YARN, Ozone, Spark, Kudu, Hive).
• Perform routine cluster maintenance and troubleshooting.
• Implement and manage security and data governance.
• Monitor systems health and optimize performance.
• Collaborate with cross-functional teams to support big data applications.
• Perform Linux administration tasks and manage system configurations.
• Ensure data integrity and backup procedures.
- Architectural Leadership:
- Design and architect robust, scalable, and high-performance Hadoop solutions.
- Define and implement data architecture strategies, standards, and processes.
- Collaborate with senior leadership to align data strategies with business goals.
- Technical Expertise:
- Develop and maintain complex data processing systems using Hadoop and its ecosystem (HDFS, YARN, MapReduce, Hive, HBase, Pig, etc.).
- Ensure optimal performance and scalability of Hadoop clusters.
- Oversee the integration of Hadoop solutions with existing data systems and third-party applications.
- Strategic Planning:
- Develop long-term plans for data architecture, considering emerging technologies and future trends.
- Evaluate and recommend new technologies and tools to enhance the Hadoop ecosystem.
- Lead the adoption of big data best practices and methodologies.
- Team Leadership and Collaboration:
- Mentor and guide data engineers and developers, fostering a culture of continuous improvement.
- Work closely with data scientists, analysts, and other stakeholders to understand requirements and deliver high-quality solutions.
- Ensure effective communication and collaboration across all teams involved in data projects.
- Project Management:
- Lead large-scale data projects from inception to completion, ensuring timely delivery and high quality.
- Manage project resources, budgets, and timelines effectively.
- Monitor project progress and address any issues or risks promptly.
- Data Governance and Security:
- Implement robust data governance policies and procedures to ensure data quality and compliance.
- Ensure data security and privacy by implementing appropriate measures and controls.
- Conduct regular audits and reviews of data systems to ensure compliance with industry standards and regulations.
Role: Principal Software Engineer
We looking for a passionate Principle Engineer - Analytics to build data products that extract valuable business insights for efficiency and customer experience. This role will require managing, processing and analyzing large amounts of raw information and in scalable databases. This will also involve developing unique data structures and writing algorithms for the entirely new set of products. The candidate will be required to have critical thinking and problem-solving skills. The candidates must be experienced with software development with advanced algorithms and must be able to handle large volume of data. Exposure with statistics and machine learning algorithms is a big plus. The candidate should have some exposure to cloud environment, continuous integration and agile scrum processes.
Responsibilities:
• Lead projects both as a principal investigator and project manager, responsible for meeting project requirements on schedule
• Software Development that creates data driven intelligence in the products which deals with Big Data backends
• Exploratory analysis of the data to be able to come up with efficient data structures and algorithms for given requirements
• The system may or may not involve machine learning models and pipelines but will require advanced algorithm development
• Managing, data in large scale data stores (such as NoSQL DBs, time series DBs, Geospatial DBs etc.)
• Creating metrics and evaluation of algorithm for better accuracy and recall
• Ensuring efficient access and usage of data through the means of indexing, clustering etc.
• Collaborate with engineering and product development teams.
Requirements:
• Master’s or Bachelor’s degree in Engineering in one of these domains - Computer Science, Information Technology, Information Systems, or related field from top-tier school
• OR Master’s degree or higher in Statistics, Mathematics, with hands on background in software development.
• Experience of 8 to 10 year with product development, having done algorithmic work
• 5+ years of experience working with large data sets or do large scale quantitative analysis
• Understanding of SaaS based products and services.
• Strong algorithmic problem-solving skills
• Able to mentor and manage team and take responsibilities of team deadline.
Skill set required:
• In depth Knowledge Python programming languages
• Understanding of software architecture and software design
• Must have fully managed a project with a team
• Having worked with Agile project management practices
• Experience with data processing analytics and visualization tools in Python (such as pandas, matplotlib, Scipy, etc.)
• Strong understanding of SQL and querying to NoSQL database (eg. Mongo, Casandra, Redis
Role / Purpose - Lead Developer - API and Microservices
Must have a strong hands-on development track record building integration utilizing a variety of integration products, tools, protocols, technologies, and patterns.
- Must have an in-depth understanding of SOA/EAI/ESB concepts, SOA Governance, Event-Driven Architecture, message-based architectures, file sharing, and exchange platforms, data virtualization and caching strategies, J2EE design patterns, frameworks
- Should possess experience with at least one of middleware technologies (Application Servers, BPMS, BRMS, ESB & Message Brokers), Programming languages (e.g. Java/J2EE, JavaScript, COBOL, C), Operating Systems (e.g. Windows, Linux, MVS), and Databases (DB2, MySQL, No SQL Databases like MongoDB, Cassandra, Hadoop, etc.)
- Must have experience implementing API Service architectures (SOAP, REST) using any of the market-leading API Management tools such as Apigee and frameworks such as Spring Boot for Microservices
- Should have Advanced skills in implementing API Service architectures (SOAP, REST) using any of the market-leading API Management tools such as Apigee or similar frameworks such as Spring Boot for Microservices
- Appetite to manage large-scale projects and multiple tracks
- Experience and knowhow of the e-commerce domain and retail experience are preferred
- Good communication & people managerial skills
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
members
• Communication
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team
Must have:
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
Project Management
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
DevOps tools
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills
Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel
Work Environment
• Customer Office (Mumbai) / Remote Work
Education
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
Job Description for :
Role: Data/Integration Architect
Experience – 8-10 Years
Notice Period: Under 30 days
Key Responsibilities: Designing, Developing frameworks for batch and real time jobs on Talend. Leading migration of these jobs from Mulesoft to Talend, maintaining best practices for the team, conducting code reviews and demos.
Core Skillsets:
Talend Data Fabric - Application, API Integration, Data Integration. Knowledge on Talend Management Cloud, deployment and scheduling of jobs using TMC or Autosys.
Programming Languages - Python/Java
Databases: SQL Server, Other Databases, Hadoop
Should have worked on Agile
Sound communication skills
Should be open to learning new technologies based on business needs on the job
Additional Skills:
Awareness of other data/integration platforms like Mulesoft, Camel
Awareness Hadoop, Snowflake, S3
delivered.
• You will utilize your configuration management and software release experience; as well as
change management concepts to drive the success of the projects.
• You will partner with senior leaders to understand and communicate the business needs to
translate them into IT requirements. Consult with Customer’s Business Analysts on their Data
warehouse requirements
• You will assist the technical team in identification and resolution of Data Quality issues.
• You will manage small to medium-sized projects relating to the delivery of applications or
application changes.
• You will use Managed Services or 3rd party resources to meet application support requirements.
• You will interface daily with multi-functional team members within the EDW team and across the
enterprise to resolve issues.
• Recommend and advocate different approaches and designs to the requirements
• Write technical design docs
• Execute Data modelling
• Solution inputs for the presentation layer
• You will craft and generate summary, statistical, and presentation reports; as well as provide reporting and metrics for strategic initiatives.
• Performs miscellaneous job-related duties as assigned
Preferred Qualifications
• Strong interpersonal, teamwork, organizational and workload planning skills
• Strong analytical, evaluative, and problem-solving abilities as well as exceptional customer service orientation
• Ability to drive clarity of purpose and goals during release and planning activities
• Excellent organizational skills including ability to prioritize tasks efficiently with high level of attention to detail
• Excited by the opportunity to continually improve processes within a large company
• Healthcare background/ Automobile background.
• Familiarity with major big data solutions and products available in the market.
• Proven ability to drive continuous
upGrad is an online education platform building the careers of tomorrow by offering the most industry-relevant programs in an immersive learning experience. Our mission is to create a new digital-first learning experience to deliver tangible career impact to individuals at scale. upGrad currently offers programs in Data Science, Machine Learning, Product Management, Digital Marketing, and Entrepreneurship, etc. upGrad is looking for people passionate about management and education to help design learning programs for working professionals to stay sharp and stay relevant and help build the careers of tomorrow.
- upGrad was awarded the Best Tech for Education by IAMAI for 2018-19,
- upGrad was also ranked as one of the LinkedIn Top Startups 2018: The 25 most sought-after startups in India.
- upGrad was earlier selected as one of the top ten most innovative companies in India by FastCompany.
- We were also covered by the Financial Times along with other disruptors in Ed-Tech.
- upGrad is the official education partner for Government of India - Startup India program.
- Our program with IIIT B has been ranked #1 program in the country in the domain of Artificial Intelligence and Machine Learning.
About the Role
A highly motivated individual who has expe rience in architecting end to end web based ecommerce/online/SaaS products and systems; bringing them to production quickly and with high quality. Able to understand expected business results and map architecture to drive business forward. Passionate about building world class solutions.
Role and Responsibilities
- Work with Product Managers and Business to understand business/product requirements and vision.
- Provide a clear architectural vision in line with business and product vision.
- Lead a team of architects, developers, and data engineers to provide platform services to other engineering teams.
- Provide architectural oversight to engineering teams across the organization.
- Hands on design and development of platform services and features owned by self - this is a hands-on coding role.
- Define guidelines for best practices covering design, unit testing, secure coding etc.
- Ensure quality by reviewing design, code, test plans, load test plans etc. as appropriate.
- Work closely with the QA and Support teams to track quality and proactively identify improvement opportunities.
- Work closely with DevOps and IT to ensure highly secure and cost optimized operations in the cloud.
- Grow technical skills in the team - identify skill gaps with plans to address them, participate in hiring, mentor other architects and engineers.
- Support other engineers in resolving complex technical issues as a go-to person.
Skills/Experience
- 12+ years of experience in design and development of ecommerce scale systems and highly scalable SaaS or enterprise products.
- Extensive experience in developing extensible and scalable web applications with
- Java, Spring Boot, Go
- Web Services - REST, OAuth, OData
- Database/Caching - MySQL, Cassandra, MongoDB, Memcached/Redis
- Queue/Broker services - RabbitMQ/Kafka
- Microservices architecture via Docker on AWS or Azure.
- Experience with web front end technologies - HTML5, CSS3, JavaScript libraries and frameworks such as jQuery, AngularJS, React, Vue.js, Bootstrap etc.
- Extensive experience with cloud based architectures and how to optimize design for cost.
- Expert level understanding of secure application design practices and a working understanding of cloud infrastructure security.
- Experience with CI/CD processes and design for testability.
- Experience working with big data technologies such as Spark/Storm/Hadoop/Data Lake Architectures is a big plus.
- Action and result-oriented problem-solver who works well both independently and as part of a team; able to foster and develop others' ideas as well as his/her own.
- Ability to organize, prioritize and schedule a high workload and multiple parallel projects efficiently.
- Excellent verbal and written communication with stakeholders in a matrixed environment.
- Long term experience with at least one product from inception to completion and evolution of the product over multiple years.
B.Tech/MCA (IT/Computer Science) from a premier institution (IIT/NIT/BITS) and/or a US Master's degree in Computer Science.
At Karza technologies, we take pride in building one of the most comprehensive digital onboarding & due-diligence platforms by profiling millions of entities and trillions of associations amongst them using data collated from more than 700 publicly available government sources. Primarily in the B2B Fintech Enterprise space, we are headquartered in Mumbai in Lower Parel with 100+ strong workforce. We are truly furthering the cause of Digital India by providing the entire BFSI ecosystem with tech products and services that aid onboarding customers, automating processes and mitigating risks seamlessly, in real-time and at fraction of the current cost.
A few recognitions:
- Recognized as Top25 startups in India to work with 2019 by LinkedIn
- Winner of HDFC Bank's Digital Innovation Summit 2020
- Super Winners (Won every category) at Tecnoviti 2020 by Banking Frontiers
- Winner of Amazon AI Award 2019 for Fintech
- Winner of FinTech Spot Pitches at Fintegrate Zone 2018 held at BSE
- Winner of FinShare 2018 challenge held by ShareKhan
- Only startup in Yes Bank Global Fintech Accelerator to win the account during the Cohort
- 2nd place Citi India FinTech Challenge 2018 by Citibank
- Top 3 in Viacom18's Startup Engagement Programme VStEP
What your average day would look like:
- Deploy and maintain mission-critical information extraction, analysis, and management systems
- Manage low cost, scalable streaming data pipelines
- Provide direct and responsive support for urgent production issues
- Contribute ideas towards secure and reliable Cloud architecture
- Use open source technologies and tools to accomplish specific use cases encountered within the project
- Use coding languages or scripting methodologies to solve automation problems
- Collaborate with others on the project to brainstorm about the best way to tackle a complex infrastructure, security, or deployment problem
- Identify processes and practices to streamline development & deployment to minimize downtime and maximize turnaround time
What you need to work with us:
- Proficiency in at least one of the general-purpose programming languages like Python, Java, etc.
- Experience in managing the IAAS and PAAS components on popular public Cloud Service Providers like AWS, Azure, GCP etc.
- Proficiency in Unix Operating systems and comfortable with Networking concepts
- Experience with developing/deploying a scalable system
- Experience with the Distributed Database & Message Queues (like Cassandra, ElasticSearch, MongoDB, Kafka, etc.)
- Experience in managing Hadoop clusters
- Understanding of containers and have managed them in production using container orchestration services.
- Solid understanding of data structures and algorithms.
- Applied exposure to continuous delivery pipelines (CI/CD).
- Keen interest and proven track record in automation and cost optimization.
Experience:
- 1-4 years of relevant experience
- BE in Computer Science / Information Technology
Your mission is to help lead team towards creating solutions that improve the way our business is run. Your knowledge of design, development, coding, testing and application programming will help your team raise their game, meeting your standards, as well as satisfying both business and functional requirements. Your expertise in various technology domains will be counted on to set strategic direction and solve complex and mission critical problems, internally and externally. Your quest to embracing leading-edge technologies and methodologies inspires your team to follow suit.
Responsibilities and Duties :
- As a Data Engineer you will be responsible for the development of data pipelines for numerous applications handling all kinds of data like structured, semi-structured &
unstructured. Having big data knowledge specially in Spark & Hive is highly preferred.
- Work in team and provide proactive technical oversight, advice development teams fostering re-use, design for scale, stability, and operational efficiency of data/analytical solutions
Education level :
- Bachelor's degree in Computer Science or equivalent
Experience :
- Minimum 3+ years relevant experience working on production grade projects experience in hands on, end to end software development
- Expertise in application, data and infrastructure architecture disciplines
- Expert designing data integrations using ETL and other data integration patterns
- Advanced knowledge of architecture, design and business processes
Proficiency in :
- Modern programming languages like Java, Python, Scala
- Big Data technologies Hadoop, Spark, HIVE, Kafka
- Writing decently optimized SQL queries
- Orchestration and deployment tools like Airflow & Jenkins for CI/CD (Optional)
- Responsible for design and development of integration solutions with Hadoop/HDFS, Real-Time Systems, Data Warehouses, and Analytics solutions
- Knowledge of system development lifecycle methodologies, such as waterfall and AGILE.
- An understanding of data architecture and modeling practices and concepts including entity-relationship diagrams, normalization, abstraction, denormalization, dimensional
modeling, and Meta data modeling practices.
- Experience generating physical data models and the associated DDL from logical data models.
- Experience developing data models for operational, transactional, and operational reporting, including the development of or interfacing with data analysis, data mapping,
and data rationalization artifacts.
- Experience enforcing data modeling standards and procedures.
- Knowledge of web technologies, application programming languages, OLTP/OLAP technologies, data strategy disciplines, relational databases, data warehouse development and Big Data solutions.
- Ability to work collaboratively in teams and develop meaningful relationships to achieve common goals
Skills :
Must Know :
- Core big-data concepts
- Spark - PySpark/Scala
- Data integration tool like Pentaho, Nifi, SSIS, etc (at least 1)
- Handling of various file formats
- Cloud platform - AWS/Azure/GCP
- Orchestration tool - Airflow
Your mission is to help lead team towards creating solutions that improve the way our business is run. Your knowledge of design, development, coding, testing and application programming will help your team raise their game, meeting your standards, as well as satisfying both business and functional requirements. Your expertise in various technology domains will be counted on to set strategic direction and solve complex and mission critical problems, internally and externally. Your quest to embracing leading-edge technologies and methodologies inspires your team to follow suit.
Responsibilities and Duties :
- As a Data Engineer you will be responsible for the development of data pipelines for numerous applications handling all kinds of data like structured, semi-structured &
unstructured. Having big data knowledge specially in Spark & Hive is highly preferred.
- Work in team and provide proactive technical oversight, advice development teams fostering re-use, design for scale, stability, and operational efficiency of data/analytical solutions
Education level :
- Bachelor's degree in Computer Science or equivalent
Experience :
- Minimum 5+ years relevant experience working on production grade projects experience in hands on, end to end software development
- Expertise in application, data and infrastructure architecture disciplines
- Expert designing data integrations using ETL and other data integration patterns
- Advanced knowledge of architecture, design and business processes
Proficiency in :
- Modern programming languages like Java, Python, Scala
- Big Data technologies Hadoop, Spark, HIVE, Kafka
- Writing decently optimized SQL queries
- Orchestration and deployment tools like Airflow & Jenkins for CI/CD (Optional)
- Responsible for design and development of integration solutions with Hadoop/HDFS, Real-Time Systems, Data Warehouses, and Analytics solutions
- Knowledge of system development lifecycle methodologies, such as waterfall and AGILE.
- An understanding of data architecture and modeling practices and concepts including entity-relationship diagrams, normalization, abstraction, denormalization, dimensional
modeling, and Meta data modeling practices.
- Experience generating physical data models and the associated DDL from logical data models.
- Experience developing data models for operational, transactional, and operational reporting, including the development of or interfacing with data analysis, data mapping,
and data rationalization artifacts.
- Experience enforcing data modeling standards and procedures.
- Knowledge of web technologies, application programming languages, OLTP/OLAP technologies, data strategy disciplines, relational databases, data warehouse development and Big Data solutions.
- Ability to work collaboratively in teams and develop meaningful relationships to achieve common goals
Skills :
Must Know :
- Core big-data concepts
- Spark - PySpark/Scala
- Data integration tool like Pentaho, Nifi, SSIS, etc (at least 1)
- Handling of various file formats
- Cloud platform - AWS/Azure/GCP
- Orchestration tool - Airflow