20+ PySpark Jobs in Chennai | PySpark Job openings in Chennai
Apply to 20+ PySpark Jobs in Chennai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.
About Moative
Moative, an Applied AI Services company, designs AI roadmaps, builds co-pilots and predictive AI solutions for companies in energy, utilities, packaging, commerce, and other primary industries. Through Moative Labs, we aspire to build micro-products and launch AI startups in vertical markets.
Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.
Work you’ll do
As a Junior ML/ AI Engineer, you will help design and develop intelligent software to solve business problems. You will collaborate with senior ML engineers, data scientists and domain experts to incorporate ML and AI technologies into existing or new workflows. You’ll analyze new opportunities and ideas. You’ll train and evaluate ML models, conduct experiments, help develop PoCs and prototypes.
Responsibilities
- Designing, training, improving & launching machine learning models using tools such as XGBoost, Tensorflow, PyTorch.
- Contribute directly to the improvement of the way we evaluate and monitor model and system performances.
- Proposing and implementing ideas that directly impact our operational and strategic metrics.
Who you are
You are an engineer who is passionate about using AL/ML to improve processes, products and delight customers. You have experience working with less than clean data, developing and tweaking ML models, and are interested deeply in getting these models into production as cost effectively as possible. You thrive on taking initiatives, are very comfortable with ambiguity and can passionately defend your decisions.
Requirements and skills
- 3+ years of experience in programming languages such as Python, PySpark, or Scala.
- Proficient knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization, DevOps (Docker, Kubernetes),
- Beginner level knowledge of MLOps practices and platforms like MLflow.
- Strong understanding of ML algorithms and frameworks (e.g., TensorFlow, PyTorch).
- Broad understanding of data structures, data engineering, statistical methodologies and machine learning models.
Working at Moative
Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less. Here are some of our guiding principles:
- Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
- Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
- Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
- Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that loud.
- High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.
If this role and our work is of interest to you, please apply here. We encourage you to apply even if you believe you do not meet all the requirements listed above.
That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers.
The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to be present in the city. We intend to move to a hybrid model in a few months time.
Building the machine learning production (or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-of-the-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop ML pipelines to support
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Develop MLOps components in Machine learning development life cycle using Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 3-5 years experience building production-quality software.
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
- Strong experience in System Integration, Application Development or Data Warehouse projects across technologies used in the enterprise space
- Knowledge of MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- CI/CD experience( i.e. Jenkins, Git hub action,
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
Building the machine learning production System(or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-ofthe-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop scalable ML pipelines
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 5.5-9 years experience building production-quality software
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR equivalent
- Strong experience in System Integration, Application Development or Datawarehouse projects across technologies used in the enterprise space
- Expertise in MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- Experience developing CI/CD components for production ready ML pipeline.
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Team handling, problem solving, project management and communication skills & creative thinking
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
About Tazapay
Tazapay is a cross-border payment service provider, offering local collections via local payment methods, virtual accounts, and cards in over 70 markets. Merchants can conduct global transactions without creating local entities, while Tazapay ensures compliance with local regulations, resulting in reduced transaction costs, FX transparency, and higher authorization rates. Licensed and backed by top investors, Tazapay provides secure and efficient solutions.
What’s Exciting Waiting for You?
This is an incredible opportunity to join our team before we scale up. You'll be part of a growth story that includes roles across Sales, Software Development, Marketing, HR, and Accounting. Enjoy a unique experience building something from the ground up and gain the satisfaction of seeing your work impact thousands of customers. We foster a culture of openness, innovation, and creating great memories together. Ready for the ride?
About the Role
As a Data Engineer on our team, you’ll design, develop, and maintain ETL processes using AWS Glue, PySpark, Athena, QuickSight, and other data engineering tools. Responsibilities include transforming and loading data into our data lake, optimizing data models for efficient querying, implementing data governance policies, and ensuring data security, privacy, and compliance.
Requirements
Must-Have
- BE / B.Tech / M.Sc / MCA / ME / M.Tech
- 5+ years of expertise in data engineering
- Strong experience in AWS Glue PySpark for ETL processes
- Proficient with AWS Athena & SQL for data querying
- Expertise in AWS QuickSight for data visualization and dashboarding
- Solid understanding of data lake concepts, data modeling, and best practices
- Strong experience with AWS cloud services, including S3, Lambda, Redshift, Glue, and IAM
- Excellent analytical, problem-solving, written, and verbal communication skills
Nice-to-Have
- Fintech domain knowledge
- AWS certifications
- Experience with large data lake implementations
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.
2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.
Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.
Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.
Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.
Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.
Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.
Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.
Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.
Experience or exposure to design and development using Full Stack tools.
Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.
Bachelor's or master's degree in computer science or related field.
AWS Glue Developer
Work Experience: 6 to 8 Years
Work Location: Noida, Bangalore, Chennai & Hyderabad
Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops,
Job Reference ID:BT/F21/IND
Job Description:
Design, build and configure applications to meet business process and application requirements.
Responsibilities:
7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
Technical Experience:
Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.
➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
➢ Create data pipeline architecture by designing and implementing data ingestion solutions.
➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.
➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.
➢ Author ETL processes using Python, Pyspark.
➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.
➢ ETL process monitoring using CloudWatch events.
➢ You will be working in collaboration with other teams. Good communication must.
➢ Must have experience in using AWS services API, AWS CLI and SDK
Professional Attributes:
➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.
➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.
➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.
Qualification:
➢ Degree in Computer Science, Computer Engineering or equivalent.
Salary: Commensurate with experience and demonstrated competence
Analytics Job Description
We are hiring an Analytics Engineer to help drive our Business Intelligence efforts. You will
partner closely with leaders across the organization, working together to understand the how
and why of people, team and company challenges, workflows and culture. The team is
responsible for delivering data and insights that drive decision-making, execution, and
investments for our product initiatives.
You will work cross-functionally with product, marketing, sales, engineering, finance, and our
customer-facing teams enabling them with data and narratives about the customer journey.
You’ll also work closely with other data teams, such as data engineering and product analytics,
to ensure we are creating a strong data culture at Blend that enables our cross-functional partners
to be more data-informed.
Role : DataEngineer
Please find below the JD for the DataEngineer Role..
Location: Guindy,Chennai
How you’ll contribute:
• Develop objectives and metrics, ensure priorities are data-driven, and balance short-
term and long-term goals
• Develop deep analytical insights to inform and influence product roadmaps and
business decisions and help improve the consumer experience
• Work closely with GTM and supporting operations teams to author and develop core
data sets that empower analyses
• Deeply understand the business and proactively spot risks and opportunities
• Develop dashboards and define metrics that drive key business decisions
• Build and maintain scalable ETL pipelines via solutions such as Fivetran, Hightouch,
and Workato
• Design our Analytics and Business Intelligence architecture, assessing and
implementing new technologies that fitting
• Work with our engineering teams to continually make our data pipelines and tooling
more resilient
Who you are:
• Bachelor’s degree or equivalent required from an accredited institution with a
quantitative focus such as Economics, Operations Research, Statistics, Computer Science OR 1-3 Years of Experience as a Data Analyst, Data Engineer, Data Scientist
• Must have strong SQL and data modeling skills, with experience applying skills to
thoughtfully create data models in a warehouse environment.
• A proven track record of using analysis to drive key decisions and influence change
• Strong storyteller and ability to communicate effectively with managers and
executives
• Demonstrated ability to define metrics for product areas, understand the right
questions to ask and push back on stakeholders in the face of ambiguous, complex
problems, and work with diverse teams with different goals
• A passion for documentation.
• A solution-oriented growth mindset. You’ll need to be a self-starter and thrive in a
dynamic environment.
• A bias towards communication and collaboration with business and technical
stakeholders.
• Quantitative rigor and systems thinking.
• Prior startup experience is preferred, but not required.
• Interest or experience in machine learning techniques (such as clustering, decision
tree, and segmentation)
• Familiarity with a scientific computing language, such as Python, for data wrangling
and statistical analysis
• Experience with a SQL focused data transformation framework such as dbt
• Experience with a Business Intelligence Tool such as Mode/Tableau
Mandatory Skillset:
-Very Strong in SQL
-Spark OR pyspark OR Python
-Shell Scripting
at Altimetrik
Bigdata with cloud:
Experience : 5-10 years
Location : Hyderabad/Chennai
Notice period : 15-20 days Max
1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
2. Experience in developing lambda functions with AWS Lambda
3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
4. Should be able to code in Python and Scala.
5. Snowflake experience will be a plus
Skills and requirements
- Experience analyzing complex and varied data in a commercial or academic setting.
- Desire to solve new and complex problems every day.
- Excellent ability to communicate scientific results to both technical and non-technical team members.
Desirable
- A degree in a numerically focused discipline such as, Maths, Physics, Chemistry, Engineering or Biological Sciences..
- Hands on experience on Python, Pyspark, SQL
- Hands on experience on building End to End Data Pipelines.
- Hands on Experience on Azure Data Factory, Azure Data Bricks, Data Lake - added advantage
- Hands on Experience in building data pipelines.
- Experience with Bigdata Tools, Hadoop, Hive, Sqoop, Spark, SparkSQL
- Experience with SQL or NoSQL databases for the purposes of data retrieval and management.
- Experience in data warehousing and business intelligence tools, techniques and technology, as well as experience in diving deep on data analysis or technical issues to come up with effective solutions.
- BS degree in math, statistics, computer science or equivalent technical field.
- Experience in data mining structured and unstructured data (SQL, ETL, data warehouse, Machine Learning etc.) in a business environment with large-scale, complex data sets.
- Proven ability to look at solutions in unconventional ways. Sees opportunities to innovate and can lead the way.
- Willing to learn and work on Data Science, ML, AI.
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
Lead QA: more than 5 years experience , led the team of more than 5 people in big data platform, should have experience in Test Automation framework, should have experience of Test process documentation
A leading global information technology and business process
Python + Data scientist : |
• Build data-driven models to understand the characteristics of engineering systems |
• Train, tune, validate, and monitor predictive models |
• Sound knowledge on Statistics |
• Experience in developing data processing tasks using PySpark such as reading, merging, enrichment, loading of data from external systems to target data destinations |
• Working knowledge on Big Data or/and Hadoop environments |
• Experience creating CI/CD Pipelines using Jenkins or like tools |
• Practiced in eXtreme Programming (XP) disciplines |
We CondéNast are looking for a Support engineer Level 2 who would be responsible for
monitoring and maintaining the production systems to ensure the business continuity is
maintained. Your Responsibilities would also include prompt communication to business
and internal teams about process delays, stability, issue, resolutions.
Primary Responsibilities
● 5+ years experience in Production support
● The Support Data Engineer is responsible for monitoring of the data pipelines
that are in production.
● Level 3 support activities - Analysing issue, debug programs & Jobs, bug fix
● The position will contribute to the monitoring, rerun or reschedule, code fix
of pipelines for a variety of projects on a daily basis.
● Escalate failures to Data-Team/DevOps incase of Infrastructure Failures or unable
to revive the data-pipelines.
● Ensure accurate alerts are raised incase of pipeline failures and corresponding
stakeholders (Business/Data Teams) are notified about the same within the
agreed upon SLAs.
● Prepare and present success/failure metrics by accurately logging the
monitoring stats.
● Able to work in shifts to provide overlap with US Business teams
● Other duties as requested or assigned.
Desired Skills & Qualification
● Have Strong working knowledge of Pyspark, Informatica, SQL(PRESTO), Batch
Handling through schedulers(databricks, Astronomer will be an
advantage),AWS-S3, SQL, Airflow and Hive/Presto
● Have basic knowledge on Shell scripts and/or Bash commands.
● Able to execute queries in Databases and produce outputs.
● Able to understand and execute the steps provided by Data-Team to
revive data-pipelines.
● Strong verbal, written communication skills and strong interpersonal
skills.
● Graduate/Diploma in computer science or information technology.
About Condé Nast
CONDÉ NAST GLOBAL
Condé Nast is a global media house with over a century of distinguished publishing
history. With a portfolio of iconic brands like Vogue, GQ, Vanity Fair, The New Yorker and
Bon Appétit, we at Condé Nast aim to tell powerful, compelling stories of communities,
culture and the contemporary world. Our operations are headquartered in New York and
London, with colleagues and collaborators in 32 markets across the world, including
France, Germany, India, China, Japan, Spain, Italy, Russia, Mexico, and Latin America.
Condé Nast has been raising the industry standards and setting records for excellence in
the publishing space. Today, our brands reach over 1 billion people in print, online, video,
and social media.
CONDÉ NAST INDIA (DATA)
Over the years, Condé Nast successfully expanded and diversified into digital, TV, and
social platforms - in other words, a staggering amount of user data. Condé Nast made the
right move to invest heavily in understanding this data and formed a whole new Data
team entirely dedicated to data processing, engineering, analytics, and visualization. This
team helps drive engagement, fuel process innovation, further content enrichment, and
increase market revenue. The Data team aimed to create a company culture where data
was the common language and facilitate an environment where insights shared in
real-time could improve performance. The Global Data team operates out of Los Angeles,
New York, Chennai, and London. The team at Condé Nast Chennai works extensively with
data to amplify its brands' digital capabilities and boost online revenue. We are broadly
divided into four groups, Data Intelligence, Data Engineering, Data Science, and
Operations (including Product and Marketing Ops, Client Services) along with Data
Strategy and monetization. The teams built capabilities and products to create
data-driven solutions for better audience engagement.
What we look forward to:
We want to welcome bright, new minds into our midst and work together to create
diverse forms of self-expression. At Condé Nast, we encourage the imaginative and
celebrate the extraordinary. We are a media company for the future, with a remarkable
past. We are Condé Nast, and It Starts Here.
at Virtusa
- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Job Sector: IT, Software
Job Type: Permanent
Location: Chennai
Experience: 10 - 20 Years
Salary: 12 – 40 LPA
Education: Any Graduate
Notice Period: Immediate
Key Skills: Python, Spark, AWS, SQL, PySpark
Contact at triple eight two zero nine four two double seven
Job Description:
Requirements
- Minimum 12 years experience
- In depth understanding and knowledge on distributed computing with spark.
- Deep understanding of Spark Architecture and internals
- Proven experience in data ingestion, data integration and data analytics with spark, preferably PySpark.
- Expertise in ETL processes, data warehousing and data lakes.
- Hands on with python for Big data and analytics.
- Hands on in agile scrum model is an added advantage.
- Knowledge on CI/CD and orchestration tools is desirable.
- AWS S3, Redshift, Lambda knowledge is preferred
- Hands-on experience in Development
- 4-6 years of Hands on experience with Python scripts
- 2-3 years of Hands on experience in PySpark coding. Worked in spark cluster computing technology.
- 3-4 years of Hands on end to end data pipeline experience working on AWS environments
- 3-4 years of Hands on experience working on AWS services – Glue, Lambda, Step Functions, EC2, RDS, SES, SNS, DMS, CloudWatch etc.
- 2-3 years of Hands on experience working on AWS redshift
- 6+ years of Hands on experience with writing Unix Shell scripts
- Good communication skills
We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Roles and Responsibilities:
- Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
- Develop programs in Scala and Python as part of data cleaning and processing.
- Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
- Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
- Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Provide high operational excellence guaranteeing high availability and platform stability.
- Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Skills:
- Experience with Big Data pipeline, Big Data analytics, Data warehousing.
- Experience with SQL/No-SQL, schema design and dimensional data modeling.
- Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
- Experience in designing systems that process structured as well as unstructured data at large scale.
- Experience in AWS/Spark/Java/Scala/Python development.
- Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
- Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
- Prior exposure to streaming data sources such as Kafka.
- Should have knowledge on Shell Scripting and Python scripting.
- High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
- Experience with NoSQL databases such as Cassandra / MongoDB.
- Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
- Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
- Experience building and deploying applications on on-premise and cloud-based infrastructure.
- Having a good understanding of machine learning landscape and concepts.
Qualifications and Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.
Certifications:
Good to have at least one of the Certifications listed here:
AZ 900 - Azure Fundamentals
DP 200, DP 201, DP 203, AZ 204 - Data Engineering
AZ 400 - Devops Certification
We are looking for an outstanding ML Architect (Deployments) with expertise in deploying Machine Learning solutions/models into production and scaling them to serve millions of customers. A candidate with an adaptable and productive working style which fits in a fast-moving environment.
Skills:
- 5+ years deploying Machine Learning pipelines in large enterprise production systems.
- Experience developing end to end ML solutions from business hypothesis to deployment / understanding the entirety of the ML development life cycle.
- Expert in modern software development practices; solid experience using source control management (CI/CD).
- Proficient in designing relevant architecture / microservices to fulfil application integration, model monitoring, training / re-training, model management, model deployment, model experimentation/development, alert mechanisms.
- Experience with public cloud platforms (Azure, AWS, GCP).
- Serverless services like lambda, azure functions, and/or cloud functions.
- Orchestration services like data factory, data pipeline, and/or data flow.
- Data science workbench/managed services like azure machine learning, sagemaker, and/or AI platform.
- Data warehouse services like snowflake, redshift, bigquery, azure sql dw, AWS Redshift.
- Distributed computing services like Pyspark, EMR, Databricks.
- Data storage services like cloud storage, S3, blob, S3 Glacier.
- Data visualization tools like Power BI, Tableau, Quicksight, and/or Qlik.
- Proven experience serving up predictive algorithms and analytics through batch and real-time APIs.
- Solid working experience with software engineers, data scientists, product owners, business analysts, project managers, and business stakeholders to design the holistic solution.
- Strong technical acumen around automated testing.
- Extensive background in statistical analysis and modeling (distributions, hypothesis testing, probability theory, etc.)
- Strong hands-on experience with statistical packages and ML libraries (e.g., Python scikit learn, Spark MLlib, etc.)
- Experience in effective data exploration and visualization (e.g., Excel, Power BI, Tableau, Qlik, etc.)
- Experience in developing and debugging in one or more of the languages Java, Python.
- Ability to work in cross functional teams.
- Apply Machine Learning techniques in production including, but not limited to, neuralnets, regression, decision trees, random forests, ensembles, SVM, Bayesian models, K-Means, etc.
Roles and Responsibilities:
Deploying ML models into production, and scaling them to serve millions of customers.
Technical solutioning skills with deep understanding of technical API integrations, AI / Data Science, BigData and public cloud architectures / deployments in a SaaS environment.
Strong stakeholder relationship management skills - able to influence and manage the expectations of senior executives.
Strong networking skills with the ability to build and maintain strong relationships with both business, operations and technology teams internally and externally.
Provide software design and programming support to projects.
Qualifications & Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Machine Learning Architect (Deployments) or a similar role for 5-7 years.
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.