Cutshort logo
PySpark Jobs in Chennai

18+ PySpark Jobs in Chennai | PySpark Job openings in Chennai

Apply to 18+ PySpark Jobs in Chennai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

icon
Fractal Analytics

at Fractal Analytics

5 recruiters
Eman Khan
Posted by Eman Khan
Bengaluru (Bangalore), Hyderabad, Gurugram, Noida, Mumbai, Pune, Coimbatore, Chennai
3 - 5.5 yrs
₹18L - ₹25L / yr
MLOps
MLFlow
kubeflow
skill iconMachine Learning (ML)
skill iconPython
+6 more

Building the machine learning production (or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-of-the-art AI solutions for Fractal clients.


Responsibilities

As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.

  • Enable Model tracking, model experimentation, Model automation
  • Develop ML pipelines to support
  • Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
  • Develop MLOps components in Machine learning development life cycle using Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
  • Work across all phases of Model development life cycle to build MLOPS components
  • Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
  • Be an integral part of client business development and delivery engagements across multiple domains


Required Qualifications

  • 3-5 years experience building production-quality software.
  • B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
  • Strong experience in System Integration, Application Development or Data Warehouse projects across technologies used in the enterprise space
  • Knowledge of MLOps, machine learning and docker
  • Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
  • CI/CD experience( i.e. Jenkins, Git hub action,
  • Database programming using any flavors of SQL
  • Knowledge of Git for Source code management
  • Ability to collaborate effectively with highly technical resources in a fast-paced environment
  • Ability to solve complex challenges/problems and rapidly deliver innovative solutions
  • Foundational Knowledge of Cloud Computing on Azure
  • Hunger and passion for learning new skills
Read more
Fractal Analytics

at Fractal Analytics

5 recruiters
Eman Khan
Posted by Eman Khan
Bengaluru (Bangalore), Gurugram, Mumbai, Hyderabad, Pune, Noida, Coimbatore, Chennai
6 - 9 yrs
₹25L - ₹38L / yr
MLOps
MLFlow
kubeflow
skill iconMachine Learning (ML)
skill iconPython
+6 more

Building the machine learning production System(or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-ofthe-art AI solutions for Fractal clients.


Responsibilities

As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.

  • Enable Model tracking, model experimentation, Model automation
  • Develop scalable ML pipelines
  • Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
  • Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
  • Work across all phases of Model development life cycle to build MLOPS components
  • Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
  • Be an integral part of client business development and delivery engagements across multiple domains


Required Qualifications

  • 5.5-9 years experience building production-quality software
  • B.E/B.Tech/M.Tech in Computer Science or related technical degree OR equivalent
  • Strong experience in System Integration, Application Development or Datawarehouse projects across technologies used in the enterprise space
  • Expertise in MLOps, machine learning and docker
  • Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
  • Experience developing CI/CD components for production ready ML pipeline.
  • Database programming using any flavors of SQL
  • Knowledge of Git for Source code management
  • Ability to collaborate effectively with highly technical resources in a fast-paced environment
  • Ability to solve complex challenges/problems and rapidly deliver innovative solutions
  • Team handling, problem solving, project management and communication skills & creative thinking
  • Foundational Knowledge of Cloud Computing on Azure
  • Hunger and passion for learning new skills
Read more
Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai
3 - 7 yrs
₹8L - ₹15L / yr
AWS Lambda
Amazon S3
Amazon VPC
Amazon EC2
Amazon Redshift
+3 more

Technical Skills:


  • Ability to understand and translate business requirements into design.
  • Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
  • Experience in creating ETL jobs using Python/PySpark.
  • Proficiency in creating AWS Lambda functions for event-based jobs.
  • Knowledge of automating ETL processes using AWS Step Functions.
  • Competence in building data warehouses and loading data into them.


Responsibilities:


  • Understand business requirements and translate them into design.
  • Assess AWS infrastructure needs for development work.
  • Develop ETL jobs using Python/PySpark to meet requirements.
  • Implement AWS Lambda for event-based tasks.
  • Automate ETL processes using AWS Step Functions.
  • Build data warehouses and manage data loading.
  • Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Read more
one-to-one, one-to-many, and many-to-many

one-to-one, one-to-many, and many-to-many

Agency job
via The Hub by Sridevi Viswanathan
Chennai
5 - 10 yrs
₹1L - ₹15L / yr
AWS CloudFormation
skill iconPython
PySpark
AWS Lambda

5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.

2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.

Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.

Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.

Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.

Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.

Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.

Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.

Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.

Experience or exposure to design and development using Full Stack tools.

Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.

Bachelor's or master's degree in computer science or related field.

 

 

Read more
A fast growing Big Data company

A fast growing Big Data company

Agency job
via Careerconnects by Kumar Narayanan
Noida, Bengaluru (Bangalore), Chennai, Hyderabad
6 - 8 yrs
₹10L - ₹15L / yr
AWS Glue
SQL
skill iconPython
PySpark
Data engineering
+6 more

AWS Glue Developer 

Work Experience: 6 to 8 Years

Work Location:  Noida, Bangalore, Chennai & Hyderabad

Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops, 

Job Reference ID:BT/F21/IND


Job Description:

Design, build and configure applications to meet business process and application requirements.


Responsibilities:

7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.


Technical Experience:

Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.


➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.

➢ Create data pipeline architecture by designing and implementing data ingestion solutions.

➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.

➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.

➢ Author ETL processes using Python, Pyspark.

➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.

➢ ETL process monitoring using CloudWatch events.

➢ You will be working in collaboration with other teams. Good communication must.

➢ Must have experience in using AWS services API, AWS CLI and SDK


Professional Attributes:

➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.

➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.

➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.


Qualification:

➢ Degree in Computer Science, Computer Engineering or equivalent.


Salary: Commensurate with experience and demonstrated competence

Read more
A Product Based Client,Chennai

A Product Based Client,Chennai

Agency job
via SangatHR by Anna Poorni
Chennai
4 - 8 yrs
₹10L - ₹15L / yr
Data Warehouse (DWH)
Informatica
ETL
Spark
PySpark
+2 more

Analytics Job Description

We are hiring an Analytics Engineer to help drive our Business Intelligence efforts. You will

partner closely with leaders across the organization, working together to understand the how

and why of people, team and company challenges, workflows and culture. The team is

responsible for delivering data and insights that drive decision-making, execution, and

investments for our product initiatives.

You will work cross-functionally with product, marketing, sales, engineering, finance, and our

customer-facing teams enabling them with data and narratives about the customer journey.

You’ll also work closely with other data teams, such as data engineering and product analytics,

to ensure we are creating a strong data culture at Blend that enables our cross-functional partners

to be more data-informed.


Role : DataEngineer 

Please find below the JD for the DataEngineer Role..

  Location: Guindy,Chennai

How you’ll contribute:

• Develop objectives and metrics, ensure priorities are data-driven, and balance short-

term and long-term goals


• Develop deep analytical insights to inform and influence product roadmaps and

business decisions and help improve the consumer experience

• Work closely with GTM and supporting operations teams to author and develop core

data sets that empower analyses

• Deeply understand the business and proactively spot risks and opportunities

• Develop dashboards and define metrics that drive key business decisions

• Build and maintain scalable ETL pipelines via solutions such as Fivetran, Hightouch,

and Workato

• Design our Analytics and Business Intelligence architecture, assessing and

implementing new technologies that fitting


• Work with our engineering teams to continually make our data pipelines and tooling

more resilient


Who you are:

• Bachelor’s degree or equivalent required from an accredited institution with a

quantitative focus such as Economics, Operations Research, Statistics, Computer Science OR 1-3 Years of Experience as a Data Analyst, Data Engineer, Data Scientist

• Must have strong SQL and data modeling skills, with experience applying skills to

thoughtfully create data models in a warehouse environment.

• A proven track record of using analysis to drive key decisions and influence change

• Strong storyteller and ability to communicate effectively with managers and

executives

• Demonstrated ability to define metrics for product areas, understand the right

questions to ask and push back on stakeholders in the face of ambiguous, complex

problems, and work with diverse teams with different goals

• A passion for documentation.

• A solution-oriented growth mindset. You’ll need to be a self-starter and thrive in a

dynamic environment.

• A bias towards communication and collaboration with business and technical

stakeholders.

• Quantitative rigor and systems thinking.

• Prior startup experience is preferred, but not required.

• Interest or experience in machine learning techniques (such as clustering, decision

tree, and segmentation)

• Familiarity with a scientific computing language, such as Python, for data wrangling

and statistical analysis

• Experience with a SQL focused data transformation framework such as dbt

• Experience with a Business Intelligence Tool such as Mode/Tableau


Mandatory Skillset:


-Very Strong in SQL

-Spark OR pyspark OR Python

-Shell Scripting


Read more
Chennai, Hyderabad
5 - 10 yrs
₹10L - ₹25L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+2 more

Bigdata with cloud:

 

Experience : 5-10 years

 

Location : Hyderabad/Chennai

 

Notice period : 15-20 days Max

 

1.  Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight

2.  Experience in developing lambda functions with AWS Lambda

3.  Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark

4.  Should be able to code in Python and Scala.

5.  Snowflake experience will be a plus

Read more
Aureus Tech Systems

at Aureus Tech Systems

3 recruiters
Naveen Yelleti
Posted by Naveen Yelleti
Kolkata, Hyderabad, Chennai, Bengaluru (Bangalore), Bhubaneswar, Visakhapatnam, Vijayawada, Trichur, Thiruvananthapuram, Mysore, Delhi, Noida, Gurugram, Nagpur
1 - 7 yrs
₹4L - ₹15L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+2 more

Skills and requirements

  • Experience analyzing complex and varied data in a commercial or academic setting.
  • Desire to solve new and complex problems every day.
  • Excellent ability to communicate scientific results to both technical and non-technical team members.


Desirable

  • A degree in a numerically focused discipline such as, Maths, Physics, Chemistry, Engineering or Biological Sciences..
  • Hands on experience on Python, Pyspark, SQL
  • Hands on experience on building End to End Data Pipelines.
  • Hands on Experience on Azure Data Factory, Azure Data Bricks, Data Lake - added advantage
  • Hands on Experience in building data pipelines.
  • Experience with Bigdata Tools, Hadoop, Hive, Sqoop, Spark, SparkSQL
  • Experience with SQL or NoSQL databases for the purposes of data retrieval and management.
  • Experience in data warehousing and business intelligence tools, techniques and technology, as well as experience in diving deep on data analysis or technical issues to come up with effective solutions.
  • BS degree in math, statistics, computer science or equivalent technical field.
  • Experience in data mining structured and unstructured data (SQL, ETL, data warehouse, Machine Learning etc.) in a business environment with large-scale, complex data sets.
  • Proven ability to look at solutions in unconventional ways. Sees opportunities to innovate and can lead the way.
  • Willing to learn and work on Data Science, ML, AI.
Read more
GradMener Technology Pvt. Ltd.
Pune, Chennai
5 - 9 yrs
₹15L - ₹20L / yr
skill iconScala
PySpark
Spark
SQL Azure
Hadoop
+4 more
  • 5+ years of experience in a Data Engineering role on cloud environment
  • Must have good experience in Scala/PySpark (preferably on data-bricks environment)
  • Extensive experience with Transact-SQL.
  • Experience in Data-bricks/Spark.
  • Strong experience in Dataware house projects
  • Expertise in database development projects with ETL processes.
  • Manage and maintain data engineering pipelines
  • Develop batch processing, streaming and integration solutions
  • Experienced in building and operationalizing large-scale enterprise data solutions and applications
  • Using one or more of Azure data and analytics services in combination with custom solutions
  • Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
  • In-depth understanding of data management (e. g. permissions, security, and monitoring).
  • Cloud repositories for e.g. Azure GitHub, Git
  • Experience in an agile environment (Prefer Azure DevOps).

Good to have

  • Manage source data access security
  • Automate Azure Data Factory pipelines
  • Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
  • Experience in implementing and maintaining CICD pipelines
  • Power BI understanding, Delta Lake house architecture
  • Knowledge of software development best practices.
  • Excellent analytical and organization skills.
  • Effective working in a team as well as working independently.
  • Strong written and verbal communication skills.
  • Expertise in database development projects and ETL processes.
Read more
MNC

MNC

Agency job
via Eurka IT SOL by Srikanth a
Chennai
5 - 11 yrs
₹10L - ₹15L / yr
PySpark
SQL
Test Automation (QA)
Big Data
skill iconData Science

Lead QA: more than 5 years experience , led the team of more than 5 people in big data platform, should have experience in Test Automation framework, should have experience of Test process documentation

Read more
A leading global information technology and business process

A leading global information technology and business process

Agency job
via Jobdost by Mamatha A
Chennai
5 - 14 yrs
₹13L - ₹21L / yr
skill iconPython
skill iconJava
PySpark
skill iconJavascript
Hadoop

Python + Data scientist : 
• Hands-on and sound knowledge of Python, Pyspark, Java script

• Build data-driven models to understand the characteristics of engineering systems

• Train, tune, validate, and monitor predictive models

• Sound knowledge on Statistics

• Experience in developing data processing tasks using PySpark such as reading,

merging, enrichment, loading of data from external systems to target data destinations

• Working knowledge on Big Data or/and Hadoop environments

• Experience creating CI/CD Pipelines using Jenkins or like tools

• Practiced in eXtreme Programming (XP) disciplines 

Read more
Amagi Media Labs

at Amagi Media Labs

3 recruiters
Rajesh C
Posted by Rajesh C
Chennai
5 - 7 yrs
₹20L - ₹30L / yr
Technical support
Tech Support
SQL
Informatica
PySpark
+1 more
Job Title: Support Engineer L3 Job Location: Chennai
We CondéNast are looking for a Support engineer Level 2 who would be responsible for
monitoring and maintaining the production systems to ensure the business continuity is
maintained. Your Responsibilities would also include prompt communication to business
and internal teams about process delays, stability, issue, resolutions.
Primary Responsibilities
● 5+ years experience in Production support
● The Support Data Engineer is responsible for monitoring of the data pipelines
that are in production.
● Level 3 support activities - Analysing issue, debug programs & Jobs, bug fix
● The position will contribute to the monitoring, rerun or reschedule, code fix
of pipelines for a variety of projects on a daily basis.
● Escalate failures to Data-Team/DevOps incase of Infrastructure Failures or unable
to revive the data-pipelines.
● Ensure accurate alerts are raised incase of pipeline failures and corresponding
stakeholders (Business/Data Teams) are notified about the same within the
agreed upon SLAs.
● Prepare and present success/failure metrics by accurately logging the
monitoring stats.
● Able to work in shifts to provide overlap with US Business teams
● Other duties as requested or assigned.
Desired Skills & Qualification
● Have Strong working knowledge of Pyspark, Informatica, SQL(PRESTO), Batch
Handling through schedulers(databricks, Astronomer will be an
advantage),AWS-S3, SQL, Airflow and Hive/Presto
● Have basic knowledge on Shell scripts and/or Bash commands.
● Able to execute queries in Databases and produce outputs.
● Able to understand and execute the steps provided by Data-Team to
revive data-pipelines.
● Strong verbal, written communication skills and strong interpersonal
skills.
● Graduate/Diploma in computer science or information technology.
About Condé Nast
CONDÉ NAST GLOBAL
Condé Nast is a global media house with over a century of distinguished publishing
history. With a portfolio of iconic brands like Vogue, GQ, Vanity Fair, The New Yorker and
Bon Appétit, we at Condé Nast aim to tell powerful, compelling stories of communities,
culture and the contemporary world. Our operations are headquartered in New York and
London, with colleagues and collaborators in 32 markets across the world, including
France, Germany, India, China, Japan, Spain, Italy, Russia, Mexico, and Latin America.
Condé Nast has been raising the industry standards and setting records for excellence in
the publishing space. Today, our brands reach over 1 billion people in print, online, video,
and social media.
CONDÉ NAST INDIA (DATA)
Over the years, Condé Nast successfully expanded and diversified into digital, TV, and
social platforms - in other words, a staggering amount of user data. Condé Nast made the
right move to invest heavily in understanding this data and formed a whole new Data
team entirely dedicated to data processing, engineering, analytics, and visualization. This
team helps drive engagement, fuel process innovation, further content enrichment, and
increase market revenue. The Data team aimed to create a company culture where data
was the common language and facilitate an environment where insights shared in
real-time could improve performance. The Global Data team operates out of Los Angeles,
New York, Chennai, and London. The team at Condé Nast Chennai works extensively with
data to amplify its brands' digital capabilities and boost online revenue. We are broadly
divided into four groups, Data Intelligence, Data Engineering, Data Science, and
Operations (including Product and Marketing Ops, Client Services) along with Data
Strategy and monetization. The teams built capabilities and products to create
data-driven solutions for better audience engagement.
What we look forward to:
We want to welcome bright, new minds into our midst and work together to create
diverse forms of self-expression. At Condé Nast, we encourage the imaginative and
celebrate the extraordinary. We are a media company for the future, with a remarkable
past. We are Condé Nast, and It Starts Here.
Read more
Virtusa

at Virtusa

2 recruiters
Agency job
via Response Informatics by Anupama Lavanya Uppala
Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune
3 - 10 yrs
₹10L - ₹25L / yr
PySpark
skill iconPython
  • Minimum 1 years of relevant experience, in PySpark (mandatory)
  • Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus 
  • Ability to play lead role and independently manage 3-5 member of Pyspark development team 
  • EMR ,Python and PYspark mandate.
  • Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Read more
Agilisium

Agilisium

Agency job
via Recruiting India by Moumita Santra
Chennai
10 - 19 yrs
₹12L - ₹40L / yr
Big Data
Apache Spark
Spark
PySpark
ETL
+1 more

Job Sector: IT, Software

Job Type: Permanent

Location: Chennai

Experience: 10 - 20 Years

Salary: 12 – 40 LPA

Education: Any Graduate

Notice Period: Immediate

Key Skills: Python, Spark, AWS, SQL, PySpark

Contact at triple eight two zero nine four two double seven

 

Job Description:

Requirements

  • Minimum 12 years experience
  • In depth understanding and knowledge on distributed computing with spark.
  • Deep understanding of Spark Architecture and internals
  • Proven experience in data ingestion, data integration and data analytics with spark, preferably PySpark.
  • Expertise in ETL processes, data warehousing and data lakes.
  • Hands on with python for Big data and analytics.
  • Hands on in agile scrum model is an added advantage.
  • Knowledge on CI/CD and orchestration tools is desirable.
  • AWS S3, Redshift, Lambda knowledge is preferred
Thanks
Read more
Virtusa

at Virtusa

2 recruiters
Agency job
via Devenir by Rakesh Kumar
Chennai, Hyderabad
4 - 6 yrs
₹10L - ₹20L / yr
PySpark
skill iconAmazon Web Services (AWS)
skill iconPython
  • Hands-on experience in Development
  • 4-6 years of Hands on experience with Python scripts
  • 2-3 years of Hands on experience in PySpark coding. Worked in spark cluster computing technology.
  • 3-4 years of Hands on end to end data pipeline experience working on AWS environments
  • 3-4 years of Hands on experience working on AWS services – Glue, Lambda, Step Functions, EC2, RDS, SES, SNS, DMS, CloudWatch etc.
  • 2-3 years of Hands on experience working on AWS redshift
  • 6+ years of Hands on experience with writing Unix Shell scripts
  • Good communication skills
Read more
netmedscom

at netmedscom

3 recruiters
Vijay Hemnath
Posted by Vijay Hemnath
Chennai
2 - 5 yrs
₹6L - ₹25L / yr
Big Data
Hadoop
Apache Hive
skill iconScala
Spark
+12 more

We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Roles and Responsibilities:

  • Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
  • Develop programs in Scala and Python as part of data cleaning and processing.
  • Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.  
  • Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
  • Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
  • Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Provide high operational excellence guaranteeing high availability and platform stability.
  • Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Skills:

  • Experience with Big Data pipeline, Big Data analytics, Data warehousing.
  • Experience with SQL/No-SQL, schema design and dimensional data modeling.
  • Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
  • Experience in designing systems that process structured as well as unstructured data at large scale.
  • Experience in AWS/Spark/Java/Scala/Python development.
  • Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
  • Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
  • Prior exposure to streaming data sources such as Kafka.
  • Should have knowledge on Shell Scripting and Python scripting.
  • High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
  • Experience with NoSQL databases such as Cassandra / MongoDB.
  • Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
  • Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
  • Experience building and deploying applications on on-premise and cloud-based infrastructure.
  • Having a good understanding of machine learning landscape and concepts. 

 

Qualifications and Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.

Certifications:

Good to have at least one of the Certifications listed here:

    AZ 900 - Azure Fundamentals

    DP 200, DP 201, DP 203, AZ 204 - Data Engineering

    AZ 400 - Devops Certification

Read more
netmedscom

at netmedscom

3 recruiters
Vijay Hemnath
Posted by Vijay Hemnath
Chennai
5 - 10 yrs
₹10L - ₹30L / yr
skill iconMachine Learning (ML)
Software deployment
CI/CD
Cloud Computing
Snow flake schema
+19 more

We are looking for an outstanding ML Architect (Deployments) with expertise in deploying Machine Learning solutions/models into production and scaling them to serve millions of customers. A candidate with an adaptable and productive working style which fits in a fast-moving environment.

 

Skills:

- 5+ years deploying Machine Learning pipelines in large enterprise production systems.

- Experience developing end to end ML solutions from business hypothesis to deployment / understanding the entirety of the ML development life cycle.
- Expert in modern software development practices; solid experience using source control management (CI/CD).
- Proficient in designing relevant architecture / microservices to fulfil application integration, model monitoring, training / re-training, model management, model deployment, model experimentation/development, alert mechanisms.
- Experience with public cloud platforms (Azure, AWS, GCP).
- Serverless services like lambda, azure functions, and/or cloud functions.
- Orchestration services like data factory, data pipeline, and/or data flow.
- Data science workbench/managed services like azure machine learning, sagemaker, and/or AI platform.
- Data warehouse services like snowflake, redshift, bigquery, azure sql dw, AWS Redshift.
- Distributed computing services like Pyspark, EMR, Databricks.
- Data storage services like cloud storage, S3, blob, S3 Glacier.
- Data visualization tools like Power BI, Tableau, Quicksight, and/or Qlik.
- Proven experience serving up predictive algorithms and analytics through batch and real-time APIs.
- Solid working experience with software engineers, data scientists, product owners, business analysts, project managers, and business stakeholders to design the holistic solution.
- Strong technical acumen around automated testing.
- Extensive background in statistical analysis and modeling (distributions, hypothesis testing, probability theory, etc.)
- Strong hands-on experience with statistical packages and ML libraries (e.g., Python scikit learn, Spark MLlib, etc.)
- Experience in effective data exploration and visualization (e.g., Excel, Power BI, Tableau, Qlik, etc.)
- Experience in developing and debugging in one or more of the languages Java, Python.
- Ability to work in cross functional teams.
- Apply Machine Learning techniques in production including, but not limited to, neuralnets, regression, decision trees, random forests, ensembles, SVM, Bayesian models, K-Means, etc.

 

Roles and Responsibilities:

Deploying ML models into production, and scaling them to serve millions of customers.

Technical solutioning skills with deep understanding of technical API integrations, AI / Data Science, BigData and public cloud architectures / deployments in a SaaS environment.

Strong stakeholder relationship management skills - able to influence and manage the expectations of senior executives.
Strong networking skills with the ability to build and maintain strong relationships with both business, operations and technology teams internally and externally.

Provide software design and programming support to projects.

 

 Qualifications & Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Machine Learning Architect (Deployments) or a similar role for 5-7 years.

 

Read more
Fragma Data Systems

at Fragma Data Systems

8 recruiters
Evelyn Charles
Posted by Evelyn Charles
Remote, Bengaluru (Bangalore), Hyderabad, Chennai, Mumbai, Pune
8 - 15 yrs
₹16L - ₹28L / yr
PySpark
SQL Azure
azure synapse
Windows Azure
Azure Data Engineer
+3 more
Technology Skills:
  • Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
  • Experience in migrating on-premise data warehouses to data platforms on AZURE cloud. 
  • Designing and implementing data engineering, ingestion, and transformation functions
Good to Have: 
  • Experience with Azure Analysis Services
  • Experience in Power BI
  • Experience with third-party solutions like Attunity/Stream sets, Informatica
  • Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
  • Capacity Planning and Performance Tuning on Azure Stack and Spark.
Read more
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Find more jobs
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort