Cutshort logo
Molecular Connections logo
Sr. Spark/AWS Developer
Sr. Spark/AWS Developer
Molecular Connections's logo

Sr. Spark/AWS Developer

Molecular Connections's profile picture
Posted by Molecular Connections
8 - 10 yrs
₹15L - ₹20L / yr
Bengaluru (Bangalore)
Skills
PySpark
Data engineering
Big Data
Hadoop
Spark
Apache Kafka
skill iconScala
skill iconPython
Data modeling
  1. Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
  2. A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
  3. Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
  4. Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
  5. Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
  6. Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
  7. Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
  8. Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
  9. Exposure to Cloudera development environment and management using Cloudera Manager.
  10. Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
  11. Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  12. Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  13. Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
  14. Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
  15. Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
  16. Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  17. Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  18. In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
  19. Hands on expertise in real time analytics with Apache Spark.
  20. Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
  21. Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
  22. Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  23. Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
  24. Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  25. Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  26. Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
  27. Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
  28. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
  29. Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis. 
  30. Generated various kinds of knowledge reports using Power BI based on Business specification. 
  31. Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
  32. Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
  33. Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
  34. Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  35. Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
  36. Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
  37. Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
  38. Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  39. Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Molecular Connections

Founded :
2001
Type
Size :
1000-5000
Stage :
Profitable
About
N/A
Connect with the team
Profile picture
Molecular Connections
Profile picture
Chendil Kumar
Profile picture
Gurminder kaur
Profile picture
Lokanath Khamari
Company social profiles
N/A

Similar jobs

globe teleservices
deepshikha thapar
Posted by deepshikha thapar
Bengaluru (Bangalore)
4 - 8 yrs
₹10L - ₹15L / yr
skill iconPython
SQL

RESPONSIBILITIES:

 Requirement understanding and elicitation, analyze, data/workflows, contribute to product

project and Proof of concept (POC)

 Contribute to prepare design documents and effort estimations.

 Develop AI/ML Models using best in-class ML models.

 Building, testing, and deploying AI/ML solutions.

 Work with Business Analysts and Product Managers to assist with defining functional user

stories.

 Ensure deliverables across teams are of high quality and clearly documented. 

 Recommend best ML practices/Industry standards for any ML use case.

 Proactively take up R and D and recommend solution options for any ML use case.

REQUIREMENTS:

Required Skills

 Overall experience of 4 to 7 Years working on AI/ML framework development

 Good programming knowledge in Python is must.

 Good Knowledge of R and SAS is desired.

 Good hands on and working knowledge SQL, Data Model, CRISP-DM.

 Proficiency with Uni/multivariate statistics, algorithm design, and predictive AI/ML modelling.

 Strong knowledge of machine learning algorithms, linear regression, logistic regression, KNN,

Random Forest, Support Vector Machines and Natural Language Processing.

 Experience with NLP and deep neural networks using synthetic and artificial data.

 Involved in different phases of SDLC and have good working exposure on different SLDC’s like

Agile Methodologies.

Read more
Mumbai
5 - 10 yrs
₹8L - ₹20L / yr
skill iconData Science
skill iconMachine Learning (ML)
Natural Language Processing (NLP)
Computer Vision
recommendation algorithm
+6 more


Data Scientist – Delivery & New Frontiers Manager 

Job Description:   

We are seeking highly skilled and motivated data scientist to join our Data Science team. The successful candidate will play a pivotal role in our data-driven initiatives and be responsible for designing, developing, and deploying data science solutions that drives business values for stakeholders. This role involves mapping business problems to a formal data science solution, working with wide range of structured and unstructured data, architecture design, creating sophisticated models, setting up operations for the data science product with the support from MLOps team and facilitating business workshops. In a nutshell, this person will represent data science and provide expertise in the full project cycle. Expectation of the successful candidate will be above that of a typical data scientist. Beyond technical expertise, problem solving in complex set-up will be key to the success for this role. 

Responsibilities: 

  • Collaborate with cross-functional teams, including software engineers, product managers, and business stakeholders, to understand business needs and identify data science opportunities. 
  • Map complex business problems to data science problem, design data science solution using GCP/Azure Databricks platform. 
  • Collect, clean, and preprocess large datasets from various internal and external sources.  
  • Streamlining data science process working with Data Engineering, and Technology teams. 
  • Managing multiple analytics projects within a Function to deliver end-to-end data science solutions, creation of insights and identify patterns.  
  • Develop and maintain data pipelines and infrastructure to support the data science projects 
  • Communicate findings and recommendations to stakeholders through data visualizations and presentations. 
  • Stay up to date with the latest data science trends and technologies, specifically for GCP companies 

 

Education / Certifications:  

Bachelor’s or Master’s in Computer Science, Engineering, Computational Statistics, Mathematics. 

Job specific requirements:  

  • Brings 5+ years of deep data science experience 

∙       Strong knowledge of machine learning and statistical modeling techniques in a in a clouds-based environment such as GCP, Azure, Amazon 

  • Experience with programming languages such as Python, R, Spark 
  • Experience with data visualization tools such as Tableau, Power BI, and D3.js 
  • Strong understanding of data structures, algorithms, and software design principles 
  • Experience with GCP platforms and services such as Big Query, Cloud ML Engine, and Cloud Storage 
  • Experience in configuring and setting up the version control on Code, Data, and Machine Learning Models using GitHub. 
  • Self-driven, be able to work with cross-functional teams in a fast-paced environment, adaptability to the changing business needs. 
  • Strong analytical and problem-solving skills 
  • Excellent verbal and written communication skills 
  • Working knowledge with application architecture, data security and compliance team. 


Read more
Curl Analytics
Agency job
via wrackle by Naveen Taalanki
Bengaluru (Bangalore)
5 - 10 yrs
₹15L - ₹30L / yr
ETL
Big Data
Data engineering
Apache Kafka
PySpark
+11 more
What you will do
  • Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
    • programmatically ingesting data from several static and real-time sources (incl. web scraping)
    • rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
    • performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
  • Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
  • Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
  • Build data tools to facilitate fast data cleaning and statistical analysis
  • Ensure data architecture is secure and compliant
  • Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
  • Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).

You should be

  •  Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
  • Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
  • Expert in shell scripting and writing schedulers
  • Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
  • Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
  • Strong knowledge of data security best practices
  • 5+ years experience in a data engineering role
  • Science / Engineering graduate from a Tier-1 university in the country
  • And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
Read more
TensorGo Software Private Limited
Deepika Agarwal
Posted by Deepika Agarwal
Remote only
5 - 8 yrs
₹5L - ₹15L / yr
skill iconPython
PySpark
apache airflow
Spark
Hadoop
+4 more

Requirements:

● Understanding our data sets and how to bring them together.

● Working with our engineering team to support custom solutions offered to the product development.

● Filling the gap between development, engineering and data ops.

● Creating, maintaining and documenting scripts to support ongoing custom solutions.

● Excellent organizational skills, including attention to precise details

● Strong multitasking skills and ability to work in a fast-paced environment

● 5+ years experience with Python to develop scripts.

● Know your way around RESTFUL APIs.[Able to integrate not necessary to publish]

● You are familiar with pulling and pushing files from SFTP and AWS S3.

● Experience with any Cloud solutions including GCP / AWS / OCI / Azure.

● Familiarity with SQL programming to query and transform data from relational Databases.

● Familiarity to work with Linux (and Linux work environment).

● Excellent written and verbal communication skills

● Extracting, transforming, and loading data into internal databases and Hadoop

● Optimizing our new and existing data pipelines for speed and reliability

● Deploying product build and product improvements

● Documenting and managing multiple repositories of code

● Experience with SQL and NoSQL databases (Casendra, MySQL)

● Hands-on experience in data pipelining and ETL. (Any of these frameworks/tools: Hadoop, BigQuery,

RedShift, Athena)

● Hands-on experience in AirFlow

● Understanding of best practices, common coding patterns and good practices around

● storing, partitioning, warehousing and indexing of data

● Experience in reading the data from Kafka topic (both live stream and offline)

● Experience in PySpark and Data frames

Responsibilities:

You’ll

● Collaborating across an agile team to continuously design, iterate, and develop big data systems.

● Extracting, transforming, and loading data into internal databases.

● Optimizing our new and existing data pipelines for speed and reliability.

● Deploying new products and product improvements.

● Documenting and managing multiple repositories of code.

Read more
codersbrain
at codersbrain
1 recruiter
Tanuj Uppal
Posted by Tanuj Uppal
Delhi
4 - 8 yrs
₹2L - ₹15L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+5 more
  • Mandatory - Hands on experience in Python and PySpark.

 

  • Build pySpark applications using Spark Dataframes in Python using Jupyter notebook and PyCharm(IDE).

 

  • Worked on optimizing spark jobs that processes huge volumes of data.

 

  • Hands on experience in version control tools like Git.

 

  • Worked on Amazon’s Analytics services like Amazon EMR, Lambda function etc

 

  • Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.

 

  • Experience/knowledge of bash/shell scripting will be a plus.

 

  • Experience in working with fixed width, delimited , multi record file formats etc.

 

  • Hands on experience in tools like Jenkins to build, test and deploy the applications

 

  • Awareness of Devops concepts and be able to work in an automated release pipeline environment.

 

  • Excellent debugging skills.
Read more
Compile
at Compile
16 recruiters
Sarumathi NH
Posted by Sarumathi NH
Bengaluru (Bangalore)
7 - 10 yrs
Best in industry
Data Warehouse (DWH)
Informatica
ETL
Spark

You will be responsible for designing, building, and maintaining data pipelines that handle Real-world data at Compile. You will be handling both inbound and outbound data deliveries at Compile for datasets including Claims, Remittances, EHR, SDOH, etc.

You will

  • Work on building and maintaining data pipelines (specifically RWD).
  • Build, enhance and maintain existing pipelines in pyspark, python and help build analytical insights and datasets.
  • Scheduling and maintaining pipeline jobs for RWD.
  • Develop, test, and implement data solutions based on the design.
  • Design and implement quality checks on existing and new data pipelines.
  • Ensure adherence to security and compliance that is required for the products.
  • Maintain relationships with various data vendors and track changes and issues across vendors and deliveries.

You have

  • Hands-on experience with ETL process (min of 5 years).
  • Excellent communication skills and ability to work with multiple vendors.
  • High proficiency with Spark, SQL.
  • Proficiency in Data modeling, validation, quality check, and data engineering concepts.
  • Experience in working with big-data processing technologies using - databricks, dbt, S3, Delta lake, Deequ, Griffin, Snowflake, BigQuery.
  • Familiarity with version control technologies, and CI/CD systems.
  • Understanding of scheduling tools like Airflow/Prefect.
  • Min of 3 years of experience managing data warehouses.
  • Familiarity with healthcare datasets is a plus.

Compile embraces diversity and equal opportunity in a serious way. We are committed to building a team of people from many backgrounds, perspectives, and skills. We know the more inclusive we are, the better our work will be.         

Read more
leading pharmacy provider
Agency job
via Econolytics by Jyotsna Econolytics
Noida, NCR (Delhi | Gurgaon | Noida)
4 - 10 yrs
₹18L - ₹24L / yr
skill iconData Science
skill iconPython
skill iconR Programming
Algorithms
Predictive modelling
Job Description:

• Help build a Data Science team which will be engaged in researching, designing,
implementing, and deploying full-stack scalable data analytics vision and machine learning
solutions to challenge various business issues.
• Modelling complex algorithms, discovering insights and identifying business
opportunities through the use of algorithmic, statistical, visualization, and mining techniques
• Translates business requirements into quick prototypes and enable the
development of big data capabilities driving business outcomes
• Responsible for data governance and defining data collection and collation
guidelines.
• Must be able to advice, guide and train other junior data engineers in their job.

Must Have:

• 4+ experience in a leadership role as a Data Scientist
• Preferably from retail, Manufacturing, Healthcare industry(not mandatory)
• Willing to work from scratch and build up a team of Data Scientists
• Open for taking up the challenges with end to end ownership
• Confident with excellent communication skills along with a good decision maker
Read more
UAE Client
Remote, Bengaluru (Bangalore), Hyderabad
6 - 10 yrs
₹15L - ₹22L / yr
Informatica
Big Data
SQL
Hadoop
Apache Spark
+1 more

Skills- Informatica with Big Data Management

 

1.Minimum 6 to 8 years of experience in informatica BDM development
2.Experience working on Spark/SQL
3.Develops informtica mapping/Sql 

4. Should have experience in Hadoop, spark etc
Read more
Prescience Decision Solutions
Shivakumar K
Posted by Shivakumar K
Bengaluru (Bangalore)
3 - 7 yrs
₹10L - ₹20L / yr
Big Data
ETL
Spark
Apache Kafka
Apache Spark
+4 more

The Data Engineer would be responsible for selecting and integrating Big Data tools and frameworks required. Would implement Data Ingestion & ETL/ELT processes

Required Experience, Skills and Qualifications:

  • Hands on experience on Big Data tools/technologies like Spark,  Databricks, Map Reduce, Hive, HDFS.
  • Expertise and excellent understanding of big data toolset such as Sqoop, Spark-streaming, Kafka, NiFi
  • Proficiency in any of the programming language: Python/ Scala/  Java with 4+ years’ experience
  • Experience in Cloud infrastructures like MS Azure, Data lake etc
  • Good working knowledge in NoSQL DB (Mongo, HBase, Casandra)
Read more
Japan Based Leading Company
Bengaluru (Bangalore)
3 - 10 yrs
₹0L - ₹20L / yr
Big Data
skill iconAmazon Web Services (AWS)
skill iconJava
skill iconPython
MySQL
+2 more
A data engineer with AWS Cloud infrastructure experience to join our Big Data Operations team. This role will provide advanced operations support, contribute to automation and system improvements, and work directly with enterprise customers to provide excellent customer service.
The candidate,
1. Must have a very good hands-on technical experience of 3+ years with JAVA or Python
2. Working experience and good understanding of AWS Cloud; Advanced experience with IAM policy and role management
3. Infrastructure Operations: 5+ years supporting systems infrastructure operations, upgrades, deployments using Terraform, and monitoring
4. Hadoop: Experience with Hadoop (Hive, Spark, Sqoop) and / or AWS EMR
5. Knowledge on PostgreSQL/MySQL/Dynamo DB backend operations
6. DevOps: Experience with DevOps automation - Orchestration/Configuration Management and CI/CD tools (Jenkins)
7. Version Control: Working experience with one or more version control platforms like GitHub or GitLab
8. Knowledge on AWS Quick sight reporting
9. Monitoring: Hands on experience with monitoring tools such as AWS CloudWatch, AWS CloudTrail, Datadog and Elastic Search
10. Networking: Working knowledge of TCP/IP networking, SMTP, HTTP, load-balancers (ELB) and high availability architecture
11. Security: Experience implementing role-based security, including AD integration, security policies, and auditing in a Linux/Hadoop/AWS environment. Familiar with penetration testing and scan tools for remediation of security vulnerabilities.
12. Demonstrated successful experience learning new technologies quickly
WHAT WILL BE THE ROLES AND RESPONSIBILITIES?
1. Create procedures/run books for operational and security aspects of AWS platform
2. Improve AWS infrastructure by developing and enhancing automation methods
3. Provide advanced business and engineering support services to end users
4. Lead other admins and platform engineers through design and implementation decisions to achieve balance between strategic design and tactical needs
5. Research and deploy new tools and frameworks to build a sustainable big data platform
6. Assist with creating programs for training and onboarding for new end users
7. Lead Agile/Kanban workflows and team process work
8. Troubleshoot issues to resolve problems
9. Provide status updates to Operations product owner and stakeholders
10. Track all details in the issue tracking system (JIRA)
11. Provide issue review and triage problems for new service/support requests
12. Use DevOps automation tools, including Jenkins build jobs
13. Fulfil any ad-hoc data or report request queries from different functional groups
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos