(Hadoop, HDFS, Kafka, Spark, Hive)
Overall Experience - 8 to 12 years
Relevant exp on Big data - 3+ years in above
Salary: Max up-to 20LPA
Job location - Chennai / Bangalore /
Notice Period - Immediate joiner / 15-to-20-day Max
The Responsibilities of The Senior Data Engineer Are:
- Requirements gathering and assessment
- Breakdown complexity and translate requirements to specification artifacts and story boards to build towards, using a test-driven approach
- Engineer scalable data pipelines using big data technologies including but not limited to Hadoop, HDFS, Kafka, HBase, Elastic
- Implement the pipelines using execution frameworks including but not limited to MapReduce, Spark, Hive, using Java/Scala/Python for application design.
- Mentoring juniors in a dynamic team setting
- Manage stakeholders with proactive communication upholding TheDataTeam's brand and values
A Candidate Must Have the Following Skills:
- Strong problem-solving ability
- Excellent software design and implementation ability
- Exposure and commitment to agile methodologies
- Detail oriented with willingness to proactively own software tasks as well as management tasks, and see them to completion with minimal guidance
- Minimum 8 years of experience
- Should have experience in full life-cycle of one big data application
- Strong understanding of various storage formats (ORC/Parquet/Avro)
- Should have hands on experience in one of the Hadoop distributions (Hortoworks/Cloudera/MapR)
- Experience in at least one cloud environment (GCP/AWS/Azure)
- Should be well versed with at least one database (MySQL/Oracle/MongoDB/Postgres)
- Bachelor's in Computer Science, and preferably, a Masters as well - Should have good code review and debugging skills
Additional skills (Good to have):
- Experience in Containerization (docker/Heroku)
- Exposure to microservices
- Exposure to DevOps practices - Experience in Performance tuning of big data applications
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
Deliver plugins for our Python-based ETL pipelines
Deliver Python microservices for provisioning and managing cloud infrastructure
Implement algorithms to analyse large data sets
Draft design documents that translate requirements into code
Effectively manage challenges associated with handling large volumes of data working to tight deadlines
Manage expectations with internal stakeholders and context-switch in a fast-paced environment
Thrive in an environment that uses AWS and Elasticsearch extensively
Keep abreast of technology and contribute to the engineering strategy
Champion best development practices and provide mentorship to others
First and foremost you are a Python developer, experienced with the Python Data stack
You love and care about data
Your code is an artistic manifest reflecting how elegant you are in what you do
You feel sparks of joy when a new abstraction or pattern arises from your code
You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
You are a continuous learner
You have a natural willingness to automate tasks
You have critical thinking and an eye for detail
Excellent ability and experience of working to tight deadlines
Sharp analytical and problem-solving skills
Strong sense of ownership and accountability for your work and delivery
Excellent written and oral communication skills
Mature collaboration and mentoring abilities
We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
Delivering complex software, ideally in a FinTech setting
Experience with CI/CD tools such as Jenkins, CircleCI
Experience with code versioning (git / mercurial / subversion)
- Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
- programmatically ingesting data from several static and real-time sources (incl. web scraping)
- rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
- performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
- Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
- Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
- Build data tools to facilitate fast data cleaning and statistical analysis
- Ensure data architecture is secure and compliant
- Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
- Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
You should be
- Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
- Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
- Expert in shell scripting and writing schedulers
- Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
- Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
- Strong knowledge of data security best practices
- 5+ years experience in a data engineering role
- Science / Engineering graduate from a Tier-1 university in the country
- And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Experience : 7+ years
Location : Pune / Hyderabad
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Access to various learning tools and programs
Certification Reimbursement Policy
Check out more about us on our website below!
We are looking for a savvy Data Engineer to join our growing team of analytics experts.
The hire will be responsible for:
- Expanding and optimizing our data and data pipeline architecture
- Optimizing data flow and collection for cross functional teams.
- Will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
- Must be self-directed and comfortable supporting the data needs of multiple teams, systems and products.
- Experience with Azure : ADLS, Databricks, Stream Analytics, SQL DW, COSMOS DB, Analysis Services, Azure Functions, Serverless Architecture, ARM Templates
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with object-oriented/object function scripting languages: Python, SQL, Scala, Spark-SQL etc.
Nice to have experience with :
- Big data tools: Hadoop, Spark and Kafka
- Data pipeline and workflow management tools: Azkaban, Luigi, Airflow
- Stream-processing systems: Storm
Database : SQL DB
Programming languages : PL/SQL, Spark SQL
Looking for candidates with Data Warehousing experience, strong domain knowledge & experience working as a Technical lead.
The right candidate will be excited by the prospect of optimizing or even re-designing our company's data architecture to support our next generation of products and data initiatives.
- We are looking for a Data Engineer with 3-5 years experience in Python, SQL, AWS (EC2, S3, Elastic Beanstalk, API Gateway), and Java.
- The applicant must be able to perform Data Mapping (data type conversion, schema harmonization) using Python, SQL, and Java.
- The applicant must be familiar with and have programmed ETL interfaces (OAUTH, REST API, ODBC) using the same languages.
- The company is looking for someone who shows an eagerness to learn and who asks concise questions when communicating with teammates.
SpringML is looking to hire a top-notch Senior Data Engineer who is passionate about working with data and using the latest distributed framework to process large dataset. As an Associate Data Engineer, your primary role will be to design and build data pipelines. You will be focused on helping client projects on data integration, data prep and implementing machine learning on datasets. In this role, you will work on some of the latest technologies, collaborate with partners on early win, consultative approach with clients, interact daily with executive leadership, and help build a great company. Chosen team members will be part of the core team and play a critical role in scaling up our emerging practice.
- Ability to work as a member of a team assigned to design and implement data integration solutions.
- Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
- Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
- Propose design solutions and recommend best practices for large scale data analysis
- B.tech degree in computer science, mathematics or other relevant fields.
- 4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
- Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
- Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
- Experience with Agile implementation methodologies
The programmer should be proficient in python and should be able to work totally independently. Should also have skill to work with databases and have strong capability to understand how to fetch data from various sources, organise the data and identify useful information through efficient code.
Familiarity with Python
Some examples of work: