MapReduce Jobs in Pune
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Experience : 7+ years
Location : Pune / Hyderabad
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Access to various learning tools and programs
Certification Reimbursement Policy
Check out more about us on our website below!
The Energy Exemplar (EE) data team is looking for an experienced Python Developer (Data Engineer) to join our Pune office. As a dedicated Data Engineer on our Research team, you will apply data engineering expertise, work very closely with the core data team to identify different data sources for specific energy markets and create an automated data pipeline. The pipeline will then incrementally pull the data from its sources and maintain a dataset, which in turn provides tremendous value to hundreds of EE customers.
At EE, you’ll have access to vast amounts of energy-related data from our sources. Our data pipelines are curated and supported by engineering teams. We also offer many company-sponsored classes and conferences that focus on data engineering, data platform. There’s a great growth opportunity for data engineering at EE..
- Develop, test and maintain architectures, such as databases and large-scale processing systems using high-performance data pipelines.
- Recommend and implement ways to improve data reliability, efficiency, and quality.
- Identify performant features and make them universally accessible to our teams across EE.
- Work together with data analysts and data scientists to wrangle the data and provide quality datasets and insights to business-critical decisions
- Take end-to-end responsibility for the development, quality, testing, and production readiness of the services you build.
- Define and evangelize Data Engineering best standards and practices to ensure engineering excellence at every stage of a development cycle.
- Act as a resident expert for data engineering, feature engineering, exploratory data analysis.
- Agile methodologies, acting as Scrum Master would be an added plus.
- 6+ years of professional experience in developing data pipelines for large-scale, complex datasets from varieties of data sources.
- Data Engineering expertise with strong experience working with Python, Beautiful Soup, Selenium, Regular Expression, Web Scraping.
- Best practices with Python Development, Doc String, Type Hints, Unit Testing, etc.
- Experience working with Cloud-based data technologies such as Azure Data lake, Azure Data Factory, Azure Data Bricks is optionally desirable.
- Moderate coding skills. SQL or similar required. C# or other languages strongly preferred.
- Outstanding communication and collaboration skills. You can learn from and teach others.
- Strong drive for results. You have a proven record of shepherding experiments to create successful shipping products/services
- A Bachelor or Masters degree in Computer Science or Engineering with coursework in Python, Big Data, Data Engineering is highly desirable.
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
You’ll spend time on the following:
- You will partner with teammates to create complex data processing pipelines in order to solve our clients’ most ambitious challenges
- You will collaborate with Data Scientists in order to design scalable implementations of their models
- You will pair to write clean and iterative code based on TDD
- Leverage various continuous delivery practices to deploy data pipelines
- Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
- Develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
- Create data models and speak to the tradeoffs of different modeling approaches
Here’s what we’re looking for:
- You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
- You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
- Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
- You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
- Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
- Strong communication and client-facing skills with the ability to work in a consulting environment
Easebuzz is a payment solutions (fintech organisation) company which enables online merchants to accept, process and disburse payments through developer friendly APIs. We are focusing on building plug n play products including the payment infrastructure to solve complete business problems. Definitely a wonderful place where all the actions related to payments, lending, subscription, eKYC is happening at the same time.
We have been consistently profitable and are constantly developing new innovative products, as a result, we are able to grow 4x over the past year alone. We are well capitalised and have recently closed a fundraise of $4M in March, 2021 from prominent VC firms and angel investors. The company is based out of Pune and has a total strength of 180 employees. Easebuzz’s corporate culture is tied into the vision of building a workplace which breeds open communication and minimal bureaucracy. An equal opportunity employer, we welcome and encourage diversity in the workplace. One thing you can be sure of is that you will be surrounded by colleagues who are committed to helping each other grow.
Easebuzz Pvt. Ltd. has its presence in Pune, Bangalore, Gurugram.
Salary: As per company standards.
Designation: Data Engineering
Experience with ETL, Data Modeling, and Data Architecture
Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties
- Spark, EMR, DynamoDB, RedShift, Kinesis, Lambda, Glue.
Experience with AWS cloud data lake for development of real-time or near real-time use cases
Experience with messaging systems such as Kafka/Kinesis for real time data ingestion and processing
Build data pipeline frameworks to automate high-volume and real-time data delivery
Create prototypes and proof-of-concepts for iterative development.
Experience with NoSQL databases, such as DynamoDB, MongoDB etc
Create and maintain optimal data pipeline architecture,
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Evangelize a very high standard of quality, reliability and performance for data models and algorithms that can be streamlined into the engineering and sciences workflow
Build and enhance data pipeline architecture by designing and implementing data ingestion solutions.
As a Senior Engineer - Big Data Analytics, you will help the architectural design and development for Healthcare Platforms, Products, Services, and Tools to deliver the vision of the Company. You will significantly contribute to engineering, technology, and platform architecture. This will be done through innovation and collaboration with engineering teams and related business functions. This is a critical, highly visible role within the company that has the potential to drive significant business impact.
The scope of this role will include strong technical contribution in the development and delivery of Big Data Analytics Cloud Platform, Products and Services in collaboration with execution and strategic partners.
- Design & develop, operate, and drive scalable, resilient, and cloud native Big Data Analytics platform to address the business requirements
- Help drive technology transformation to achieve business transformation, through the creation of the Healthcare Analytics Data Cloud that will help Change establish a leadership position in healthcare data & analytics in the industry
- Help in successful implementation of Analytics as a Service
- Ensure Platforms and Services meet SLA requirements
- Be a significant contributor and partner in the development and execution of the Enterprise Technology Strategy
- At least 2 years of experience software development for big data analytics, and cloud. At least 5 years of experience in software development
- Experience working with High Performance Distributed Computing Systems in public and private cloud environments
- Understands big data open-source eco-systems and its players. Contribution to open source is a strong plus
- Experience with Spark, Spark Streaming, Hadoop, AWS/Azure, NoSQL Databases, In-Memory caches, distributed computing, Kafka, OLAP stores, etc.
- Have successful track record of creating working Big Data stack that aligned with business needs, and delivered timely enterprise class products
- Experience with delivering and managing scale of Operating Environment
- Experience with Big Data/Micro Service based Systems, SaaS, PaaS, and Architectures
- Experience Developing Systems in Java, Python, Unix
- BSCS, BSEE or equivalent, MSCS preferred
Datametica is looking for talented SQL engineers who would get training & the opportunity to work on Cloud and Big Data Analytics.
- Strong in SQL development
- Hands-on at least one scripting language - preferably shell scripting
- Development experience in Data warehouse projects
- Selected candidates will be provided training opportunities on one or more of the following: Google Cloud, AWS, DevOps Tools, Big Data technologies like Hadoop, Pig, Hive, Spark, Sqoop, Flume, and KafkaWould get a chance to be part of the enterprise-grade implementation of Cloud and Big Data systems
- Will play an active role in setting up the Modern data platform based on Cloud and Big Data
- Would be part of teams with rich experience in various aspects of distributed systems and computing
Develop complex queries, pipelines and software programs to solve analytics and data mining problems
Interact with other data scientists, product managers, and engineers to understand business problems, technical requirements to deliver predictive and smart data solutions
Prototype new applications or data systems
Lead data investigations to troubleshoot data issues that arise along the data pipelines
Collaborate with different product owners to incorporate data science solutions
Maintain and improve data science platform
BS/MS/PhD in Computer Science, Electrical Engineering or related disciplines
Strong fundamentals: data structures, algorithms, database
5+ years of software industry experience with 2+ years in analytics, data mining, and/or data warehouse
Fluency with Python
Experience developing web services using REST approaches.
Proficiency with SQL/Unix/Shell
Experience in DevOps (CI/CD, Docker, Kubernetes)
Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multi-task and manage expectations
Industry experience with big data processing technologies such as Spark and Kafka
Experience with machine learning algorithms and/or R a plus
Experience in Java/Scala a plus
Experience with any MPP analytics engines like Vertica
Experience with data integration tools like Pentaho/SAP Analytics Cloud
Responsibilities for Data Scientist/ NLP Engineer
Work with customers to identify opportunities for leveraging their data to drive business
• Develop custom data models and algorithms to apply to data sets.
• Basic data cleaning and annotation for any incoming raw data.
• Use predictive modeling to increase and optimize customer experiences, revenue
generation, ad targeting and other business outcomes.
• Develop company A/B testing framework and test model quality.
• Deployment of ML model in production.
Qualifications for Junior Data Scientist/ NLP Engineer
• BS, MS in Computer Science, Engineering, or related discipline.
• 3+ Years of experience in Data Science/Machine Learning.
• Experience with programming language Python.
• Familiar with at least one database query language, such as SQL
• Knowledge of Text Classification & Clustering, Question Answering & Query Understanding,
Search Indexing & Fuzzy Matching.
• Excellent written and verbal communication skills for coordinating acrossteams.
• Willing to learn and master new technologies and techniques.
• Knowledge and experience in statistical and data mining techniques:
GLM/Regression, Random Forest, Boosting, Trees, text mining, NLP, etc.
• Experience with chatbots would be bonus but not required