- Hands-on programming expertise in Java OR Python
- Strong production experience with Spark (Minimum of 1-2 years)
- Experience in data pipelines using Big Data technologies (Hadoop, Spark, Kafka, etc.,) on large scale unstructured data sets
- Working experience and good understanding of public cloud environments (AWS OR Azure OR Google Cloud)
- Experience with IAM policy and role management is a plus
As a machine learning engineer on the team, you will
• Help science and product teams innovate in developing and improving end-to-end
solutions to machine learning-based security/privacy control
• Partner with scientists to brainstorm and create new ways to collect/curate data
• Design and build infrastructure critical to solving problems in privacy-preserving machine
• Help team self-organize and follow machine learning best practice.
• 4+ years of experience contributing to the architecture and design (architecture, design
patterns, reliability and scaling) of new and current systems
• 4+ years of programming experience with at least one modern language such as Java,
C++, or C# including object-oriented design
• 4+ years of professional software development experience
• 4+ years of experience as a mentor, tech lead OR leading an engineering team
• 4+ years of professional software development experience in Big Data and Machine
• Knowledge of common ML frameworks such as Tensorflow, PyTorch
• Experience with cloud provider Machine Learning tools such as AWS SageMaker
• Programming experience with at least two modern language such as Python, Java, C++,
or C# including object-oriented design
• 3+ years of experience contributing to the architecture and design (architecture, design
patterns, reliability and scaling) of new and current systems
• Experience in python
• BS in Computer Science or equivalent
Carsome’s Data Department is on the lookout for a Data Scientist/Senior Data Scientist who has a strong passion in building data powered products.
Data Science function under the Data Department has a responsibility for standardisation of methods, mentoring team of data science resources/interns, including code libraries and documentation, quality assurance of outputs, modeling techniques and statistics, leveraging a variety of technologies, open-source languages, and cloud computing platform.
You will get to lead & implement projects such as price optimization/prediction, enabling iconic personalization experiences for our customer, inventory optimization etc.
- Identifying and integrating datasets that can be leveraged through our product and work closely with data engineering team to develop data products.
- Execute analytical experiments methodically to help solve various problems and make a true impact across functions such as operations, finance, logistics, marketing.
- Identify, prioritize, and design testing opportunities that will inform algorithm enhancements.
- Devise and utilize algorithms and models to mine big data stores, perform data and error analysis to improve models and clean and validate data for uniformity and accuracy.
- Unlock insights by analyzing large amounts of complex website traffic and transactional data.
- Implement analytical models into production by collaborating with data analytics engineers.
- Expertise in model design, training, evaluation, and implementation ML Algorithm expertise K-nearest neighbors, Random Forests, Naive Bayes, Regression Models. PyTorch, TensorFlow, Keras, deep learning expertise, tSNE, gradient boosting expertise, regression implementation expertise, Python, Pyspark, SQL, R, AWS Sagemaker /personalize etc.
- Machine Learning / Data Science Certification
Experience & Education
- Bachelor’s in Engineering / Master’s in Data Science / Postgraduate Certificate in Data Science.
Sr. Data Engineer
Bigfoot Retail Solutions [Shiprocket] is a logistics platform which connects Indian eCommerce SMBs with logistics players to enable end to end solutions.
Our innovative data backed platform drives logistics efficiency, helps reduce cost, increases sales throughput by reducing RTO and improves post order customer engagement and experience.
Our vision is to power all logistics for the direct commerce market in India
including first mile, linehaul, last mile, warehousing, cross border and O2O.
Position: Sr.Data Engineer
Team : Business Intelligence
Location: New Delhi
We are looking for a savvy Data Engineer to join our growing team of analytics experts. The hire will be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimizing or even re-designing our company’s data architecture to support our next generation of products and data initiatives.
- Create and maintain optimal data pipeline architecture.
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centres and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
Qualifications for Data Engineer
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Strong project management and organizational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- We are looking for a candidate with 5+ years of experience in a Data Engineer role, who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
Role: Data Engineer
Location: Bangalore/ Mumbai
Experience : 2-5 yrs
PayU is the payments and fintech business of Prosus, a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities.
The leading online payment service provider in 36 countries, PayU is dedicated to creating a fast, simple and efficient payment process for merchants and buyers. Focused on empowering people through financial services and creating a world without financial borders where everyone can prosper, PayU is one of the biggest investors in the fintech space globally, with investments totalling $700 million- to date. PayU also specializes in credit products and services for emerging markets across the globe. We are dedicated to removing risks to merchants, allowing consumers to use credit in ways that suit them and enabling a greater number of global citizens to access credit services.
Our local operations in Asia, Central and Eastern Europe, Latin America, the Middle East, Africa and South East Asia enable us to combine the expertise of high growth companies with our own unique local knowledge and technology to ensure that our customers have access to the best financial services.
India is the biggest market for PayU globally and the company has already invested $400 million in this region in last 4 years. PayU in its next phase of growth is developing a full regional fintech ecosystem providing multiple digital financial services in one integrated experience. We are going to do this through 3 mechanisms: build, co-build/partner; select strategic investments.
PayU supports over 350,000+ merchants and millions of consumers making payments online with over 250 payment methods and 1,800+ payment specialists. The markets in which PayU operates represent a potential consumer base of nearly 2.3 billion people and a huge growth potential for merchants.
- Design infrastructure for data, especially for but not limited to consumption in machine learning applications
- Define database architecture needed to combine and link data, and ensure integrity across different sources
- Ensure performance of data systems for machine learning to customer-facing web and mobile applications using cutting-edge open source frameworks, to highly available RESTful services, to back-end Java based systems
- Work with large, fast, complex data sets to solve difficult, non-routine analysis problems, applying advanced data handling techniques if needed
- Build data pipelines, includes implementing, testing, and maintaining infrastructural components related to the data engineering stack.
- Work closely with Data Engineers, ML Engineers and SREs to gather data engineering requirements to prototype, develop, validate and deploy data science and machine learning solutions
Requirements to be successful in this role:
- Strong knowledge and experience in Python, Pandas, Data wrangling, ETL processes, statistics, data visualisation, Data Modelling and Informatica.
- Strong experience with scalable compute solutions such as in Kafka, Snowflake
- Strong experience with workflow management libraries and tools such as Airflow, AWS Step Functions etc.
- Strong experience with data engineering practices (i.e. data ingestion pipelines and ETL)
- A good understanding of machine learning methods, algorithms, pipelines, testing practices and frameworks
- Preferred) MEng/MSc/PhD degree in computer science, engineering, mathematics, physics, or equivalent (preference: DS/ AI)
- Experience with designing and implementing tools that support sharing of data, code, practices across organizations at scale
- 5+ years of experience in software development.
- At least 2 years of relevant work experience on large scale Data applications
- Good attitude, strong problem-solving abilities, analytical skills, ability to take ownership as appropriate
- Should be able to do coding, debugging, performance tuning, and deploying the apps to Prod.
- Should have good working experience Hadoop ecosystem (HDFS, Hive, Yarn, File formats like Avro/Parquet)
- J2EE Frameworks (Spring/Hibernate/REST)
- Spark Streaming or any other streaming technology.
- Java programming language is mandatory.
- Good to have experience with Java
- Ability to work on the sprint stories to completion along with Unit test case coverage.
- Experience working in Agile Methodology
- Excellent communication and coordination skills
- Knowledgeable (and preferred hands-on) - UNIX environments, different continuous integration tools.
- Must be able to integrate quickly into the team and work independently towards team goals
- Take the complete responsibility of the sprint stories’ execution
- Be accountable for the delivery of the tasks in the defined timelines with good quality
- Follow the processes for project execution and delivery.
- Follow agile methodology
- Work with the team lead closely and contribute to the smooth delivery of the project.
- Understand/define the architecture and discuss the pros-cons of the same with the team
- Involve in the brainstorming sessions and suggest improvements in the architecture/design.
- Work with other team leads to get the architecture/design reviewed.
- Work with the clients and counterparts (in US) of the project.
- Keep all the stakeholders updated about the project/task status/risks/issues if there are any.
We are looking for a Data Engineer to join our data team to solve data-driven critical
business problems. The hire will be responsible for expanding and optimizing the existing
end-to-end architecture including the data pipeline architecture. The Data Engineer will
collaborate with software developers, database architects, data analysts, data scientists and platform team on data initiatives and will ensure optimal data delivery architecture is
consistent throughout ongoing projects. The right candidate should have hands on in
developing a hybrid set of data-pipelines depending on the business requirements.
- Develop, construct, test and maintain existing and new data-driven architectures.
- Align architecture with business requirements and provide solutions which fits best
- to solve the business problems.
- Build the infrastructure required for optimal extraction, transformation, and loading
- of data from a wide variety of data sources using SQL and Azure ‘big data’
- Data acquisition from multiple sources across the organization.
- Use programming language and tools efficiently to collate the data.
- Identify ways to improve data reliability, efficiency and quality
- Use data to discover tasks that can be automated.
- Deliver updates to stakeholders based on analytics.
- Set up practices on data reporting and continuous monitoring
Required Technical Skills
- Graduate in Computer Science or in similar quantitative area
- 1+ years of relevant work experience as a Data Engineer or in a similar role.
- Advanced SQL knowledge, Data-Modelling and experience working with relational
- databases, query authoring (SQL) as well as working familiarity with a variety of
- Experience in developing and optimizing ETL pipelines, big data pipelines, and datadriven
- Must have strong big-data core knowledge & experience in programming using Spark - Python/Scala
- Experience with orchestrating tool like Airflow or similar
- Experience with Azure Data Factory is good to have
- Build processes supporting data transformation, data structures, metadata,
- dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic
- Good understanding of Git workflow, Test-case driven development and using CICD
- is good to have
- Good to have some understanding of Delta tables It would be advantage if the candidate also have below mentioned experience using
- the following software/tools:
- Experience with big data tools: Hadoop, Spark, Hive, etc.
- Experience with relational SQL and NoSQL databases
- Experience with cloud data services
- Experience with object-oriented/object function scripting languages: Python, Scala, etc.
Senior Big Data Engineer
Note: Notice Period : 45 days
Banyan Data Services (BDS) is a US-based data-focused Company that specializes in comprehensive data solutions and services, headquartered in San Jose, California, USA.
We are looking for a Senior Hadoop Bigdata Engineer who has expertise in solving complex data problems across a big data platform. You will be a part of our development team based out of Bangalore. This team focuses on the most innovative and emerging data infrastructure software and services to support highly scalable and available infrastructure.
It's a once-in-a-lifetime opportunity to join our rocket ship startup run by a world-class executive team. We are looking for candidates that aspire to be a part of the cutting-edge solutions and services we offer that address next-gen data evolution challenges.
· 5+ years of experience working with Java and Spring technologies
· At least 3 years of programming experience working with Spark on big data; including experience with data profiling and building transformations
· Knowledge of microservices architecture is plus
· Experience with any NoSQL databases such as HBase, MongoDB, or Cassandra
· Experience with Kafka or any streaming tools
· Knowledge of Scala would be preferable
· Experience with agile application development
· Exposure of any Cloud Technologies including containers and Kubernetes
· Demonstrated experience of performing DevOps for platforms
· Strong Skillsets in Data Structures & Algorithm in using efficient way of code complexity
· Exposure to Graph databases
· Passion for learning new technologies and the ability to do so quickly
· A Bachelor's degree in a computer-related field or equivalent professional experience is required
· Scope and deliver solutions with the ability to design solutions independently based on high-level architecture
· Design and develop the big data-focused micro-Services
· Involve in big data infrastructure, distributed systems, data modeling, and query processing
· Build software with cutting-edge technologies on cloud
· Willing to learn new technologies and research-orientated projects
· Proven interpersonal skills while contributing to team effort by accomplishing related results as needed
Experience – 3 – 12 yrs
Budget - Open
Location - PAN India (Noida/Bangaluru/Hyderabad/Chennai)
Presto Developer (4)
Understanding of distributed SQL query engine running on Hadoop
Design and develop core components for Presto
Contribute to the ongoing Presto development by implementing new features, bug fixes, and other improvements
Develop new and extend existing Presto connectors to various data sources
Lead complex and technically challenging projects from concept to completion
Write tests and contribute to ongoing automation infrastructure development
Run and analyze software performance metrics
Collaborate with teams globally across multiple time zones and operate in an Agile development environment
Hands-on experience and interest with Hadoop
- Should have good hands-on experience in Informatica MDM Customer 360, Data Integration(ETL) using PowerCenter, Data Quality.
- Must have strong skills in Data Analysis, Data Mapping for ETL processes, and Data Modeling.
- Experience with the SIF framework including real-time integration
- Should have experience in building C360 Insights using Informatica
- Should have good experience in creating performant design using Mapplets, Mappings, Workflows for Data Quality(cleansing), ETL.
- Should have experience in building different data warehouse architecture like Enterprise,
- Federated, and Multi-Tier architecture.
- Should have experience in configuring Informatica Data Director in reference to the Data
- Governance of users, IT Managers, and Data Stewards.
- Should have good knowledge in developing complex PL/SQL queries.
- Should have working experience on UNIX and shell scripting to run the Informatica workflows and to control the ETL flow.
- Should know about Informatica Server installation and knowledge on the Administration console.
- Working experience with Developer with Administration is added knowledge.
- Working experience in Amazon Web Services (AWS) is an added advantage. Particularly on AWS S3, Data pipeline, Lambda, Kinesis, DynamoDB, and EMR.
- Should be responsible for the creation of automated BI solutions, including requirements, design,development, testing, and deployment
- Core Java: advanced level competency, should have worked on projects with core Java development.
- Linux shell : advanced level competency, work experience with Linux shell scripting, knowledge and experience to use important shell commands
- Rdbms, SQL: advanced level competency, Should have expertise in SQL query language syntax, should be well versed with aggregations, joins of SQL query language.
- Data structures and problem solving: should have ability to use appropriate data structure.
- AWS cloud : Good to have experience with aws serverless toolset along with aws infra
- Data Engineering ecosystem : Good to have experience and knowledge of data engineering, ETL, data warehouse (any toolset)
- Hadoop, HDFS, YARN : Should have introduction to internal working of these toolsets
- HIVE, MapReduce, Spark: Good to have experience developing transformations using hive queries, MapReduce job implementation and Spark Job Implementation. Spark implementation in Scala will be plus point.
- Airflow, Oozie, Sqoop, Zookeeper, Kafka: Good to have knowledge about purpose and working of these technology toolsets. Working experience will be a plus point here.