Lead Data Engineer

at Lymbyc

DP
Posted by Venky Thiriveedhi
icon
Bengaluru (Bangalore), Chennai
icon
4 - 8 yrs
icon
₹9L - ₹14L / yr
icon
Full time
Skills
Apache Spark
Apache Kafka
Druid Database
Big Data
Apache Sqoop
RESTful APIs
Elasticsearch
Apache Ranger
Apache Atlas
kappa
Key skill set : Apache NiFi, Kafka Connect (Confluent), Sqoop, Kylo, Spark, Druid, Presto, RESTful services, Lambda / Kappa architectures Responsibilities : - Build a scalable, reliable, operable and performant big data platform for both streaming and batch analytics - Design and implement data aggregation, cleansing and transformation layers Skills : - Around 4+ years of hands-on experience designing and operating large data platforms - Experience in Big data Ingestion, Transformation and stream/batch processing technologies using Apache NiFi, Apache Kafka, Kafka Connect (Confluent), Sqoop, Spark, Storm, Hive etc; - Experience in designing and building streaming data platforms in Lambda, Kappa architectures - Should have working experience in one of NoSQL, OLAP data stores like Druid, Cassandra, Elasticsearch, Pinot etc; - Experience in one of data warehousing tools like RedShift, BigQuery, Azure SQL Data Warehouse - Exposure to other Data Ingestion, Data Lake and querying frameworks like Marmaray, Kylo, Drill, Presto - Experience in designing and consuming microservices - Exposure to security and governance tools like Apache Ranger, Apache Atlas - Any contributions to open source projects a plus - Experience in performance benchmarks will be a plus
Read more

About Lymbyc

Founded
2012
Type
Size
Stage
Profitable
About
LYMBYC - the world’s first “virtual analyst” is designed to empower business leaders with contextual insights at the point of decision making . It curates embedded intelligence across all data sources and provides predictive insights driven by its adaptive ML engine.
Read more
Company video
Connect with the team
icon
Venky Thiriveedhi
icon
Rajakrishna Bharathy Mahadevan
Company social profiles
icon
icon
icon
icon
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

a secure data and intelligence sharing platform for Enterprises. We believe data security and privacy are paramount for AI and Machine Learning to truly evolve and embed into the world
Agency job
via HyrHub by Shwetha Naik
Bengaluru (Bangalore)
2 - 4 yrs
₹13L - ₹25L / yr
Python
Data Structures
RESTful APIs
Design patterns
Django
+8 more
As part of early stage sta
Expectations
Good experience with writing quality and mature Python code. Familiar with Python
design patterns. OOP , refactoring patterns, writing async tasks and heavy
background tasks.
Understand auth n/z, ideally worked on authorization/authentication mechanism in
python. Familiarity with Auth0 is preferred.
Understand how to secure API endpoints.
Familiar with AWS concepts on -> EC2, VPC, RDS, and IAM. (Or any cloud
equivalent)
Backend Engineer @Eder Labs 3
Have basic DevOps experience and engineering and supporting services in modern
containerized cloud stack.
Experience and understanding of docker an docker-compose.
Responsibilites
Own backend design, architecture, implementation and delivery of features and
modules.
Take ownership of the Database. Write migrations, maintain, and manage
Database. (Postgres, MongoDB.)
Collaborate with a generalist team to develop, test and launch new features. Be a
generalist and find ways and functions in to bring up your team, product and
eventually the business.
Refactoring when needed, and keep hunting for new tools that can help us as a
business (not just the engineering team)
Develop Data Pipelines, from data sourcing, wrangling (cleaning), transformations,
to eventual use
Develop MLOps systems, to take in data, analyze it, pass it through any models,
and process results. DevOps for Machine Learning.
Follow modern git oriented dev workflows, versioning, CI/CD automation and
testing.
Ideal Candidate will have :
2 years of full time experience working as a data infrastructure / core backend
engineer in a team environment.
Understanding of Machine Learning technologies, frameworks and paradigms
involved there.
Backend Engineer @Eder Labs 4
Experience with the following tools:
Fast API / Django
Airflow
Kafka / RabbitMQ
Tensorflow / Pandas / Jupyter Notebook
pytest / asyncio
Experience setting up and managing ELK stack
In depth understanding of database systems, in terms of scaling compute efficiently.
Good understanding of data streaming services, and the involved networking.
Read more
DP
Posted by Shreelakshmi M
Bengaluru (Bangalore)
5 - 8 yrs
Best in industry
ETL
Informatica
Data Warehouse (DWH)
Python
ETL QA
+1 more
  • Graduate+ in Mathematics, Statistics, Computer Science, Economics, Business, Engineering or equivalent work experience.
  • Total experience of 5+ years with at least 2 years in managing data quality for high scale data platforms.
  • Good knowledge of SQL querying.
  • Strong skill in analysing data and uncovering patterns using SQL or Python.
  • Excellent understanding of data warehouse/big data concepts such data extraction, data transformation, data loading (ETL process).
  • Strong background in automation and building automated testing frameworks for data ingestion and transformation jobs.
  • Experience in big data technologies a big plus.
  • Experience in machine learning, especially in data quality applications a big plus.
  • Experience in building data quality automation frameworks a big plus.
  • Strong experience working with an Agile development team with rapid iterations. 
  • Very strong verbal and written communication, and presentation skills.
  • Ability to quickly understand business rules.
  • Ability to work well with others in a geographically distributed team.
  • Keen observation skills to analyse data, highly detail oriented.
  • Excellent judgment, critical-thinking, and decision-making skills; can balance attention to detail with swift execution.
  • Able to identify stakeholders, build relationships, and influence others to get work done.
  • Self-directed and self-motivated individual who takes complete ownership of the product and its outcome.
Read more
DP
Posted by Aarti Sharma
Pune
2 - 3 yrs
₹15L - ₹17L / yr
Python
Amazon Web Services (AWS)
Big Data
ETL
Java
+9 more

About Amber (https://amberstudent.com)
Long-term accommodation booking platform for students (think booking.com for
student housing). Amber helps 80M students worldwide, find and book full-time accommodations near their universities, without the hassle of negotiation, nonstandardized and cumbersome paperwork, and a broken payment process.

We are the leading student housing platform globally, with 1M+ student housing units listed in 6 countries and across 80 cities.

We are growing rapidly and targeting $400M in annual gross bookings value by 2022.
If you are passionate about making international mobility and living, seamless and accessible, then - Join us in building the future of student housing!
We are amongst the fastest growing companies in Asia-Pacific as per
Financial times https://www.ft.com/high-growth-asia-pacific-ranking-2022" target="_blank">https://www.ft.com/high-growth-asia-pacific-ranking-2022.

 

Responsibilities
  • In charge of converting raw data into usable information for analytics and business decision-making
  • Setting up accurate data pipelines to structure the Data and optimize the cost
  • Create and maintain optimal data pipeline architecture
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS technologies.
  • Work with stakeholders including the Executive, Product, Analytics and Design teams to assist with data-related technical issues and support their data infrastructure needs.

 

Requirements
  • Minimum 2 years of previous experience as a data engineer or in a similar role.
  • Technical expertise in data models, data mining, and segmentation
  • techniques.
  • Knowledge and hands-on with of programming languages (e.g. Java, Python
  • and Scala)
  • Hands-on experience with SQL database design and AWS lambda function.
  • Experience with big data tools: Spark, and Kafka.
  • Experience with AWS cloud services: Redshift and S3.
  • Experience in ETL frameworks like AWS Glue.
  • Experience in designing Data warehousing and streaming processes.

 

What will you get from amber: 
  • Fast-paced growth (can skip intermediate levels)
  • Total freedom and authority (everything under you, just get the job done!)
  • Open and Inclusive Environment
  • Great Compensation (and ESOPs)
Read more
Pune, Bengaluru (Bangalore)
6 - 8 yrs
₹15L - ₹25L / yr
Data Science
Machine Learning (ML)
Natural Language Processing (NLP)
Computer Vision
Artificial Intelligence (AI)
+6 more
Company Profile
XressBees – a logistics company started in 2015 – is amongst the fastest growing companies of its sector. Our
vision to evolve into a strong full-service logistics organization reflects itself in the various lines of business like B2C
logistics 3PL, B2B Xpress, Hyperlocal and Cross border Logistics.
Our strong domain expertise and constant focus on innovation has helped us rapidly evolve as the most trusted
logistics partner of India. XB has progressively carved our way towards best-in-class technology platforms, an
extensive logistics network reach, and a seamless last mile management system.
While on this aggressive growth path, we seek to become the one-stop-shop for end-to-end logistics solutions. Our
big focus areas for the very near future include strengthening our presence as service providers of choice and
leveraging the power of technology to drive supply chain efficiencies.
Job Overview
XpressBees would enrich and scale its end-to-end logistics solutions at a high pace. This is a great opportunity to join
the team working on forming and delivering the operational strategy behind Artificial Intelligence / Machine Learning
and Data Engineering, leading projects and teams of AI Engineers collaborating with Data Scientists. In your role, you
will build high performance AI/ML solutions using groundbreaking AI/ML and BigData technologies. You will need to
understand business requirements and convert them to a solvable data science problem statement. You will be
involved in end to end AI/ML projects, starting from smaller scale POCs all the way to full scale ML pipelines in
production.
Seasoned AI/ML Engineers would own the implementation and productionzation of cutting-edge AI driven algorithmic
components for search, recommendation and insights to improve the efficiencies of the logistics supply chain and
serve the customer better.
You will apply innovative ML tools and concepts to deliver value to our teams and customers and make an impact to
the organization while solving challenging problems in the areas of AI, ML , Data Analytics and Computer Science.
Opportunities for application:
- Route Optimization
- Address / Geo-Coding Engine
- Anomaly detection, Computer Vision (e.g. loading / unloading)
- Fraud Detection (fake delivery attempts)
- Promise Recommendation Engine etc.
- Customer & Tech support solutions, e.g. chat bots.
- Breach detection / prediction
An Artificial Intelligence Engineer would apply himself/herself in the areas of -
- Deep Learning, NLP, Reinforcement Learning
- Machine Learning - Logistic Regression, Decision Trees, Random Forests, XGBoost, etc..
- Driving Optimization via LPs, MILPs, Stochastic Programs, and MDPs
- Operations Research, Supply Chain Optimization, and Data Analytics/Visualization
- Computer Vision and OCR technologies
The AI Engineering team enables internal teams to add AI capabilities to their Apps and Workflows easily via APIs
without needing to build AI expertise in each team – Decision Support, NLP, Computer Vision, for Public Clouds and
Enterprise in NLU, Vision and Conversational AI.Candidate is adept at working with large data sets to find
opportunities for product and process optimization and using models to test the effectiveness of different courses of
action. They must have knowledge using a variety of data mining/data analysis methods, using a variety of data tools,
building, and implementing models, using/creating algorithms, and creating/running simulations. They must be
comfortable working with a wide range of stakeholders and functional teams. The right candidate will have a passion
for discovering solutions hidden in large data sets and working with stakeholders to improve business outcomes.

Roles & Responsibilities
● Develop scalable infrastructure, including microservices and backend, that automates training and
deployment of ML models.
● Building cloud services in Decision Support (Anomaly Detection, Time series forecasting, Fraud detection,
Risk prevention, Predictive analytics), computer vision, natural language processing (NLP) and speech that
work out of the box.
● Brainstorm and Design various POCs using ML/DL/NLP solutions for new or existing enterprise problems.
● Work with fellow data scientists/SW engineers to build out other parts of the infrastructure, effectively
communicating your needs and understanding theirs and address external and internal shareholder's
product challenges.
● Build core of Artificial Intelligence and AI Services such as Decision Support, Vision, Speech, Text, NLP, NLU,
and others.
● Leverage Cloud technology –AWS, GCP, Azure
● Experiment with ML models in Python using machine learning libraries (Pytorch, Tensorflow), Big Data,
Hadoop, HBase, Spark, etc
● Work with stakeholders throughout the organization to identify opportunities for leveraging company data to
drive business solutions.
● Mine and analyze data from company databases to drive optimization and improvement of product
development, marketing techniques and business strategies.
● Assess the effectiveness and accuracy of new data sources and data gathering techniques.
● Develop custom data models and algorithms to apply to data sets.
● Use predictive modeling to increase and optimize customer experiences, supply chain metric and other
business outcomes.
● Develop company A/B testing framework and test model quality.
● Coordinate with different functional teams to implement models and monitor outcomes.
● Develop processes and tools to monitor and analyze model performance and data accuracy.
● Develop scalable infrastructure, including microservices and backend, that automates training and
deployment of ML models.
● Brainstorm and Design various POCs using ML/DL/NLP solutions for new or existing enterprise problems.
● Work with fellow data scientists/SW engineers to build out other parts of the infrastructure, effectively
communicating your needs and understanding theirs and address external and internal shareholder's
product challenges.
● Deliver machine learning and data science projects with data science techniques and associated libraries
such as AI/ ML or equivalent NLP (Natural Language Processing) packages. Such techniques include a good
to phenomenal understanding of statistical models, probabilistic algorithms, classification, clustering, deep
learning or related approaches as it applies to financial applications.
● The role will encourage you to learn a wide array of capabilities, toolsets and architectural patterns for
successful delivery.
What is required of you?
You will get an opportunity to build and operate a suite of massive scale, integrated data/ML platforms in a broadly
distributed, multi-tenant cloud environment.
● B.S., M.S., or Ph.D. in Computer Science, Computer Engineering
● Coding knowledge and experience with several languages: C, C++, Java,JavaScript, etc.
● Experience with building high-performance, resilient, scalable, and well-engineered systems
● Experience in CI/CD and development best practices, instrumentation, logging systems
● Experience using statistical computer languages (R, Python, SLQ, etc.) to manipulate data and draw insights
from large data sets.
● Experience working with and creating data architectures.
● Good understanding of various machine learning and natural language processing technologies, such as
classification, information retrieval, clustering, knowledge graph, semi-supervised learning and ranking.

● Knowledge and experience in statistical and data mining techniques: GLM/Regression, Random Forest,
Boosting, Trees, text mining, social network analysis, etc.
● Knowledge on using web services: Redshift, S3, Spark, Digital Ocean, etc.
● Knowledge on creating and using advanced machine learning algorithms and statistics: regression,
simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
● Knowledge on analyzing data from 3rd party providers: Google Analytics, Site Catalyst, Core metrics,
AdWords, Crimson Hexagon, Facebook Insights, etc.
● Knowledge on distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, MySQL, Kafka etc.
● Knowledge on visualizing/presenting data for stakeholders using: Quicksight, Periscope, Business Objects,
D3, ggplot, Tableau etc.
● Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural
networks, etc.) and their real-world advantages/drawbacks.
● Knowledge of advanced statistical techniques and concepts (regression, properties of distributions,
statistical tests, and proper usage, etc.) and experience with applications.
● Experience building data pipelines that prep data for Machine learning and complete feedback loops.
● Knowledge of Machine Learning lifecycle and experience working with data scientists
● Experience with Relational databases and NoSQL databases
● Experience with workflow scheduling / orchestration such as Airflow or Oozie
● Working knowledge of current techniques and approaches in machine learning and statistical or
mathematical models
● Strong Data Engineering & ETL skills to build scalable data pipelines. Exposure to data streaming stack (e.g.
Kafka)
● Relevant experience in fine tuning and optimizing ML (especially Deep Learning) models to bring down
serving latency.
● Exposure to ML model productionzation stack (e.g. MLFlow, Docker)
● Excellent exploratory data analysis skills to slice & dice data at scale using SQL in Redshift/BigQuery.
Read more
Pune, Bengaluru (Bangalore), Coimbatore, Hyderabad, Gurugram
3 - 10 yrs
₹18L - ₹40L / yr
Apache Kafka
Spark
Hadoop
Apache Hive
Big Data
+5 more

Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.



You’ll spend time on the following:

  • You will partner with teammates to create complex data processing pipelines in order to solve our clients’ most ambitious challenges
  • You will collaborate with Data Scientists in order to design scalable implementations of their models
  • You will pair to write clean and iterative code based on TDD
  • Leverage various continuous delivery practices to deploy data pipelines
  • Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
  • Develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
  • Create data models and speak to the tradeoffs of different modeling approaches

Here’s what we’re looking for:

 

  • You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
  • You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
  • Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
  • You are comfortable taking data-driven approaches and applying data security strategy to solve business problems 
  • Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
  • Strong communication and client-facing skills with the ability to work in a consulting environment
Read more
Remote only
4 - 10 yrs
₹12L - ₹23L / yr
Informatica
ETL
Big Data
Spark
SQL
Skill:- informatica with big data management
 
1.Minimum 6 to 8 years of experience in informatica BDM development
2. Experience working on Spark/SQL
3. Develops informtica mapping/Sql 
4. Should have experience in Hadoop, spark etc

Work days- Sun-Thu
Day shift
 
 
 
Read more
Data & Cloud Technology serviced based company.
Agency job
via Multi Recruit by Ragul Ragul
Chennai, Coimbatore, Madurai
5 - 10 yrs
₹12L - ₹19L / yr
Apache Spark
HiveQL
Amazon Web Services (AWS)
Data engineering
JSON
+2 more
  • Must have the experience of leading teams and drive customer interactions
  • Must have multiple successful deployments user stories
  • Extensive hands on experience in Apache Spark along with HiveQL
  • Sound knowledge in Amazon Web Services or any other Cloud environment.
  • Experienced in data flow orchestration using Apache Airflow
  • JSON, XML, CSV, Parquet file formats with snappy compression.
  • File movements between HDFS and AWS S3
  • Experience in shell scripting and scripting to automate report generation and migration of reports to AWS S3
  • Worked in building a data pipeline using Pandas and Flask FrameworkGood Familiarity with Anaconda and Jupyternotebook
Read more
DP
Posted by Rashmi Poovaiah
Bengaluru (Bangalore), Chennai, Pune
4 - 10 yrs
₹8L - ₹15L / yr
Big Data
Hadoop
Spark
Apache Kafka
HiveQL
+2 more

Role Summary/Purpose:

We are looking for a Developer/Senior Developers to be a part of building advanced analytical platform leveraging Big Data technologies and transform the legacy systems. This role is an exciting, fast-paced, constantly changing and challenging work environment, and will play an important role in resolving and influencing high-level decisions.

 

Requirements:

  • The candidate must be a self-starter, who can work under general guidelines in a fast-spaced environment.
  • Overall minimum of 4 to 8 year of software development experience and 2 years in Data Warehousing domain knowledge
  • Must have 3 years of hands-on working knowledge on Big Data technologies such as Hadoop, Hive, Hbase, Spark, Kafka, Spark Streaming, SCALA etc…
  • Excellent knowledge in SQL & Linux Shell scripting
  • Bachelors/Master’s/Engineering Degree from a well-reputed university.
  • Strong communication, Interpersonal, Learning and organizing skills matched with the ability to manage stress, Time, and People effectively
  • Proven experience in co-ordination of many dependencies and multiple demanding stakeholders in a complex, large-scale deployment environment
  • Ability to manage a diverse and challenging stakeholder community
  • Diverse knowledge and experience of working on Agile Deliveries and Scrum teams.

 

Responsibilities

  • Should works as a senior developer/individual contributor based on situations
  • Should be part of SCRUM discussions and to take requirements
  • Adhere to SCRUM timeline and deliver accordingly
  • Participate in a team environment for the design, development and implementation
  • Should take L3 activities on need basis
  • Prepare Unit/SIT/UAT testcase and log the results
  • Co-ordinate SIT and UAT Testing. Take feedbacks and provide necessary remediation/recommendation in time.
  • Quality delivery and automation should be a top priority
  • Co-ordinate change and deployment in time
  • Should create healthy harmony within the team
  • Owns interaction points with members of core team (e.g.BA team, Testing and business team) and any other relevant stakeholders
Read more
DP
Posted by Nagraj Kumar
Bengaluru (Bangalore)
2 - 8 yrs
₹6L - ₹25L / yr
Scala
Apache Spark
Big Data
PreferredSkills- • Should have minimum 3 years of experience in Software development • Strong experience in spark Scala development • Person should have strong experience in AWS cloud platform services • Should have good knowledge and exposure in Amazon EMR, EC2 • Should be good in over databases like dynamodb, snowflake
Read more
Mumbai
3 - 7 yrs
₹5L - ₹15L / yr
Machine Learning (ML)
Python
Data Science
Big Data
R Programming
+2 more
Data Scientist - We are looking for a candidate to build great recommendation engines and power an intelligent m.Paani user journey Responsibilities : - Data Mining using methods like associations, correlations, inferences, clustering, graph analysis etc. - Scale machine learning algorithm that powers our platform to support our growing customer base and increasing data volume - Design and implement machine learning, information extraction, probabilistic matching algorithms and models - Care about designing the full machine learning pipeline. - Extending company's data with 3rd party sources. - Enhancing data collection procedures. - Processing, cleaning and verifying data collected. - Ad hoc analysis of the data and present clear results. - Creating advanced analytics products that provide actionable insights. The Individual : - We are looking for a candidate with the following skills, experience and attributes: Required : - Someone with 2+ years of work experience in machine learning. - Educational qualification relevant to the role. Degree in Statistics, certificate courses in Big Data, Machine Learning etc. - Knowledge of Machine Learning techniques and algorithms. - Knowledge in languages and toolkits like Python, R, Numpy. - Knowledge of data visualization tools like D3,js, ggplot2. - Knowledge of query languages like SQL, Hive, Pig . - Familiar with Big Data architecture and tools like Hadoop, Spark, Map Reduce. - Familiar with NoSQL databases like MongoDB, Cassandra, HBase. - Good applied statistics skills like distributions, statistical testing, regression etc. Compensation & Logistics : This is a full-time opportunity. Compensation will be in line with startup, and will be based on qualifications and experience. The position is based in Mumbai, India, and the candidate must live in Mumbai or be willing to relocate.
Read more
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Lymbyc?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort