About the job: - You will work with data scientists to architect, code and deploy ML models - You will solve problems of storing and analyzing large scale data in milliseconds - architect and develop data processing and warehouse systems - You will code, drink, breathe and live python, sklearn and pandas. It’s good to have experience in these but not a necessity - as long as you’re super comfortable in a language of your choice. - You will develop tools and products that provide analysts ready access to the data About you: - Strong CS fundamentals - You have strong experience in working with production environments - You write code that is clean, readable and tested - Instead of doing it second time, you automate it - You have worked with some of the commonly used databases and computing frameworks (Psql, S3, Hadoop, Hive, Presto, Spark, etc) - It will be great if you have one of the following to share - a kaggle or a github profile - You are an expert in one or more programming languages (Python preferred). Also good to have experience with python-based application development and data science libraries. - Ideally, you have 2+ years of experience in tech and/or data. - Degree in CS/Maths from Tier-1 institutes.
JOB DESCRIPTION: We are looking for a Data Engineer with a solid background in scalable systems to work with our engineering team to improve and optimize our platform. You will have significant input into the team’s architectural approach and execution. We are looking for a hands-on programmer who enjoys designing and optimizing data pipelines for large-scale data. This is NOT a "data scientist" role, so please don't apply if you're looking for that. RESPONSIBILITIES: 1. Build, maintain and test, performant, scalable data pipelines 2. Work with data scientists and application developers to implement scalable pipelines for data ingest, processing, machine learning and visualization 3. Building interfaces for ingest across various data stores MUST-HAVE: 1. A track record of building and deploying data pipelines as a part of work or side projects 2. Ability to work with RDBMS, MySQL or Postgres 3. Ability to deploy over cloud infrastructure, at least AWS 4. Demonstrated ability and hunger to learn GOOD-TO-HAVE: 1. Computer Science degree 2. Expertise in at least one of: Python, Java, Scala 3. Expertise and experience in deploying solutions based on Spark and Kafka 4. Knowledge of container systems like Docker or Kubernetes 5. Experience with NoSQL / graph databases: 6. Knowledge of Machine Learning Kindly apply only if you are skilled in building data pipelines.