Job Requirement Installation, configuration and administration of Big Data components (including Hadoop/Spark) for batch and real-time analytics and data hubs Capable of processing large sets of structured, semi-structured and unstructured data Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review. Familiar with data architecture for designing data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing Optional - Visual communicator ability to convert and present data in an easy comprehensible visualization using tools like D3.js, Tableau To enjoy being challenged, solve complex problems on a daily basis Proficient in executing efficient and robust ETL workflows To be able to work in teams and collaborate with others to clarify requirements To be able to tune Hadoop solutions to improve performance and end-user experience To have strong co-ordination and project management skills to handle complex projects Engineering background
JOB DESCRIPTION: We are looking for a Data Engineer with a solid background in scalable systems to work with our engineering team to improve and optimize our platform. You will have significant input into the team’s architectural approach and execution. We are looking for a hands-on programmer who enjoys designing and optimizing data pipelines for large-scale data. This is NOT a "data scientist" role, so please don't apply if you're looking for that. RESPONSIBILITIES: 1. Build, maintain and test, performant, scalable data pipelines 2. Work with data scientists and application developers to implement scalable pipelines for data ingest, processing, machine learning and visualization 3. Building interfaces for ingest across various data stores MUST-HAVE: 1. A track record of building and deploying data pipelines as a part of work or side projects 2. Ability to work with RDBMS, MySQL or Postgres 3. Ability to deploy over cloud infrastructure, at least AWS 4. Demonstrated ability and hunger to learn GOOD-TO-HAVE: 1. Computer Science degree 2. Expertise in at least one of: Python, Java, Scala 3. Expertise and experience in deploying solutions based on Spark and Kafka 4. Knowledge of container systems like Docker or Kubernetes 5. Experience with NoSQL / graph databases: 6. Knowledge of Machine Learning Kindly apply only if you are skilled in building data pipelines.