Responsibilities: Build real-time and batch analytics platform for analytics & machine-learning. Design, propose and develop solutions keeping the growing scale & business requirements in mind. As an integral part of the Data Engineering team, be involved in the entire development lifecycle from conceptualisation to architecture to coding to unit testing. Help us design the Data Model for our data warehouse and other data engineering solutions. Requirements: Deep understanding of real-time as well as batch processing big data solutions (Spark, Storm, Kafka, KSql, Flink, MapReduce, Yarn, Hive, HDFS, Pig etc). Extensive experience developing applications that work with NoSQL stores (e.g.,Elastic Search, HBase, Cassandra, MongoDB). Understands Data very well and has fair Data Modelling experience. Proven programming experience in Java or Scala. Experience in gathering and processing raw data at scale including writing scripts, web scraping, calling APIs, writing SQL queries, etc. Experience in cloud based data stores like Redshift and Big Query is an advantage. Previous experience in a high-growth tech startup would be an advantage.
1. KEY OBJECTIVE OF THE JOB To closely work with various users, product management team and the tech team to design, develop and strategize Data Architecture and multidimensional databases. 2. MAJOR DELIVERABLES: • Design end-to-end BI and Analytics platform and present to tech and business stakeholders • Evaluate multiple tools and conduct Proofs-of-Concept of the same basis requirements and budgets • Ability to perform Dimensional Modelling of multiple data-marts and enterprise data warehouse from scratch • Understand complex OLTP (Online Transaction Processing) systems such as Order Booking, CRM, Finance, Web etc. and map schemas and data dictionaries from them • Understand business rules around data entities and document them • Map the business rules and OLTP entities to a dimensional model spread across multiple data marts and warehouses • Design a robust and failsafe ETL (Extract, Transform & Load) process without relying on any tool • Operationalise the ETL using shell and SQL scripts without the need for any tool • Operationalise the dimensional model and the warehousing architecture using simple standalone databases like MySQL and Postgres on Linux, or on Cloud based systems like Redshift etc. • Model data lakes for lightly structured but highly voluminous clickstream data using Hadoop and similar technologies • Extremely hands-on person who loves to create a blueprint as well as write scripts, make presentations and even setup end-to-end PoCs (Proof of Concepts) on his/her own • Coordinate among Data Scientists, Technology Partners, Business Users, Analysts etc, and make sure they are able to use the OLAP (Online Analytical Processing) in the intended way • Understand the pain points of the above stakeholders and continuously iterate the existing platform with a completely open mind to meet their needs. • Track and Continuously tune the data infrastructure for performance and scale 3. RIGHT PERSON : Essential Attributes • Dimensional Modelling and Schema Design for OLAP/BI • Command over Multiple ETL, DW/Data mart and BI tools • Experience on HANA, TALEND will be of added advantage • Solution Design and Documentation • Big Data Architecture Designing ( HADOOP and related ecosystem) • Propensity towards Hands-On/Start-up working environment Desirable Attributes • Big Data and Machine Learning • Data Science and Statistics • Ecommerce or Retail domain experience Profile An engineering and a tech enthusiast, with total experience of atleast 10 years with 5 to 6 years of experience in data warehouse architecture with an ability to think logical, ability to address issues related to data migration, understands the importance of data dictionaries and has strong desire to establish best practices will fit the bill.
Job Requirement Installation, configuration and administration of Big Data components (including Hadoop/Spark) for batch and real-time analytics and data hubs Capable of processing large sets of structured, semi-structured and unstructured data Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review. Familiar with data architecture for designing data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing Optional - Visual communicator ability to convert and present data in an easy comprehensible visualization using tools like D3.js, Tableau To enjoy being challenged, solve complex problems on a daily basis Proficient in executing efficient and robust ETL workflows To be able to work in teams and collaborate with others to clarify requirements To be able to tune Hadoop solutions to improve performance and end-user experience To have strong co-ordination and project management skills to handle complex projects Engineering background
Job Title: Software Developer – Big Data Responsibilities We are looking for a Big Data Developer who can drive innovation and take ownership and deliver results. • Understand business requirements from stakeholders • Build & own Mintifi Big Data applications • Be heavily involved in every step of the product development process, from ideation to implementation to release. • Design and build systems with automated instrumentation and monitoring • Write unit & integration tests • Collaborate with cross functional teams to validate and get feedback on the efficacy of results created by the big data applications. Use the feedback to improve the business logic • Proactive approach to turn ambiguous problem spaces into clear design solutions. Qualifications • Hands-on programming skills in Apache Spark using Java or Scala • Good understanding about Data Structures and Algorithms • Good understanding about relational and non-relational database concepts (MySQL, Hadoop, MongoDB) • Experience in Hadoop ecosystem components like YARN, Zookeeper would be a strong plus
About the job: - You will work with data scientists to architect, code and deploy ML models - You will solve problems of storing and analyzing large scale data in milliseconds - architect and develop data processing and warehouse systems - You will code, drink, breathe and live python, sklearn and pandas. It’s good to have experience in these but not a necessity - as long as you’re super comfortable in a language of your choice. - You will develop tools and products that provide analysts ready access to the data About you: - Strong CS fundamentals - You have strong experience in working with production environments - You write code that is clean, readable and tested - Instead of doing it second time, you automate it - You have worked with some of the commonly used databases and computing frameworks (Psql, S3, Hadoop, Hive, Presto, Spark, etc) - It will be great if you have one of the following to share - a kaggle or a github profile - You are an expert in one or more programming languages (Python preferred). Also good to have experience with python-based application development and data science libraries. - Ideally, you have 2+ years of experience in tech and/or data. - Degree in CS/Maths from Tier-1 institutes.