Data Engineering role at ThoughtWorks ThoughtWorks India is looking for talented data engineers passionate about building large scale data processing systems to help manage the ever-growing information needs of our clients. Our developers have been contributing code to major organizations and open source projects for over 25 years now. They’ve also been writing books, speaking at conferences, and helping push software development forward -- changing companies and even industries along the way. As Consultants, we work with our clients to ensure we’re delivering the best possible solution. Our Lead Dev plays an important role in leading these projects to success. You will be responsible for - Creating complex data processing pipelines, as part of diverse, high energy teams Designing scalable implementations of the models developed by our Data Scientists Hands-on programming based on TDD, usually in a pair programming environment Deploying data pipelines in production based on Continuous Delivery practices Ideally, you should have - 2-6 years of overall industry experience Minimum of 2 years of experience building and deploying large scale data processing pipelines in a production environment Strong domain modelling and coding experience in Java /Scala / Python. Experience building data pipelines and data centric applications using distributed storage platforms like HDFS, S3, NoSql databases (Hbase, Cassandra, etc) and distributed processing platforms like Hadoop, Spark, Hive, Oozie, Airflow, Kafka etc in a production setting Hands on experience in (at least one or more) MapR, Cloudera, Hortonworks and/or Cloud (AWS EMR, Azure HDInsights, Qubole etc.) Knowledge of software best practices like Test-Driven Development (TDD) and Continuous Integration (CI), Agile development Strong communication skills with the ability to work in a consulting environment is essential And here’s some of the perks of being part of a unique organization like ThoughtWorks: A real commitment to “changing the face of IT” -- our way of thinking about diversity and inclusion. Over the past ten years, we’ve implemented a lot of initiatives to make ThoughtWorks a place that reflects the world around us, and to make this a welcoming home to technologists of all stripes. We’re not perfect, but we’re actively working towards true gender balance for our business and our industry, and you’ll see that diversity reflected on our project teams and in offices. Continuous learning. You’ll be constantly exposed to new languages, frameworks and ideas from your peers and as you work on different projects -- challenging you to stay at the top of your game. Support to grow as a technologist outside of your role at ThoughtWorks. This is why ThoughtWorkers have written over 100 books and can be found speaking at (and, ahem, keynoting) tech conferences all over the world. We love to learn and share knowledge, and you’ll find a community of passionate technologists eager to back your endeavors, whatever they may be. You’ll also receive financial support to attend conferences every year. An organizational commitment to social responsibility. ThoughtWorkers challenge each other to be just a little more thoughtful about the world around us, and we believe in using our profits for good. All around the world, you’ll find ThoughtWorks supporting great causes and organizations in both official and unofficial capacities. If you relish the idea of being part of ThoughtWorks’ Data Practice that extends beyond the work we do for our customers, you may find ThoughtWorks is the right place for you. If you share our passion for technology and want to help change the world with software, we want to hear from you!
1. KEY OBJECTIVE OF THE JOB To closely work with various users, product management team and the tech team to design, develop and strategize Data Architecture and multidimensional databases. 2. MAJOR DELIVERABLES: • Design end-to-end BI and Analytics platform and present to tech and business stakeholders • Evaluate multiple tools and conduct Proofs-of-Concept of the same basis requirements and budgets • Ability to perform Dimensional Modelling of multiple data-marts and enterprise data warehouse from scratch • Understand complex OLTP (Online Transaction Processing) systems such as Order Booking, CRM, Finance, Web etc. and map schemas and data dictionaries from them • Understand business rules around data entities and document them • Map the business rules and OLTP entities to a dimensional model spread across multiple data marts and warehouses • Design a robust and failsafe ETL (Extract, Transform & Load) process without relying on any tool • Operationalise the ETL using shell and SQL scripts without the need for any tool • Operationalise the dimensional model and the warehousing architecture using simple standalone databases like MySQL and Postgres on Linux, or on Cloud based systems like Redshift etc. • Model data lakes for lightly structured but highly voluminous clickstream data using Hadoop and similar technologies • Extremely hands-on person who loves to create a blueprint as well as write scripts, make presentations and even setup end-to-end PoCs (Proof of Concepts) on his/her own • Coordinate among Data Scientists, Technology Partners, Business Users, Analysts etc, and make sure they are able to use the OLAP (Online Analytical Processing) in the intended way • Understand the pain points of the above stakeholders and continuously iterate the existing platform with a completely open mind to meet their needs. • Track and Continuously tune the data infrastructure for performance and scale 3. RIGHT PERSON : Essential Attributes • Dimensional Modelling and Schema Design for OLAP/BI • Command over Multiple ETL, DW/Data mart and BI tools • Experience on HANA, TALEND will be of added advantage • Solution Design and Documentation • Big Data Architecture Designing ( HADOOP and related ecosystem) • Propensity towards Hands-On/Start-up working environment Desirable Attributes • Big Data and Machine Learning • Data Science and Statistics • Ecommerce or Retail domain experience Profile An engineering and a tech enthusiast, with total experience of atleast 10 years with 5 to 6 years of experience in data warehouse architecture with an ability to think logical, ability to address issues related to data migration, understands the importance of data dictionaries and has strong desire to establish best practices will fit the bill.