Role: Big Data Engineer (AWS+GCP) Experience: 5 - 10 years Key Skills: AWS EMR/Glue, Hive, SQL (Very Strong), Hadoop, Python, Redshift, GCP Job Description Support and Enhance existing data pipelines in Google Cloud (Big Query) Interface with our Advertising Analytics team, gathering requirements and delivering complete BI solutions. Model data and metadata to support discovery, ad-hoc and pre-built reporting. Own the design, development, and maintenance of datasets our BA teams will use to drive key business decisions. Adopt and define the standards and best practices in data engineering including data integrity, validation, reliability, and documentation. Tune and ensure query performance using profiling tools and SQL. Analyze and solve problems at their root, stepping back to understand the broader context. Learn and understand a broad range of data resources and know when, how, and which to use and which not to use. Keep up to date with advances in big data technologies and run pilots to design the data architecture to scale with the increased data volume using AWS. Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for datasets. Triage many possible courses of action in a high-ambiguity environment, making use of both quantitative analysis and business judgment. Qualifications Bachelor’s degree in CS or related technical field. 7+ years of experience in data modeling, ETL development, and Data Warehousing. Experience in distributed technologies such as Hadoop, Hive and Spark Experience with cloud technologies – Google Cloud, Big Query, S3, RDS, Redshift & EMR Optimize SQL/data pipelines Strong organizational and multitasking skills with ability to balance competing priorities. Experience in programming using Python, Java or Scala Excellent communication (verbal and written) and interpersonal skills and an ability to effectively communicate with both business and technical teams. An ability to work in a fast-paced environment where continuous innovation is occurring and ambiguity is the norm
ResponsibilitiesEnsure timely and top-quality product deliveryEnsure that the end product is fully and correctly defined and documentedEnsure implementation/continuous improvement of formal processes to support product development activitiesDrive the architecture/design decisions needed to achieve cost-effective and high-performance resultsConduct feasibility analysis, produce functional and design specifications of proposed new features.· Provide helpful and productive code reviews for peers and junior members of the team.Troubleshoot complex issues discovered in-house as well as in customer environments.Qualifications· Strong computer science fundamentals in algorithms, data structures, databases, operating systems, etc.· Expertise in Java, Object Oriented Programming, Design Patterns· Experience in coding and implementing scalable solutions in a large-scale distributed environment· Working experience in a Linux/UNIX environment is good to have· Experience with relational databases and database concepts, preferably MySQL· Experience with SQL and Java optimization for real-time systems· Familiarity with version control systems Git and build tools like Maven· Excellent interpersonal, written, and verbal communication skills· BE/B.Tech./M.Sc./MCS/MCA in Computers or equivalent
Data Engineering role at ThoughtWorks ThoughtWorks India is looking for talented data engineers passionate about building large scale data processing systems to help manage the ever-growing information needs of our clients. Our developers have been contributing code to major organizations and open source projects for over 25 years now. They’ve also been writing books, speaking at conferences, and helping push software development forward -- changing companies and even industries along the way. As Consultants, we work with our clients to ensure we’re delivering the best possible solution. Our Lead Dev plays an important role in leading these projects to success. You will be responsible for - Creating complex data processing pipelines, as part of diverse, high energy teams Designing scalable implementations of the models developed by our Data Scientists Hands-on programming based on TDD, usually in a pair programming environment Deploying data pipelines in production based on Continuous Delivery practices Ideally, you should have - 2-6 years of overall industry experience Minimum of 2 years of experience building and deploying large scale data processing pipelines in a production environment Strong domain modelling and coding experience in Java /Scala / Python. Experience building data pipelines and data centric applications using distributed storage platforms like HDFS, S3, NoSql databases (Hbase, Cassandra, etc) and distributed processing platforms like Hadoop, Spark, Hive, Oozie, Airflow, Kafka etc in a production setting Hands on experience in (at least one or more) MapR, Cloudera, Hortonworks and/or Cloud (AWS EMR, Azure HDInsights, Qubole etc.) Knowledge of software best practices like Test-Driven Development (TDD) and Continuous Integration (CI), Agile development Strong communication skills with the ability to work in a consulting environment is essential And here’s some of the perks of being part of a unique organization like ThoughtWorks: A real commitment to “changing the face of IT” -- our way of thinking about diversity and inclusion. Over the past ten years, we’ve implemented a lot of initiatives to make ThoughtWorks a place that reflects the world around us, and to make this a welcoming home to technologists of all stripes. We’re not perfect, but we’re actively working towards true gender balance for our business and our industry, and you’ll see that diversity reflected on our project teams and in offices. Continuous learning. You’ll be constantly exposed to new languages, frameworks and ideas from your peers and as you work on different projects -- challenging you to stay at the top of your game. Support to grow as a technologist outside of your role at ThoughtWorks. This is why ThoughtWorkers have written over 100 books and can be found speaking at (and, ahem, keynoting) tech conferences all over the world. We love to learn and share knowledge, and you’ll find a community of passionate technologists eager to back your endeavors, whatever they may be. You’ll also receive financial support to attend conferences every year. An organizational commitment to social responsibility. ThoughtWorkers challenge each other to be just a little more thoughtful about the world around us, and we believe in using our profits for good. All around the world, you’ll find ThoughtWorks supporting great causes and organizations in both official and unofficial capacities. If you relish the idea of being part of ThoughtWorks’ Data Practice that extends beyond the work we do for our customers, you may find ThoughtWorks is the right place for you. If you share our passion for technology and want to help change the world with software, we want to hear from you!
Description Deep experience and understanding of Apache Hadoop and surrounding technologies required; Experience with Spark, Impala, Hive, Flume, Parquet and MapReduce. Strong understanding of development languages to include: Java, Python, Scala, Shell Scripting Expertise in Apache Spark 2. x framework principals and usages. Should be proficient in developing Spark Batch and Streaming job in Python, Scala or Java. Should have proven experience in performance tuning of Spark applications both from application code and configuration perspective. Should be proficient in Kafka and integration with Spark. Should be proficient in Spark SQL and data warehousing techniques using Hive. Should be very proficient in Unix shell scripting and in operating on Linux. Should have knowledge about any cloud based infrastructure. Good experience in tuning Spark applications and performance improvements. Strong understanding of data profiling concepts and ability to operationalize analyses into design and development activities Experience with best practices of software development; Version control systems, automated builds, etc. Experienced in and able to lead the following phases of the Software Development Life Cycle on any project (feasibility planning, analysis, development, integration, test and implementation) Capable of working within the team or as an individual Experience to create technical documentation
Description Requirements: Overall experience of 10 years with minimum 6 years data analysis experience MBA Finance or Similar background profile Ability to lead projects and work independently Must have the ability to write complex SQL, doing cohort analysis, comparative analysis etc . Experience working directly with business users to build reports, dashboards and solving business questions with data Experience with doing analysis using Python and Spark is a plus Experience with MicroStrategy or Tableau is a plu
We are looking to hire passionate Java techies who will be comfortable learning and working on Java and any open source frameworks & technologies. She/he should be a 100% hands-on person on technology skills and interested in solving complex analytics use cases. We are working on a complete stack platform which has already been adopted by some very large Enterprises across the world. Candidates with prior experience of having worked in typical R&D environment and/or product based companies with dynamic work environment will be have an additional edge. We currently work on some of the latest technologies like Cassandra, Hadoop, Apache Solr, Spark and Lucene, and some core Machine Learning and AI technologies. Even though prior knowledge of these skills is not mandatory at all for selection, you would be expected to learn new skills on the job.
We at InfoVision Labs, are passionate about technology and what our clients would like to get accomplished. We continuously strive to understand business challenges, changing competitive landscape and how the cutting edge technology can help position our client to the forefront of the competition.We are a fun loving team of Usability Experts and Software Engineers, focused on Mobile Technology, Responsive Web Solutions and Cloud Based Solutions. Job Responsibilities: ◾Minimum 3 years of experience in Big Data skills required. ◾Complete life cycle experience with Big Data is highly preferred ◾Skills – Hadoop, Spark, “R”, Hive, Pig, H-Base and Scala ◾Excellent communication skills ◾Ability to work independently with no-supervision.
Ixsight Technologies is an innovative IT company with strong Intellectual Property. Ixsight is focused on creating Customer Data Value through its solutions for Identity Management, Locational Analytics, Address Science and Customer Engagement. Ixsight is also adapting its solutions to Big Data and Cloud. We are in the process of creating new solutions across platforms. Ixsight has served over 80+ clients in India – for various end user applications across traditional BFSI and telecom sector. In the recent past we are catering to the new generation verticals – Hospitality, ecommerce etc. Ixsight has been featured in the Gartner’s India Technology Hype Cycle and has been recognised by both clients and peers for pioneering and excellent solutions. If you wish to play a direct part in creating new products, building IP and being part of Product Creation - Ixsight is the place.