India's No. 1 Loans & Cards Marketplace
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
Please note - This is a 100% remote opportunity and you can work from any location.
About the team:
You will be a part of Cactus Labs which is the R&D Cell of Cactus Communications. Cactus Labs is a high impact cell that works to solve complex technical and business problems that help keep us strategically competitive in the industry. We are a multi-cultural team spread across multiple countries. We work in the domain of AI/ML especially with Text (NLP - Natural Language Processing), Language Understanding, Explainable AI, Big Data, AR/VR etc.
The opportunity: Within Cactus Labs you will work with the Big Data team. This team manages Terabytes of data coming from different sources. We are re-orchestrating data pipelines to handle this data at scale and improve visibility and robustness. We operate across all the three Cloud Platforms and leverage the best of them.
In this role, you will get to own a component end to end. You will also get to work on could platform and learn to design distributed data processing systems to operate at scale.
- Build and maintain robust data processing pipelines at scale
- Collaborate with a team of Big Data Engineers, Big Data and Cloud Architects and Domain SMEs to drive the product ahead
- Follow best practices in building and optimize existing processes
- Stay up to date with the progress in the domain since we work on cutting-edge technologies and are constantly trying new things out
- Build solutions for massive scale. This requires extensive benchmarking to pick the right approach
- Understand the data in and out and make sense of it. You will at times need to draw conclusions and present it to the business users
- Be independent, self-driven and highly motivated. While you will have the best people to learn from and access to various courses or training materials, we expect you to take charge of your growth and learning.
Expectations from you:
- 4-7 Years of relevant experience in Big Data with Java
- Highly proficient in distributed computing and Big Data Ecosystem - Hadoop, HDFS, Apache Spark
- Good understanding of data lake and their importance in a Big Data Ecosystem
- Being able to mentor junior team members and review their code
- Experience in working in a Cloud Environment (AWS, Azure or GCP)
- You like to work without a lot of supervision or micromanagement.
- Above all, you get excited by data. You like to dive deep, mine patterns and draw conclusions. You believe in making data driven decisions and helping the team look for the pattern as well.
- Familiarity with search engines like Elasticsearch and Bigdata warehouses systems like AWS Athena, Google Big Query etc
- Building data pipelines using Airflow
- Experience of working in AWS Cloud Environment.
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
* Formulates and recommends standards for achieving maximum performance
and efficiency of the DW ecosystem.
* Participates in the Pre-sales activities for solutions of various customer
* Develop business cases and ROI for the customer/clients.
* Interview stakeholders and develop BI roadmap for success given project
* Evangelize self-service BI and visual discovery while helping to automate any
manual process at the client site.
* Work closely with the Engineering Manager to ensure prioritization of
* Champion data quality, integrity, and reliability throughout the organization by
designing and promoting best practices.
* Help DW/DE team members with issues needing technical expertise or
complex systems and/or programming knowledge.
* Provide on-the-job training for new or less experienced team members.
* Develop a technical excellence team
- experience designing business intelligence solutions
- experience with ETL Process, Data warehouse architecture
- experience with Azure Data services i.e., ADF, ADLS Gen 2, Azure SQL dB,
Synapse, Azure Databricks, and Power BI
- Good analytical and problem-solving skills
- Fluent in relational database concepts and flat file processing concepts
- Must be knowledgeable in software development lifecycles/methodologies
6 to 8 years of relevant work experience in ETL tools
Good knowledge working in AWS Cloud Data Bases like Aurora DB and ecosystem and tools (AWS DMS)
Migrating databases to AWS Cloud would be Mandatory
Sound knowledge of SQL and procedural language.
Possess solid experience of writing complex SQL queries and optimizing SQL query performance
Knowledge of data ingestion one-off feed, change data capture, incremental batch
Additional Skills :
Experience in Unix/Linux systems and writing shell scripts would be nice to have
Java knowledge would be an added advantage
Knowledge in Spark Python for building ETL pipelines on cloud would be preferable
Client An IT Services Major, hiring for a leading insurance player.
Position: SENIOR CONSULTANT
- Azure admin- senior consultant with HD Insights(Big data)
Skills and Experience
- Microsoft Azure Administrator certification
- Bigdata project experience in Azure HDInsight Stack. big data processing frameworks such as Spark, Hadoop, Hive, Kafka or Hbase.
- Preferred: Insurance or BFSI domain experience
- 5 to 5 years of experience is required.
Our Kafka developer has a combination of technical skills, communication skills and business knowledge. The developer should be able to work on multiple medium to large projects. The successful candidate will have excellent technical skills of Apache/Confluent Kafka, Enterprise Data WareHouse preferable GCP BigQuery or any equivalent Cloud EDW and also will be able to take oral and written business requirements and develop efficient code to meet set deliverables.
Must Have Skills
- Participate in the development, enhancement and maintenance of data applications both as an individual contributor and as a lead.
- Leading in the identification, isolation, resolution and communication of problems within the production environment.
- Leading developer and applying technical skills Apache/Confluent Kafka (Preferred) AWS Kinesis (Optional), Cloud Enterprise Data Warehouse Google BigQuery (Preferred) or AWS RedShift or SnowFlakes (Optional)
- Design recommending best approach suited for data movement from different sources to Cloud EDW using Apache/Confluent Kafka
- Performs independent functional and technical analysis for major projects supporting several corporate initiatives.
- Communicate and Work with IT partners and user community with various levels from Sr Management to detailed developer to business SME for project definition .
- Works on multiple platforms and multiple projects concurrently.
- Performs code and unit testing for complex scope modules, and projects
- Provide expertise and hands on experience working on Kafka connect using schema registry in a very high volume environment (~900 Million messages)
- Provide expertise in Kafka brokers, zookeepers, KSQL, KStream and Kafka Control center.
- Provide expertise and hands on experience working on AvroConverters, JsonConverters, and StringConverters.
- Provide expertise and hands on experience working on Kafka connectors such as MQ connectors, Elastic Search connectors, JDBC connectors, File stream connector, JMS source connectors, Tasks, Workers, converters, Transforms.
- Provide expertise and hands on experience on custom connectors using the Kafka core concepts and API.
- Working knowledge on Kafka Rest proxy.
- Ensure optimum performance, high availability and stability of solutions.
- Create topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of best practices.
- Create stubs for producers, consumers and consumer groups for helping onboard applications from different languages/platforms. Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other things in the Hadoop ecosystem.
- Use automation tools like provisioning using Jenkins, Udeploy or relevant technologies
- Ability to perform data related benchmarking, performance analysis and tuning.
- Strong skills in In-memory applications, Database Design, Data Integration.
• Responsible for developing and maintaining applications with PySpark
Must Have Skills: