Please note - This is a 100% remote opportunity and you can work from any location.
About the team:
You will be a part of Cactus Labs which is the R&D Cell of Cactus Communications. Cactus Labs is a high impact cell that works to solve complex technical and business problems that help keep us strategically competitive in the industry. We are a multi-cultural team spread across multiple countries. We work in the domain of AI/ML especially with Text (NLP - Natural Language Processing), Language Understanding, Explainable AI, Big Data, AR/VR etc.
The opportunity: Within Cactus Labs you will work with the Big Data team. This team manages Terabytes of data coming from different sources. We are re-orchestrating data pipelines to handle this data at scale and improve visibility and robustness. We operate across all the three Cloud Platforms and leverage the best of them.
In this role, you will get to own a component end to end. You will also get to work on cloud platform and learn to design distributed data processing systems to operate at scale.
- Collaborate with a team of Big Data Engineers, Big Data and Cloud Architects and Domain SMEs to drive the product ahead
- Stay up to date with the progress of in the domain since We work on cutting-edge technologies and are constantly trying new things out
- Build solutions for massive scale. This requires extensive benchmarking to pick the right approach
- Understand the data in and out, and make sense of it. You will at times need to draw conclusions and present it to the business users
- Be independent, self-driven and highly motivated. While you will have the best people to learn from and access to various courses or training materials, we expect you to take charge of your growth and learning.
Expectations from you:
- 1-3 Year of relevant experience in Big Data with pySpark
- Hands on experience of distributed computing and Big Data Ecosystem - Hadoop, HDFS, Spark etc
- Good understanding of data lake and their importance in a Big Data Ecosystem
- Experience of working in the Cloud Environment (AWS, Azure or GCP)
- You like to work without a lot of supervision or micromanagement.
- Above all, you get excited by data. You like to dive deep, mine patterns and draw conclusions. You believe in making data driven decisions and helping the team look for the pattern as well.
- Familiarity with search engines like Elasticsearch and Bigdata warehouses systems like AWS Athena, Google Big Query etc
- Building data pipelines using Airflow
- Experience of working in AWS Cloud Environment
- Knowledge of NLP and ML
Purpose of Job:
Responsible for drawing insights from many sources of data to answer important business
questions and help the organization make better use of data in their daily activities.
We are looking for a smart and experienced Data Engineer 1 who can work with a senior
⮚ Build DevOps solutions and CICD pipelines for code deployment
⮚ Build unit test cases for APIs and Code in Python
⮚ Manage AWS resources including EC2, RDS, Cloud Watch, Amazon Aurora etc.
⮚ Build and deliver high quality data architecture and pipelines to support business
and reporting needs
⮚ Deliver on data architecture projects and implementation of next generation BI
⮚ Interface with other teams to extract, transform, and load data from a wide variety
of data sources
Education: MS/MTech/Btech graduates or equivalent with focus on data science and
quantitative fields (CS, Eng, Math, Eco)
Work Experience: Proven 1+ years of experience in data mining (SQL, ETL, data
warehouse, etc.) and using SQL databases
⮚ Proficient in Python and SQL. Familiarity with statistics or analytical techniques
⮚ Data Warehousing Experience with Big Data Technologies (Hadoop, Hive,
Hbase, Pig, Spark, etc.)
⮚ Working knowledge of tools and utilities - AWS, DevOps with Git, Selenium,
Postman, Airflow, PySpark
⮚ Deep Curiosity and Humility
⮚ Excellent storyteller and communicator
⮚ Design Thinking
● Able to contribute to the gathering of functional requirements, developing technical
specifications, and test case planning
● Demonstrating technical expertise, and solving challenging programming and design
● 60% hands-on coding with architecture ownership of one or more products
● Ability to articulate architectural and design options, and educate development teams and
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
● Mentor and guide team members
● Work cross-functionally with various bidgely teams including product management, QA/QE,
various product lines, and/or business units to drive forward results
● BS/MS in computer science or equivalent work experience
● 8-12 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data EcoSystems.
● Past experience with Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra,
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
● Ability to lead and mentor technical team members
● Expertise with the entire Software Development Life Cycle (SDLC)
● Excellent communication skills: Demonstrated ability to explain complex technical issues to
both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Business Acumen - strategic thinking & strategy development
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
● Experience with Agile Development, SCRUM, or Extreme Programming methodologies
Minimum of 4 years’ experience of working on DW/ETL projects and expert hands-on working knowledge of ETL tools.
Experience with Data Management & data warehouse development
Star schemas, Data Vaults, RDBMS, and ODS
Change Data capture
Slowly changing dimensions
Partitioning and tuning
Vertical and horizontal scaling
Spark, Hadoop, MPP, RDBMS
Experience with Dev/OPS architecture, implementation and operation
Hand's on working knowledge of Unix/Linux
Building Complex SQL Queries. Expert SQL and data analysis skills, ability to debug and fix data issue.
Complex ETL program design coding
Experience in Shell Scripting, Batch Scripting.
Good communication (oral & written) and inter-personal skills
Expert SQL and data analysis skill, ability to debug and fix data issue Work closely with business teams to understand their business needs and participate in requirements gathering, while creating artifacts and seek business approval.
Helping business define new requirements, Participating in End user meetings to derive and define the business requirement, propose cost effective solutions for data analytics and familiarize the team with the customer needs, specifications, design targets & techniques to support task performance and delivery.
Propose good design & solutions and adherence to the best Design & Standard practices.
Review & Propose industry best tools & technology for ever changing business rules and data set. Conduct Proof of Concepts (POC) with new tools & technologies to derive convincing benchmarks.
Prepare the plan, design and document the architecture, High-Level Topology Design, Functional Design, and review the same with customer IT managers and provide detailed knowledge to the development team to familiarize them with customer requirements, specifications, design standards and techniques.
Review code developed by other programmers, mentor, guide and monitor their work ensuring adherence to programming and documentation policies.
Work with functional business analysts to ensure that application programs are functioning as defined.
Capture user-feedback/comments on the delivered systems and document it for the client and project manager’s review. Review all deliverables before final delivery to client for quality adherence.
Technologies (Select based on requirement)
Databases - Oracle, Teradata, Postgres, SQL Server, Big Data, Snowflake, or Redshift
Tools – Talend, Informatica, SSIS, Matillion, Glue, or Azure Data Factory
Utilities for bulk loading and extracting
Languages – SQL, PL-SQL, T-SQL, Python, Java, or Scala
Data Virtualization Data services development
Service Delivery - REST, Web Services
Data Virtualization Delivery – Denodo
Cloud certification Azure
Complex SQL Queries
Data Ingestion, Data Modeling (Domain), Consumption(RDMS)
1. Understand the business problem and translate these to data services and
2. Expertise in working on cloud application designs, cloud approval plans, and
systems required to manage cloud storage.
3. Explore new technologies and learn new techniques to solve business problems
4. Collaborate with different teams - engineering and business, to build better data
5. Regularly evaluate cloud applications, hardware, and software.
6. Respond to technical issues in a professional and timely manner.
7. Identify the top cloud architecture solutions to successfully meet the strategic
needs of the company.
8. Offer guidance in infrastructure movement techniques including bulk application
transfers into the cloud.
9. Manage team and handle delivery of 2-3 projects
JD | Data Architect 24-Aug-2021
Solving for better 3 of 7
Is Education overrated? Yes. We believe so. But there is no way to locate you
otherwise. So we might look for at least a computer science, computer engineering,
information technology, or relevant field along with:
1. Over 4-6 years of experience in Data handling
2. Hands-on experience of any one programming language (Python, Java, Scala)
3. Understanding of SQL is must
4. Big data (Hadoop, Hive, Yarn, Sqoop)
5. MPP platforms (Spark, Presto)
6. Data-pipeline & scheduler tool (Ozzie, Airflow, Nifi)
7. Streaming engines (Kafka, Storm, Spark Streaming)
8. Any Relational database or DW experience
9. Any ETL tool experience
10. Hands-on experience in pipeline design, ETL and application development
11. Hands-on experience in cloud platforms like AWS, GCP etc.
12. Good communication skills and strong analytical skills
13. Experience in team handling and project delivery
- We are looking for a Data Engineer to build the next-generation mobile applications for our world-class fintech product.
- The candidate will be responsible for expanding and optimising our data and data pipeline architecture, as well as optimising data flow and collection for cross-functional teams.
- The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimising data systems and building them from the ground up.
- Looking for a person with a strong ability to analyse and provide valuable insights to the product and business team to solve daily business problems.
- You should be able to work in a high-volume environment, have outstanding planning and organisational skills.
Qualifications for Data Engineer
- Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimising ‘big data’ data pipelines, architectures, and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets. Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Looking for a candidate with 2-3 years of experience in a Data Engineer role, who is a CS graduate or has an equivalent experience.
What we're looking for?
- Experience with big data tools: Hadoop, Spark, Kafka and other alternate tools.
- Experience with relational SQL and NoSQL databases, including MySql/Postgres and Mongodb.
- Experience with data pipeline and workflow management tools: Luigi, Airflow.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift.
- Experience with stream-processing systems: Storm, Spark-Streaming.
- Experience with object-oriented/object function scripting languages: Python, Java, Scala.
• 5+ years’ experience developing and maintaining modern ingestion pipeline using
technologies like Spark, Apache Nifi etc).
• 2+ years’ experience with Healthcare Payors (focusing on Membership, Enrollment, Eligibility,
• Claims, Clinical)
• Hands on experience on AWS Cloud and its Native components like S3, Athena, Redshift &
• Jupyter Notebooks
• Strong in Spark Scala & Python pipelines (ETL & Streaming)
• Strong experience in metadata management tools like AWS Glue
• String experience in coding with languages like Java, Python
• Worked on designing ETL & streaming pipelines in Spark Scala / Python
• Good experience in Requirements gathering, Design & Development
• Working with cross-functional teams to meet strategic goals.
• Experience in high volume data environments
• Critical thinking and excellent verbal and written communication skills
• Strong problem-solving and analytical abilities, should be able to work and delivery
• Good-to-have AWS Developer certified, Scala coding experience, Postman-API and Apache
Airflow or similar schedulers experience
• Nice-to-have experience in healthcare messaging standards like HL7, CCDA, EDI, 834, 835, 837
• Good communication skills
(Hadoop, HDFS, Kafka, Spark, Hive)
Overall Experience - 8 to 12 years
Relevant exp on Big data - 3+ years in above
Salary: Max up-to 20LPA
Job location - Chennai / Bangalore /
Notice Period - Immediate joiner / 15-to-20-day Max
The Responsibilities of The Senior Data Engineer Are:
- Requirements gathering and assessment
- Breakdown complexity and translate requirements to specification artifacts and story boards to build towards, using a test-driven approach
- Engineer scalable data pipelines using big data technologies including but not limited to Hadoop, HDFS, Kafka, HBase, Elastic
- Implement the pipelines using execution frameworks including but not limited to MapReduce, Spark, Hive, using Java/Scala/Python for application design.
- Mentoring juniors in a dynamic team setting
- Manage stakeholders with proactive communication upholding TheDataTeam's brand and values
A Candidate Must Have the Following Skills:
- Strong problem-solving ability
- Excellent software design and implementation ability
- Exposure and commitment to agile methodologies
- Detail oriented with willingness to proactively own software tasks as well as management tasks, and see them to completion with minimal guidance
- Minimum 8 years of experience
- Should have experience in full life-cycle of one big data application
- Strong understanding of various storage formats (ORC/Parquet/Avro)
- Should have hands on experience in one of the Hadoop distributions (Hortoworks/Cloudera/MapR)
- Experience in at least one cloud environment (GCP/AWS/Azure)
- Should be well versed with at least one database (MySQL/Oracle/MongoDB/Postgres)
- Bachelor's in Computer Science, and preferably, a Masters as well - Should have good code review and debugging skills
Additional skills (Good to have):
- Experience in Containerization (docker/Heroku)
- Exposure to microservices
- Exposure to DevOps practices - Experience in Performance tuning of big data applications
Roles and responsibilities:
- Responsible for development and maintenance of applications with technologies involving Enterprise Java and Distributed technologies.
- Experience in Hadoop, Kafka, Spark, Elastic Search, SQL, Kibana, Python, experience w/ machine learning and Analytics etc.
- Collaborate with developers, product manager, business analysts and business users in conceptualizing, estimating and developing new software applications and enhancements..
- Collaborate with QA team to define test cases, metrics, and resolve questions about test results.
- Assist in the design and implementation process for new products, research and create POC for possible solutions.
- Develop components based on business and/or application requirements
- Create unit tests in accordance with team policies & procedures
- Advise, and mentor team members in specialized technical areas as well as fulfill administrative duties as defined by support process
- Work with cross-functional teams during crisis to address and resolve complex incidents and problems in addition to assessment, analysis, and resolution of cross-functional issues.