- Design, construct, install, test and maintain data pipeline and data management systems.
- Ensure that all systems meet the business/company requirements as well as industry practices.
- Integrate up-and-coming data management and software engineering technologies into existing data structures.
- Processes for data mining, data modeling, and data production.
- Create custom software components and analytics applications.
- Collaborate with members of your team (eg, Data Architects, the Software team, Data Scientists) on the project's goals.
- Recommend different ways to constantly improve data reliability and quality.
- Experience in a related field with real-world skills and testimonials from former employees.
- Familiar with data warehouses like Redshift, Bigquery and Athena.
- Familiar with data processing systems like flink, spark and storm. Develop set
- Proficiency in Python and SQL. Possible work experience and proof of technical expertise.
- You may also consider a Master's degree in computer engineering or science in order to fine-tune your skills while on the job. (Although a Master's isn't required, it is always appreciated).
- Intellectual curiosity to find new and unusual ways of how to solve data management issues.
- Ability to approach data organization challenges while keeping an eye on what's important.
- Minimal data science knowledge is a Must, should understand a bit of analytics.
We were founded in 2012 by Tushar Vashisht and Sachin Shenoy, and incubated by Microsoft Accelerator.
Today, we happen to be India's largest and most loved health & fitness app with over 4 million users from
220+ cities in India. What makes us unique is our ability to bring together the power of artificial
intelligence powered technology and human empathy to deliver measurable impact in our customers'
lives. We do this through our team of elite nutritionists & trainers working together with the world's first
AI powered virtual nutritionist - "Ria", our proudest creation till date. Ria references data from over 200
million food & workout logs and 14 million conversations to deliver intelligent health & fitness suggestions
to our customers. Ria also happens to be multi-lingual, "she" understands English, French, German, Italian
Recently Russia's Sistema and Samsung's AI focussed fund - NEXT, led a USD 12 Million Series B funding
into our business. We are the most liked app in India across categories, we've been consistently rated as
the no:1 health & fitness app on playstore for 3 years running and received Google's "editor's choice
award" in 2017. Some of the marquee corporates in the country such as Cognizant, Accenture, Deloitte,
Metlife amongst others have also benefited from our employee engagement and wellness programs. Our
global aspirations have taken us to MENA, SEA and LATAM regions with more markets to follow.
Company website www.healthifyme.com
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform.
- Driving optimization, testing and tooling to improve quality.
- Reviewing and approving high level & amp; detailed design to ensure that the solution delivers to the business needs and aligns to the data & analytics architecture principles and roadmap.
- Understanding business requirements and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements.
- Following proper SDLC (Code review, sprint process).
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc.
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users.
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in the Hadoop platform.
- Supporting and contributing to development guidelines and standards for data ingestion.
- Working with a data scientist and business analytics team to assist in data ingestion and data related technical issues.
- Designing and documenting the development & deployment flow.
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
The fastest rising startup in the EdTech space, focussed on Engineering and Government Job Exams and with an eye to capture UPSC, PSC, and international exams. Testbook is poised to revolutionize the industry. With a registered user base of over 2.2 Crore students, more than 450 crore questions solved on the WebApp, and a knockout Android App. Testbook has raced to the front and is ideally placed to capture bigger markets.
Testbook is the perfect incubator for talent. You come, you learn, you conquer. You train under the best mentors and become an expert in your field in your own right. That being said, the flexibility in the projects you choose, how and when you work on them, what you want to add to them is respected in this startup. You are the sole master of your work.
The IIT pedigree of the co-founders has attracted some of the brightest minds in the country to Testbook. A team that is quickly swelling in ranks, it now stands at 500+ in-house employees and hundreds of remote interns and freelancers. And the number is rocketing weekly. Now is the time to join the force.
In this role you will get to:-
- Work with state-of-the-art data frameworks and technologies like Dataflow(Apache Beam), Dataproc(Apache Spark & Hadoop), Apache Kafka, Google PubSub, Apache Airflow, and others.
- You will work cross-functionally with various teams, creating solutions that deal with large volumes of data.
- You will work with the team to set and maintain standards and development practices.
- You will be a keen advocate of quality and continuous improvement.
- You will modernize the current data systems to develop Cloud-enabled Data and Analytics solutions
- Drive the development of cloud-based data lake, hybrid data warehouses & business intelligence platforms
- Improve upon the data ingestion models, ETL jobs, and alerts to maintain data integrity and data availability
- Build Data Pipelines to ingest structured and Unstructured Data.
- Gain hands-on experience with new data platforms and programming languages
- Analyze and provide data-supported recommendations to improve product performance and customer acquisition
- Design, Build and Support resilient production-grade applications and web services
Who you are:-
- 1+ years of work experience in Software Engineering and development.
- Very strong understanding of Python & pandas library.Good understanding of Scala, R, and other related languages
- Experience with data transformation & data analytics in both batch & streaming mode using cloud-native technologies.
- Strong experience with the big data technologies like Hadoop, Spark, BigQuery, DataProc, Dataflow
- Strong analytical and communication skills.
- Experience working with large, disconnected, and/or unstructured datasets.
- Experience building and optimizing data pipelines, architectures, and data sets using cloud-native technologies.
- Hands-on experience with any cloud tech like GCP/AWS is a plus.
GCP Data Analyst profile must have below skills sets :
- Knowledge of programming languages like https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Ftutorials%2Fsql-tutorial%2Fhow-to-become-sql-developer&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EImfaJAD1KHOyrBQ7FkbaPl1STtfnf4QdQlbjw72%2BmE%3D&reserved=0" target="_blank">SQL, Oracle, R, MATLAB, Java and https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Fwhy-learn-python-a-guide-to-unlock-your-python-career-article&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z2n1Xy%2F3YN6nQqSweU5T7EfUTa1kPAAjbCMTWxDCh%2FY%3D&reserved=0" target="_blank">Python
- Data cleansing, data visualization, data wrangling
- Data modeling , data warehouse concepts
- Adapt to Big data platform like Hadoop, Spark for stream & batch processing
- GCP (Cloud Dataproc, Cloud Dataflow, Cloud Datalab, Cloud Dataprep, BigQuery, Cloud Datastore, Cloud Datafusion, Auto ML etc)
- Creating, designing and developing data models
- Prepare plans for all ETL (Extract/Transformation/Load) procedures and architectures
- Validating results and creating business reports
- Monitoring and tuning data loads and queries
- Develop and prepare a schedule for a new data warehouse
- Analyze large databases and recommend appropriate optimization for the same
- Administer all requirements and design various functional specifications for data
- Provide support to the Software Development Life cycle
- Prepare various code designs and ensure efficient implementation of the same
- Evaluate all codes and ensure the quality of all project deliverables
- Monitor data warehouse work and provide subject matter expertise
- Hands-on BI practices, data structures, data modeling, SQL skills
5 Years - 10 Years
Must have Skills: SQL
Hard Skills for a Data Warehouse Developer:
Soft Skills for Data Warehouse Developers:
- Strong Python Coding skills and OOP skills
- Should have worked on Big Data product Architecture
- Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
- NoSQL-based databases such as Cassandra, Elasticsearch etc.
- Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
- Experience on development of ETL for data product
- Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
- Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
- Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
- Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)
- Python and Scala (Optional), Spark / PySpark, Parallel programming
- Architect, Design and Build high performance Search systems for personalization, optimization, and targeting
- Designing systems with Solr, Akka, Cassandra, Kafka
- Algorithmic development with primary focus Machine Learning
- Working with rapid and innovative development methodologies like: Kanban, Continuous Integration and Daily deployments
- Participation in design and code reviews and recommend improvements
- Unit testing with JUnit, Performance testing and tuning
- Coordination with internal and external teams
- Mentoring junior engineers
- Participate in Product roadmap and Prioritization discussions and decisions
- Evangelize the solution with Professional services and Customer Success teams
What's the role?
Your role as a Principal Engineer will involve working with various team. As a principal engineer, will need full knowledge of the software development lifecycle and Agile methodologies. You will demonstrate multi-tasking skills under tight deadlines and constraints. You will regularly contribute to the development of work products (including analyzing, designing, programming, debugging, and documenting software) and may work with customers to resolve challenges and respond to suggestions for improvements and enhancements. You will setup the standard and principal for the product he/she drives.
- Setup coding practice, guidelines & quality of the software delivered.
- Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions.
- Documents and demonstrates solutions by developing documentation, flowcharts, layouts, diagrams, charts, code comments and clear code.
- Prepares and installs solutions by determining and designing system specifications, standards, and programming.
- Improves operations by conducting systems analysis; recommending changes in policies and procedures.
- Updates job knowledge by studying state-of-the-art development tools, programming techniques, and computing equipment; participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
- Protects operations by keeping information confidential.
- Develops software solutions by studying information needs; conferring with users; studying systems flow, data usage, and work processes; investigating problem areas; following the software development lifecycle. Who are you? You are a go-getter, with an eye for detail, strong problem-solving and debugging skills, and having a degree in BE/MCA/M.E./ M Tech degree or equivalent degree from reputed college/university.
Essential Skills / Experience:
- 10+ years of engineering experience
- Experience in designing and developing high volume web-services using API protocols and data formats
- Proficient in API modelling languages and annotation
- Proficient in Java programming
- Experience with Scala programming
- Experience with ETL systems
- Experience with Agile methodologies
- Experience with Cloud service & storage
- Proficient in Unix/Linux operating systems
- Excellent oral and written communication skills Preferred:
- Functional programming languages (Scala, etc)
- Scripting languages (bash, Perl, Python, etc)
- Amazon Web Services (Redshift, ECS etc)
Pipelines should be optimised to handle both real time data, batch update data and historical data.
Establish scalable, efficient, automated processes for complex, large scale data analysis.
Write high quality code to gather and manage large data sets (both real time and batch data) from multiple sources, perform ETL and store it in a data warehouse.
Manipulate and analyse complex, high-volume, high-dimensional data from varying sources using a variety of tools and data analysis techniques.
Participate in data pipelines health monitoring and performance optimisations as well as quality documentation.
Interact with end users/clients and translate business language into technical requirements.
Acts independently to expose and resolve problems.
Job Requirements :-
2+ years experience working in software development & data pipeline development for enterprise analytics.
2+ years of working with Python with exposure to various warehousing tools
In-depth working with any of commercial tools like AWS Glue, Ta-lend, Informatica, Data-stage, etc.
Experience with various relational databases like MySQL, MSSql, Oracle etc. is a must.
Experience with analytics and reporting tools (Tableau, Power BI, SSRS, SSAS).
Experience in various DevOps practices helping the client to deploy and scale the systems as per requirement.
Strong verbal and written communication skills with other developers and business client.
Knowledge of Logistics and/or Transportation Domain is a plus.
Hands-on with traditional databases and ERP systems like Sybase and People-soft.