Big Data Engineer
at Altimetrik
Big Data Engineer: 5+ yrs.
Immediate Joiner
- Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
- Experience in developing lambda functions with AWS Lambda
- Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
- Should be able to code in Python and Scala.
- Snowflake experience will be a plus
- We can start keeping Hadoop and Hive requirements as good to have or understanding of is enough rather than keeping it as a desirable requirement.
Similar jobs
Job Description: Data Engineer
We are looking for a curious Data Engineer to join our extremely fast-growing Tech Team at StanPlus
About RED.Health (Formerly Stanplus Technologies)
Get to know the team:
Join our team and help us build the world’s fastest and most reliable emergency response system using cutting-edge technology.
Because every second counts in an emergency, we are building systems and flows with 4 9s of reliability to ensure that our technology is always there when people need it the most. We are looking for distributed systems experts who can help us perfect the architecture behind our key design principles: scalability, reliability, programmability, and resiliency. Our system features a powerful dispatch engine that connects emergency service providers with patients in real-time
.
Key Responsibilities
● Build Data ETL Pipelines
● Develop data set processes
● Strong analytic skills related to working with unstructured datasets
● Evaluate business needs and objectives
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery
● Interpret trends and patterns
● Work with data and analytics experts to strive for greater functionality in our data system
● Build algorithms and prototypes
● Explore ways to enhance data quality and reliability
● Work with the Executive, Product, Data, and D esign teams, to assist with data-related technical issues and support their data infrastructure needs.
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Key Requirements
● Proven experience as a data engineer, software developer, or similar of at least 3 years.
● Bachelor's / Master’s degree in data engineering, big data analytics, computer engineering, or related field.
● Experience with big data tools: Hadoop, Spark, Kafka, etc.
● Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
● Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
● Experience with Azure, AWS cloud services: EC2, EMR, RDS, Redshift
● Experience with BigQuery
● Experience with stream-processing systems: Storm, Spark-Streaming, etc.
● Experience with languages: Python, Java, C++, Scala, SQL, R, etc.
● Good hands-on with Hive, Presto.
Daily and monthly responsibilities
- Review and coordinate with business application teams on data delivery requirements.
- Develop estimation and proposed delivery schedules in coordination with development team.
- Develop sourcing and data delivery designs.
- Review data model, metadata and delivery criteria for solution.
- Review and coordinate with team on test criteria and performance of testing.
- Contribute to the design, development and completion of project deliverables.
- Complete in-depth data analysis and contribution to strategic efforts
- Complete understanding of how we manage data with focus on improvement of how data is sourced and managed across multiple business areas.
Basic Qualifications
- Bachelor’s degree.
- 5+ years of data analysis working with business data initiatives.
- Knowledge of Structured Query Language (SQL) and use in data access and analysis.
- Proficient in data management including data analytical capability.
- Excellent verbal and written communications also high attention to detail.
- Experience with Python.
- Presentation skills in demonstrating system design and data analysis solutions.
Requirements:
● Understanding our data sets and how to bring them together.
● Working with our engineering team to support custom solutions offered to the product development.
● Filling the gap between development, engineering and data ops.
● Creating, maintaining and documenting scripts to support ongoing custom solutions.
● Excellent organizational skills, including attention to precise details
● Strong multitasking skills and ability to work in a fast-paced environment
● 5+ years experience with Python to develop scripts.
● Know your way around RESTFUL APIs.[Able to integrate not necessary to publish]
● You are familiar with pulling and pushing files from SFTP and AWS S3.
● Experience with any Cloud solutions including GCP / AWS / OCI / Azure.
● Familiarity with SQL programming to query and transform data from relational Databases.
● Familiarity to work with Linux (and Linux work environment).
● Excellent written and verbal communication skills
● Extracting, transforming, and loading data into internal databases and Hadoop
● Optimizing our new and existing data pipelines for speed and reliability
● Deploying product build and product improvements
● Documenting and managing multiple repositories of code
● Experience with SQL and NoSQL databases (Casendra, MySQL)
● Hands-on experience in data pipelining and ETL. (Any of these frameworks/tools: Hadoop, BigQuery,
RedShift, Athena)
● Hands-on experience in AirFlow
● Understanding of best practices, common coding patterns and good practices around
● storing, partitioning, warehousing and indexing of data
● Experience in reading the data from Kafka topic (both live stream and offline)
● Experience in PySpark and Data frames
Responsibilities:
You’ll
● Collaborating across an agile team to continuously design, iterate, and develop big data systems.
● Extracting, transforming, and loading data into internal databases.
● Optimizing our new and existing data pipelines for speed and reliability.
● Deploying new products and product improvements.
● Documenting and managing multiple repositories of code.
- Play a critical role as a member of the leadership team in shaping and supporting our overall company vision, day-to-day operations, and culture.
- Set the technical vision and build the technical product roadmap from launch to scale; including defining long-term goals and strategies
- Define best practices around coding methodologies, software development, and quality assurance
- Define innovative technical requirements and systems while balancing time, feasibility, cost and customer experience
- Build and support production products
- Ensure our internal processes and services comply with privacy and security regulations
- Establish a high performing, inclusive engineering culture focused on innovation, execution, growth and development
- Set a high bar for our overall engineering practices in support of our mission and goals
- Develop goals, roadmaps and delivery dates to help us scale quickly and sustainably
- Collaborate closely with Product, Business, Marketing and Data Science
- Experience with financial and transactional systems
- Experience engineering for large volumes of data at scale
- Experience with financial audit and compliance is a plus
- Experience building a successful consumer facing web and mobile apps at scale
BRIEF DESCRIPTION:
At-least 1 year of Python, Spark, SQL, data engineering experience
Primary Skillset: PySpark, Scala/Python/Spark, Azure Synapse, S3, RedShift/Snowflake
Relevant Experience: Legacy ETL job Migration to AWS Glue / Python & Spark combination
ROLE SCOPE:
Reverse engineer the existing/legacy ETL jobs
Create the workflow diagrams and review the logic diagrams with Tech Leads
Write equivalent logic in Python & Spark
Unit test the Glue jobs and certify the data loads before passing to system testing
Follow the best practices, enable appropriate audit & control mechanism
Analytically skillful, identify the root causes quickly and efficiently debug issues
Take ownership of the deliverables and support the deployments
REQUIREMENTS:
Create data pipelines for data integration into Cloud stacks eg. Azure Synapse
Code data processing jobs in Azure Synapse Analytics, Python, and Spark
Experience in dealing with structured, semi-structured, and unstructured data in batch and real-time environments.
Should be able to process .json, .parquet and .avro files
PREFERRED BACKGROUND:
Tier1/2 candidates from IIT/NIT/IIITs
However, relevant experience, learning attitude takes precedence
applied research.
● Understand, apply and extend state-of-the-art NLP research to better serve our customers.
● Work closely with engineering, product, and customers to scientifically frame the business problems and come up with the underlying AI models.
● Design, implement, test, deploy, and maintain innovative data and machine learning solutions to accelerate our business.
● Think creatively to identify new opportunities and contribute to high-quality publications or patents.
Desired Qualifications and Experience
● At Least 1 year of professional experience.
● Bachelors in Computer Science or related fields from the top colleges.
● Extensive knowledge and practical experience in one or more of the following areas: machine learning, deep learning, NLP, recommendation systems, information retrieval.
● Experience applying ML to solve complex business problems from scratch.
● Experience with Python and a deep learning framework like Pytorch/Tensorflow.
● Awareness of the state of the art research in the NLP community.
● Excellent verbal and written communication and presentation skills.
As an experienced Data Scientist you’ll join a team of data scientists, analysts, and software engineers
working to push the boundaries of data science in health care. We like to experiment, iterate, and
innovate with technology, from developing new algorithms specific to health care’s challenges, to
bringing the latest machine learning practices and applications developed in other industries into the
health care world. We know that algorithms are only valuable when powered by the right data, so we
focus on fully understanding the problems we need to solve, and truly understanding the data behind
them before launching into solutions – ensuring that the solutions we do land on are impactful and
powerful
Essential functions
• Research, conceptualize, and implement analytical approaches and predictive modeling to
evaluate scenarios, predict utilization and clinical outcomes, and recommend actions to impact
results.
• Manage and execute on the entire model development process, including scope definition,
hypothesis formation, data cleaning and preparation, feature selection, model implementation
in production, validation and iteration, using multiple data sources.
• Provide guidance on necessary data and software infrastructure capabilities to deliver a scalable
solution across partners and support the implementation of the team’s algorithms and models
• Contribute to the development and publication in major journals, conferences showcasing
leadership in healthcare data science.
• Work closely and collaborate with Data Scientists, Machine Learning engineers, IT teams and
Business stakeholders spread out across various locations in US and India to achieve business
goals
• Provide guidance to other Data Scientist and Machine Learning Engineers
Job Description:
Roles & Responsibilities:
· You will be involved in every part of the project lifecycle, right from identifying the business problem and proposing a solution, to data collection, cleaning, and preprocessing, to training and optimizing ML/DL models and deploying them to production.
· You will often be required to design and execute proof-of-concept projects that can demonstrate business value and build confidence with CloudMoyo’s clients.
· You will be involved in designing and delivering data visualizations that utilize the ML models to generate insights and intuitively deliver business value to CXOs.
Desired Skill Set:
· Candidates should have strong Python coding skills and be comfortable working with various ML/DL frameworks and libraries.
· Hands-on skills and industry experience in one or more of the following areas is necessary:
1) Deep Learning (CNNs/RNNs, Reinforcement Learning, VAEs/GANs)
2) Machine Learning (Regression, Random Forests, SVMs, K-means, ensemble methods)
3) Natural Language Processing
4) Graph Databases (Neo4j, Apache Giraph)
5) Azure Bot Service
6) Azure ML Studio / Azure Cognitive Services
7) Log Analytics with NLP/ML/DL
· Previous experience with data visualization, C# or Azure Cloud platform and services will be a plus.
· Candidates should have excellent communication skills and be highly technical, with the ability to discuss ideas at any level from executive to developer.
· Creative problem-solving, unconventional approaches and a hacker mindset is highly desired.