WHAT YOU WILL DO:
-
● Create and maintain optimal data pipeline architecture.
-
● Assemble large, complex data sets that meet functional / non-functional business requirements.
-
● Identify, design, and implement internal process improvements: automating manual processes,
optimizing data delivery, re-designing infrastructure for greater scalability, etc.
-
● Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide
variety of data sources using Spark,Hadoop and AWS 'big data' technologies.(EC2, EMR, S3, Athena).
-
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition,
operational efficiency and other key business performance metrics.
-
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with
data-related technical issues and support their data infrastructure needs.
-
● Keep our data separated and secure across national boundaries through multiple data centers and AWS
regions.
-
● Create data tools for analytics and data scientist team members that assist them in building and
optimizing our product into an innovative industry leader.
-
● Work with data and analytics experts to strive for greater functionality in our data systems.
REQUIRED SKILLS & QUALIFICATIONS:
-
● 5+ years of experience in a Data Engineer role.
-
● Advanced working SQL knowledge and experience working with relational databases, query authoring
(SQL) as well as working familiarity with a variety of databases.
-
● Experience building and optimizing 'big data' data pipelines, architectures and data sets.
-
● Experience performing root cause analysis on internal and external data and processes to answer
specific business questions and identify opportunities for improvement.
-
● Strong analytic skills related to working with unstructured datasets.
-
● Build processes supporting data transformation, data structures, metadata, dependency and workload
management.
-
● A successful history of manipulating, processing and extracting value from large disconnected datasets.
-
● Working knowledge of message queuing, stream processing, and highly scalable 'big data' data stores.
-
● Strong project management and organizational skills.
-
● Experience supporting and working with cross-functional teams in a dynamic environment
-
● Experience with big data tools: Hadoop, Spark, Pig, Vetica, etc.
-
● Experience with AWS cloud services: EC2, EMR, S3, Athena
-
● Experience with Linux
-
● Experience with object-oriented/object function scripting languages: Python, Java, Shell, Scala, etc.
PREFERRED SKILLS & QUALIFICATIONS:
● Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.
About Astegic
Similar jobs
delivered.
• You will utilize your configuration management and software release experience; as well as
change management concepts to drive the success of the projects.
• You will partner with senior leaders to understand and communicate the business needs to
translate them into IT requirements. Consult with Customer’s Business Analysts on their Data
warehouse requirements
• You will assist the technical team in identification and resolution of Data Quality issues.
• You will manage small to medium-sized projects relating to the delivery of applications or
application changes.
• You will use Managed Services or 3rd party resources to meet application support requirements.
• You will interface daily with multi-functional team members within the EDW team and across the
enterprise to resolve issues.
• Recommend and advocate different approaches and designs to the requirements
• Write technical design docs
• Execute Data modelling
• Solution inputs for the presentation layer
• You will craft and generate summary, statistical, and presentation reports; as well as provide reporting and metrics for strategic initiatives.
• Performs miscellaneous job-related duties as assigned
Preferred Qualifications
• Strong interpersonal, teamwork, organizational and workload planning skills
• Strong analytical, evaluative, and problem-solving abilities as well as exceptional customer service orientation
• Ability to drive clarity of purpose and goals during release and planning activities
• Excellent organizational skills including ability to prioritize tasks efficiently with high level of attention to detail
• Excited by the opportunity to continually improve processes within a large company
• Healthcare background/ Automobile background.
• Familiarity with major big data solutions and products available in the market.
• Proven ability to drive continuous
5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.
2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.
Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.
Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.
Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.
Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.
Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.
Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.
Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.
Experience or exposure to design and development using Full Stack tools.
Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.
Bachelor's or master's degree in computer science or related field.
- Creating and managing ETL/ELT pipelines based on requirements
- Build PowerBI dashboards and manage datasets needed.
- Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
- Build data cubes for real-time visualisation needs and CXO dashboards.
Required Tech Skills
- Microsoft PowerBI & DAX
- Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
- Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory
Senior Data Engineer
Responsibilities:
● Clean, prepare and optimize data at scale for ingestion and consumption by machine learning models
● Drive the implementation of new data management projects and re-structure of the current data architecture
● Implement complex automated workflows and routines using workflow scheduling tools
● Build continuous integration, test-driven development and production deployment frameworks
● Drive collaborative reviews of design, code, test plans and dataset implementation performed by other data engineers in support of maintaining data engineering standards
● Anticipate, identify and solve issues concerning data management to improve data quality
● Design and build reusable components, frameworks and libraries at scale to support machine learning products
● Design and implement product features in collaboration with business and Technology stakeholders
● Analyze and profile data for the purpose of designing scalable solutions
● Troubleshoot complex data issues and perform root cause analysis to proactively resolve product and operational issues
● Mentor and develop other data engineers in adopting best practices
● Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
Qualifications:
● 8+ years of experience developing scalable Big Data applications or solutions on distributed platforms
● Experience in Google Cloud Platform (GCP) and good to have other cloud platform tools
● Experience working with Data warehousing tools, including DynamoDB, SQL, and Snowflake
● Experience architecting data products in Streaming, Serverless and Microservices Architecture and platform.
● Experience with Spark (Scala/Python/Java) and Kafka
● Work experience with using Databricks (Data Engineering and Delta Lake components)
● Experience working with Big Data platforms, including Dataproc, Data Bricks etc
● Experience working with distributed technology tools including Spark, Presto, Databricks, Airflow
● Working knowledge of Data warehousing, Data modeling
● Experience working in Agile and Scrum development process
● Bachelor's degree in Computer Science, Information Systems, Business, or other relevant subject area
Role:
Senior Data Engineer
Total No. of Years:
8+ years of relevant experience
To be onboarded by:
Immediate
Notice Period:
Skills
Mandatory / Desirable
Min years (Project Exp)
Max years (Project Exp)
GCP Exposure
Mandatory Min 3 to 7
BigQuery, Dataflow, Dataproc, AI Building Blocks, Looker, Cloud Data Fusion, Dataprep .Spark and PySpark
Mandatory Min 5 to 9
Relational SQL
Mandatory Min 4 to 8
Shell scripting language
Mandatory Min 4 to 8
Python /scala language
Mandatory Min 4 to 8
Airflow/Kubeflow workflow scheduling tool
Mandatory Min 3 to 7
Kubernetes
Desirable 1 to 6
Scala
Mandatory Min 2 to 6
Databricks
Desirable Min 1 to 6
Google Cloud Functions
Mandatory Min 2 to 6
GitHub source control tool
Mandatory Min 4 to 8
Machine Learning
Desirable 1 to 6
Deep Learning
Desirable Min 1to 6
Data structures and algorithms
Mandatory Min 4 to 8
JOB DESCRIPTION
Product Analyst
About Us:-
"Slack for Construction"
Early stage startup cofounded by IIT - Roorkee alumnis. A Mobile-based operating system to manage construction & architectural projects. Material, all the info is shared over whatsapp, mobile app to manage all this in one single place - almost like a slack tool for construction.Mobile app + SAAS platform - administration and management of the process, 150000 users, subscription based pricing.It helps construction project owners and contractors track on-site progress in real-time to finish projects on time and in budget. We aim to bring the speed of software development to infrastructure development.Founded by IIT Roorkee alumni and backed by industry experts, we are on a mission to help the second largest industry in India-Construction make a transition from pen and paper to digital.
About the team
As a productivity app startup, we value productivity and ownership most. That helps raise our own bar and the bar of people we hire.We follow agile and scrum approaches for product development and use best of class tools and practices. Measuring our progress on a weekly basis and iterating fast enables us to build breakthrough modules and features rapidly.If you join us, You will be constantly thrown into challenging situations. Decisions that you take, will directly impact our clients and sales. That's how we learn.
Techstack -
- Prior experience in any data driven decision making field.
- Working knowledge of querying data using SQL.
- Familiarity with customer and business data analytic tools like Segment, Mix-panel, Google Analytics, SmartLook etc.
- Data visualisation tools like Tableau, Power BI, etc.
Responsibility -
"All things data"
- Ability to synthesize complex data into actionable goals.
- Critical thinking skills to recommend original and productive ideas
- Ability to visualise user stories and create user funnels
- Perform user test sessions and market surveys to inform product development teams
- Excellent writing skills to prepare detailed product specification and analytic reports
- Help define Product strategy / Roadmaps with scalable architecture
- Interpersonal skills to work collaboratively with various stakeholders who may have competing interests
Job Description
We are looking for an experienced engineer with superb technical skills. Primarily be responsible for architecting and building large scale data pipelines that delivers AI and Analytical solutions to our customers. The right candidate will enthusiastically take ownership in developing and managing a continuously improving, robust, scalable software solutions.
Although your primary responsibilities will be around back-end work, we prize individuals who are willing to step in and contribute to other areas including automation, tooling, and management applications. Experience with or desire to learn Machine Learning a plus.
Skills
- Bachelors/Masters/Phd in CS or equivalent industry experience
- Demonstrated expertise of building and shipping cloud native applications
- 5+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka Streams, Py Spark, and streaming databases like druid or equivalent like Hive
- Strong industry expertise with containerization technologies including kubernetes (EKS/AKS), Kubeflow
- Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
- 5+ Industry experience in python
- Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
- Experience with scripting languages. Python experience highly desirable. Experience in API development using Swagger
- Implementing automated testing platforms and unit tests
- Proficient understanding of code versioning tools, such as Git
- Familiarity with continuous integration, Jenkins
Responsibilities
- Architect, Design and Implement Large scale data processing pipelines using Kafka Streams, PySpark, Fluentd and Druid
- Create custom Operators for Kubernetes, Kubeflow
- Develop data ingestion processes and ETLs
- Assist in dev ops operations
- Design and Implement APIs
- Identify performance bottlenecks and bugs, and devise solutions to these problems
- Help maintain code quality, organization, and documentation
- Communicate with stakeholders regarding various aspects of solution.
- Mentor team members on best practices
- We are looking for a Data Engineer with 3-5 years experience in Python, SQL, AWS (EC2, S3, Elastic Beanstalk, API Gateway), and Java.
- The applicant must be able to perform Data Mapping (data type conversion, schema harmonization) using Python, SQL, and Java.
- The applicant must be familiar with and have programmed ETL interfaces (OAUTH, REST API, ODBC) using the same languages.
- The company is looking for someone who shows an eagerness to learn and who asks concise questions when communicating with teammates.