Data Engineer
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
About consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
Similar jobs
About the company:
VakilSearch is a technology-driven platform, offering services that cover the legal needs of startups and established businesses. Some of our services include incorporation, government registrations & filings, accounting, documentation and annual compliances. In addition, we offer a wide range of services to individuals, such as property agreements and tax filings. Our mission is to provide one-click access to individuals and businesses for all their legal and professional needs.
You can learn more about us at https://vakilsearch.com/">vakilsearch.com.
About the role:
A successful data analyst needs to have a combination of technical as well leadership skills. A background in Mathematics, Statistics, Computer Science, Information Management can serve as a solid foundation to build your career as a data analyst at VakilSearch.
Why to join Vakilsearch:
- Unlimited opportunities to grow
- Flat hierarchy
- Encouraging environment to unleash your out of box thinking skills
Responsibilities:
- Preparing reports for the stakeholders and the management, enabling them to take important decisions based on various facts and trends.
- Using automated tools to extract data from primary and secondary sources
- Identify and recommend the right product metrics to be analysed and tracked for every feature/problem statement.
- Using statistical tools to identify, analyze, and interpret patterns and trends in complex data sets that could be helpful for the diagnosis and prediction
- Working with programmers, engineers, and management heads to identify process improvement opportunities, propose system modifications, and devise data governance strategies.
Required skills:
- Bachelor’s degree from an accredited university or college in computer science or graduate from data science related program
- Minimum of 0 - 2 years experience in analysing
Data Engineer JD:
- Designing, developing, constructing, installing, testing and maintaining the complete data management & processing systems.
- Building highly scalable, robust, fault-tolerant, & secure user data platform adhering to data protection laws.
- Taking care of the complete ETL (Extract, Transform & Load) process.
- Ensuring architecture is planned in such a way that it meets all the business requirements.
- Exploring new ways of using existing data, to provide more insights out of it.
- Proposing ways to improve data quality, reliability & efficiency of the whole system.
- Creating data models to reduce system complexity and hence increase efficiency & reduce cost.
- Introducing new data management tools & technologies into the existing system to make it more efficient.
- Setting up monitoring and alarming on data pipeline jobs to detect failures and anomalies
What do we expect from you?
- BS/MS in Computer Science or equivalent experience
- 5 years of recent experience in Big Data Engineering.
- Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Zookeeper, Storm, Spark, Airflow and NoSQL systems
- Excellent programming and debugging skills in Java or Python.
- Apache spark, python, hands on experience in deploying ML models
- Has worked on streaming and realtime pipelines
- Experience with Apache Kafka or has worked with any of Spark Streaming, Flume or Storm
Focus Area:
R1 |
Data structure & Algorithms |
R2 |
Problem solving + Coding |
R3 |
Design (LLD) |
2. hands on experience using python, sql, tablaue
3. Data Analyst
About Amagi & Growth
Amagi Corporation is a next-generation media technology company that provides cloud broadcast and targeted advertising solutions to broadcast TV and streaming TV platforms. Amagi enables content owners to launch, distribute and monetize live linear channels on Free-Ad-Supported TV and video services platforms. Amagi also offers 24x7 cloud managed services bringing simplicity, advanced automation, and transparency to the entire broadcast operations. Overall, Amagi supports 500+ channels on its platform for linear channel creation, distribution, and monetization with deployments in over 40 countries. Amagi has offices in New York (Corporate office), Los Angeles, and London, broadcast operations in New Delhi, and our Development & Innovation center in Bangalore. Amagi is also expanding in Singapore, Canada and other countries.
Amagi has seen phenomenal growth as a global organization over the last 3 years. Amagi has been a profitable firm for the last 2 years, and is now looking at investing in multiple new areas. Amagi has been backed by 4 investors - Emerald, Premji Invest, Nadathur and Mayfield. As of the fiscal year ending March 31, 2021, the company witnessed stellar growth in the areas of channel creation, distribution, and monetization, enabling customers to extend distribution and earn advertising dollars while saving up to 40% in cost of operations compared to traditional delivery models. Some key highlights of this include:
· Annual revenue growth of 136%
· 44% increase in customers
· 50+ Free Ad Supported Streaming TV (FAST) platform partnerships and 100+ platform partnerships globally
· 250+ channels added to its cloud platform taking the overall tally to more than 500
· Approximately 2 billion ad opportunities every month supporting OTT ad-insertion for 1000+ channels
· 60% increase in workforce in the US, UK, and India to support strong customer growth (current headcount being 360 full-time employees + Contractors)
· 5-10x growth in ad impressions among top customers
Responsibilities
- Understanding the business requirements so as to formulate the problems to solve and restrict the slice of data to be explored.
- Collecting data from various sources.
- Performing cleansing, processing, and validation on the data subject to analyze, in order to ensure its quality.
- Exploring and visualizing data.
- Performing statistical analysis and experiments to derive business insights.
- Clearly communicating the findings from the analysis to turn information into something actionable through reports, dashboards, and/or presentations.
Skills
- Experience solving problems in the project’s business domain.
- Experience with data integration from multiple sources
- Proficiency in at least one query language, especially SQL.
- Working experience with NoSQL databases, such as MongoDB and Elasticsearch.
- Working experience with popular statistical and machine learning techniques, such as clustering, linear regression, KNN, decision trees, etc.
- Good scripting skills using Python, R or any other relevant language
- Proficiency in at least one data visualization tool, such as Matplotlib, Plotly, D3.js, ggplot, etc.
- Great communication skills.
Preferred Education & Experience:
-
Bachelor’s or master’s degree in Computer Engineering, Computer Science, Computer Applications, Mathematics, Statistics or related technical field or equivalent practical experience. Relevant experience of at least 3 years in lieu of above if from a different stream of education.
-
Well-versed in and 5+ years of hands-on demonstrable experience with:
▪ Data Analysis & Data Modeling
▪ Database Design & Implementation
▪ Database Performance Tuning & Optimization
▪ PL/pgSQL & SQL -
5+ years of hands-on development experience in Relational Database (PostgreSQL/SQL Server/Oracle).
-
5+ years of hands-on development experience in SQL, PL/PgSQL, including stored procedures, functions, triggers, and views.
-
Hands-on experience with demonstrable working experience in Database Design Principles, SQL Query Optimization Techniques, Index Management, Integrity Checks, Statistics, and Isolation levels
-
Hands-on experience with demonstrable working experience in Database Read & Write Performance Tuning & Optimization.
-
Knowledge and Experience working in Domain Driven Design (DDD) Concepts, Object Oriented Programming System (OOPS) Concepts, Cloud Architecture Concepts, NoSQL Database Concepts are added values
-
Knowledge and working experience in Oil & Gas, Financial, & Automotive Domains is a plus
-
Hands-on development experience in one or more NoSQL data stores such as Cassandra, HBase, MongoDB, DynamoDB, Elastic Search, Neo4J, etc. a plus.
bachelor’s degree or equivalent experience
● Knowledge of database fundamentals and fluency in advanced SQL, including concepts
such as windowing functions
● Knowledge of popular scripting languages for data processing such as Python, as well as
familiarity with common frameworks such as Pandas
● Experience building streaming ETL pipelines with tools such as Apache Flink, Apache
Beam, Google Cloud Dataflow, DBT and equivalents
● Experience building batch ETL pipelines with tools such as Apache Airflow, Spark, DBT, or
custom scripts
● Experience working with messaging systems such as Apache Kafka (and hosted
equivalents such as Amazon MSK), Apache Pulsar
● Familiarity with BI applications such as Tableau, Looker, or Superset
● Hands on coding experience in Java or Scala
Scala/Akka, Microservices, Akka streams, Play, Kafka, NoSQL. Java/J2EE, Spring, Architecture & Design, Enterprise integration ML/Data analytics, BigData technologies, Python/R programming, ML Algorithms/data models, ML tools(H2O/Tensorflow/Prediction.io), Realtime dashboards & batch, cluster compute (Spark/Akka), Distributed Data platforms (HDFS, Elastic search, Splunk, Casandra), Kafka with cross DC replication, NoSQL & In-memory DB