Big Data Engineer
at Altimetrik
-Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight.
-Experience in developing lambda functions with AWS Lambda.
-
Expertise with Spark/PySpark
– Candidate should be hands on with PySpark code and should be able to do transformations with Spark
-Should be able to code in Python and Scala.
-
Snowflake experience will be a plus
Similar jobs
RESPONSIBILITIES:
Requirement understanding and elicitation, analyze, data/workflows, contribute to product
project and Proof of concept (POC)
Contribute to prepare design documents and effort estimations.
Develop AI/ML Models using best in-class ML models.
Building, testing, and deploying AI/ML solutions.
Work with Business Analysts and Product Managers to assist with defining functional user
stories.
Ensure deliverables across teams are of high quality and clearly documented.
Recommend best ML practices/Industry standards for any ML use case.
Proactively take up R and D and recommend solution options for any ML use case.
REQUIREMENTS:
Required Skills
Overall experience of 4 to 7 Years working on AI/ML framework development
Good programming knowledge in Python is must.
Good Knowledge of R and SAS is desired.
Good hands on and working knowledge SQL, Data Model, CRISP-DM.
Proficiency with Uni/multivariate statistics, algorithm design, and predictive AI/ML modelling.
Strong knowledge of machine learning algorithms, linear regression, logistic regression, KNN,
Random Forest, Support Vector Machines and Natural Language Processing.
Experience with NLP and deep neural networks using synthetic and artificial data.
Involved in different phases of SDLC and have good working exposure on different SLDC’s like
Agile Methodologies.
-
Deliver plugins for our Python-based ETL pipelines
-
Deliver Python microservices for provisioning and managing cloud infrastructure
-
Implement algorithms to analyse large data sets
-
Draft design documents that translate requirements into code
-
Effectively manage challenges associated with handling large volumes of data working to tight deadlines
-
Manage expectations with internal stakeholders and context-switch in a fast-paced environment
-
Thrive in an environment that uses AWS and Elasticsearch extensively
-
Keep abreast of technology and contribute to the engineering strategy
-
Champion best development practices and provide mentorship to others
-
First and foremost you are a Python developer, experienced with the Python Data stack
-
You love and care about data
-
Your code is an artistic manifest reflecting how elegant you are in what you do
-
You feel sparks of joy when a new abstraction or pattern arises from your code
-
You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
-
You are a continuous learner
-
You have a natural willingness to automate tasks
-
You have critical thinking and an eye for detail
-
Excellent ability and experience of working to tight deadlines
-
Sharp analytical and problem-solving skills
-
Strong sense of ownership and accountability for your work and delivery
-
Excellent written and oral communication skills
-
Mature collaboration and mentoring abilities
-
We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
-
Delivering complex software, ideally in a FinTech setting
-
Experience with CI/CD tools such as Jenkins, CircleCI
-
Experience with code versioning (git / mercurial / subversion)
Intuitive cloud (http://www.intuitive.cloud">www.intuitive.cloud) is one of the fastest growing top-tier Cloud Solutions and SDx Engineering solution and service company supporting 80+ Global Enterprise Customer across Americas, Europe and Middle East.
Intuitive is a recognized professional and manage service partner for core superpowers in cloud(public/ Hybrid), security, GRC, DevSecOps, SRE, Application modernization/ containers/ K8 -as-a- service and cloud application delivery.
Data Engineering:
- 9+ years’ experience as data engineer.
- Must have 4+ Years in implementing data engineering solutions with Databricks.
- This is hands on role building data pipelines using Databricks. Hands-on technical experience with Apache Spark.
- Must have deep expertise in one of the programming languages for data processes (Python, Scala). Experience with Python, PySpark, Hadoop, Hive and/or Spark to write data pipelines and data processing layers
- Must have worked with relational databases like Snowflake. Good SQL experience for writing complex SQL transformation.
- Performance Tuning of Spark SQL running on S3/Data Lake/Delta Lake/ storage and Strong Knowledge on Databricks and Cluster Configurations.
- Hands on architectural experience
- Nice to have Databricks administration including security and infrastructure features of Databricks.
Strong knowledge in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, text mining, etc.
Sound Knowlegde querying databases and using statistical computer languages: R, Python, SQL, etc.
Strong understanding creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
We are currently looking for a Junior Data Scientist to join our growing Data Science team in Panchkula. As a Jr. Data Scientist, you will work closely with the Head of Data Science and a variety of cross-functional teams to identify opportunities to enhance the customer journey, reduce churn, improve user retention, and drive revenue.
Experience Required
- Medium to Expert level proficiency in either R or Python.
- Expert level proficiency in SQL scripting for RDBMS and NoSQL DBs (especially MongoDB)
- Tracking and insights on key metrics around User Journey, User Retention, Churn Modelling and Prediction, etc.
- Medium-to-Highly skilled in data-structures and ML algorithms, with the ability to create efficient solutions to complex problems.
- Experience of working on an end-to-end data science pipeline: problem scoping, data gathering, EDA, modeling, insights, visualizations, monitoring and maintenance.
- Medium-to-Proficient in creating beautiful Tableau dashboards.
- Problem-solving: Ability to break the problem into small parts and apply relevant techniques to drive the required outcomes.
- Intermediate to advanced knowledge of machine learning, probability theory, statistics, and algorithms. You will be required to discuss and use various algorithms and approaches on a daily basis.
- Proficient in at least a few of the following: regression, Bayesian methods, tree-based learners, SVM, RF, XGBOOST, time series modelling, GLM, GLMM, clustering, Deep learning etc.
Good to Have
- Experience in one of the upcoming technologies like deep learning, recommender systems, etc.
- Experience of working in the Gaming domain
- Marketing analytics, cross-sell, up-sell, campaign analytics, fraud detection
- Experience in building and maintaining Data Warehouses in AWS would be a big plus!
Benefits
- PF and gratuity
- Working 5 days a week
- Paid leaves (CL, SL, EL, ML) and holidays
- Parties, festivals, birthday celebrations, etc
- Equability: absence of favouritism in hiring & promotion
along with metrics to track their progress
Managing available resources such as hardware, data, and personnel so that deadlines
are met
Analysing the ML algorithms that could be used to solve a given problem and ranking
them by their success probability
Exploring and visualizing data to gain an understanding of it, then identifying
differences in data distribution that could affect performance when deploying the model
in the real world
Verifying data quality, and/or ensuring it via data cleaning
Supervising the data acquisition process if more data is needed
Defining validation strategies
Defining the pre-processing or feature engineering to be done on a given dataset
Defining data augmentation pipelines
Training models and tuning their hyper parameters
Analysing the errors of the model and designing strategies to overcome them
Deploying models to production