BRIEF DESCRIPTION:
At-least 1 year of Python, Spark, SQL, data engineering experience
Primary Skillset: PySpark, Scala/Python/Spark, Azure Synapse, S3, RedShift/Snowflake
Relevant Experience: Legacy ETL job Migration to AWS Glue / Python & Spark combination
ROLE SCOPE:
Reverse engineer the existing/legacy ETL jobs
Create the workflow diagrams and review the logic diagrams with Tech Leads
Write equivalent logic in Python & Spark
Unit test the Glue jobs and certify the data loads before passing to system testing
Follow the best practices, enable appropriate audit & control mechanism
Analytically skillful, identify the root causes quickly and efficiently debug issues
Take ownership of the deliverables and support the deployments
REQUIREMENTS:
Create data pipelines for data integration into Cloud stacks eg. Azure Synapse
Code data processing jobs in Azure Synapse Analytics, Python, and Spark
Experience in dealing with structured, semi-structured, and unstructured data in batch and real-time environments.
Should be able to process .json, .parquet and .avro files
PREFERRED BACKGROUND:
Tier1/2 candidates from IIT/NIT/IIITs
However, relevant experience, learning attitude takes precedence
About RedSeer Consulting
Similar jobs
Machine Learning Developer
1. Working on supervised and unsupervised learning algorithms
2. Developing deep learning and machine learning algorithms
3. Working on live projects on data analytics
Survey Analytics Analyst
at Leading Management Consulting Firm
We are looking for candidates who have demonstrated both a strong business sense and deep understanding of the quantitative foundations of modelling.
• Excellent analytical and problem-solving skills, including the ability to disaggregate issues, identify root causes and recommend solutions
• Statistical programming software experience in SPSS and comfortable working with large data sets.
• R, Python, SAS & SQL are preferred but not a mandate
• Excellent time management skills
• Good written and verbal communication skills; understanding of both written and spoken English
• Strong interpersonal skills
• Ability to act autonomously, bringing structure and organization to work
• Creative and action-oriented mindset
• Ability to interact in a fluid, demanding and unstructured environment where priorities evolve constantly, and methodologies are regularly challenged
• Ability to work under pressure and deliver on tight deadlines
Qualifications and Experience:
• Graduate degree in: Statistics/Economics/Econometrics/Computer
Science/Engineering/Mathematics/MBA (with a strong quantitative background) or
equivalent
• Strong track record work experience in the field of business intelligence, market
research, and/or Advanced Analytics
• Knowledge of data collection methods (focus groups, surveys, etc.)
• Knowledge of statistical packages (SPSS, SAS, R, Python, or similar), databases,
and MS Office (Excel, PowerPoint, Word)
• Strong analytical and critical thinking skills
• Industry experience in Consumer Experience/Healthcare a plus
a. 4+ years of experience in Azure development using PySpark (Databricks) and Synapse.
b. Real world project experience in using ADF to bring in data from on-premise applications into Azure using ADF pipelines.
c. Strong working experience on transforming data using PySpark on Databricks.
d. Experience with Synapse database and transformations within Synapse
e. Strong knowledge of SQL.
f. Experience in working with multiple kinds of source systems (e.g. HANA, Teradata, MS SQL Server, flat files, JSON, etc.)
g. Strong communication skills.
h. Experience in working on Agile
● Proficiency in Linux.
● Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
● Must have SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra,
and Athena.
● Must have experience with Python/Scala.
● Must have experience with Big Data technologies like Apache Spark.
● Must have experience with Apache Airflow.
● Experience with data pipelines and ETL tools like AWS Glue.
Job roles and responsibilities:
- Minimum 3 to 4 years hands-on designing, building and operationalizing large-scale enterprise data solutions and applications using GCP data and analytics services like, Cloud DataProc, Cloud Dataflow, Cloud BigQuery, Cloud PubSub, Cloud Functions.
- Hands-on experience in analyzing, re-architecting and re-platforming on-premise data warehouses to data platforms on GCP cloud using GCP/3rd party services.
- Experience in designing and building data pipelines within a hybrid big data architecture using Java, Python, Scala & GCP Native tools.
- Hands-on Orchestrating and scheduling Data pipelines using Composer, Airflow.
- Experience in performing detail assessments of current state data platforms and creating an appropriate transition path to GCP cloud
Technical Skills Required:
- Strong Experience in GCP data and Analytics Services
- Working knowledge on Big data ecosystem-Hadoop, Spark, Hbase, Hive, Scala etc
- Experience in building and optimizing data pipelines in Spark
- Strong skills in Orchestration of workflows with Composer/Apache Airflow
- Good knowledge on object-oriented scripting languages: Python (must have) and Java or C++.
- Good to have knowledge in building CI/CD pipelines with GCP Cloud Build and native GCP services
- The ideal candidate is adept at using large data sets to find opportunities for product and process optimization and using models to test the effectiveness of different courses of action.
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies.
- Assess the effectiveness and accuracy of new data sources and data gathering techniques.
- Develop custom data models and algorithms to apply to data sets.
- Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes.
- Develop company A/B testing framework and test model quality.
- Develop processes and tools to monitor and analyze model performance and data accuracy.
Roles & Responsibilities
- Experience using statistical languages (R, Python, SQL, etc.) to manipulate data and draw insights from large data sets.
- Experience working with and creating data architectures.
- Looking for someone with 3-7 years of experience manipulating data sets and building statistical models
- Has a Bachelor's, Master's in Computer Science or another quantitative field
- Knowledge and experience in statistical and data mining techniques :
- GLM/Regression, Random Forest, Boosting, Trees, text mining,social network analysis, etc.
- Experience querying databases and using statistical computer languages :R, Python, SQL, etc.
- Experience creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees,neural networks, etc.
- Experience with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, Gurobi, MySQL, etc.
- Experience visualizing/presenting data for stakeholders using: Periscope, Business Objects, D3, ggplot, etc.
- Participate in full machine learning Lifecycle including data collection, cleaning, preprocessing to training models, and deploying them to Production.
- Discover data sources, get access to them, ingest them, clean them up, and make them “machine learning ready”.
- Work with data scientists to create and refine features from the underlying data and build pipelines to train and deploy models.
- Partner with data scientists to understand and implement machine learning algorithms.
- Support A/B tests, gather data, perform analysis, draw conclusions on the impact of your models.
- Work cross-functionally with product managers, data scientists, and product engineers, and communicate results to peers and leaders.
- Mentor junior team members
Who we have in mind:
- Graduate in Computer Science or related field, or equivalent practical experience.
- 4+ years of experience in software engineering with 2+ years of direct experience in the machine learning field.
- Proficiency with SQL, Python, Spark, and basic libraries such as Scikit-learn, NumPy, Pandas.
- Familiarity with deep learning frameworks such as TensorFlow or Keras
- Experience with Computer Vision (OpenCV), NLP frameworks (NLTK, SpaCY, BERT).
- Basic knowledge of machine learning techniques (i.e. classification, regression, and clustering).
- Understand machine learning principles (training, validation, etc.)
- Strong hands-on knowledge of data query and data processing tools (i.e. SQL)
- Software engineering fundamentals: version control systems (i.e. Git, Github) and workflows, and ability to write production-ready code.
- Experience deploying highly scalable software supporting millions or more users
- Experience building applications on cloud (AWS or Azure)
- Experience working in scrum teams with Agile tools like JIRA
- Strong oral and written communication skills. Ability to explain complex concepts and technical material to non-technical users
- 6+ months of proven experience as a Data Scientist or Data Analyst
- Understanding of machine-learning and operations research
- Extensive knowledge of R, SQL and Excel
- Analytical mind and business acumen
- Strong Statistical understanding
- Problem-solving aptitude
- BSc/BA in Computer Science, Engineering or relevant field; graduate degree in Data Science or other quantitative field is preferred
bachelor’s degree or equivalent experience
● Knowledge of database fundamentals and fluency in advanced SQL, including concepts
such as windowing functions
● Knowledge of popular scripting languages for data processing such as Python, as well as
familiarity with common frameworks such as Pandas
● Experience building streaming ETL pipelines with tools such as Apache Flink, Apache
Beam, Google Cloud Dataflow, DBT and equivalents
● Experience building batch ETL pipelines with tools such as Apache Airflow, Spark, DBT, or
custom scripts
● Experience working with messaging systems such as Apache Kafka (and hosted
equivalents such as Amazon MSK), Apache Pulsar
● Familiarity with BI applications such as Tableau, Looker, or Superset
● Hands on coding experience in Java or Scala
Big Data Developer
at Intelliswift