AWS Data Engineer
- Desire to explore new technology and break new ground.
- Are passionate about Open Source technology, continuous learning, and innovation.
- Have the problem-solving skills, grit, and commitment to complete challenging work assignments and meet deadlines.
Qualifications
- Engineer enterprise-class, large-scale deployments, and deliver Cloud-based Serverless solutions to our customers.
- You will work in a fast-paced environment with leading microservice and cloud technologies, and continue to develop your all-around technical skills.
- Participate in code reviews and provide meaningful feedback to other team members.
- Create technical documentation.
- Develop thorough Unit Tests to ensure code quality.
Skills and Experience
- Advanced skills in troubleshooting and tuning AWS Lambda functions developed with Java and/or Python.
- Experience with event-driven architecture design patterns and practices
- Experience in database design and architecture principles and strong SQL abilities
- Message brokers like Kafka and Kinesis
- Experience with Hadoop, Hive, and Spark (either PySpark or Scala)
- Demonstrated experience owning enterprise-class applications and delivering highly available distributed, fault-tolerant, globally accessible services at scale.
- Good understanding of distributed systems.
- Candidates will be self-motivated and display initiative, ownership, and flexibility.
Preferred Qualifications
- AWS Lambda function development experience with Java and/or Python.
- Lambda triggers such as SNS, SES, or cron.
- Databricks
- Cloud development experience with AWS services, including:
- IAM
- S3
- EC2
- AWS CLI
- API Gateway
- ECR
- CloudWatch
- Glue
- Kinesis
- DynamoDB
- Java 8 or higher
- ETL data pipeline building
- Data Lake Experience
- Python
- Docker
- MongoDB or similar NoSQL DB.
- Relational Databases (e.g., MySQL, PostgreSQL, Oracle, etc.).
- Gradle and/or Maven.
- JUnit
- Git
- Scrum
- Experience with Unix and/or macOS.
- Immediate Joiners
Nice to have:
- AWS / GCP / Azure Certification.
- Cloud development experience with Google Cloud or Azure
About Advanced technology to Solve Business Problems.( A1)
Similar jobs
● Knowledge of Excel,SQL and writing code in python.
● Experience with Reporting and Business Intelligence tools like Tableau, Metabase.
● Exposure with distributed analytics processing technologies is desired (e.g. Hive, Spark).
● Experience with Clevertap, Mixpanel, Amplitude, etc.
● Excellent communication skills.
● Background in market research and project management.
● Attention to detail.
● Problem-solving aptitude.
good exposure to concepts and/or technology across the broader spectrum. Enterprise Risk Technology
covers a variety of existing systems and green-field projects.
A Full stack Hadoop development experience with Scala development
A Full stack Java development experience covering Core Java (including JDK 1.8) and good understanding
of design patterns.
Requirements:-
• Strong hands-on development in Java technologies.
• Strong hands-on development in Hadoop technologies like Spark, Scala and experience on Avro.
• Participation in product feature design and documentation
• Requirement break-up, ownership and implantation.
• Product BAU deliveries and Level 3 production defects fixes.
Qualifications & Experience
• Degree holder in numerate subject
• Hands on Experience on Hadoop, Spark, Scala, Impala, Avro and messaging like Kafka
• Experience across a core compiled language – Java
• Proficiency in Java related frameworks like Springs, Hibernate, JPA
• Hands on experience in JDK 1.8 and strong skillset covering Collections, Multithreading with
For internal use only
For internal use only
experience working on Distributed applications.
• Strong hands-on development track record with end-to-end development cycle involvement
• Good exposure to computational concepts
• Good communication and interpersonal skills
• Working knowledge of risk and derivatives pricing (optional)
• Proficiency in SQL (PL/SQL), data modelling.
• Understanding of Hadoop architecture and Scala program language is a good to have.
BRIEF DESCRIPTION:
At-least 1 year of Python, Spark, SQL, data engineering experience
Primary Skillset: PySpark, Scala/Python/Spark, Azure Synapse, S3, RedShift/Snowflake
Relevant Experience: Legacy ETL job Migration to AWS Glue / Python & Spark combination
ROLE SCOPE:
Reverse engineer the existing/legacy ETL jobs
Create the workflow diagrams and review the logic diagrams with Tech Leads
Write equivalent logic in Python & Spark
Unit test the Glue jobs and certify the data loads before passing to system testing
Follow the best practices, enable appropriate audit & control mechanism
Analytically skillful, identify the root causes quickly and efficiently debug issues
Take ownership of the deliverables and support the deployments
REQUIREMENTS:
Create data pipelines for data integration into Cloud stacks eg. Azure Synapse
Code data processing jobs in Azure Synapse Analytics, Python, and Spark
Experience in dealing with structured, semi-structured, and unstructured data in batch and real-time environments.
Should be able to process .json, .parquet and .avro files
PREFERRED BACKGROUND:
Tier1/2 candidates from IIT/NIT/IIITs
However, relevant experience, learning attitude takes precedence
Job Title : Analyst / Sr. Analyst – Data Science Developer - Python
Exp : 2 to 5 yrs
Loc : B’lore / Hyd / Chennai
NP: Candidate should join us in 2 months (Max) / Immediate Joiners Pref.
About the role:
We are looking for an Analyst / Senior Analyst who works in the analytics domain with a strong python background.
Desired Skills, Competencies & Experience:
• • 2-4 years of experience in working in the analytics domain with a strong python background. • • Visualization skills in python with plotly, matplotlib, seaborn etc. Ability to create customized plots using such tools. • • Ability to write effective, scalable and modular code. Should be able to understand, test and debug existing python project modules quickly and contribute to that. • • Should be familiarized with Git workflows.
Good to Have: • • Familiarity with cloud platforms like AWS, AzureML, Databricks, GCP etc. • • Understanding of shell scripting, python package development. • • Experienced with Python data science packages like Pandas, numpy, sklearn etc. • • ML model building and evaluation experience using sklearn.
|
Good Python developers / Data Engineers / Devops engineers
Exp: 1-8years
Work loc: Chennai. / Remote support
Object-oriented languages (e.g. Python, PySpark, Java, C#, C++ ) and frameworks (e.g. J2EE or .NET)
- Must have 5-8 years of experience in handling data
- Must have the ability to interpret large amounts of data and to multi-task
- Must have strong knowledge of and experience with programming (Python), Linux/Bash scripting, databases(SQL, etc)
- Must have strong analytical and critical thinking to resolve business problems using data and tech
- Must have domain familiarity and interest of – Cloud technologies (GCP/Azure Microsoft/ AWS Amazon), open-source technologies, Enterprise technologies
- Must have the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy.
- Must have good communication skills
- Working knowledge/exposure to ElasticSearch, PostgreSQL, Athena, PrestoDB, Jupyter Notebook
● Working hand in hand with application developers and data scientists to help build softwares that scales in terms of performance and stability Skills ● 3+ years of experience managing large scale data infrastructure and building data pipelines/ data products. ● Proficient in - Any data engineering technologies and proficient in AWS data engineering technologies is plus. ● Language - python, scala or go ● Experience in working with real time streaming systems Experience in handling millions of events per day Experience in developing and deploying data models on Cloud ● Bachelors/Masters in Computer Science or equivalent experience Ability to learn and use skills in new technologies
About us
DataWeave provides Retailers and Brands with “Competitive Intelligence as a Service” that enables them to take key decisions that impact their revenue. Powered by AI, we provide easily consumable and actionable competitive intelligence by aggregating and analyzing billions of publicly available data points on the Web to help businesses develop data-driven strategies and make smarter decisions.
Data Science@DataWeave
We the Data Science team at DataWeave (called Semantics internally) build the core machine learning backend and structured domain knowledge needed to deliver insights through our data products. Our underpinnings are: innovation, business awareness, long term thinking, and pushing the envelope. We are a fast paced labs within the org applying the latest research in Computer Vision, Natural Language Processing, and Deep Learning to hard problems in different domains.
How we work?
It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale!
What do we offer?
● Some of the most challenging research problems in NLP and Computer Vision. Huge text and image
datasets that you can play with!
● Ability to see the impact of your work and the value you're adding to our customers almost immediately.
● Opportunity to work on different problems and explore a wide variety of tools to figure out what really
excites you.
● A culture of openness. Fun work environment. A flat hierarchy. Organization wide visibility. Flexible
working hours.
● Learning opportunities with courses and tech conferences. Mentorship from seniors in the team.
● Last but not the least, competitive salary packages and fast paced growth opportunities.
Who are we looking for?
The ideal candidate is a strong software developer or a researcher with experience building and shipping production grade data science applications at scale. Such a candidate has keen interest in liaising with the business and product teams to understand a business problem, and translate that into a data science problem.
You are also expected to develop capabilities that open up new business productization opportunities.
We are looking for someone with a Master's degree and 1+ years of experience working on problems in NLP or Computer Vision.
If you have 4+ years of relevant experience with a Master's degree (PhD preferred), you will be considered for a senior role.
Key problem areas
● Preprocessing and feature extraction noisy and unstructured data -- both text as well as images.
● Keyphrase extraction, sequence labeling, entity relationship mining from texts in different domains.
● Document clustering, attribute tagging, data normalization, classification, summarization, sentiment
analysis.
● Image based clustering and classification, segmentation, object detection, extracting text from images,
generative models, recommender systems.
● Ensemble approaches for all the above problems using multiple text and image based techniques.
Relevant set of skills
● Have a strong grasp of concepts in computer science, probability and statistics, linear algebra, calculus,
optimization, algorithms and complexity.
● Background in one or more of information retrieval, data mining, statistical techniques, natural language
processing, and computer vision.
● Excellent coding skills on multiple programming languages with experience building production grade
systems. Prior experience with Python is a bonus.
● Experience building and shipping machine learning models that solve real world engineering problems.
Prior experience with deep learning is a bonus.
● Experience building robust clustering and classification models on unstructured data (text, images, etc).
Experience working with Retail domain data is a bonus.
● Ability to process noisy and unstructured data to enrich it and extract meaningful relationships.
● Experience working with a variety of tools and libraries for machine learning and visualization, including
numpy, matplotlib, scikit-learn, Keras, PyTorch, Tensorflow.
● Use the command line like a pro. Be proficient in Git and other essential software development tools.
● Working knowledge of large-scale computational models such as MapReduce and Spark is a bonus.
● Be a self-starter—someone who thrives in fast paced environments with minimal ‘management’.
● It's a huge bonus if you have some personal projects (including open source contributions) that you work
on during your spare time. Show off some of your projects you have hosted on GitHub.
Role and responsibilities
● Understand the business problems we are solving. Build data science capability that align with our product strategy.
● Conduct research. Do experiments. Quickly build throw away prototypes to solve problems pertaining to the Retail domain.
● Build robust clustering and classification models in an iterative manner that can be used in production.
● Constantly think scale, think automation. Measure everything. Optimize proactively.
● Take end to end ownership of the projects you are working on. Work with minimal supervision.
● Help scale our delivery, customer success, and data quality teams with constant algorithmic improvements and automation.
● Take initiatives to build new capabilities. Develop business awareness. Explore productization opportunities.
● Be a tech thought leader. Add passion and vibrance to the team. Push the envelope. Be a mentor to junior members of the team.
● Stay on top of latest research in deep learning, NLP, Computer Vision, and other relevant areas.