Data Engineer
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
About consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
Similar jobs
Job Title: Data Analyst with Python
Experience: 4+ years
Location: Mumbai
Working Mode: Onsite
Primary Skills: Python, Data Analysis of RDs, FDs, Saving Accounts, Banking domain, Risk Consultant, Car Loan Model, Cross-sell model
Qualification: Any graduation
Job Description
1. With 4 to 5 years of banking analytics experience
2. Data Analyst profile with good domain understanding
3. Good with Python and worked on Liabilities (Mandatory)
4. For Liabilities Should have worked on (Saving A/c, FD,RD) (Mandatory)
5. Good with stakeholder management and requirement gathering
6. Only from banking industry
engineering
2. Preferably should have done some project or internship related to the field
3. Knowledge of SQL is a plus
4. A deep desire to learn new things and be a part of a vibrant start-up.
5. You will have a lot of freehand and this comes with immense responsibility - so it
is expected that you will be willing to master new things that come along!
Job Description:
1. Design and build a pipeline to train models for NLP problems like Classification,
NER
2. Develop APIs that showcase our models' capabilities and enable third-party
integrations
3. Work across a microservices architecture that processes thousands of
documents per day.
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
TOP 3 SKILLS
Python (Language)
Spark Framework
Spark Streaming
Docker/Jenkins/ Spinakar
AWS
Hive Queries
He/She should be good coder.
Preff: - Airflow
Must have experience: -
Python
Spark framework and streaming
exposure to Machine Learning Lifecycle is mandatory.
Project:
This is searching domain project. Any searching activity which is happening on website this team create the model for the same, they create sorting/scored model for any search. This is done by the data
scientist This team is working more on the streaming side of data, the candidate would work extensively on Spark streaming and there will be a lot of work in Machine Learning.
INTERVIEW INFORMATION
3-4 rounds.
1st round based on data engineering batching experience.
2nd round based on data engineering streaming experience.
3rd round based on ML lifecycle (3rd round can be a techno-functional round based on previous
feedbacks otherwise 4th round will be a functional round if required.
Senior Data Scientist
Your goal: To improve the education process and improve the student experience through data.
The organization: Data Science for Learning Services Data Science and Machine Learning are core to Chegg. As a Student Hub, we want to ensure that students discover the full breadth of learning solutions we have to offer to get full value on their learning time with us. To create the most relevant and engaging interactions, we are solving a multitude of machine learning problems so that we can better model student behavior, link various types of content, optimize workflows, and provide a personalized experience.
The Role: Senior Data Scientist
As a Senior Data Scientist, you will focus on conducting research and development in NLP and ML. You will be responsible for writing production-quality code for data product solutions at Chegg. You will lead in identification and implementation of key projects to process data and knowledge discovery.
Responsibilities:
• Translate product requirements into AIML/NLP solutions
• Be able to think out of the box and be able to design novel solutions for the problem at hand
• Write production-quality code
• Be able to design data and annotation collection strategies
• Identify key evaluation metrics and release requirements for data products
• Integrate new data and design workflows
• Innovate, share, and educate team members and community
Requirements:
• Working experience in machine learning, NLP, recommendation systems, experimentation, or related fields, with a specialization in NLP • Working experience on large language models that cater to multiple tasks such as text generation, Q&A, summarization, translation etc is highly preferred
• Knowledge on MLOPs and deployment pipelines is a must
• Expertise on supervised, unsupervised and reinforcement ML algorithms.
• Strong programming skills in Python
• Top data wrangling skills using SQL or NOSQL queries
• Experience using containers to deploy real-time prediction services
• Passion for using technology to help students
• Excellent communication skills
• Good team player and a self-starter
• Outstanding analytical and problem-solving skills
• Experience working with ML pipeline products such as AWS Sagemaker, Google ML, or Databricks a plus.
Why do we exist?
Students are working harder than ever before to stabilize their future. Our recent research study called State of the Student shows that nearly 3 out of 4 students are working to support themselves through college and 1 in 3 students feel pressure to spend more than they can afford. We founded our business on provided affordable textbook rental options to address these issues. Since then, we’ve expanded our offerings to supplement many facets of higher educational learning through Chegg Study, Chegg Math, Chegg Writing, Chegg Internships, Thinkful Online Learning, and more, to support students beyond their college experience. These offerings lower financial concerns for students by modernizing their learning experience. We exist so students everywhere have a smarter, faster, more affordable way to student.
Video Shorts
Life at Chegg: https://jobs.chegg.com/Video-Shorts-Chegg-Services
Certified Great Place to Work!: http://reviews.greatplacetowork.com/chegg
Chegg India: http://www.cheggindia.com/
Chegg Israel: http://insider.geektime.co.il/organizations/chegg
Thinkful (a Chegg Online Learning Service): https://www.thinkful.com/about/#careers
Chegg out our culture and benefits!
http://www.chegg.com/jobs/benefits
https://www.youtube.com/watch?v=YYHnkwiD7Oo
Chegg is an equal-opportunity employer
Required Experience
· 3+ years of relevant technical experience as a data analyst role
· Intermediate / expert skills with SQL and basic statistics
· Experience in Advance SQL
· Python programming- Added advantage
· Strong problem solving and structuring skills
· Automation in connecting various sources to the data and representing it through various dashboards
· Excellent with Numbers and communicate data points through various reports/templates
· Ability to communicate effectively internally and outside Data Analytics team
· Proactively take up work responsibilities and take adhocs as and when needed
· Ability and desire to take ownership of and initiative for analysis; from requirements clarification to deliverable
· Strong technical communication skills; both written and verbal
· Ability to understand and articulate the "big picture" and simplify complex ideas
· Ability to identify and learn applicable new techniques independently as needed
· Must have worked with various Databases (Relational and Non-Relational) and ETL processes
· Must have experience in handling large volume and data and adhere to optimization and performance standards
· Should have the ability to analyse and provide relationship views of the data from different angles
· Must have excellent Communication skills (written and oral).
· Knowing Data Science is an added advantage
Required Skills
MYSQL, Advanced Excel, Tableau, Reporting and dashboards, MS office, VBA, Analytical skills
Preferred Experience
· Strong understanding of relational database MY SQL etc.
· Prior experience working remotely full-time
· Prior Experience working in Advance SQL
· Experience with one or more BI tools, such as Superset, Tableau etc.
· High level of logical and mathematical ability in Problem Solving
Introduction
http://www.synapsica.com/">Synapsica is a https://yourstory.com/2021/06/funding-alert-synapsica-healthcare-ivycap-ventures-endiya-partners/">series-A funded HealthTech startup founded by alumni from IIT Kharagpur, AIIMS New Delhi, and IIM Ahmedabad. We believe healthcare needs to be transparent and objective while being affordable. Every patient has the right to know exactly what is happening in their bodies and they don't have to rely on cryptic 2 liners given to them as a diagnosis.
Towards this aim, we are building an artificial intelligence enabled cloud based platform to analyse medical images and create v2.0 of advanced radiology reporting. We are backed by IvyCap, Endia Partners, YCombinator and other investors from India, US, and Japan. We are proud to have GE and The Spinal Kinetics as our partners. Here’s a small sample of what we’re building: https://www.youtube.com/watch?v=FR6a94Tqqls">https://www.youtube.com/watch?v=FR6a94Tqqls
Your Roles and Responsibilities
Synapsica is looking for a Principal AI Researcher to lead and drive AI based research and development efforts. Ideal candidate should have extensive experience in Computer Vision and AI Research, either through studies or industrial R&D projects and should be excited to work on advanced exploratory research and development projects in computer vision and machine learning to create the next generation of advanced radiology solutions.
The role involves computer vision tasks including development customization and training of Convolutional Neural Networks (CNNs); application of ML techniques (SVM, regression, clustering etc.), and traditional Image Processing (OpenCV, etc.). The role is research-focused and would involve going through and implementing existing research papers, deep dive of problem analysis, frequent review of results, generating new ideas, building new models from scratch, publishing papers, automating and optimizing key processes. The role will span from real-world data handling to the most advanced methods such as transfer learning, generative models, reinforcement learning, etc., with a focus on understanding quickly and experimenting even faster. Suitable candidate will collaborate closely both with the medical research team, software developers and AI research scientists. The candidate must be creative, ask questions, and be comfortable challenging the status quo. The position is based in our Bangalore office.
Primary Responsibilities
- Interface between product managers and engineers to design, build, and deliver AI models and capabilities for our spine products.
- Formulate and design AI capabilities of our stack with special focus on computer vision.
- Strategize end-to-end model training flow including data annotation, model experiments, model optimizations, model deployment and relevant automations
- Lead teams, engineers, and scientists to envision and build new research capabilities and ensure delivery of our product roadmap.
- Organize regular reviews and discussions.
- Keep the team up-to-date with latest industrial and research updates.
- Publish research and clinical validation papers
Requirements
- 6+ years of relevant experience in solving complex real-world problems at scale using computer vision-based deep learning.
- Prior experience in leading and managing a team.
- Strong problem-solving ability
- Prior experience with Python, cuDNN, Tensorflow, PyTorch, Keras, Caffe (or similar Deep Learning frameworks).
- Extensive understanding of computer vision/image processing applications like object classification, segmentation, object detection etc
- Ability to write custom Convolutional Neural Network Architecture in Pytorch (or similar)
- Background in publishing research papers and/or patents
- Computer Vision and AI Research background in medical domain will be a plus
- Experience of GPU/DSP/other Multi-core architecture programming
- Effective communication with other project members and project stakeholders
- Detail-oriented, eager to learn, acquire new skills
- Prior Project Management and Team Leadership experience
- Ability to plan work and meet the deadline
Role :
- Understand and translate statistics and analytics to address business problems
- Responsible for helping in data preparation and data pull, which is the first step in machine learning
- Should be able to do cut and slice data to extract interesting insights from the data
- Model development for better customer engagement and retention
- Hands on experience in relevant tools like SQL(expert), Excel, R/Python
- Working on strategy development to increase business revenue
Requirements:
- Hands on experience in relevant tools like SQL(expert), Excel, R/Python
- Statistics: Strong knowledge of statistics
- Should able to do data scraping & Data mining
- Be self-driven, and show ability to deliver on ambiguous projects
- An ability and interest in working in a fast-paced, ambiguous and rapidly-changing environment
- Should have worked on Business Projects for an organization, Ex: customer acquisition, Customer retention.
- We are looking for an experienced data engineer to join our team.
- The preprocessing involves ETL tasks, using pyspark, AWS Glue, staging data in parquet formats on S3, and Athena
To succeed in this data engineering position, you should care about well-documented, testable code and data integrity. We have devops who can help with AWS permissions.
We would like to build up a consistent data lake with staged, ready-to-use data, and to build up various scripts that will serve as blueprints for various additional data ingestion and transforms.
If you enjoy setting up something which many others will rely on, and have the relevant ETL expertise, we’d like to work with you.
Responsibilities
- Analyze and organize raw data
- Build data pipelines
- Prepare data for predictive modeling
- Explore ways to enhance data quality and reliability
- Potentially, collaborate with data scientists to support various experiments
Requirements
- Previous experience as a data engineer with the above technologies