Big Data Engineer + Spark
Roles and Responsibilities
About Multinational Company providing energy & Automation digital
Similar jobs
Lead Data Engineer
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
Job responsibilities
· You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems
· You will partner with teammates to create complex data processing pipelines in order to solve our clients' most ambitious challenges
· You will collaborate with Data Scientists in order to design scalable implementations of their models
· You will pair to write clean and iterative code based on TDD
· Leverage various continuous delivery practices to deploy, support and operate data pipelines
· Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
· Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
· Create data models and speak to the tradeoffs of different modeling approaches
· On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product
· Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
· Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
Job qualifications Technical skills
· You are equally happy coding and leading a team to implement a solution
· You have a track record of innovation and expertise in Data Engineering
· You're passionate about craftsmanship and have applied your expertise across a range of industries and organizations
· You have a deep understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
· You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
· Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
· You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
· You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
· Working with data excites you: you have created Big data architecture, you can build and operate data pipelines, and maintain data storage, all within distributed systems
Professional skills
· Advocate your data engineering expertise to the broader tech community outside of Thoughtworks, speaking at conferences and acting as a mentor for more junior-level data engineers
· You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
· An interest in coaching others, sharing your experience and knowledge with teammates
· You enjoy influencing others and always advocate for technical excellence while being open to change when needed
Senior Data Scientist
Your goal: To improve the education process and improve the student experience through data.
The organization: Data Science for Learning Services Data Science and Machine Learning are core to Chegg. As a Student Hub, we want to ensure that students discover the full breadth of learning solutions we have to offer to get full value on their learning time with us. To create the most relevant and engaging interactions, we are solving a multitude of machine learning problems so that we can better model student behavior, link various types of content, optimize workflows, and provide a personalized experience.
The Role: Senior Data Scientist
As a Senior Data Scientist, you will focus on conducting research and development in NLP and ML. You will be responsible for writing production-quality code for data product solutions at Chegg. You will lead in identification and implementation of key projects to process data and knowledge discovery.
Responsibilities:
• Translate product requirements into AIML/NLP solutions
• Be able to think out of the box and be able to design novel solutions for the problem at hand
• Write production-quality code
• Be able to design data and annotation collection strategies
• Identify key evaluation metrics and release requirements for data products
• Integrate new data and design workflows
• Innovate, share, and educate team members and community
Requirements:
• Working experience in machine learning, NLP, recommendation systems, experimentation, or related fields, with a specialization in NLP • Working experience on large language models that cater to multiple tasks such as text generation, Q&A, summarization, translation etc is highly preferred
• Knowledge on MLOPs and deployment pipelines is a must
• Expertise on supervised, unsupervised and reinforcement ML algorithms.
• Strong programming skills in Python
• Top data wrangling skills using SQL or NOSQL queries
• Experience using containers to deploy real-time prediction services
• Passion for using technology to help students
• Excellent communication skills
• Good team player and a self-starter
• Outstanding analytical and problem-solving skills
• Experience working with ML pipeline products such as AWS Sagemaker, Google ML, or Databricks a plus.
Why do we exist?
Students are working harder than ever before to stabilize their future. Our recent research study called State of the Student shows that nearly 3 out of 4 students are working to support themselves through college and 1 in 3 students feel pressure to spend more than they can afford. We founded our business on provided affordable textbook rental options to address these issues. Since then, we’ve expanded our offerings to supplement many facets of higher educational learning through Chegg Study, Chegg Math, Chegg Writing, Chegg Internships, Thinkful Online Learning, and more, to support students beyond their college experience. These offerings lower financial concerns for students by modernizing their learning experience. We exist so students everywhere have a smarter, faster, more affordable way to student.
Video Shorts
Life at Chegg: https://jobs.chegg.com/Video-Shorts-Chegg-Services
Certified Great Place to Work!: http://reviews.greatplacetowork.com/chegg
Chegg India: http://www.cheggindia.com/
Chegg Israel: http://insider.geektime.co.il/organizations/chegg
Thinkful (a Chegg Online Learning Service): https://www.thinkful.com/about/#careers
Chegg out our culture and benefits!
http://www.chegg.com/jobs/benefits
https://www.youtube.com/watch?v=YYHnkwiD7Oo
Chegg is an equal-opportunity employer
- We are looking for : Data engineer
- Sprak
- Scala
- Hadoop
N.p - 15 days to 30 Days
Location : Bangalore / Noida
1. Working on supervised and unsupervised learning algorithms
2. Developing deep learning and machine learning algorithms
3. Working on live projects on data analytics
Work Days-
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
Responsibilities Description:
Responsible for the development and implementation of machine learning algorithms and techniques to solve business problems and optimize member experiences. Primary duties may include are but not limited to: Design machine learning projects to address specific business problems determined by consultation with business partners. Work with data-sets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis. Implements batch and real-time model scoring to drive actions. Develops machine learning algorithms to build customized solutions that go beyond standard industry tools and lead to innovative solutions. Develop sophisticated visualization of analysis output for business users.
Experience Requirements:
BS/MA/MS/PhD in Statistics, Computer Science, Mathematics, Machine Learning, Econometrics, Physics, Biostatistics or related Quantitative disciplines. 2-4 years of experience in predictive analytics and advanced expertise with software such as Python, or any combination of education and experience which would provide an equivalent background. Experience in the healthcare sector. Experience in Deep Learning strongly preferred.
Required Technical Skill Set:
- Full cycle of building machine learning solutions,
o Understanding of wide range of algorithms and their corresponding problems to solve
o Data preparation and analysis
o Model training and validation
o Model application to the problem
- Experience using the full open source programming tools and utilities
- Experience in working in end-to-end data science project implementation.
- 2+ years of experience with development and deployment of Machine Learning applications
- 2+ years of experience with NLP approaches in a production setting
- Experience in building models using bagging and boosting algorithms
- Exposure/experience in building Deep Learning models for NLP/Computer Vision use cases preferred
- Ability to write efficient code with good understanding of core Data Structures/algorithms is critical
- Strong python skills following software engineering best practices
- Experience in using code versioning tools like GIT, bit bucket
- Experience in working in Agile projects
- Comfort & familiarity with SQL and Hadoop ecosystem of tools including spark
- Experience managing big data with efficient query program good to have
- Good to have experience in training ML models in tools like Sage Maker, Kubeflow etc.
- Good to have experience in frameworks to depict interpretability of models using libraries like Lime, Shap etc.
- Experience with Health care sector is preferred
- MS/M.Tech or PhD is a plus
We’re looking to hire someone to help scale Machine Learning and NLP efforts at Episource. You’ll work with the team that develops the models powering Episource’s product focused on NLP driven medical coding. Some of the problems include improving our ICD code recommendations , clinical named entity recognition and information extraction from clinical notes.
This is a role for highly technical machine learning & data engineers who combine outstanding oral and written communication skills, and the ability to code up prototypes and productionalize using a large range of tools, algorithms, and languages. Most importantly they need to have the ability to autonomously plan and organize their work assignments based on high-level team goals.
You will be responsible for setting an agenda to develop and ship machine learning models that positively impact the business, working with partners across the company including operations and engineering. You will use research results to shape strategy for the company, and help build a foundation of tools and practices used by quantitative staff across the company.
What you will achieve:
-
Define the research vision for data science, and oversee planning, staffing, and prioritization to make sure the team is advancing that roadmap
-
Invest in your team’s skills, tools, and processes to improve their velocity, including working with engineering counterparts to shape the roadmap for machine learning needs
-
Hire, retain, and develop talented and diverse staff through ownership of our data science hiring processes, brand, and functional leadership of data scientists
-
Evangelise machine learning and AI internally and externally, including attending conferences and being a thought leader in the space
-
Partner with the executive team and other business leaders to deliver cross-functional research work and models
Required Skills:
-
Strong background in classical machine learning and machine learning deployments is a must and preferably with 4-8 years of experience
-
Knowledge of deep learning & NLP
-
Hands-on experience in TensorFlow/PyTorch, Scikit-Learn, Python, Apache Spark & Big Data platforms to manipulate large-scale structured and unstructured datasets.
-
Experience with GPU computing is a plus.
-
Professional experience as a data science leader, setting the vision for how to most effectively use data in your organization. This could be through technical leadership with ownership over a research agenda, or developing a team as a personnel manager in a new area at a larger company.
-
Expert-level experience with a wide range of quantitative methods that can be applied to business problems.
-
Evidence you’ve successfully been able to scope, deliver and sell your own research in a way that shifts the agenda of a large organization.
-
Excellent written and verbal communication skills on quantitative topics for a variety of audiences: product managers, designers, engineers, and business leaders.
-
Fluent in data fundamentals: SQL, data manipulation using a procedural language, statistics, experimentation, and modeling
Qualifications
-
Professional experience as a data science leader, setting the vision for how to most effectively use data in your organization
-
Expert-level experience with machine learning that can be applied to business problems
-
Evidence you’ve successfully been able to scope, deliver and sell your own work in a way that shifts the agenda of a large organization
-
Fluent in data fundamentals: SQL, data manipulation using a procedural language, statistics, experimentation, and modeling
-
Degree in a field that has very applicable use of data science / statistics techniques (e.g. statistics, applied math, computer science, OR a science field with direct statistics application)
-
5+ years of industry experience in data science and machine learning, preferably at a software product company
-
3+ years of experience managing data science teams, incl. managing/grooming managers beneath you
-
3+ years of experience partnering with executive staff on data topics