Title: Data Engineer (Azure) (Location: Gurgaon/Hyderabad)
Salary: Competitive as per Industry Standard
We are expanding our Data Engineering Team and hiring passionate professionals with extensive
knowledge and experience in building and managing large enterprise data and analytics platforms. We
are looking for creative individuals with strong programming skills, who can understand complex
business and architectural problems and develop solutions. The individual will work closely with the rest
of our data engineering and data science team in implementing and managing Scalable Smart Data
Lakes, Data Ingestion Platforms, Machine Learning and NLP based Analytics Platforms, Hyper-Scale
Processing Clusters, Data Mining and Search Engines.
What You’ll Need:
- 3+ years of industry experience in creating and managing end-to-end Data Solutions, Optimal
Data Processing Pipelines and Architecture dealing with large volume, big data sets of varied
data types.
- Proficiency in Python, Linux and shell scripting.
- Strong knowledge of working with PySpark dataframes, Pandas dataframes for writing efficient pre-processing and other data manipulation tasks.
● Strong experience in developing the infrastructure required for data ingestion, optimal
extraction, transformation, and loading of data from a wide variety of data sources using tools like Azure Data Factory, Azure Databricks (or Jupyter notebooks/ Google Colab) (or other similiar tools).
- Working knowledge of github or other version control tools.
- Experience with creating Restful web services and API platforms.
- Work with data science and infrastructure team members to implement practical machine
learning solutions and pipelines in production.
- Experience with cloud providers like Azure/AWS/GCP.
- Experience with SQL and NoSQL databases. MySQL/ Azure Cosmosdb / Hbase/MongoDB/ Elasticsearch etc.
- Experience with stream-processing systems: Spark-Streaming, Kafka etc and working experience with event driven architectures.
- Strong analytic skills related to working with unstructured datasets.
Good to have (to filter or prioritize candidates)
- Experience with testing libraries such as pytest for writing unit-tests for the developed code.
- Knowledge of Machine Learning algorithms and libraries would be good to have,
implementation experience would be an added advantage.
- Knowledge and experience of Datalake, Dockers and Kubernetes would be good to have.
- Knowledge of Azure functions , Elastic search etc will be good to have.
- Having experience with model versioning (mlflow) and data versioning will be beneficial
- Having experience with microservices libraries or with python libraries such as flask for hosting ml services and models would be great.
About Scry AI
Scry AI invents, designs, and develops cutting-edge technology-based Enterprise solutions powered by Machine Learning, Natural Language Processing, Big Data, and Computer Vision.
Scry AI is an R&D organization leading innovation in business automation technology and has been helping companies and businesses transform how they work.
Catering to core industries like Fintech, Healthcare, Communication, Mobility, and Smart Cities, Scry has invested heavily in R&D to build cutting-edge product suites that address challenges and roadblocks that plague traditional business environments.
Similar jobs
Who You Are:
- In-depth and strong knowledge of SQL.
- Basic knowledge of Java.
- Basic scripting knowledge.
- Strong analytical skills.
- Excellent debugging skills and problem-solving.
What You’ll Do:
- Comfortable working in EST+IST Timezone
- Troubleshoot complex issues discovered in-house as well as in customer environments.
- Replicate customer environments/issues on Platform and Data and work to identify the root cause or provide interim workaround as needed.
- Ability to debug SQL queries associated with Data pipelines.
- Monitoring and debugging ETL jobs on a daily basis.
- Provide Technical Action plans to take a customer/product issue from start to resolution.
- Capture and document any Data incidents identified on Platform and maintain the history of such issues along with resolution.
- Identify product bugs and improvements based on customer environments and work to close them
- Ensure implementation/continuous improvement of formal processes to support product development activities.
- Good in external and internal communication across stakeholders.
Data Scientist – Program Embedded
Job Description:
We are seeking a highly skilled and motivated senior data scientist to support a big data program. The successful candidate will play a pivotal role in supporting multiple projects in this program covering traditional tasks from revenue management, demand forecasting, improving customer experience to testing/using new tools/platforms such as Copilot Fabric for different purpose. The expected candidate would have deep expertise in machine learning methodology and applications. And he/she should have completed multiple large scale data science projects (full cycle from ideation to BAU). Beyond technical expertise, problem solving in complex set-up will be key to the success for this role. This is a data science role directly embedded into the program/projects, stake holder management and collaborations with patterner are crucial to the success on this role (on top of the deep expertise).
What we are looking for:
- Highly efficient in Python/Pyspark/R.
- Understand MLOps concepts, working experience in product industrialization (from Data Science point of view). Experience in building product for live deployment, and continuous development and continuous integration.
- Familiar with cloud platforms such as Azure, GCP, and the data management systems on such platform. Familiar with Databricks and product deployment on Databricks.
- Experience in ML projects involving techniques: Regression, Time Series, Clustering, Classification, Dimension Reduction, Anomaly detection with traditional ML approaches and DL approaches.
- Solid background in statistics, probability distributions, A/B testing validation, univariate/multivariate analysis, hypothesis test for different purpose, data augmentation etc.
- Familiar with designing testing framework for different modelling practice/projects based on business needs.
- Exposure to Gen AI tools and enthusiastic about experimenting and have new ideas on what can be done.
- If they have improved an internal company process using an AI tool, that would be great (e.g. process simplification, manual task automation, auto emails)
- Ideally, 10+ years of experience, and have been on independent business facing roles.
- CPG or retail as a data scientist would be nice, but not number one priority, especially for those who have navigated through multiple industries.
- Being proactive and collaborative would be essential.
Some projects examples within the program:
- Test new tools/platforms such as Copilo, Fabric for commercial reporting. Testing, validation and build trust.
- Building algorithms for predicting trend in category, consumptions to support dashboards.
- Revenue Growth Management, create/understand the algorithms behind the tools (can be built by 3rd parties) we need to maintain or choose to improve. Able to prioritize and build product roadmap. Able to design new solutions and articulate/quantify the limitation of the solutions.
- Demand forecasting, create localized forecasts to improve in store availability. Proper model monitoring for early detection of potential issues in the forecast focusing particularly on improving the end user experience.
Work closely with different Front Office and Support Function stakeholders including but not restricted to Business
Management, Accounts, Regulatory Reporting, Operations, Risk, Compliance, HR on all data collection and reporting use cases.
Collaborate with Business and Technology teams to understand enterprise data, create an innovative narrative to explain, engage and enlighten regular staff members as well as executive leadership with data-driven storytelling
Solve data consumption and visualization through data as a service distribution model
Articulate findings clearly and concisely for different target use cases, including through presentations, design solutions, visualizations
Perform Adhoc / automated report generation tasks using Power BI, Oracle BI, Informatica
Perform data access/transfer and ETL automation tasks using Python, SQL, OLAP / OLTP, RESTful APIs, and IT tools (CFT, MQ-Series, Control-M, etc.)
Provide support and maintain the availability of BI applications irrespective of the hosting location
Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability, provide incident-related communications promptly
Work with strict deadlines on high priority regulatory reports
Serve as a liaison between business and technology to ensure that data related business requirements for protecting sensitive data are clearly defined, communicated, and well understood, and considered as part of operational
prioritization and planning
To work for APAC Chief Data Office and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
General Skills:
Excellent knowledge of RDBMS and hands-on experience with complex SQL is a must, some experience in NoSQL and Big Data Technologies like Hive and Spark would be a plus
Experience with industrialized reporting on BI tools like PowerBI, Informatica
Knowledge of data related industry best practices in the highly regulated CIB industry, experience with regulatory report generation for financial institutions
Knowledge of industry-leading data access, data security, Master Data, and Reference Data Management, and establishing data lineage
5+ years experience on Data Visualization / Business Intelligence / ETL developer roles
Ability to multi-task and manage various projects simultaneously
Attention to detail
Ability to present to Senior Management, ExCo; excellent written and verbal communication skills
PLSQL Developer
experience of 4 to 6 years
Skills- MS SQl Server and Oracle, AWS or Azure
• Experience in setting up RDS service in cloud technologies such as AWS or Azure
• Strong proficiency with SQL and its variation among popular databases
• Should be well-versed in writing stored procedures, functions, packages, using collections,
• Skilled at optimizing large, complicated SQL statements.
• Should have worked in migration projects.
• Should have worked on creating reports.
• Should be able to distinguish between normalized and de-normalized data modelling designs and use cases.
• Knowledge of best practices when dealing with relational databases
• Capable of troubleshooting common database issues
• Familiar with tools that can aid with profiling server resource usage and optimizing it.
• Proficient understanding of code versioning tools such as Git and SVN
Senior Data Scientist
Your goal: To improve the education process and improve the student experience through data.
The organization: Data Science for Learning Services Data Science and Machine Learning are core to Chegg. As a Student Hub, we want to ensure that students discover the full breadth of learning solutions we have to offer to get full value on their learning time with us. To create the most relevant and engaging interactions, we are solving a multitude of machine learning problems so that we can better model student behavior, link various types of content, optimize workflows, and provide a personalized experience.
The Role: Senior Data Scientist
As a Senior Data Scientist, you will focus on conducting research and development in NLP and ML. You will be responsible for writing production-quality code for data product solutions at Chegg. You will lead in identification and implementation of key projects to process data and knowledge discovery.
Responsibilities:
• Translate product requirements into AIML/NLP solutions
• Be able to think out of the box and be able to design novel solutions for the problem at hand
• Write production-quality code
• Be able to design data and annotation collection strategies
• Identify key evaluation metrics and release requirements for data products
• Integrate new data and design workflows
• Innovate, share, and educate team members and community
Requirements:
• Working experience in machine learning, NLP, recommendation systems, experimentation, or related fields, with a specialization in NLP • Working experience on large language models that cater to multiple tasks such as text generation, Q&A, summarization, translation etc is highly preferred
• Knowledge on MLOPs and deployment pipelines is a must
• Expertise on supervised, unsupervised and reinforcement ML algorithms.
• Strong programming skills in Python
• Top data wrangling skills using SQL or NOSQL queries
• Experience using containers to deploy real-time prediction services
• Passion for using technology to help students
• Excellent communication skills
• Good team player and a self-starter
• Outstanding analytical and problem-solving skills
• Experience working with ML pipeline products such as AWS Sagemaker, Google ML, or Databricks a plus.
Why do we exist?
Students are working harder than ever before to stabilize their future. Our recent research study called State of the Student shows that nearly 3 out of 4 students are working to support themselves through college and 1 in 3 students feel pressure to spend more than they can afford. We founded our business on provided affordable textbook rental options to address these issues. Since then, we’ve expanded our offerings to supplement many facets of higher educational learning through Chegg Study, Chegg Math, Chegg Writing, Chegg Internships, Thinkful Online Learning, and more, to support students beyond their college experience. These offerings lower financial concerns for students by modernizing their learning experience. We exist so students everywhere have a smarter, faster, more affordable way to student.
Video Shorts
Life at Chegg: https://jobs.chegg.com/Video-Shorts-Chegg-Services
Certified Great Place to Work!: http://reviews.greatplacetowork.com/chegg
Chegg India: http://www.cheggindia.com/
Chegg Israel: http://insider.geektime.co.il/organizations/chegg
Thinkful (a Chegg Online Learning Service): https://www.thinkful.com/about/#careers
Chegg out our culture and benefits!
http://www.chegg.com/jobs/benefits
https://www.youtube.com/watch?v=YYHnkwiD7Oo
Chegg is an equal-opportunity employer
Job Responsibilities:
- Identify valuable data sources and automate collection processes
- Undertake preprocessing of structured and unstructured data.
- Analyze large amounts of information to discover trends and patterns
- Helping develop reports and analysis.
- Present information using data visualization techniques.
- Assessing tests and implementing new or upgraded software and assisting with strategic decisions on new systems.
- Evaluating changes and updates to source production systems.
- Develop, implement, and maintain leading-edge analytic systems, taking complicated problems and building simple frameworks
- Providing technical expertise in data storage structures, data mining, and data cleansing.
- Propose solutions and strategies to business challenges
Desired Skills and Experience:
- At least 1 year of experience in Data Analysis
- Complete understanding of Operations Research, Data Modelling, ML, and AI concepts.
- Knowledge of Python is mandatory, familiarity with MySQL, SQL, Scala, Java or C++ is an asset
- Experience using visualization tools (e.g. Jupyter Notebook) and data frameworks (e.g. Hadoop)
- Analytical mind and business acumen
- Strong math skills (e.g. statistics, algebra)
- Problem-solving aptitude
- Excellent communication and presentation skills.
- Bachelor’s / Master's Degree in Computer Science, Engineering, Data Science or other quantitative or relevant field is preferred
Data Engineer
- High Skilled and proficient on Azure Data Engineering Tech stacks (ADF, Databricks) - Should be well experienced in design and development of Big data integration platform (Kafka, Hadoop). - Highly skilled and experienced in building medium to complex data integration pipelines for Data at Rest and streaming data using Spark. - Strong knowledge in R/Python. - Advanced proficiency in solution design and implementation through Azure Data Lake, SQL and NoSQL Databases. - Strong in Data Warehousing concepts - Expertise in SQL, SQL tuning, Data Management (Data Security), schema design, Python and ETL processes - Highly Motivated, Self-Starter and quick learner - Must have Good knowledge on Data modelling and understating of Data analytics - Exposure to Statistical procedures, Experiments and Machine Learning techniques is an added advantage. - Experience in leading small team of 6/7 Data Engineers. - Excellent written and verbal communication skills
|
Work shift: Day time
- Strong problem-solving skills with an emphasis on product development.
insights from large data sets.
• Experience in building ML pipelines with Apache Spark, Python
• Proficiency in implementing end to end Data Science Life cycle
• Experience in Model fine-tuning and advanced grid search techniques
• Experience working with and creating data architectures.
• Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural
networks, etc.) and their real-world advantages/drawbacks.
• Knowledge of advanced statistical techniques and concepts (regression, properties of distributions,
statistical tests and proper usage, etc.) and experience with applications.
• Excellent written and verbal communication skills for coordinating across teams.
• A drive to learn and master new technologies and techniques.
• Assess the effectiveness and accuracy of new data sources and data gathering techniques.
• Develop custom data models and algorithms to apply to data sets.
• Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting, and other business outcomes.
• Develop company A/B testing framework and test model quality.
• Coordinate with different functional teams to implement models and monitor outcomes.
• Develop processes and tools to monitor and analyze model performance and data accuracy.
Key skills:
● Strong knowledge in Data Science pipelines with Python
● Object-oriented programming
● A/B testing framework and model fine-tuning
● Proficiency in using sci-kit, NumPy, and pandas package in python
Nice to have:
● Ability to work with containerized solutions: Docker/Compose/Swarm/Kubernetes
● Unit testing, Test-driven development practice
● DevOps, Continuous integration/ continuous deployment experience
● Agile development environment experience, familiarity with SCRUM
● Deep learning knowledge
JD for IOT DE:
The role requires experience in Azure core technologies – IoT Hub/ Event Hub, Stream Analytics, IoT Central, Azure Data Lake Storage, Azure Cosmos, Azure Data Factory, Azure SQL Database, Azure HDInsight / Databricks, SQL data warehouse.
You Have:
- Minimum 2 years of software development experience
- Minimum 2 years of experience in IoT/streaming data pipelines solution development
- Bachelor's and/or Master’s degree in computer science
- Strong Consulting skills in data management including data governance, data quality, security, data integration, processing, and provisioning
- Delivered data management projects with real-time/near real-time data insights delivery on Azure Cloud
- Translated complex analytical requirements into the technical design including data models, ETLs, and Dashboards / Reports
- Experience deploying dashboards and self-service analytics solutions on both relational and non-relational databases
- Experience with different computing paradigms in databases such as In-Memory, Distributed, Massively Parallel Processing
- Successfully delivered large scale IOT data management initiatives covering Plan, Design, Build and Deploy phases leveraging different delivery methodologies including Agile
- Experience in handling telemetry data with Spark Streaming, Kafka, Flink, Scala, Pyspark, Spark SQL.
- Hands-on experience on containers and Dockers
- Exposure to streaming protocols like MQTT and AMQP
- Knowledge of OT network protocols like OPC UA, CAN Bus, and similar protocols
- Strong knowledge of continuous integration, static code analysis, and test-driven development
- Experience in delivering projects in a highly collaborative delivery model with teams at onsite and offshore
- Must have excellent analytical and problem-solving skills
- Delivered change management initiatives focused on driving data platforms adoption across the enterprise
- Strong verbal and written communications skills are a must, as well as the ability to work effectively across internal and external organizations
Roles & Responsibilities
You Will:
- Translate functional requirements into technical design
- Interact with clients and internal stakeholders to understand the data and platform requirements in detail and determine core Azure services needed to fulfill the technical design
- Design, Develop and Deliver data integration interfaces in ADF and Azure Databricks
- Design, Develop and Deliver data provisioning interfaces to fulfill consumption needs
- Deliver data models on Azure platform, it could be on Azure Cosmos, SQL DW / Synapse, or SQL
- Advise clients on ML Engineering and deploying ML Ops at Scale on AKS
- Automate core activities to minimize the delivery lead times and improve the overall quality
- Optimize platform cost by selecting the right platform services and architecting the solution in a cost-effective manner
- Deploy Azure DevOps and CI CD processes
- Deploy logging and monitoring across the different integration points for critical alerts