We are actively seeking a Senior Data Engineer experienced in building data pipelines and integrations from 3rd party data sources by writing custom automated ETL jobs using Python. The role will work in partnership with other members of the Business Analytics team to support the development and implementation of new and existing data warehouse solutions for our clients. This includes designing database import/export processes used to generate client data warehouse deliverables.
- 2+ Years experience as an ETL developer with strong data architecture knowledge around data warehousing concepts, SQL development and optimization, and operational support models.
- Experience using Python to automate ETL/Data Processes jobs.
- Design and develop ETL and data processing solutions using data integration tools, python scripts, and AWS / Azure / On-Premise Environment.
- Experience / Willingness to learn AWS Glue / AWS Data Pipeline / Azure Data Factory for Data Integration.
- Develop and create transformation queries, views, and stored procedures for ETL processes, and process automation.
- Document data mappings, data dictionaries, processes, programs, and solutions as per established standards for data governance.
- Work with the data analytics team to assess and troubleshoot potential data quality issues at key intake points such as validating control totals at intake and then upon transformation, and transparently build lessons learned into future data quality assessments
- Solid experience with data modeling, business logic, and RESTful APIs.
- Solid experience in the Linux environment.
- Experience with NoSQL / PostgreSQL preferred
- Experience working with databases such as MySQL, NoSQL, and Postgres, and enterprise-level connectivity experience (such as connecting over TLS and through proxies).
- Experience with NGINX and SSL.
- Performance tune data processes and SQL queries, and recommend and implement data process optimization and query tuning techniques.
About NeenOpal Intelligent Solutions Private Limited
Experience in data cleansing and preparation concepts and tools.
Hands-on NLP experience
Programming Languages: Hands-on with Python and Java.
Strong background in relational databases, data modelling and
Experience working with statistical libraries (e.g scikit-learn) and
frameworks for predictive analytics.
Knowledge of Hadoop and UNIX will be a plus.
Expertise in probability and statistics, time-series analysis as well
as experience in the use of machine learning methods, for example, linear regression, correlation, statistical significance, and so forth.
Experience in data visualization concepts and tools
Knowledge of Kibana and/or other reporting tool
Knowledge of Data Quality ControlsDesired Skills and Experience
1. Working knowledge on GCP (Cloud Storage, Cloud functions, Firestore, DAG, Airflow/Cloud Composer, python, apache beam, bigquery)
2. Knowledge on Google BQ and DBT
3. Good to have Python scripting knowledge for Data Engineering
4. Terraform knowledge will be an added advantage
5. Knowledge on Data warehousing is a must
6. Data Analysis, Knowledge on Teradata (BTEQ, Mload)
7. ETL or ELT process
8. Building CI/CD pipeline, containerization etc
9. Agile ways of working
Team Lead and Process:
1. Ensuring adherence to schedule and quality of activities related to design, build, testing and implementation of deliverables.
2. Participate in requirement elicitation, validation of architecture, creation and review of design.
3. Provide support to the team like pseudocode to team, coordinating with architects to resolve blockers, assigning and reviewing tasks and ensure quality and timelines are met.
Skills: Machine Learning,Deep Learning,Artificial Intelligence,python.
Domain knowledge: Data cleaning, modelling, analytics, statistics, machine learning, AI
· To be part of Digital Manufacturing and Industrie 4.0 projects across Saint Gobain group of companies
· Design and develop AI//ML models to be deployed across SG factories
· Knowledge on Hadoop, Apache Spark, MapReduce, Scala, Python programming, SQL and NoSQL databases is required
· Should be strong in statistics, data analysis, data modelling, machine learning techniques and Neural Networks
· Prior experience in developing AI and ML models is required
· Experience with data from the Manufacturing Industry would be a plus
Roles and Responsibilities:
· Develop AI and ML models for the Manufacturing Industry with a focus on Energy, Asset Performance Optimization and Logistics
· Multitasking, good communication necessary
· Entrepreneurial attitude.
The Sr. Data Scientist will be located in Pune, India or alternative location and working closely with our Analytics teams in New York City, India, and Bosnia. The role will be part of our Clinical Insights line of analytics, seeking to support internal and external Business partners in generating analyses and insights of Outcomes product (measurement of campaign outcomes / script lift), as well as general Deep Intent product suite. Activities in this position include conducting exploratory data analysis / discovery, creating and scoring audiences, reading campaign results by analyzing medical claims, clinical, demographic and clickstream data; performing analysis and creating actionable insights, summarizing them and presenting results and recommended actions to internal stakeholders and external clients, as needed. This role will report directly to the Sr. Director of Outcomes Insights.
- Time-series modeling and forecasting
- Predictive modeling (e.g. xgboost, deep learning) on large datasets
- Building data ingestion pipelines and transform data into metrics useful for analytics and modeling
- Hypothesis Testing, Experimental Design & AB Testing
- Write production level code in Python,, SQL in BigQuery/Spark and Git experience
- Support business development and client analytics and insights process, under supervision of the director / sr. data scientist, utilizing consumer demographic, clickstream and clinical data (claims and medications)
- Core activities to include: Campaign audience sizing estimates, generating lookalike & campaign audiences, generating standardized reporting deliverables on media performance, and packaging insights into relevant client stories
- Extract, explore, visualize and analyze large healthcare claims data, consumer demographic, prospecting and clickstream data using SQL, Python or R libraries.
- Generate scripts for audience creation using SQL, Python / R and API call infrastructure.
- Understand objectives of client campaigns, audience selection (diagnostics), creative and channel.
- Support internal product development of data tools, dashboards and forecasts, as needed.
- You have a working understanding of the ad-tech / digital marketing and advertising data and campaigns, and interest (and aptitude) for learning US healthcare patient and provider systems (e.g. medical claims, medications etc.).
- Desire to work in a rapidly growing and scaling startup, with a strong culture of fast-paced cross functional collaboration.
- Hands-on predictive modeling experience (decision trees, boosting algorithms and regression models).
- Orientation and interest in translating complex quantitative results into meaningful findings and interpretable deliverables, and communicate with the less technical audience.
- Hypothesis oriented curiosity and tenacity in obtaining meaningful results through iterative data analysis and data prep.
- “Can do” attitude, outstanding technical troubleshooting and problem-solving abilities, aptitude to rapidly develop working knowledge of new tools, open source libraries, data sources etc.
- Ability to meet deadlines and flexibility to work constructively with shifting priorities.
- You have strong communication & presentation skills backed with strong hold of critical thinking.
- Bachelor’s degree in a STEM field, such as Statistics, Mathematics, Engineering, Biostatistics, Econometrics, Economics, Finance, or Data Science.
- Minimum of 5 years of working experience as Data Analyst, Engineer, Data Scientist or Researcher in digital marketing, consumer advertisement, telecom, healthcare or other areas requiring customer level predictive analytics.
- Proficiency in performing statistical analysis in R or Python, including relevant libraries is required. Prior experience in using these tools in analytical R&D strongly preferred.
- Advanced ability to use relevant technology/software to wrangle data, perform analytics, and visualize for consumption is required.
- Experience with SQL is required.
- Advanced experience: with basic Office Suite (Excel, Powerpoint) is required.
- Familiarity with medical and healthcare data preferred (medical claims, Rx, etc.).
- Experience with cloud technologies such as AWS or Google Cloud, required
- Exposure to big data tools (hadoop, pyspark) is preferred.
- Experience with Git/version control and Jira/ticketing system is strongly preferred.
- Experience with a visualization tool such as Looker and / or Tableau, preferred.
Hypersonix.ai is seeking a Data Evangelist who can work closely with customers to understand the data sources, acquire data and drive product success by delivering insights based on customer needs.
Primary Responsibilities :
- Lead and deliver complete application lifecycle design, development, deployment, and support for actionable BI and Advanced Analytics solutions
- Design and develop data models and ETL process for structured and unstructured data that is distributed across multiple Cloud platforms
- Develop and deliver solutions with data streaming capabilities for a large volume of data
- Design, code and maintain parts of the product and drive customer adoption
- Build data acquisition strategy to onboard customer data with speed and accuracy
- Working both independently and with team members to develop, refine, implement, and scale ETL processes
- On-going support and maintenance of live-clients for their data and analytics needs
- Defining the data automation architecture to drive self-service data load capabilities
Required Qualifications :
- Bachelors/Masters/Ph.D. in Computer Science, Information Systems, Data Science, Artificial Intelligence, Machine Learning or related disciplines
- 10+ years of experience guiding the development and implementation of Data architecture in structured, unstructured, and semi-structured data environments.
- Highly proficient in Big Data, data architecture, data modeling, data warehousing, data wrangling, data integration, data testing and application performance tuning
- Experience with data engineering tools and platforms such as Kafka, Spark, Databricks, Flink, Storm, Druid and Hadoop
- Strong with hands-on programming and scripting for Big Data ecosystem (Python, Scala, Spark, etc)
- Experience building batch and streaming ETL data pipelines using workflow management tools like Airflow, Luigi, NiFi, Talend, etc
- Familiarity with cloud-based platforms like AWS, Azure or GCP
- Experience with cloud data warehouses like Redshift and Snowflake
- Proficient in writing complex SQL queries.
- Excellent communication skills and prior experience of working closely with customers
- Data savvy who loves to understand large data trends and obsessed with data analysis
- Desire to learn about, explore, and invent new tools for solving real-world problems using data
Desired Qualifications :
- Cloud computing experience, Amazon Web Services (AWS)
- Prior experience in Data Warehousing concepts, multi-dimensional data models
- Full command of Analytics concepts including Dimension, KPI, Reports & Dashboards
- Prior experience in managing client implementation of Analytics projects
- Knowledge and prior experience of using machine learning tools
Expertise in handling large amount of data through Python or PySpark
Conduct data assessment, perform data quality checks and transform data using SQL
and ETL tools
Experience of deploying ETL / data pipelines and workflows in cloud technologies and
architecture such as Azure and Amazon Web Services will be valued
Comfort with data modelling principles (e.g. database structure, entity relationships, UID
etc.) and software development principles (e.g. modularization, testing, refactoring, etc.)
A thoughtful and comfortable communicator (verbal and written) with the ability to
facilitate discussions and conduct training
Track record of strong problem-solving, requirement gathering, and leading by example
Ability to thrive in a flexible and collaborative environment
Track record of completing projects successfully on time, within budget and as per scope
SQL development for our Enterprise Resource Planning (ERP) Product offered to SMEs. Regular modifications , creation and validation with testing of stored procedures , views, functions on MS SQL Server.
Responsibilities and Duties
Understanding the ERP Software and use cases.
Regular Creation,modifications and testing of
- Stored Procedures
- Nested Queries
- Table and Schema Designs
Qualifications and Skills
- Procedural Language
- Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
- Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
- Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
- Build data pipelines that clean, transform, and aggregate data from disparate sources
- Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
- Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
- Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
- Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.
- Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
- 5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
- Technical expertise with data models, data mining.
- Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
- Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
- Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
- Hands-on knowledge in SQL and No-SQL database design.
- Having knowledge in CI/CD for the building and hosting of the solutions.
- Having AWS certification is an added advantage.
- Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
- A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
- Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
- A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists
- Managing and designing the reporting environment, including data sources, security, and metadata.
- Preparing reports for executive leadership that effectively communicate trends, patterns, and predictions using relevant data
- Establish KPIs to measure the effectiveness of business decisions.
- Work with management to prioritize business and information needs.
- Provide data solutions, tools, and capabilities to enable self-service frameworks for data consumers
- Provide expertise and translate the business needs to design; and develop tools, techniques, and metrics, and dashboards for insights and data visualization.
- Responsible for developing and executing tools to monitor and report on data quality.
- Responsible for establishing appreciation and adherence to the principles of data quality management, including metadata, lineage, and business definitions
- Provide support to Tech teams in managing security mechanisms and data access governance
- Provides technical support and mentoring and training to less senior analysts.
- Derive insights through A/B tests, funnel analysis, and user segmentation
- 3+ years in a data analyst position, preferably working as a Data Analyst in a fast-paced and dynamic business setting.
- Strong SQL-based querying languages (MYSQL, PostgreSQL) and Excel skills with the ability to learn other analytic tools.
- Skilled in statistical and econometric modeling, performing quantitative analysis, and technological data mining and analysis techniques.
- This role requires a mixture of data schema knowledge and technical writing activities paired with hands-on and collaborative work with Systems Analysts. Technical exposure through requirements, QA, or development software lifecycles are also a plus
- Demonstrated analytical skills. Ability to work with large amounts of data: facts, figures, and number crunching. Ability to see through the data and analyze it to find conclusions.
- Excellent attention to detail. Data needs to be precise. Conclusions drawn from data analysis will drive critical client decisions
Domain knowledge in the Internet of Things is a plus
- Managing a junior team of analysts. It is crucial that they have exceptional writing and verbal communication skills to perform their job duties and manage others.
- B.E/ B.Tech./ M. E/ M. Tech from any recognized university in India.
- Minimum 60% in Graduation or Post-Graduation
- SQL knowledge and hands-on experience is a must.
- Great interpersonal and communication skill
Pipelines should be optimised to handle both real time data, batch update data and historical data.
Establish scalable, efficient, automated processes for complex, large scale data analysis.
Write high quality code to gather and manage large data sets (both real time and batch data) from multiple sources, perform ETL and store it in a data warehouse.
Manipulate and analyse complex, high-volume, high-dimensional data from varying sources using a variety of tools and data analysis techniques.
Participate in data pipelines health monitoring and performance optimisations as well as quality documentation.
Interact with end users/clients and translate business language into technical requirements.
Acts independently to expose and resolve problems.
Job Requirements :-
2+ years experience working in software development & data pipeline development for enterprise analytics.
2+ years of working with Python with exposure to various warehousing tools
In-depth working with any of commercial tools like AWS Glue, Ta-lend, Informatica, Data-stage, etc.
Experience with various relational databases like MySQL, MSSql, Oracle etc. is a must.
Experience with analytics and reporting tools (Tableau, Power BI, SSRS, SSAS).
Experience in various DevOps practices helping the client to deploy and scale the systems as per requirement.
Strong verbal and written communication skills with other developers and business client.
Knowledge of Logistics and/or Transportation Domain is a plus.
Hands-on with traditional databases and ERP systems like Sybase and People-soft.