11+ Data mining Jobs in Pune | Data mining Job openings in Pune
Apply to 11+ Data mining Jobs in Pune on CutShort.io. Explore the latest Data mining Job opportunities across top companies like Google, Amazon & Adobe.
4-6 years of total experience in data warehousing and business intelligence
3+ years of solid Power BI experience (Power Query, M-Query, DAX, Aggregates)
2 years’ experience building Power BI using cloud data (Snowflake, Azure Synapse, SQL DB, data lake)
Strong experience building visually appealing UI/UX in Power BI
Understand how to design Power BI solutions for performance (composite models, incremental refresh, analysis services)
Experience building Power BI using large data in direct query mode
Expert SQL background (query building, stored procedure, optimizing performance)
at DeepIntent
Who We Are:
DeepIntent is leading the healthcare advertising industry with data-driven solutions built for the future. From day one, our mission has been to improve patient outcomes through the artful use of advertising, data science, and real-world clinical data.
What You’ll Do:
We are looking for a Senior Software Engineer based in Pune, India who can master both DeepIntent’s data architectures and pharma research and analytics methodologies to make significant contributions to how health media is analyzed by our clients. This role requires an Engineer who not only understands DBA functions but also how they impact research objectives and can work with researchers and data scientists to achieve impactful results.
This role will be in the Analytics Organization and will require integration and partnership with the Engineering Organization. The ideal candidate is a self-starter who is inquisitive who is not afraid to take on and learn from challenges and will constantly seek to improve the facets of the business they manage. The ideal candidate will also need to demonstrate the ability to collaborate and partner with others.
- Serve as the Engineering interface between Analytics and Engineering teams
- Develop and standardized all interface points for analysts to retrieve and analyze data with a focus on research methodologies and data based decisioning
- Optimize queries and data access efficiencies, serve as expert in how to most efficiently attain desired data points
- Build “mastered” versions of the data for Analytics specific querying use cases
- Help with data ETL, table performance optimization
- Establish formal data practice for the Analytics practice in conjunction with rest of DeepIntent
- Build & operate scalable and robust data architectures
- Interpret analytics methodology requirements and apply to data architecture to create standardized queries and operations for use by analytics teams
- Implement DataOps practices
- Master existing and new Data Pipelines and develop appropriate queries to meet analytics specific objectives
- Collaborate with various business stakeholders, software engineers, machine learning engineers, analysts
- Operate between Engineers and Analysts to unify both practices for analytics insight creation
Who You Are:
- Adept in market research methodologies and using data to deliver representative insights
- Inquisitive, curious, understands how to query complicated data sets, move and combine data between databases
- Deep SQL experience is a must
- Exceptional communication skills with ability to collaborate and translate with between technical and non technical needs
- English Language Fluency and proven success working with teams in the U.S.
- Experience in designing, developing and operating configurable Data pipelines serving high volume and velocity data
- Experience working with public clouds like GCP/AWS
- Good understanding of software engineering, DataOps, and data architecture, Agile and DevOps methodologies
- Experience building Data architectures that optimize performance and cost, whether the components are prepackaged or homegrown
- Proficient with SQL,Python or JVM based language, Bash
- Experience with any of Apache open source projects such as Spark, Druid, Beam, Airflow etc.and big data databases like BigQuery, Clickhouse, etc
- Ability to think big, take bets and innovate, dive deep, hire and develop the best talent, learn and be curious
- Comfortable to work in EST Time Zone
Requirements
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
Company Profile:
Easebuzz is a payment solutions (fintech organisation) company which enables online merchants to accept, process and disburse payments through developer friendly APIs. We are focusing on building plug n play products including the payment infrastructure to solve complete business problems. Definitely a wonderful place where all the actions related to payments, lending, subscription, eKYC is happening at the same time.
We have been consistently profitable and are constantly developing new innovative products, as a result, we are able to grow 4x over the past year alone. We are well capitalised and have recently closed a fundraise of $4M in March, 2021 from prominent VC firms and angel investors. The company is based out of Pune and has a total strength of 180 employees. Easebuzz’s corporate culture is tied into the vision of building a workplace which breeds open communication and minimal bureaucracy. An equal opportunity employer, we welcome and encourage diversity in the workplace. One thing you can be sure of is that you will be surrounded by colleagues who are committed to helping each other grow.
Easebuzz Pvt. Ltd. has its presence in Pune, Bangalore, Gurugram.
Salary: As per company standards.
Designation: Data Engineering
Location: Pune
Experience with ETL, Data Modeling, and Data Architecture
Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties
- Spark, EMR, DynamoDB, RedShift, Kinesis, Lambda, Glue.
Experience with AWS cloud data lake for development of real-time or near real-time use cases
Experience with messaging systems such as Kafka/Kinesis for real time data ingestion and processing
Build data pipeline frameworks to automate high-volume and real-time data delivery
Create prototypes and proof-of-concepts for iterative development.
Experience with NoSQL databases, such as DynamoDB, MongoDB etc
Create and maintain optimal data pipeline architecture,
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Evangelize a very high standard of quality, reliability and performance for data models and algorithms that can be streamlined into the engineering and sciences workflow
Build and enhance data pipeline architecture by designing and implementing data ingestion solutions.
Employment Type
Full-time
Datametica is Hiring for Datastage Developer
- Must have 3 to 8 years of experience in ETL Design and Development using IBM Datastage Components.
- Should have extensive knowledge in Unix shell scripting.
- Understanding of DW principles (Fact, Dimension tables, Dimensional Modelling and Data warehousing concepts).
- Research, development, document and modification of ETL processes as per data architecture and modeling requirements.
- Ensure appropriate documentation for all new development and modifications of the ETL processes and jobs.
- Should be good in writing complex SQL queries.
About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!
www.datametica.com
- 6+ years of recent hands-on Java development
- Developing data pipelines in AWS or Google Cloud
- Java, Python, JavaScript programming languages
- Great understanding of designing for performance, scalability, and reliability of data intensive application
- Hadoop MapReduce, Spark, Pig. Understanding of database fundamentals and advanced SQL knowledge.
- In-depth understanding of object oriented programming concepts and design patterns
- Ability to communicate clearly to technical and non-technical audiences, verbally and in writing
- Understanding of full software development life cycle, agile development and continuous integration
- Experience in Agile methodologies including Scrum and Kanban
- Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
- Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
- Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
- Build data pipelines that clean, transform, and aggregate data from disparate sources
- Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
- Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
- Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
- Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.
Job Qualifications:
- Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
- 5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
- Technical expertise with data models, data mining.
- Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
- Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
- Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
- Hands-on knowledge in SQL and No-SQL database design.
- Having knowledge in CI/CD for the building and hosting of the solutions.
- Having AWS certification is an added advantage.
- Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
- A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
- Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
- A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists
- Expertise in designing and implementing enterprise scale database (OLTP) and Data warehouse solutions.
- Hands on experience in implementing Azure SQL Database, Azure SQL Date warehouse (Azure Synapse Analytics) and big data processing using Azure Databricks and Azure HD Insight.
- Expert in writing T-SQL programming for complex stored procedures, functions, views and query optimization.
- Should be aware of Database development for both on-premise and SAAS Applications using SQL Server and PostgreSQL.
- Experience in ETL and ELT implementations using Azure Data Factory V2 and SSIS.
- Experience and expertise in building machine learning models using Logistic and linear regression, Decision tree and Random forest Algorithms.
- PolyBase queries for exporting and importing data into Azure Data Lake.
- Building data models both tabular and multidimensional using SQL Server data tools.
- Writing data preparation, cleaning and processing steps using Python, SCALA, and R.
- Programming experience using python libraries NumPy, Pandas and Matplotlib.
- Implementing NOSQL databases and writing queries using cypher.
- Designing end user visualizations using Power BI, QlikView and Tableau.
- Experience working with all versions of SQL Server 2005/2008/2008R2/2012/2014/2016/2017/2019
- Experience using the expression languages MDX and DAX.
- Experience in migrating on-premise SQL server database to Microsoft Azure.
- Hands on experience in using Azure blob storage, Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2.
- Performance tuning complex SQL queries, hands on experience using SQL Extended events.
- Data modeling using Power BI for Adhoc reporting.
- Raw data load automation using T-SQL and SSIS
- Expert in migrating existing on-premise database to SQL Azure.
- Experience in using U-SQL for Azure Data Lake Analytics.
- Hands on experience in generating SSRS reports using MDX.
- Experience in designing predictive models using Python and SQL Server.
- Developing machine learning models using Azure Databricks and SQL Server
Job Description:
Roles & Responsibilities:
· You will be involved in every part of the project lifecycle, right from identifying the business problem and proposing a solution, to data collection, cleaning, and preprocessing, to training and optimizing ML/DL models and deploying them to production.
· You will often be required to design and execute proof-of-concept projects that can demonstrate business value and build confidence with CloudMoyo’s clients.
· You will be involved in designing and delivering data visualizations that utilize the ML models to generate insights and intuitively deliver business value to CXOs.
Desired Skill Set:
· Candidates should have strong Python coding skills and be comfortable working with various ML/DL frameworks and libraries.
· Hands-on skills and industry experience in one or more of the following areas is necessary:
1) Deep Learning (CNNs/RNNs, Reinforcement Learning, VAEs/GANs)
2) Machine Learning (Regression, Random Forests, SVMs, K-means, ensemble methods)
3) Natural Language Processing
4) Graph Databases (Neo4j, Apache Giraph)
5) Azure Bot Service
6) Azure ML Studio / Azure Cognitive Services
7) Log Analytics with NLP/ML/DL
· Previous experience with data visualization, C# or Azure Cloud platform and services will be a plus.
· Candidates should have excellent communication skills and be highly technical, with the ability to discuss ideas at any level from executive to developer.
· Creative problem-solving, unconventional approaches and a hacker mindset is highly desired.