Design, implement, and improve the analytics platform
Implement and simplify self-service data query and analysis capabilities of the BI platform
Develop and improve the current BI architecture, emphasizing data security, data quality
and timeliness, scalability, and extensibility
Deploy and use various big data technologies and run pilots to design low latency
data architectures at scale
Collaborate with business analysts, data scientists, product managers, software development engineers,
and other BI teams to develop, implement, and validate KPIs, statistical analyses, data profiling, prediction,
forecasting, clustering, and machine learning algorithms
Educational
At Ganit we are building an elite team, ergo we are seeking candidates who possess the
following backgrounds:
7+ years relevant experience
Expert level skills writing and optimizing complex SQL
Knowledge of data warehousing concepts
Experience in data mining, profiling, and analysis
Experience with complex data modelling, ETL design, and using large databases
in a business environment
Proficiency with Linux command line and systems administration
Experience with languages like Python/Java/Scala
Experience with Big Data technologies such as Hive/Spark
Proven ability to develop unconventional solutions, sees opportunities to
innovate and leads the way
Good experience of working in cloud platforms like AWS, GCP & Azure. Having worked on
projects involving creation of data lake or data warehouse
Excellent verbal and written communication.
Proven interpersonal skills and ability to convey key insights from complex analyses in
summarized business terms. Ability to effectively communicate with multiple teams
Good to have
AWS/GCP/Azure Data Engineer Certification
Similar jobs
Science)
Have 2 to 6 years of experience working in a similar role in a startup environment
SQL and Excel have no secrets for you
You love visualizing data with Tableau
Any experience with product analytics tools (Mixpanel, Clevertap) is a plus
You solve math puzzles for fun
A strong analytical mindset with a problem-solving attitude
Comfortable with being critical and speaking your mind
You can easily switch between coding (R or Python) and having a business
discussion
Be a team player who thrives in a fast-paced and constantly changing environment
Job Description: Data Engineer
We are looking for a curious Data Engineer to join our extremely fast-growing Tech Team at StanPlus
About RED.Health (Formerly Stanplus Technologies)
Get to know the team:
Join our team and help us build the world’s fastest and most reliable emergency response system using cutting-edge technology.
Because every second counts in an emergency, we are building systems and flows with 4 9s of reliability to ensure that our technology is always there when people need it the most. We are looking for distributed systems experts who can help us perfect the architecture behind our key design principles: scalability, reliability, programmability, and resiliency. Our system features a powerful dispatch engine that connects emergency service providers with patients in real-time
.
Key Responsibilities
● Build Data ETL Pipelines
● Develop data set processes
● Strong analytic skills related to working with unstructured datasets
● Evaluate business needs and objectives
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery
● Interpret trends and patterns
● Work with data and analytics experts to strive for greater functionality in our data system
● Build algorithms and prototypes
● Explore ways to enhance data quality and reliability
● Work with the Executive, Product, Data, and D esign teams, to assist with data-related technical issues and support their data infrastructure needs.
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Key Requirements
● Proven experience as a data engineer, software developer, or similar of at least 3 years.
● Bachelor's / Master’s degree in data engineering, big data analytics, computer engineering, or related field.
● Experience with big data tools: Hadoop, Spark, Kafka, etc.
● Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
● Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
● Experience with Azure, AWS cloud services: EC2, EMR, RDS, Redshift
● Experience with BigQuery
● Experience with stream-processing systems: Storm, Spark-Streaming, etc.
● Experience with languages: Python, Java, C++, Scala, SQL, R, etc.
● Good hands-on with Hive, Presto.
Greetings!!!!
We are looking for a data engineer for one of our premium clients for their Chennai and Tirunelveli location
Required Education/Experience
● Bachelor’s degree in computer Science or related field
● 5-7 years’ experience in the following:
● Snowflake, Databricks management,
● Python and AWS Lambda
● Scala and/or Java
● Data integration service, SQL and Extract Transform Load (ELT)
● Azure or AWS for development and deployment
● Jira or similar tool during SDLC
● Experience managing codebase using Code repository in Git/GitHub or Bitbucket
● Experience working with a data warehouse.
● Familiarity with structured and semi-structured data formats including JSON, Avro, ORC, Parquet, or XML
● Exposure to working in an agile work environment
5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.
2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.
Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.
Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.
Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.
Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.
Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.
Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.
Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.
Experience or exposure to design and development using Full Stack tools.
Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.
Bachelor's or master's degree in computer science or related field.
Role : Sr Data Scientist / Tech Lead – Data Science
Number of positions : 8
Responsibilities
- Lead a team of data scientists, machine learning engineers and big data specialists
- Be the main point of contact for the customers
- Lead data mining and collection procedures
- Ensure data quality and integrity
- Interpret and analyze data problems
- Conceive, plan and prioritize data projects
- Build analytic systems and predictive models
- Test performance of data-driven products
- Visualize data and create reports
- Experiment with new models and techniques
- Align data projects with organizational goals
Requirements (please read carefully)
- Very strong in statistics fundamentals. Not all data is Big Data. The candidate should be able to derive statistical insights from very few data points if required, using traditional statistical methods.
- Msc-Statistics/ Phd.Statistics
- Education – no bar, but preferably from a Statistics academic background (eg MSc-Stats, MSc-Econometrics etc), given the first point
- Strong expertise in Python (any other statistical languages/tools like R, SAS, SPSS etc are just optional, but Python is absolutely essential). If the person is very strong in Python, but has almost nil knowledge in the other statistical tools, he/she will still be considered a good candidate for this role.
- Proven experience as a Data Scientist or similar role, for about 7-8 years
- Solid understanding of machine learning and AI concepts, especially wrt choice of apt candidate algorithms for a use case, and model evaluation.
- Good expertise in writing SQL queries (should not be dependent upon anyone else for pulling in data, joining them, data wrangling etc)
- Knowledge of data management and visualization techniques --- more from a Data Science perspective.
- Should be able to grasp business problems, ask the right questions to better understand the problem breadthwise /depthwise, design apt solutions, and explain that to the business stakeholders.
- Again, the last point above is extremely important --- should be able to identify solutions that can be explained to stakeholders, and furthermore, be able to present them in simple, direct language.
http://www.altimetrik.com/">http://www.altimetrik.com
https://www.youtube.com/watch?v=3nUs4YxppNE&feature=emb_rel_end">https://www.youtube.com/watch?v=3nUs4YxppNE&feature=emb_rel_end
https://www.youtube.com/watch?v=e40r6kJdC8c">https://www.youtube.com/watch?v=e40r6kJdC8c
- 5+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka Streams, Py Spark, and streaming databases like druid or equivalent like Hive
- Strong industry expertise with containerization technologies including kubernetes (EKS/AKS), Kubeflow
- Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
- 5+ Industry experience in python
- Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
- Experience with scripting languages. Python experience highly desirable. Experience in API development using Swagger
- Implementing automated testing platforms and unit tests
- Proficient understanding of code versioning tools, such as Git
- Familiarity with continuous integration, Jenkins
Responsibilities
- Architect, Design and Implement Large scale data processing pipelines using Kafka Streams, PySpark, Fluentd and Druid
- Create custom Operators for Kubernetes, Kubeflow
- Develop data ingestion processes and ETLs
- Assist in dev ops operations
- Design and Implement APIs
- Identify performance bottlenecks and bugs, and devise solutions to these problems
- Help maintain code quality, organization, and documentation
- Communicate with stakeholders regarding various aspects of solution.
- Mentor team members on best practices
- Key responsibility is to design, develop & maintain efficient Data models for the organization maintained to ensure optimal query performance by the consumption layer.
- Developing, Deploying & maintaining a repository of UDXs written in Java / Python.
- Develop optimal Data Model design, analyzing complex distributed data deployments, and making recommendations to optimize performance basis data consumption patterns, performance expectations, the query is executed on the tables/databases, etc.
- Periodic Database health check and maintenance
- Designing collections in a no-SQL Database for efficient performance
- Document & maintain data dictionary from various sources to enable data governance
- Coordination with Business teams, IT, and other stakeholders to provide best-in-class data pipeline solutions, exposing data via APIs, loading in down streams, No-SQL Databases, etc
- Data Governance Process Implementation and ensuring data security
Requirements
- Extensive working experience in Designing & Implementing Data models in OLAP Data Warehousing solutions (Redshift, Synapse, Snowflake, Teradata, Vertica, etc).
- Programming experience using Python / Java.
- Working knowledge in developing & deploying User-defined Functions (UDXs) using Java / Python.
- Strong understanding & extensive working experience in OLAP Data Warehousing (Redshift, Synapse, Snowflake, Teradata, Vertica, etc) architecture and cloud-native Data Lake (S3, ADLS, BigQuery, etc) Architecture.
- Strong knowledge in Design, Development & Performance tuning of 3NF/Flat/Hybrid Data Model.
- Extensive technical experience in SQL including code optimization techniques.
- Strung knowledge of database performance and tuning, troubleshooting, and tuning.
- Knowledge of collection design in any No-SQL DB (DynamoDB, MongoDB, CosmosDB, etc), along with implementation of best practices.
- Ability to understand business functionality, processes, and flows.
- Good combination of technical and interpersonal skills with strong written and verbal communication; detail-oriented with the ability to work independently.
- Any OLAP DWH DBA Experience and User Management will be added advantage.
- Knowledge in financial industry-specific Data models such as FSLDM, IBM Financial Data Model, etc will be added advantage.
- Experience in Snowflake will be added advantage.
- Working experience in BFSI/NBFC & data understanding of Loan/Mortgage data will be added advantage.
Functional knowledge
- Data Governance & Quality Assurance
- Modern OLAP Database Architecture & Design
- Linux
- Data structures, algorithm & data modeling techniques
- No-SQL database architecture
- Data Security
• Help build a Data Science team which will be engaged in researching, designing,
implementing, and deploying full-stack scalable data analytics vision and machine learning
solutions to challenge various business issues.
• Modelling complex algorithms, discovering insights and identifying business
opportunities through the use of algorithmic, statistical, visualization, and mining techniques
• Translates business requirements into quick prototypes and enable the
development of big data capabilities driving business outcomes
• Responsible for data governance and defining data collection and collation
guidelines.
• Must be able to advice, guide and train other junior data engineers in their job.
Must Have:
• 4+ experience in a leadership role as a Data Scientist
• Preferably from retail, Manufacturing, Healthcare industry(not mandatory)
• Willing to work from scratch and build up a team of Data Scientists
• Open for taking up the challenges with end to end ownership
• Confident with excellent communication skills along with a good decision maker
The candidate must have Expertise in ADF(Azure data factory), well versed with python.
Performance optimization of scripts (code) and Productionizing of code (SQL, Pandas, Python or PySpark, etc.)
Required skills:
Bachelors in - in Computer Science, Data Science, Computer Engineering, IT or equivalent
Fluency in Python (Pandas), PySpark, SQL, or similar
Azure data factory experience (min 12 months)
Able to write efficient code using traditional, OO concepts, modular programming following the SDLC process.
Experience in production optimization and end-to-end performance tracing (technical root cause analysis)
Ability to work independently with demonstrated experience in project or program management
Azure experience ability to translate data scientist code in Python and make it efficient (production) for cloud deployment