Datawarehousing Jobs in Bangalore (Bengaluru)
We are looking for a Senior Data Engineer to join the Customer Innovation team, who will be responsible for acquiring, transforming, and integrating customer data onto our Data Activation Platform from customers’ clinical, claims, and other data sources. You will work closely with customers to build data and analytics solutions to support their business needs, and be the engine that powers the partnership that we build with them by delivering high-ﬁdelity data assets.
In this role, you will work closely with our Product Managers, Data Scientists, and Software Engineers to build the solution architecture that will support customer objectives. You'll work with some of the brightest minds in the industry, work with one of the richest healthcare data sets in the world, use cutting-edge technology, and see your efforts affect products and people on a regular basis. The ideal candidate is someone that
- Has healthcare experience and is passionate about helping heal people,
- Loves working with data,
- Has an obsessive focus on data quality,
- Is comfortable with ambiguity and making decisions based on available data and reasonable assumptions,
- Has strong data interrogation and analysis skills,
- Defaults to written communication and delivers clean documentation, and,
- Enjoys working with customers and problem solving for them.
A day in the life at Innovaccer:
- Deﬁne the end-to-end solution architecture for projects by mapping customers’ business and technical requirements against the suite of Innovaccer products and Solutions.
- Measure and communicate impact to our customers.
- Enabling customers on how to activate data themselves using SQL, BI tools, or APIs to solve questions they have at speed.
What You Need:
- 4+ years of experience in a Data Engineering role, a Graduate degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative ﬁeld.
- 4+ years of experience working with relational databases like Snowﬂake, Redshift, or Postgres.
- Intermediate to advanced level SQL programming skills.
- Data Analytics and Visualization (using tools like PowerBI)
- The ability to engage with both the business and technical teams of a client - to document and explain technical problems or concepts in a clear and concise way.
- Ability to work in a fast-paced and agile environment.
- Easily adapt and learn new things whether it’s a new library, framework, process, or visual design concept.
What we offer:
- Industry certifications: We want you to be a subject matter expert in what you do. So, whether it’s our product or our domain, we’ll help you dive in and get certified.
- Quarterly rewards and recognition programs: We foster learning and encourage people to take risks. We recognize and reward your hard work.
- Health benefits: We cover health insurance for you and your loved ones.
- Sabbatical policy: We encourage people to take time off and rejuvenate, learn new skills, and pursue their interests so they can generate new ideas with Innovaccer.
- Pet-friendly office and open floor plan: No boring cubicles.
Mandatory (Minimum 4 years of working experience)
3+ years of experience leading data warehouse implementation with technical architectures , ETL / ELT ,
reporting / analytic tools and scripting (end to end implementation)
Experienced in Microsoft Azure (Azure SQL Managed Instance , Data Factory , Azure Synapse, Azure Monitoring ,
Azure DevOps , Event Hubs , Azure AD Security)
Deep experience in using any BI tools such as Power BI/Tableau, QlikView/SAP-BO etc.,
Experienced in ETL tools such as SSIS, Talend/Informatica/Pentaho
Expertise in using RDBMSes like Oracle, SQL Server as source or target and online analytical processing (OLAP)
Experienced in SQL/T-SQL/ DML/DDL statements, stored procedure, function, trigger, indexes, cursor
Expertise in building and organizing advanced DAX calculations and SSAS cubes
Experience in data/dimensional modelling, analysis, design, testing, development, and implementation
Experienced in advanced data warehouse concepts using structured, semi-structured and un-structured data
Experienced with real time ingestion, change data capture, real time & batch processing
Good knowledge of meta data management and data governance
Great problem solving skills, with a strong bias for quality and design excellence
Experienced in developing dashboards with a focus on usability, performance, flexibility, testability, and
Familiarity with development in cloud environments like AWS / Azure / Google
Good To Have (1+ years of working experience)
Experience working with Snowflake, Amazon RedShift
Good verbal and written communication skills
Ability to collaborate and work effectively in a team.
Excellent analytical and logical skills
We are looking for a savvy Data Engineer to join our growing team of analytics experts.
The hire will be responsible for:
- Expanding and optimizing our data and data pipeline architecture
- Optimizing data flow and collection for cross functional teams.
- Will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
- Must be self-directed and comfortable supporting the data needs of multiple teams, systems and products.
- Experience with Azure : ADLS, Databricks, Stream Analytics, SQL DW, COSMOS DB, Analysis Services, Azure Functions, Serverless Architecture, ARM Templates
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with object-oriented/object function scripting languages: Python, SQL, Scala, Spark-SQL etc.
Nice to have experience with :
- Big data tools: Hadoop, Spark and Kafka
- Data pipeline and workflow management tools: Azkaban, Luigi, Airflow
- Stream-processing systems: Storm
Database : SQL DB
Programming languages : PL/SQL, Spark SQL
Looking for candidates with Data Warehousing experience, strong domain knowledge & experience working as a Technical lead.
The right candidate will be excited by the prospect of optimizing or even re-designing our company's data architecture to support our next generation of products and data initiatives.
Objectives of this Role:
• Design, and develop creative and innovative frameworks/components for data platforms, as we continue to experience dramatic growth in the usage and visibility of our products
• work closely with data scientist and product owners to come up with better design/development approach for application and platform to scale and serve the needs.
• Examine existing systems, identifying flaws and creating solutions to improve service uptime and time-to-resolve through monitoring and automated remediation
• Plan and execute full software development life cycles (SDLC) for each assigned project, adhering to company standards and expectations Daily and Monthly Responsibilities:
• Design and build tools/frameworks/scripts to automate development, testing deployment, management and monitoring of the company’s 24x7 services and products
• Plan and scale distributed software and applications, applying synchronous and asynchronous design patterns, write code, and deliver with urgency and quality
• Collaborate with global team, producing project work plans and analyzing the efficiency and feasibility of project operations,
• manage large volume of data and process them on Realtime and batch orientation as needed.
• while leveraging global technology stack and making localized improvements Track, document, and maintain software system functionality—both internally and externally, leveraging opportunities to improve engineering productivity
• Code review, Git operation, CI-CD, Mentor and assign task to junior team members
• Writing reusable, testable, and efficient code
• Design and implementation of low-latency, high-availability, and performant applications
• Integration of user-facing elements developed by front-end developers with server-side logic
• Implementation of security and data protection
• Integration of data storage solutions
Skills and Qualifications
• Bachelor’s degree in software engineering or information technology
• 5-7 years’ experience engineering software and networking platforms
• 5+ years professional experience with Python or Java or Scala.
• Strong experience in API development and API integration.
• proven knowledge on data migration, platform migration, CI-CD process, orchestration workflows like Airflow or Luigi or Azkaban etc.
• Experience with data engineering tools and platforms such as Kafka, Spark, Databricks, Hadoop, No-SQl platform
• Prior experience in Datawarehouse and OLAP design and deployment.
• Proven ability to document design processes, including development, tests, analytics, and troubleshooting
• Experience with rapid development cycles in a web-based/Multi Cloud environment
• Strong scripting and test automation abilities Good to have Qualifications
• Working knowledge of relational databases as well as ORM and SQL technologies
• Proficiency with Multi OS env, Docker and Kubernetes
• Proven experience designing interactive applications and largescale platforms
• Desire to continue to grow professional capabilities with ongoing training and educational opportunities.
- 4-8 years of experience in BI/DW
- 3+ years of experience with Microstrategy schema, design and development
- Experience in Microstrategy Cloud for Azure and connecting with Azure Synapse as Data Source
- Extensive experience in developing reports, dashboards and cubes in Microstrategy
- Advanced SQL coding skills
- Hands on development in BI reporting and performance tuning
- Should be able to prepare unit test cases and execute unit testing
• Responsible for developing and maintaining applications with PySpark
Must Have Skills:
This leads to a very interesting and challenging use case in the emerging field of large scale distributed HTAP, which is still not mature enough to provide a solution out of the box that works for our scale and SLAs. So, we are building a solution that can handle the complexity of our use case and scale to several trillions of rows. As a "Database Engineer", you will evolve, architect, build and scale the core data warehouse that sits at the heart of Clarisights enabling large scale distributed, interactive analytics on near realtime data.
What you'll do
- Understanding and gaining expertise in existing data warehouse.
- Use the above knowledge to identify gaps in the current system and formulate strategies around what can be done to fill them
- Avail KPIs around the data warehouse.
- Find solutions to evolve and scale the data warehouse. This will involve a lot of technical research, benchmarking and testing of existing and candidate replacement systems.
- Bulid from scratch all or parts of the data warehouse to improve the KPIs.
- Ensure the SLAs and SLOs of data warehouse, which will require assuming ownership and being oncall for the same.
- Gain deep understanding into Linux and understand concepts that drive performance characteristics like IO scheduling, paging, processing scheduling, CPU instruction pipelining etc.
- Adopt/build tooling and tune the systems to extract maximum performance out of the underlying hardware.
- Build wrappers/microservices for improving visibility, control, adoption and ease of use for the data warehouse.
- Build tooling and automation for monitoring, debugging and deployment of the warehouse.
- Contribute to open source database technologies that are used at or are potential candidates for use.
What you bring
We are looking for engineers with a strong passion for solving challenging engineering problems and a burning desire to learn and grow in a fast growing startup. This is not an easy gig, it will require strong technical chops and an insatiable curiosity to make things better. We need passionate and mature engineers who can do wonders with some mentoring and don't need to be managed.
- Distributed systems: You have a good understanding of general patterns of scaling and fault-tolerance in large scale distributed systems.
- Databases: You have a good understanding of database concepts like query optimization, indexing, transactions, sharding, replication etc.
- Data pipelines: You have a working knowledge of distributed data processing systems.
- Engineer at heart: You thrive on writing great code and have a strong appreciation for modular, testable and maintainable code, and make sure to document it. You have the ability to take new initiatives and questioning status quo.
- Passion & Drive to learn and excel: You believe in our vision. You drive the product for the better, always looking to improve things, and soon become the go-to person to talk to on something that you mastered along. You love dabbling in your own side-projects and learning new skills that are not necessarily part of your normal day job.
- Inquisitiveness: You are curious to know how different modules on our platform work. You are not afraid to venture into unknown territories of code. You ask questions.
- Ownership: You are your own manager. You have the ability to implement engineering tasks on your own without a need for micro-management and take responsibility for any task that has been assigned to you.
- Teamwork: You should be helpful and work well with teams. You’re probably someone who enjoys sharing knowledge with team-mates, asking for help when they need it.
- Open Source Contribution: Bonus.
We are looking for a Data Engineer that will be responsible for collecting, storing, processing, and analyzing huge sets of data that is coming from different sources.
Working with Big Data tools and frameworks to provide requested capabilities Identify development needs in order to improve and streamline operations Develop and manage BI solutions Implementing ETL process and Data Warehousing Monitoring performance and managing infrastructure
Proficient understanding of distributed computing principles Proficiency with Hadoop and Spark Experience with building stream-processing systems, using solutions such as Kafka and Spark-Streaming Good knowledge of Data querying tools SQL and Hive Knowledge of various ETL techniques and frameworks Experience with Python/Java/Scala (at least one) Experience with cloud services such as AWS or GCP Experience with NoSQL databases, such as DynamoDB,MongoDB will be an advantage Excellent written and verbal communication skills
SQL, Python, Numpy,Pandas,Knowledge of Hive and Data warehousing concept will be a plus point.
- Strong analytical skills with the ability to collect, organise, analyse and interpret trends or patterns in complex data sets and provide reports & visualisations.
- Work with management to prioritise business KPIs and information needs Locate and define new process improvement opportunities.
- Technical expertise with data models, database design and development, data mining and segmentation techniques
- Proven success in a collaborative, team-oriented environment
- Working experience with geospatial data will be a plus.