Data Engineer- Senior
Cubera is a data company revolutionizing big data analytics and Adtech through data share value principles wherein the users entrust their data to us. We refine the art of understanding, processing, extracting, and evaluating the data that is entrusted to us. We are a gateway for brands to increase their lead efficiency as the world moves towards web3.
What are you going to do?
Design & Develop high performance and scalable solutions that meet the needs of our customers.
Closely work with the Product Management, Architects and cross functional teams.
Build and deploy large-scale systems in Java/Python.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their algorithms.
Follow best practices that can be adopted in Bigdata stack.
Use your engineering experience and technical skills to drive the features and mentor the engineers.
What are we looking for ( Competencies) :
Bachelor’s degree in computer science, computer engineering, or related technical discipline.
Overall 5 to 8 years of programming experience in Java, Python including object-oriented design.
Data handling frameworks: Should have a working knowledge of one or more data handling frameworks like- Hive, Spark, Storm, Flink, Beam, Airflow, Nifi etc.
Data Infrastructure: Should have experience in building, deploying and maintaining applications on popular cloud infrastructure like AWS, GCP etc.
Data Store: Must have expertise in one of general-purpose No-SQL data stores like Elasticsearch, MongoDB, Redis, RedShift, etc.
Strong sense of ownership, focus on quality, responsiveness, efficiency, and innovation.
Ability to work with distributed teams in a collaborative and productive manner.
Benefits:
Competitive Salary Packages and benefits.
Collaborative, lively and an upbeat work environment with young professionals.
Job Category: Development
Job Type: Full Time
Job Location: Bangalore
Similar jobs
Team:- We are a team of 9 data scientists working on Video Analytics Projects, Data Analytics projects for internal AI requirements of Reliance Industries as well for the external business. At a time, we make progress on multiple projects(atleast 4) in Video Analytics or Data Analytics.
JD Code: SHI-LDE-01
Version#: 1.0
Date of JD Creation: 27-March-2023
Position Title: Lead Data Engineer
Reporting to: Technical Director
Location: Bangalore Urban, India (on-site)
SmartHub.ai (www.smarthub.ai) is a fast-growing Startup headquartered in Palo Alto, CA, and with offices in Seattle and Bangalore. We operate at the intersection of AI, IoT & Edge Computing. With strategic investments from leaders in infrastructure & data management, SmartHub.ai is redefining the Edge IoT space. Our “Software Defined Edge” products help enterprises rapidly accelerate their Edge Infrastructure Management & Intelligence. We empower enterprises to leverage their Edge environment to increase revenue, efficiency of operations, manage safety and digital risks by using Edge and AI technologies.
SmartHub is an equal opportunity employer and will always be committed to nurture a workplace culture that supports, inspires and respects all individuals, encourages employees to bring their best selves to work, laugh and share. We seek builders who hail from a variety of backgrounds, perspectives and skills to join our team.
Summary
This role requires the candidate to translate business and product requirements to build, maintain, optimize data systems which can be relational or non-relational in nature. The candidate is expected to tune and analyse the data including from a short and long-term trend analysis and reporting, AI/ML uses cases.
We are looking for a talented technical professional with at least 8 years of proven experience in owning, architecting, designing, operating and optimising databases that are used for large scale analytics and reports.
Responsibilities
- Provide technical & architectural leadership for the next generation of product development.
- Innovate, Research & Evaluate new technologies and tools for a quality output.
- Architect, Design and Implement ensuring scalability, performance and security.
- Code and implement new algorithms to solve complex problems.
- Analyze complex data, develop, optimize and transform large data sets both structured and unstructured.
- Ability to deploy and administrator the database and continuously tuning for performance especially container orchestration stacks such as Kubernetes
- Develop analytical models and solutions Mentor Junior members technically in Architecture, Designing and robust Coding.
- Work in an Agile development environment while continuously evaluating and improvising engineering processes
Required
- At least 8 years of experience with significant depth in designing and building scalable distributed database systems for enterprise class products, experience of working in product development companies.
- Should have been feature/component lead for several complex features involving large datasets.
- Strong background in relational and non-relational database like Postgres, MongoDB, Hadoop etl.
- Deep exp database optimization, tuning ertise in SQL, Time Series Databases, Apache Drill, HDFS, Spark are good to have
- Excellent analytical and problem-solving skill sets.
- Experience in for high throughput is highly desirable
- Exposure to database provisioning in Kubernetes/non-Kubernetes environments, configuration and tuning in a highly available mode.
- Demonstrated ability to provide technical leadership and mentoring to the team
- Design the architecture of our big data platform
- Perform and oversee tasks such as writing scripts, calling APIs, web scraping, and writing SQL queries
- Design and implement data stores that support the scalable processing and storage of our high-frequency data
- Maintain our data pipeline
- Customize and oversee integration tools, warehouses, databases, and analytical systems
- Configure and provide availability for data-access tools used by all data scientists
The Merck Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Merck’s data management and global analytics platform (Palantir Foundry, Hadoop, AWS and other components).
The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premise Merck’s own data centers. Developing pipelines and applications on Foundry requires:
• Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
• Proficiency in PySpark for distributed computation
• Familiarity with Postgres and ElasticSearch
• Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
• Familiarity with common databases (e.g. JDBC, mySQL, Microsoft SQL). Not all types required
This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.
Roles & Responsibilities:
• Develop data pipelines by ingesting various data sources – structured and un-structured – into Palantir Foundry
• Participate in end to end project lifecycle, from requirements analysis to go-live and operations of an application
• Acts as business analyst for developing requirements for Foundry pipelines
• Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
• Document technical work in a professional and transparent way. Create high quality technical documentation
• Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
• Deploy applications on Foundry platform infrastructure with clearly defined checks
• Implementation of changes and bug fixes via Merck's change management framework and according to system engineering practices (additional training will be provided)
• DevOps project setup following Agile principles (e.g. Scrum)
• Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java
• Work closely with business users, data scientists/analysts to design physical data models
We are looking for a Senior Data Engineer to join the Customer Innovation team, who will be responsible for acquiring, transforming, and integrating customer data onto our Data Activation Platform from customers’ clinical, claims, and other data sources. You will work closely with customers to build data and analytics solutions to support their business needs, and be the engine that powers the partnership that we build with them by delivering high-fidelity data assets.
In this role, you will work closely with our Product Managers, Data Scientists, and Software Engineers to build the solution architecture that will support customer objectives. You'll work with some of the brightest minds in the industry, work with one of the richest healthcare data sets in the world, use cutting-edge technology, and see your efforts affect products and people on a regular basis. The ideal candidate is someone that
- Has healthcare experience and is passionate about helping heal people,
- Loves working with data,
- Has an obsessive focus on data quality,
- Is comfortable with ambiguity and making decisions based on available data and reasonable assumptions,
- Has strong data interrogation and analysis skills,
- Defaults to written communication and delivers clean documentation, and,
- Enjoys working with customers and problem solving for them.
A day in the life at Innovaccer:
- Define the end-to-end solution architecture for projects by mapping customers’ business and technical requirements against the suite of Innovaccer products and Solutions.
- Measure and communicate impact to our customers.
- Enabling customers on how to activate data themselves using SQL, BI tools, or APIs to solve questions they have at speed.
What You Need:
- 4+ years of experience in a Data Engineering role, a Graduate degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative field.
- 4+ years of experience working with relational databases like Snowflake, Redshift, or Postgres.
- Intermediate to advanced level SQL programming skills.
- Data Analytics and Visualization (using tools like PowerBI)
- The ability to engage with both the business and technical teams of a client - to document and explain technical problems or concepts in a clear and concise way.
- Ability to work in a fast-paced and agile environment.
- Easily adapt and learn new things whether it’s a new library, framework, process, or visual design concept.
What we offer:
- Industry certifications: We want you to be a subject matter expert in what you do. So, whether it’s our product or our domain, we’ll help you dive in and get certified.
- Quarterly rewards and recognition programs: We foster learning and encourage people to take risks. We recognize and reward your hard work.
- Health benefits: We cover health insurance for you and your loved ones.
- Sabbatical policy: We encourage people to take time off and rejuvenate, learn new skills, and pursue their interests so they can generate new ideas with Innovaccer.
- Pet-friendly office and open floor plan: No boring cubicles.
Responsibilities for Data Scientist/ NLP Engineer
Work with customers to identify opportunities for leveraging their data to drive business
solutions.
• Develop custom data models and algorithms to apply to data sets.
• Basic data cleaning and annotation for any incoming raw data.
• Use predictive modeling to increase and optimize customer experiences, revenue
generation, ad targeting and other business outcomes.
• Develop company A/B testing framework and test model quality.
• Deployment of ML model in production.
Qualifications for Junior Data Scientist/ NLP Engineer
• BS, MS in Computer Science, Engineering, or related discipline.
• 3+ Years of experience in Data Science/Machine Learning.
• Experience with programming language Python.
• Familiar with at least one database query language, such as SQL
• Knowledge of Text Classification & Clustering, Question Answering & Query Understanding,
Search Indexing & Fuzzy Matching.
• Excellent written and verbal communication skills for coordinating acrossteams.
• Willing to learn and master new technologies and techniques.
• Knowledge and experience in statistical and data mining techniques:
GLM/Regression, Random Forest, Boosting, Trees, text mining, NLP, etc.
• Experience with chatbots would be bonus but not required