Similar jobs
About DeepIntent:
DeepIntent is a marketing technology company that helps healthcare brands strengthen communication with patients and healthcare professionals by enabling highly effective and performant digital advertising campaigns. Our healthcare technology platform, MarketMatch™, connects advertisers, data providers, and publishers to operate the first unified, programmatic marketplace for healthcare marketers. The platform’s built-in identity solution matches digital IDs with clinical, behavioural, and contextual data in real-time so marketers can qualify 1.6M+ verified HCPs and 225M+ patients to find their most clinically-relevant audiences and message them on a one-to-one basis in a privacy-compliant way. Healthcare marketers use MarketMatch to plan, activate, and measure digital campaigns in ways that best suit their business, from managed service engagements to technical integration or self-service solutions. DeepIntent was founded by Memorial Sloan Kettering alumni in 2016 and acquired by Propel Media, Inc. in 2017. We proudly serve major pharmaceutical and Fortune 500 companies out of our offices in New York, Bosnia and India.
What You’ll Do:
- Establish formal data practice for the organisation.
- Build & operate scalable and robust data architectures.
- Create pipelines for the self-service introduction and usage of new data
- Implement DataOps practices
- Design, Develop, and operate Data Pipelines which support Data scientists and machine learning
- Engineers.
- Build simple, highly reliable Data storage, ingestion, and transformation solutions which are easy
- to deploy and manage.
- Collaborate with various business stakeholders, software engineers, machine learning
- engineers, and analysts.
Who You Are:
- Experience in designing, developing and operating configurable Data pipelines serving high
- volume and velocity data.
- Experience working with public clouds like GCP/AWS.
- Good understanding of software engineering, DataOps, data architecture, Agile and
- DevOps methodologies.
- Experience building Data architectures that optimize performance and cost, whether the
- components are prepackaged or homegrown
- Proficient with SQL, Java, Spring boot, Python or JVM-based language, Bash
- Experience with any of Apache open source projects such as Spark, Druid, Beam, Airflow
- etc. and big data databases like BigQuery, Clickhouse, etc
- Good communication skills with the ability to collaborate with both technical and non-technical
- people.
- Ability to Think Big, take bets and innovate, Dive Deep, Bias for Action, Hire and Develop the Best, Learn and be Curious
1. ROLE AND RESPONSIBILITIES
1.1. Implement next generation intelligent data platform solutions that help build high performance distributed systems.
1.2. Proactively diagnose problems and envisage long term life of the product focusing on reusable, extensible components.
1.3. Ensure agile delivery processes.
1.4. Work collaboratively with stake holders including product and engineering teams.
1.5. Build best-practices in the engineering team.
2. PRIMARY SKILL REQUIRED
2.1. Having a 2-6 years of core software product development experience.
2.2. Experience of working with data-intensive projects, with a variety of technology stacks including different programming languages (Java,
Python, Scala)
2.3. Experience in building infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data
sources to support other teams to run pipelines/jobs/reports etc.
2.4. Experience in Open-source stack
2.5. Experiences of working with RDBMS databases, NoSQL Databases
2.6. Knowledge of enterprise data lakes, data analytics, reporting, in-memory data handling, etc.
2.7. Have core computer science academic background
2.8. Aspire to continue to pursue career in technical stream
3. Optional Skill Required:
3.1. Understanding of Big Data technologies and Machine learning/Deep learning
3.2. Understanding of diverse set of databases like MongoDB, Cassandra, Redshift, Postgres, etc.
3.3. Understanding of Cloud Platform: AWS, Azure, GCP, etc.
3.4. Experience in BFSI domain is a plus.
4. PREFERRED SKILLS
4.1. A Startup mentality: comfort with ambiguity, a willingness to test, learn and improve rapidl
Roles and
Responsibilities
Seeking AWS Cloud Engineer /Data Warehouse Developer for our Data CoE team to
help us in configure and develop new AWS environments for our Enterprise Data Lake,
migrate the on-premise traditional workloads to cloud. Must have a sound
understanding of BI best practices, relational structures, dimensional data modelling,
structured query language (SQL) skills, data warehouse and reporting techniques.
Extensive experience in providing AWS Cloud solutions to various business
use cases.
Creating star schema data models, performing ETLs and validating results with
business representatives
Supporting implemented BI solutions by: monitoring and tuning queries and
data loads, addressing user questions concerning data integrity, monitoring
performance and communicating functional and technical issues.
Job Description: -
This position is responsible for the successful delivery of business intelligence
information to the entire organization and is experienced in BI development and
implementations, data architecture and data warehousing.
Requisite Qualification
Essential
-
AWS Certified Database Specialty or -
AWS Certified Data Analytics
Preferred
Any other Data Engineer Certification
Requisite Experience
Essential 4 -7 yrs of experience
Preferred 2+ yrs of experience in ETL & data pipelines
Skills Required
Special Skills Required
AWS: S3, DMS, Redshift, EC2, VPC, Lambda, Delta Lake, CloudWatch etc.
Bigdata: Databricks, Spark, Glue and Athena
Expertise in Lake Formation, Python programming, Spark, Shell scripting
Minimum Bachelor’s degree with 5+ years of experience in designing, building,
and maintaining AWS data components
3+ years of experience in data component configuration, related roles and
access setup
Expertise in Python programming
Knowledge in all aspects of DevOps (source control, continuous integration,
deployments, etc.)
Comfortable working with DevOps: Jenkins, Bitbucket, CI/CD
Hands on ETL development experience, preferably using or SSIS
SQL Server experience required
Strong analytical skills to solve and model complex business requirements
Sound understanding of BI Best Practices/Methodologies, relational structures,
dimensional data modelling, structured query language (SQL) skills, data
warehouse and reporting techniques
Preferred Skills
Required
Experience working in the SCRUM Environment.
Experience in Administration (Windows/Unix/Network/
plus.
Experience in SQL Server, SSIS, SSAS, SSRS
Comfortable with creating data models and visualization using Power BI
Hands on experience in relational and multi-dimensional data modelling,
including multiple source systems from databases and flat files, and the use of
standard data modelling tools
Ability to collaborate on a team with infrastructure, BI report development and
business analyst resources, and clearly communicate solutions to both
technical and non-technical team members
We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Roles and Responsibilities:
- Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
- Develop programs in Scala and Python as part of data cleaning and processing.
- Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
- Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
- Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Provide high operational excellence guaranteeing high availability and platform stability.
- Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Skills:
- Experience with Big Data pipeline, Big Data analytics, Data warehousing.
- Experience with SQL/No-SQL, schema design and dimensional data modeling.
- Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
- Experience in designing systems that process structured as well as unstructured data at large scale.
- Experience in AWS/Spark/Java/Scala/Python development.
- Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
- Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
- Prior exposure to streaming data sources such as Kafka.
- Should have knowledge on Shell Scripting and Python scripting.
- High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
- Experience with NoSQL databases such as Cassandra / MongoDB.
- Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
- Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
- Experience building and deploying applications on on-premise and cloud-based infrastructure.
- Having a good understanding of machine learning landscape and concepts.
Qualifications and Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.
Certifications:
Good to have at least one of the Certifications listed here:
AZ 900 - Azure Fundamentals
DP 200, DP 201, DP 203, AZ 204 - Data Engineering
AZ 400 - Devops Certification
Strong experience in Scala/Spark
End client: Sapient
Mode of Hiring : FTE
Notice should be less than 30days
JD:
Required Skills:
- Intermediate to Expert level hands-on programming using one of programming language- Java or Python or Pyspark or Scala.
- Strong practical knowledge of SQL.
Hands on experience on Spark/SparkSQL - Data Structure and Algorithms
- Hands-on experience as an individual contributor in Design, Development, Testing and Deployment of Big Data technologies based applications
- Experience in Big Data application tools, such as Hadoop, MapReduce, Spark, etc
- Experience on NoSQL Databases like HBase, etc
- Experience with Linux OS environment (Shell script, AWK, SED)
- Intermediate RDBMS skill, able to write SQL query with complex relation on top of big RDMS (100+ table)