- Develop process workflows for data preparations, modeling, and mining Manage configurations to build reliable datasets for analysis Troubleshooting services, system bottlenecks, and application integration.
- Designing, integrating, and documenting technical components, and dependencies of big data platform Ensuring best practices that can be adopted in the Big Data stack and shared across teams.
- Design and Development of Data pipeline on AWS Cloud
- Data Pipeline development using Pyspark, AWS, and Python.
- Developing Pyspark streaming applications
- Hands-on experience in Spark, Python, and Cloud
- Highly analytical and data-oriented
- Good to have - Databricks
About Persistent Systems
- Mandatory - Hands on experience in Python and PySpark.
- Build pySpark applications using Spark Dataframes in Python using Jupyter notebook and PyCharm(IDE).
- Worked on optimizing spark jobs that processes huge volumes of data.
- Hands on experience in version control tools like Git.
- Worked on Amazon’s Analytics services like Amazon EMR, Lambda function etc
- Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.
- Experience/knowledge of bash/shell scripting will be a plus.
- Experience in working with fixed width, delimited , multi record file formats etc.
- Hands on experience in tools like Jenkins to build, test and deploy the applications
- Awareness of Devops concepts and be able to work in an automated release pipeline environment.
- Excellent debugging skills.
The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products.
Responsibilities for Data Engineer
• Create and maintain optimal data pipeline architecture,
• Assemble large, complex data sets that meet functional / non-functional business requirements.
• Identify, design, and implement internal process improvements: automating manual processes,
optimizing data delivery, re-designing infrastructure for greater scalability, etc.
• Build the infrastructure required for optimal extraction, transformation, and loading of data
from a wide variety of data sources using SQL and AWS big data technologies.
• Build analytics tools that utilize the data pipeline to provide actionable insights into customer
acquisition, operational efficiency and other key business performance metrics.
• Work with stakeholders including the Executive, Product, Data and Design teams to assist with
data-related technical issues and support their data infrastructure needs.
• Create data tools for analytics and data scientist team members that assist them in building and
optimizing our product into an innovative industry leader.
• Work with data and analytics experts to strive for greater functionality in our data systems.
Qualifications for Data Engineer
• Experience building and optimizing big data ETL pipelines, architectures and data sets.
• Advanced working SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as working familiarity with a variety of databases.
• Experience performing root cause analysis on internal and external data and processes to
answer specific business questions and identify opportunities for improvement.
• Strong analytic skills related to working with unstructured datasets.
• Build processes supporting data transformation, data structures, metadata, dependency and
• A successful history of manipulating, processing and extracting value from large disconnected
Technical must haves:
● Extensive exposure to at least one Business Intelligence Platform (if possible, QlikView/Qlik
Sense) – if not Qlik, ETL tool knowledge, ex- Informatica/Talend
● At least 1 Data Query language – SQL/Python
● Experience in creating breakthrough visualizations
● Understanding of RDMS, Data Architecture/Schemas, Data Integrations, Data Models and Data Flows is a must
● A technical degree like BE/B. Tech a must
Technical Ideal to have:
● Exposure to our tech stack – PHP
● Microsoft workflows knowledge
Behavioural Pen Portrait:
● Must Have: Enthusiastic, aggressive, vigorous, high achievement orientation, strong command
over spoken and written English
● Ideal: Ability to Collaborate
Preferred location is Ahmedabad, however, if we find exemplary talent then we are open to remote working model- can be discussed.
Big Data Engineer/Data Engineer
What we are solving
Welcome to today’s business data world where:
• Unification of all customer data into one platform is a challenge
• Extraction is expensive
• Business users do not have the time/skill to write queries
• High dependency on tech team for written queries
These facts may look scary but there are solutions with real-time self-serve analytics:
• Fully automated data integration from any kind of a data source into a universal schema
• Analytics database that streamlines data indexing, query and analysis into a single platform.
• Start generating value from Day 1 through deep dives, root cause analysis and micro segmentation
At Propellor.ai, this is what we do.
• We help our clients reduce effort and increase effectiveness quickly
• By clearly defining the scope of Projects
• Using Dependable, scalable, future proof technology solution like Big Data Solutions and Cloud Platforms
• Engaging with Data Scientists and Data Engineers to provide End to End Solutions leading to industrialisation of Data Science Model Development and Deployment
What we have achieved so far
Since we started in 2016,
• We have worked across 9 countries with 25+ global brands and 75+ projects
• We have 50+ clients, 100+ Data Sources and 20TB+ data processed daily
Work culture at Propellor.ai
We are a small, remote team that believes in
• Working with a few, but only with highest quality team members who want to become the very best in their fields.
• With each member's belief and faith in what we are solving, we collectively see the Big Picture
• No hierarchy leads us to believe in reaching the decision maker without any hesitation so that our actions can have fruitful and aligned outcomes.
• Each one is a CEO of their domain.So, the criteria while making a choice is so our employees and clients can succeed together!
To read more about us click here:
About the role
We are building an exceptional team of Data engineers who are passionate developers and wants to push the boundaries to solve complex business problems using the latest tech stack. As a Big Data Engineer, you will work with various Technology and Business teams to deliver our Data Engineering offerings to our clients across the globe.
• The role would involve big data pre-processing & reporting workflows including collecting, parsing, managing, analysing, and visualizing large sets of data to turn information into business insights
• Develop the software and systems needed for end-to-end execution on large projects
• Work across all phases of SDLC, and use Software Engineering principles to build scalable solutions
• Build the knowledge base required to deliver increasingly complex technology projects
• The role would also involve testing various machine learning models on Big Data and deploying learned models for ongoing scoring and prediction.
Education & Experience
• B.Tech. or Equivalent degree in CS/CE/IT/ECE/EEE 3+ years of experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions.
Must have (hands-on) experience
• Python and SQL expertise
• Distributed computing frameworks (Hadoop Ecosystem & Spark components)
• Must be proficient in any Cloud computing platforms (AWS/Azure/GCP) • Experience in in any cloud platform would be preferred - GCP (Big Query/Bigtable, Pub sub, Data Flow, App engine )/ AWS/ Azure
• Linux environment, SQL and Shell scripting Desirable
• Statistical or machine learning DSL like R
• Distributed and low latency (streaming) application architecture
• Row store distributed DBMSs such as Cassandra, CouchDB, MongoDB, etc
. • Familiarity with API design
1. One phone screening round to gauge your interest and knowledge of fundamentals
2. An assignment to test your skills and ability to come up with solutions in a certain time
3. Interview 1 with our Data Engineer lead
4. Final Interview with our Data Engineer Lead and the Business Teams
Preferred Immediate Joiners
- Apply advanced predictive modeling and statistical techniques to design, build, maintain, and improve upon multiple real-time decision systems.
- Visualize and show complex data-sets via multidimensional visualization tools.
- Perform data cleansing, transformation & feature engineering.
- Design scalable automated data mining, modelling and validation processes.
- Produce scalable, reusable, efficient feature code to be implemented on clusters and standalone data servers.
- Contribute to the development/ deployment of machine learning algorithms, operational research, semantic analysis, and statistical methods for finding structure in large data sets.
- An absolute minimum of 1-3 years of relevant data science experience.
- Have an Engineering or comparable math / physics degree.
- Knowledge & working proficiency in Excel, Python, R
- Expert proficiency in at least 2 structured programming languages.
- Deep understanding about statistical and analytical models.
- Bias for action - Ability to move quickly while taking time out to review the details.
- Clear communicator - Ability to synthesise and clearly articulate complex information, highlighting key takeaways and actionable insights.
- Team player - Working mostly autonomously, yet being a team player keeping your crews looped-in.
- Mindset - Ability to take responsibility for your life and that of your people and projects.
- Mindfulness - Ability to maintain practices that keep you grounded.
Novelship is seeking a Data Engineer to be based in India or Remote in South East Asia to join our Tech Team.
Brief Description of the Role:
As a Data Engineer, you will be responsible for Building & Maintaining our Analytics Infrastructure, Data Taxononmy, Data Ingestion and aggregation to provide Business Intelligence to different teams and support Data Dependent tools like ERP and CRM.
In this role you will:
- Analyze and design ETL solutions to store/fetch data from multiple systems like Postgres, Airtable, Google Analytics and Mixpanel.
- Drive the implementation of new data management projects such as Finance ERP and re-structure of the current data architecture.
- Participate in the building of a single source of Data Sytems and Data Taxonomy projects.
- Engage in problem definition and resolution and collaborate with a diverse group of engineers and business owners from across the company.
- Work with stakeholders including the Strategy, Product and Marketing teams to assist with data-related technical issues, support their data analytics needs and work on data collection and aggregation solutions.
- Act as a technical resource for the Data team and be involved in creating and implementing current and future Analytics projects like data lake design and data warehouse design.
- Ensure quality and consistency of the data in the Data warehouse and follow best data governance practices.
- Analyze large amounts of information to discover trends and patterns to provide Business Intelligence.
- Mine and analyse data from databases to drive optimization and improvement of product development, marketing techniques and business strategies.
- Design and build reusable components, frameworks and libraries at scale to support analytics data products
- Build and maintain optimal data pipeline architecture and data systems. Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems
- 2 to 4 years of professional experience as a Data Engineer.
- Proficiency in either Python, Scala or R.
- Proficiency in SQL, Relational & Non-Relational Databases.
- Excellent analytical and problem-solving skills.
- Experience with Business Intelligence tools like Data Studio, Power BI and Tableau.
- Experience in Data Cleaning, Creating Data Pipelines, Data Modelling, Storytelling and Dashboarding.
- Bachelors or Masters's education in Computer Science
Experienced in writing complex SQL select queries (window functions & CTE’s) with advanced SQL experience
Should be an individual contributor for initial few months based on project movement team will be aligned
Strong in querying logic and data interpretation
Solid communication and articulation skills
Able to handle stakeholders independently with less interventions of reporting manager
Develop strategies to solve problems in logical yet creative ways
Create custom reports and presentations accompanied by strong data visualization and storytelling
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
As a Data Science Lead, you will be working on creating industry first analytical and propensity models to
help discover the information hidden in vast amounts of data, and make smarter decisions to deliver
even better customer experience. Your primary focus will be in applying data mining techniques, doing
statistical analysis, and building high quality prediction systems integrated with our products.
➢ Working with business and leadership teams to gathering and analyse structured and unstructured data
➢ Data mining using state-of-the-art methods
➢ Enhancing data collection procedures to include information that is relevant for building analytic
➢ Processing, cleansing, and verifying the integrity of data used for analysis
➢ Doing ad-hoc analysis and presenting results in a clear manner
➢ Creating automated anomaly detection systems and constant tracking of its performance
➢ Creation and evolution of an efficient BI pipeline into a multi-faceted pipeline to support various
What we are looking for:
➢ 5-8 years of relevant experience, preferably in financial services industry.
➢ A bachelors / master’s degree in the field of Statistics, Mathematics, Computer Science or
Management from Tier 1 Institutes.
➢ Data warehousing experience will be a plus.
➢ Good conceptual understanding of statistics and probability.
➢ Experience in developing dashboards and reports using BI tools.