50+ Apache Spark Jobs in India
Apply to 50+ Apache Spark Jobs on CutShort.io. Find your next job, effortlessly. Browse Apache Spark Jobs and apply today!
Responsibilities:
• Build customer facing solution for Data Observability product to monitor Data Pipelines
• Work on POCs to build new data pipeline monitoring capabilities.
• Building next-generation scalable, reliable, flexible, high-performance data pipeline capabilities for ingestion of data from multiple sources containing complex dataset.
•Continuously improve services you own, making them more performant, and utilising resources in the most optimised way.
• Collaborate closely with engineering, data science team and product team to propose an optimal solution for a given problem statement
• Working closely with DevOps team on performance monitoring and MLOps
Required Skills:
• 3+ Years of Data related technology experience.
• Good understanding of distributed computing principles
• Experience in Apache Spark
• Hands on programming with Python
• Knowledge of Hadoop v2, Map Reduce, HDFS
• Experience with building stream-processing systems, using technologies such as Apache Storm, Spark-Streaming or Flink
• Experience with messaging systems, such as Kafka or RabbitMQ
• Good understanding of Big Data querying tools, such as Hive
• Experience with integration of data from multiple data sources
• Good understanding of SQL queries, joins, stored procedures, relational schemas
• Experience with NoSQL databases, such as HBase, Cassandra/Scylla, MongoDB
• Knowledge of ETL techniques and frameworks
• Performance tuning of Spark Jobs
• General understanding of Data Quality is a plus point
• Experience on Databricks,snowflake and BigQuery or similar lake houses would be a big plus
• Nice to have some knowledge in DevOps
We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.
Key Responsibilities:
- Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
- Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
- Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
- Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
- Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
- Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
- Collaborate with diverse teams to convert business requirements into precise technical specifications.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- Demonstrated hands-on experience with Azure cloud services and Databricks.
- Proficient programming skills in Python, Scala, and PHP.
- In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
- Familiarity with distributed data processing and external table management.
- Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
- Exceptional problem-solving acumen and meticulous attention to detail.
Additional Qualifications :
- Acquaintance with data security and privacy standards.
- Experience in CI/CD pipelines and version control systems, notably Git.
- Familiarity with Agile methodologies and DevOps practices.
- Competence in technical writing for comprehensive documentation.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
A LEADING US BASED MNC
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
🚀 Exciting Opportunity: Data Engineer Position in Gurugram 🌐
Hello
We are actively seeking a talented and experienced Data Engineer to join our dynamic team at Reality Motivational Venture in Gurugram (Gurgaon). If you're passionate about data, thrive in a collaborative environment, and possess the skills we're looking for, we want to hear from you!
Position: Data Engineer
Location: Gurugram (Gurgaon)
Experience: 5+ years
Key Skills:
- Python
- Spark, Pyspark
- Data Governance
- Cloud (AWS/Azure/GCP)
Main Responsibilities:
- Define and set up analytics environments for "Big Data" applications in collaboration with domain experts.
- Implement ETL processes for telemetry-based and stationary test data.
- Support in defining data governance, including data lifecycle management.
- Develop large-scale data processing engines and real-time search and analytics based on time series data.
- Ensure technical, methodological, and quality aspects.
- Support CI/CD processes.
- Foster know-how development and transfer, continuous improvement of leading technologies within Data Engineering.
- Collaborate with solution architects on the development of complex on-premise, hybrid, and cloud solution architectures.
Qualification Requirements:
- BSc, MSc, MEng, or PhD in Computer Science, Informatics/Telematics, Mathematics/Statistics, or a comparable engineering degree.
- Proficiency in Python and the PyData stack (Pandas/Numpy).
- Experience in high-level programming languages (C#/C++/Java).
- Familiarity with scalable processing environments like Dask (or Spark).
- Proficient in Linux and scripting languages (Bash Scripts).
- Experience in containerization and orchestration of containerized services (Kubernetes).
- Education in database technologies (SQL/OLAP and Non-SQL).
- Interest in Big Data storage technologies (Elastic, ClickHouse).
- Familiarity with Cloud technologies (Azure, AWS, GCP).
- Fluent English communication skills (speaking and writing).
- Ability to work constructively with a global team.
- Willingness to travel for business trips during development projects.
Preferable:
- Working knowledge of vehicle architectures, communication, and components.
- Experience in additional programming languages (C#/C++/Java, R, Scala, MATLAB).
- Experience in time-series processing.
How to Apply:
Interested candidates, please share your updated CV/resume with me.
Thank you for considering this exciting opportunity.
About Us:
6sense is a Predictive Intelligence Engine that is reimagining how B2B companies do
sales and marketing. It works with big data at scale, advanced machine learning and
predictive modelling to find buyers and predict what they will purchase, when and
how much.
6sense helps B2B marketing and sales organizations fully understand the complex ABM
buyer journey. By combining intent signals from every channel with the industry’s most
advanced AI predictive capabilities, it is finally possible to predict account demand and
optimize demand generation in an ABM world. Equipped with the power of AI and the
6sense Demand PlatformTM, marketing and sales professionals can uncover, prioritize,
and engage buyers to drive more revenue.
6sense is seeking a Staff Software Engineer and data to become part of a team
designing, developing, and deploying its customer-centric applications.
We’ve more than doubled our revenue in the past five years and completed our Series
E funding of $200M last year, giving us a stable foundation for growth.
Responsibilities:
1. Own critical datasets and data pipelines for product & business, and work
towards direct business goals of increased data coverage, data match rates, data
quality, data freshness
2. Create more value from various datasets with creative solutions, and unlocking
more value from existing data, and help build data moat for the company3. Design, develop, test, deploy and maintain optimal data pipelines, and assemble
large, complex data sets that meet functional and non-functional business
requirements
4. Improving our current data pipelines i.e. improve their performance, SLAs,
remove redundancies, and figure out a way to test before v/s after roll out
5. Identify, design, and implement process improvements in data flow across
multiple stages and via collaboration with multiple cross functional teams eg.
automating manual processes, optimising data delivery, hand-off processes etc.
6. Work with cross function stakeholders including the Product, Data Analytics ,
Customer Support teams for their enablement for data access and related goals
7. Build for security, privacy, scalability, reliability and compliance
8. Mentor and coach other team members on scalable and extensible solutions
design, and best coding standards
9. Help build a team and cultivate innovation by driving cross-collaboration and
execution of projects across multiple teams
Requirements:
8-10+ years of overall work experience as a Data Engineer
Excellent analytical and problem-solving skills
Strong experience with Big Data technologies like Apache Spark. Experience with
Hadoop, Hive, Presto would-be a plus
Strong experience in writing complex, optimized SQL queries across large data
sets. Experience with optimizing queries and underlying storage
Experience with Python/ Scala
Experience with Apache Airflow or other orchestration tools
Experience with writing Hive / Presto UDFs in Java
Experience working on AWS cloud platform and services.
Experience with Key Value stores or NoSQL databases would be a plus.
Comfortable with Unix / Linux command line
Interpersonal Skills:
You can work independently as well as part of a team.
You take ownership of projects and drive them to conclusion.
You’re a good communicator and are capable of not just doing the work, but also
teaching others and explaining the “why” behind complicated technical
decisions.
You aren’t afraid to roll up your sleeves: This role will evolve over time, and we’ll
want you to evolve with it
Lead Data Engineer
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
Job responsibilities
· You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems
· You will partner with teammates to create complex data processing pipelines in order to solve our clients' most ambitious challenges
· You will collaborate with Data Scientists in order to design scalable implementations of their models
· You will pair to write clean and iterative code based on TDD
· Leverage various continuous delivery practices to deploy, support and operate data pipelines
· Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
· Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
· Create data models and speak to the tradeoffs of different modeling approaches
· On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product
· Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
· Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
Job qualifications Technical skills
· You are equally happy coding and leading a team to implement a solution
· You have a track record of innovation and expertise in Data Engineering
· You're passionate about craftsmanship and have applied your expertise across a range of industries and organizations
· You have a deep understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
· You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
· Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
· You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
· You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
· Working with data excites you: you have created Big data architecture, you can build and operate data pipelines, and maintain data storage, all within distributed systems
Professional skills
· Advocate your data engineering expertise to the broader tech community outside of Thoughtworks, speaking at conferences and acting as a mentor for more junior-level data engineers
· You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
· An interest in coaching others, sharing your experience and knowledge with teammates
· You enjoy influencing others and always advocate for technical excellence while being open to change when needed
TOP 3 SKILLS
Python (Language)
Spark Framework
Spark Streaming
Docker/Jenkins/ Spinakar
AWS
Hive Queries
He/She should be good coder.
Preff: - Airflow
Must have experience: -
Python
Spark framework and streaming
exposure to Machine Learning Lifecycle is mandatory.
Project:
This is searching domain project. Any searching activity which is happening on website this team create the model for the same, they create sorting/scored model for any search. This is done by the data
scientist This team is working more on the streaming side of data, the candidate would work extensively on Spark streaming and there will be a lot of work in Machine Learning.
INTERVIEW INFORMATION
3-4 rounds.
1st round based on data engineering batching experience.
2nd round based on data engineering streaming experience.
3rd round based on ML lifecycle (3rd round can be a techno-functional round based on previous
feedbacks otherwise 4th round will be a functional round if required.
Roles and Responsibilities:
- Design, develop, and maintain the end-to-end MLOps infrastructure from the ground up, leveraging open-source systems across the entire MLOps landscape.
- Creating pipelines for data ingestion, data transformation, building, testing, and deploying machine learning models, as well as monitoring and maintaining the performance of these models in production.
- Managing the MLOps stack, including version control systems, continuous integration and deployment tools, containerization, orchestration, and monitoring systems.
- Ensure that the MLOps stack is scalable, reliable, and secure.
Skills Required:
- 3-6 years of MLOps experience
- Preferably worked in the startup ecosystem
Primary Skills:
- Experience with E2E MLOps systems like ClearML, Kubeflow, MLFlow etc.
- Technical expertise in MLOps: Should have a deep understanding of the MLOps landscape and be able to leverage open-source systems to build scalable, reliable, and secure MLOps infrastructure.
- Programming skills: Proficient in at least one programming language, such as Python, and have experience with data science libraries, such as TensorFlow, PyTorch, or Scikit-learn.
- DevOps experience: Should have experience with DevOps tools and practices, such as Git, Docker, Kubernetes, and Jenkins.
Secondary Skills:
- Version Control Systems (VCS) tools like Git and Subversion
- Containerization technologies like Docker and Kubernetes
- Cloud Platforms like AWS, Azure, and Google Cloud Platform
- Data Preparation and Management tools like Apache Spark, Apache Hadoop, and SQL databases like PostgreSQL and MySQL
- Machine Learning Frameworks like TensorFlow, PyTorch, and Scikit-learn
- Monitoring and Logging tools like Prometheus, Grafana, and Elasticsearch
- Continuous Integration and Continuous Deployment (CI/CD) tools like Jenkins, GitLab CI, and CircleCI
- Explain ability and Interpretability tools like LIME and SHAP
- Creating and managing ETL/ELT pipelines based on requirements
- Build PowerBI dashboards and manage datasets needed.
- Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
- Build data cubes for real-time visualisation needs and CXO dashboards.
Required Tech Skills
- Microsoft PowerBI & DAX
- Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
- Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
What you get to do in this role:
Work on extremely high scale RUST web services or backend systems.
Design and develop solutions for highly scalable web and backend systems.
Proactively identify and solve performance issues.
Maintain a high bar on code quality and unit testing.
What you bring to the role:
5+ years of hands-on software development experience.
At least 2+ years of RUST development experience.
Knowledge of cargo packages for kafka, redis etc.
Strong CS fundamentals, including system design, data structures and algorithms.
Expertise in backend and web services development.
Good analytical and troubleshooting skills.
What will help you stand out:
Experience working with large scale web services and applications.
Exposure to Golang, Scala or Java
Exposure to Big data systems like Kafka, Spark, Hadoop etc.
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
As Conviva is expanding, we are building products providing deep insights into end user experience for our customers.
Platform and TLB Team
The vision for the TLB team is to build data processing software that works on terabytes of streaming data in real time. Engineer the next-gen Spark-like system for in-memory computation of large time-series dataset’s – both Spark-like backend infra and library based programming model. Build horizontally and vertically scalable system that analyses trillions of events per day within sub second latencies. Utilize the latest and greatest of big data technologies to build solutions for use-cases across multiple verticals. Lead technology innovation and advancement that will have big business impact for years to come. Be part of a worldwide team building software using the latest technologies and the best of software development tools and processes.
What You’ll Do
This is an individual contributor position. Expectations will be on the below lines:
- Design, build and maintain the stream processing, and time-series analysis system which is at the heart of Conviva's products
- Responsible for the architecture of the Conviva platform
- Build features, enhancements, new services, and bug fixing in Scala and Java on a Jenkins-based pipeline to be deployed as Docker containers on Kubernetes
- Own the entire lifecycle of your microservice including early specs, design, technology choice, development, unit-testing, integration-testing, documentation, deployment, troubleshooting, enhancements etc.
- Lead a team to develop a feature or parts of the product
- Adhere to the Agile model of software development to plan, estimate, and ship per business priority
What you need to succeed
- 9+ years of work experience in software development of data processing products.
- Engineering degree in software or equivalent from a premier institute.
- Excellent knowledge of fundamentals of Computer Science like algorithms and data structures. Hands-on with functional programming and know-how of its concepts
- Excellent programming and debugging skills on the JVM. Proficient in writing code in Scala/Java/Rust/Haskell/Erlang that is reliable, maintainable, secure, and performant
- Experience with big data technologies like Spark, Flink, Kafka, Druid, HDFS, etc.
- Deep understanding of distributed systems concepts and scalability challenges including multi-threading, concurrency, sharding, partitioning, etc.
- Experience/knowledge of Akka/Lagom framework and/or stream processing technologies like RxJava or Project Reactor will be a big plus. Knowledge of design patterns like event-streaming, CQRS and DDD to build large microservice architectures will be a big plus
- Excellent communication skills. Willingness to work under pressure. Hunger to learn and succeed. Comfortable with ambiguity. Comfortable with complexity
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their businesses ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision, and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Job Title: Data Engineer
Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.
Responsibilities:
- Design, build, and maintain data pipelines to collect, store, and process data from various sources.
- Create and manage data warehousing and data lake solutions.
- Develop and maintain data processing and data integration tools.
- Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
- Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
- Ensure data quality and integrity across all data sources.
- Develop and implement best practices for data governance, security, and privacy.
- Monitor data pipeline performance / Errors and troubleshoot issues as needed.
- Stay up-to-date with emerging data technologies and best practices.
Requirements:
Bachelor's degree in Computer Science, Information Systems, or a related field.
Experience with ETL tools like Matillion,SSIS,Informatica
Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.
Experience in writing complex SQL queries
Strong programming skills in languages such as Python, Java, or Scala.
Experience with data modeling, data warehousing, and data integration.
Strong problem-solving skills and ability to work independently.
Excellent communication and collaboration skills.
Familiarity with big data technologies such as Hadoop, Spark, or Kafka.
Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks
Familiarity with cloud computing platforms such as AWS, Azure, or GCP.
Familiarity with Reporting tools
Teamwork/ growth contribution
- Helping the team in taking the Interviews and identifying right candidates
- Adhering to timelines
- Intime status communication and upfront communication of any risks
- Tech, train, share knowledge with peers.
- Good Communication skills
- Proven abilities to take initiative and be innovative
- Analytical mind with a problem-solving aptitude
Good to have :
Master's degree in Computer Science, Information Systems, or a related field.
Experience with NoSQL databases such as MongoDB or Cassandra.
Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.
Knowledge of machine learning and statistical modeling techniques.
If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.
About Kloud9:
Kloud9 exists with the sole purpose of providing cloud expertise to the retail industry. Our team of cloud architects, engineers and developers help retailers launch a successful cloud initiative so you can quickly realise the benefits of cloud technology. Our standardised, proven cloud adoption methodologies reduce the cloud adoption time and effort so you can directly benefit from lower migration costs.
Kloud9 was founded with the vision of bridging the gap between E-commerce and cloud. The E-commerce of any industry is limiting and poses a huge challenge in terms of the finances spent on physical data structures.
At Kloud9, we know migrating to the cloud is the single most significant technology shift your company faces today. We are your trusted advisors in transformation and are determined to build a deep partnership along the way. Our cloud and retail experts will ease your transition to the cloud.
Our sole focus is to provide cloud expertise to retail industry giving our clients the empowerment that will take their business to the next level. Our team of proficient architects, engineers and developers have been designing, building and implementing solutions for retailers for an average of more than 20 years.
We are a cloud vendor that is both platform and technology independent. Our vendor independence not just provides us with a unique perspective into the cloud market but also ensures that we deliver the cloud solutions available that best meet our clients' requirements.
What we are looking for:
● 3+ years’ experience developing Data & Analytic solutions
● Experience building data lake solutions leveraging one or more of the following AWS, EMR, S3, Hive& Spark
● Experience with relational SQL
● Experience with scripting languages such as Shell, Python
● Experience with source control tools such as GitHub and related dev process
● Experience with workflow scheduling tools such as Airflow
● In-depth knowledge of scalable cloud
● Has a passion for data solutions
● Strong understanding of data structures and algorithms
● Strong understanding of solution and technical design
● Has a strong problem-solving and analytical mindset
● Experience working with Agile Teams.
● Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
● Able to quickly pick up new programming languages, technologies, and frameworks
● Bachelor’s Degree in computer science
Why Explore a Career at Kloud9:
With job opportunities in prime locations of US, London, Poland and Bengaluru, we help build your career paths in cutting edge technologies of AI, Machine Learning and Data Science. Be part of an inclusive and diverse workforce that's changing the face of retail technology with their creativity and innovative solutions. Our vested interest in our employees translates to deliver the best products and solutions to our customers.
Data Engineer- Senior
Cubera is a data company revolutionizing big data analytics and Adtech through data share value principles wherein the users entrust their data to us. We refine the art of understanding, processing, extracting, and evaluating the data that is entrusted to us. We are a gateway for brands to increase their lead efficiency as the world moves towards web3.
What are you going to do?
Design & Develop high performance and scalable solutions that meet the needs of our customers.
Closely work with the Product Management, Architects and cross functional teams.
Build and deploy large-scale systems in Java/Python.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their algorithms.
Follow best practices that can be adopted in Bigdata stack.
Use your engineering experience and technical skills to drive the features and mentor the engineers.
What are we looking for ( Competencies) :
Bachelor’s degree in computer science, computer engineering, or related technical discipline.
Overall 5 to 8 years of programming experience in Java, Python including object-oriented design.
Data handling frameworks: Should have a working knowledge of one or more data handling frameworks like- Hive, Spark, Storm, Flink, Beam, Airflow, Nifi etc.
Data Infrastructure: Should have experience in building, deploying and maintaining applications on popular cloud infrastructure like AWS, GCP etc.
Data Store: Must have expertise in one of general-purpose No-SQL data stores like Elasticsearch, MongoDB, Redis, RedShift, etc.
Strong sense of ownership, focus on quality, responsiveness, efficiency, and innovation.
Ability to work with distributed teams in a collaborative and productive manner.
Benefits:
Competitive Salary Packages and benefits.
Collaborative, lively and an upbeat work environment with young professionals.
Job Category: Development
Job Type: Full Time
Job Location: Bangalore
The thrill of working at a start-up that is starting to scale massively is something else. Simpl (FinTech startup of the year - 2020) was formed in 2015 by Nitya Sharma, an investment banker from Wall Street and Chaitra Chidanand, a tech executive from the Valley, when they teamed up with a very clear mission - to make money simple so that people can live well and do amazing things. Simpl is the payment platform for the mobile-first world, and we’re backed by some of the best names in fintech globally (folks who have invested in Visa, Square and Transferwise), and
has Joe Saunders, Ex Chairman and CEO of Visa as a board member.
Everyone at Simpl is an internal entrepreneur who is given a lot of bandwidth and resources to create the next breakthrough towards the long term vision of “making money Simpl”. Our first product is a payment platform that lets people buy instantly, anywhere online, and pay later. In
the background, Simpl uses big data for credit underwriting, risk and fraud modelling, all without any paperwork, and enables Banks and Non-Bank Financial Companies to access a whole new consumer market.
In place of traditional forms of identification and authentication, Simpl integrates deeply into merchant apps via SDKs and APIs. This allows for more sophisticated forms of authentication that take full advantage of smartphone data and processing power
Skillset:
Workflow manager/scheduler like Airflow, Luigi, Oozie
Good handle on Python
ETL Experience
Batch processing frameworks like Spark, MR/PIG
File formats: parquet, JSON, XML, thrift, avro, protobuff
Rule engine (drools - business rule management system)
Distributed file systems like HDFS, NFS, AWS, S3 and equivalent
Built/configured dashboards
Nice to have:
Data platform experience for eg: building data lakes, working with near - realtime
applications/frameworks like storm, flink, spark.
AWS
File encoding types: Thrift, Avro, Protobuff, Parquet, JSON, XML
HIVE, HBASE
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
members
• Communication
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team
Must have:
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
Project Management
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
DevOps tools
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills
Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel
Work Environment
• Customer Office (Mumbai) / Remote Work
Education
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
- 5+ years of experience in software development.
- At least 2 years of relevant work experience on large scale Data applications
- Good attitude, strong problem-solving abilities, analytical skills, ability to take ownership as appropriate
- Should be able to do coding, debugging, performance tuning, and deploying the apps to Prod.
- Should have good working experience Hadoop ecosystem (HDFS, Hive, Yarn, File formats like Avro/Parquet)
- Kafka
- J2EE Frameworks (Spring/Hibernate/REST)
- Spark Streaming or any other streaming technology.
- Java programming language is mandatory.
- Good to have experience with Java
- Ability to work on the sprint stories to completion along with Unit test case coverage.
- Experience working in Agile Methodology
- Excellent communication and coordination skills
- Knowledgeable (and preferred hands-on) - UNIX environments, different continuous integration tools.
- Must be able to integrate quickly into the team and work independently towards team goals
- Take the complete responsibility of the sprint stories’ execution
- Be accountable for the delivery of the tasks in the defined timelines with good quality
- Follow the processes for project execution and delivery.
- Follow agile methodology
- Work with the team lead closely and contribute to the smooth delivery of the project.
- Understand/define the architecture and discuss the pros-cons of the same with the team
- Involve in the brainstorming sessions and suggest improvements in the architecture/design.
- Work with other team leads to get the architecture/design reviewed.
- Work with the clients and counterparts (in US) of the project.
- Keep all the stakeholders updated about the project/task status/risks/issues if there are any.
About Slintel (a 6sense company) :
Slintel, a 6sense company, the leader in capturing technographics-powered buying intent, helps companies uncover the 3% of active buyers in their target market. Slintel evaluates over 100 billion data points and analyzes factors such as buyer journeys, technology adoption patterns, and other digital footprints to deliver market & sales intelligence.
Slintel's customers have access to the buying patterns and contact information of more than 17 million companies and 250 million decision makers across the world.
Slintel is a fast growing B2B SaaS company in the sales and marketing tech space. We are funded by top tier VCs, and going after a billion dollar opportunity. At Slintel, we are building a sales development automation platform that can significantly improve outcomes for sales teams, while reducing the number of hours spent on research and outreach.
We are a big data company and perform deep analysis on technology buying patterns, buyer pain points to understand where buyers are in their journey. Over 100 billion data points are analyzed every week to derive recommendations on where companies should focus their marketing and sales efforts on. Third party intent signals are then clubbed with first party data from CRMs to derive meaningful recommendations on whom to target on any given day.
6sense is headquartered in San Francisco, CA and has 8 office locations across 4 countries.
6sense, an account engagement platform, secured $200 million in a Series E funding round, bringing its total valuation to $5.2 billion 10 months after its $125 million Series D round. The investment was co-led by Blue Owl and MSD Partners, among other new and existing investors.
Linkedin (Slintel) : https://www.linkedin.com/company/slintel/">https://www.linkedin.com/company/slintel/
Industry : Software Development
Company size : 51-200 employees (189 on LinkedIn)
Headquarters : Mountain View, California
Founded : 2016
Specialties : Technographics, lead intelligence, Sales Intelligence, Company Data, and Lead Data.
Website (Slintel) : https://www.slintel.com/slintel">https://www.slintel.com/slintel
Linkedin (6sense) : https://www.linkedin.com/company/6sense/">https://www.linkedin.com/company/6sense/
Industry : Software Development
Company size : 501-1,000 employees (937 on LinkedIn)
Headquarters : San Francisco, California
Founded : 2013
Specialties : Predictive intelligence, Predictive marketing, B2B marketing, and Predictive sales
Website (6sense) : https://6sense.com/">https://6sense.com/
Acquisition News :
https://inc42.com/buzz/us-based-based-6sense-acquires-b2b-buyer-intelligence-startup-slintel/
Funding Details & News :
Slintel funding : https://www.crunchbase.com/organization/slintel">https://www.crunchbase.com/organization/slintel
6sense funding : https://www.crunchbase.com/organization/6sense">https://www.crunchbase.com/organization/6sense
https://www.nasdaq.com/articles/ai-software-firm-6sense-valued-at-%245.2-bln-after-softbank-joins-funding-round">https://www.nasdaq.com/articles/ai-software-firm-6sense-valued-at-%245.2-bln-after-softbank-joins-funding-round
https://www.bloomberg.com/news/articles/2022-01-20/6sense-reaches-5-2-billion-value-with-softbank-joining-round">https://www.bloomberg.com/news/articles/2022-01-20/6sense-reaches-5-2-billion-value-with-softbank-joining-round
https://xipometer.com/en/company/6sense">https://xipometer.com/en/company/6sense
Slintel & 6sense Customers :
https://www.featuredcustomers.com/vendor/slintel/customers
https://www.featuredcustomers.com/vendor/6sense/customers">https://www.featuredcustomers.com/vendor/6sense/customers
About the job
Responsibilities
- Work in collaboration with the application team and integration team to design, create, and maintain optimal data pipeline architecture and data structures for Data Lake/Data Warehouse
- Work with stakeholders including the Sales, Product, and Customer Support teams to assist with data-related technical issues and support their data analytics needs
- Assemble large, complex data sets from third-party vendors to meet business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Elastic search, MongoDB, and AWS technology
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems
Requirements
- 3+ years of experience in a Data Engineer role
- Proficiency in Linux
- Must have SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra, and Athena
- Must have experience with Python/ Scala
- Must have experience with Big Data technologies like Apache Spark
- Must have experience with Apache Airflow
- Experience with data pipeline and ETL tools like AWS Glue
- Experience working with AWS cloud services: EC2 S3 RDS, Redshift and other Data solutions eg. Databricks, Snowflake
Desired Skills and Experience
Python, SQL, Scala, Spark, ETL
• Java 1.8+ or Open JDK 11+ • Core spring framework • Spring Boot • Spring Rest Services • Any RDBMS • Gradel or Maven • JPA / Hibernate • GIT / Bitbucket • Apache Spark |
What you'll do:
Design and development of scalable applications.
Collaborate with tech leads to get maximum understanding of underlying infrastructure.
Contribute to continual improvement by suggesting improvements to the software system.
Ensure high scalability and performance
You will advocate for good, clean, well documented and performing code; follow standards and best practices.
We'd love for you to have:
Education: Bachelor/Master Degree in Computer Science
Experience: 1-3 years of relevant experience in BI/Big-Data with hands-on coding experience
Mandatory Skills
Strong in problem-solving
Good exposure to Big Data technologies, Hive, Hadoop, Impala, Hbase, Kafka, Spark
Strong experience of Data Engineering
Able to comprehend challenges related to Database and Data Warehousing technologies and ability to understand complex design, system architecture
Experience with the software development lifecycle, design, develop, review, debug, document, and deliver (especially in a multi-location organization)
Working knowledge of Java, python
Desired Skills
Experience with reporting tools like Tableau, QlikView
Awareness of CI-CD pipeline
Inclination to work on cloud platform ex:- AWS
Crisp communication skills with team members, Business owners.
Be able to work in a challenging, dynamic environment and meet tight deadlines
Job Description: Data Scientist
At Propellor.ai, we derive insights that allow our clients to make scientific decisions. We believe in demanding more from the fields of Mathematics, Computer Science, and Business Logic. Combine these and we show our clients a 360-degree view of their business. In this role, the Data Scientist will be expected to work on Procurement problems along with a team-based across the globe.
We are a Remote-First Company.
Read more about us here: https://www.propellor.ai/consulting" target="_blank">https://www.propellor.ai/consulting
What will help you be successful in this role
- Articulate
- High Energy
- Passion to learn
- High sense of ownership
- Ability to work in a fast-paced and deadline-driven environment
- Loves technology
- Highly skilled at Data Interpretation
- Problem solver
- Ability to narrate the story to the business stakeholders
- Generate insights and the ability to turn them into actions and decisions
Skills to work in a challenging, complex project environment
- Need you to be naturally curious and have a passion for understanding consumer behavior
- A high level of motivation, passion, and high sense of ownership
- Excellent communication skills needed to manage an incredibly diverse slate of work, clients, and team personalities
- Flexibility to work on multiple projects and deadline-driven fast-paced environment
- Ability to work in ambiguity and manage the chaos
Key Responsibilities
- Analyze data to unlock insights: Ability to identify relevant insights and actions from data. Use regression, cluster analysis, time series, etc. to explore relationships and trends in response to stakeholder questions and business challenges.
- Bring in experience for AI and ML: Bring in Industry experience and apply the same to build efficient and optimal Machine Learning solutions.
- Exploratory Data Analysis (EDA) and Generate Insights: Analyse internal and external datasets using analytical techniques, tools, and visualization methods. Ensure pre-processing/cleansing of data and evaluate data points across the enterprise landscape and/or external data points that can be leveraged in machine learning models to generate insights.
- DS and ML Model Identification and Training: Identity, test, and train machine learning models that need to be leveraged for business use cases. Evaluate models based on interpretability, performance, and accuracy as required. Experiment and identify features from datasets that will help influence model outputs. Determine what models will need to be deployed, data points that need to be fed into models, and aid in the deployment and maintenance of models.
Technical Skills
An enthusiastic individual with the following skills. Please do not hesitate to apply if you do not match all of them. We are open to promising candidates who are passionate about their work, fast learners and are team players.
- Strong experience with machine learning and AI including regression, forecasting, time series, cluster analysis, classification, Image recognition, NLP, Text Analytics and Computer Vision.
- Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as Python, or similar.
- Strong experience with popular database programming languages including SQL.
- Strong experience in Spark/Pyspark
- Experience in working in Databricks
What are the company benefits you get, when you join us as?
- Permanent Work from Home Opportunity
- Opportunity to work with Business Decision Makers and an internationally based team
- The work environment that offers limitless learning
- A culture void of any bureaucracy, hierarchy
- A culture of being open, direct, and with mutual respect
- A fun, high-caliber team that trusts you and provides the support and mentorship to help you grow
- The opportunity to work on high-impact business problems that are already defining the future of Marketing and improving real lives
To know more about how we work: https://bit.ly/3Oy6WlE" target="_blank">https://bit.ly/3Oy6WlE
Whom will you work with?
You will closely work with other Senior Data Scientists and Data Engineers.
Immediate to 15-day Joiners will be preferred.
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team
● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional
business requirements.
● Building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Maintain, organize & automate data processes for various use cases.
● Identifying trends, doing follow-up analysis, preparing visualizations.
● Creating daily, weekly and monthly reports of product KPIs.
● Create informative, actionable and repeatable reporting that highlights
relevant business trends and opportunities for improvement.
Required Skills And Experience:
● 2-5 years of work experience in data analytics- including analyzing large data sets.
● BTech in Mathematics/Computer Science
● Strong analytical, quantitative and data interpretation skills.
● Hands-on experience with Python, Apache Spark, Hadoop, NoSQL
databases(MongoDB preferred), Linux is a must.
● Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Experience with Google Cloud Data Analytics Products such as BigQuery, Dataflow, Dataproc etc. (or similar cloud-based platforms).
● Experience working within a Linux computing environment, and use of
command-line tools including knowledge of shell/Python scripting for
automating common tasks.
● Previous experience working at startups and/or in fast-paced environments.
● Previous experience as a data engineer or in a similar role.
Requirements:
- Overall 3 to 5 years of experience in designing and implementing complex large scale Software.
- Good in Python is must.
- Experience in Apache Spark, Scala, Java and Delta Lake
- Experience in designing and implementing templated ETL/ELT data pipelines
- Expert level experience in Data Pipeline Orchestrationusing Apache Airflow for large scale production deployment
- Experience in visualizing data from various tasks in the data pipeline using Apache Zeppelin/Plotly or any other visualization library.
- Log management and log monitoring using ELK/Grafana
- Git Hub Integration
Technology Stack: Apache Spark, Apache Airflow, Python, AWS, EC2, S3, Kubernetes, ELK, Grafana , Apache Arrow, Java
Location: Chennai
Education: BE/BTech
Experience: Minimum 3+ years of experience as a Data Scientist/Data Engineer
Domain knowledge: Data cleaning, modelling, analytics, statistics, machine learning, AI
Requirements:
- To be part of Digital Manufacturing and Industrie 4.0 projects across client group of companies
- Design and develop AI//ML models to be deployed across factories
- Knowledge on Hadoop, Apache Spark, MapReduce, Scala, Python programming, SQL and NoSQL databases is required
- Should be strong in statistics, data analysis, data modelling, machine learning techniques and Neural Networks
- Prior experience in developing AI and ML models is required
- Experience with data from the Manufacturing Industry would be a plus
Roles and Responsibilities:
- Develop AI and ML models for the Manufacturing Industry with a focus on Energy, Asset Performance Optimization and Logistics
- Multitasking, good communication necessary
- Entrepreneurial attitude
Additional Information:
- Travel: Must be willing to travel on shorter duration within India and abroad
- Job Location: Chennai
- Reporting to: Team Leader, Energy Management System
Collaborate with the CIO on application Architecture and Design of our ETL (Extract, Transform,
Load) and other aspects of Data Pipelines. Our stack is built on top of the well-known Spark
Ecosystem (e.g. Scala, Python, etc.)
Periodically evaluate architectural landscape for efficiencies in our Data Pipelines and define current
state, target state architecture and transition plans, road maps to achieve desired architectural state
Conducts/leads and implements proof of concepts to prove new technologies in support of
architecture vision and guiding principles (e.g. Flink)
Assist in the ideation and execution of architectural principles, guidelines and technology standards
that can be leveraged across the team and organization. Specially around ETL & Data Pipelines
Promotes consistency between all applications leveraging enterprise automation capabilities
Provide architectural consultation, support, mentoring, and guidance to project teams, e.g. architects,
data scientist, developers, etc.
Collaborate with the DevOps Lead on technical features
Define and manage work items using Agile methodologies (Kanban, Azure boards, etc) Leads Data
Engineering efforts (e.g. Scala Spark, PySpark, etc)
Knowledge & Experience
Experienced with Spark, Delta Lake, and Scala to work with Petabytes of data (to work with Batch
and Streaming flows)
Knowledge of a wide variety of open source technologies including but not limited to; NiFi,
Kubernetes, Docker, Hive, Oozie, YARN, Zookeeper, PostgreSQL, RabbitMQ, Elasticsearch
A strong understanding of AWS/Azure and/or technology as a service (Iaas, SaaS, PaaS)
Strong verbal and written communications skills are a must, as well as the ability to work effectively
across internal and external organizations and virtual teams
Appreciation of building high volume, low latency systems for the API flow
Core Dev skills (SOLID principles, IOC, 12-factor app, CI-CD, GIT)
Messaging, Microservice Architecture, Caching (Redis), Containerization, Performance, and Load
testing, REST APIs
Knowledge of HTML, JavaScript frameworks (preferably Angular 2+), Typescript
Appreciation of Python and C# .NET Core or Java Appreciation of global data privacy requirements
and cryptography
Experience in System Testing and experience of automated testing e.g. unit tests, integration tests,
mocking/stubbing
Relevant industry and other professional qualifications
Tertiary qualifications (degree level)
We are an inclusive employer and welcome applicants from all backgrounds. We pride ourselves on
our commitment to Equality and Diversity and are committed to removing barriers throughout our
hiring process.
Key Requirements
Extensive data engineering development experience (e.g., ETL), using well known stacks (e.g., Scala
Spark)
Experience in Technical Leadership positions (or looking to gain experience)
Background software engineering
The ability to write technical documentation
Solid understanding of virtualization and/or cloud computing technologies (e.g., docker, Kubernetes)
Experience in designing software solutions and enjoys UML and the odd sequence diagram
Experience operating within an Agile environment Ability to work independently and with minimum
supervision
Strong project development management skills, with the ability to successfully manage and prioritize
numerous time pressured analytical projects/work tasks simultaneously
Able to pivot quickly and make rapid decisions based on changing needs in a fast-paced environment
Works constructively with teams and acts with high integrity
Passionate team player with an inquisitive, creative mindset and ability to think outside the box.
-
Bachelor’s or master’s degree in Computer Engineering, Computer Science, Computer Applications, Mathematics, Statistics, or related technical field. Relevant experience of at least 3 years in lieu of above if from a different stream of education.
-
Well-versed in and 3+ hands-on demonstrable experience with: ▪ Stream & Batch Big Data Pipeline Processing using Apache Spark and/or Apache Flink.
▪ Distributed Cloud Native Computing including Server less Functions
▪ Relational, Object Store, Document, Graph, etc. Database Design & Implementation
▪ Micro services Architecture, API Modeling, Design, & Programming -
3+ years of hands-on development experience in Apache Spark using Scala and/or Java.
-
Ability to write executable code for Services using Spark RDD, Spark SQL, Structured Streaming, Spark MLLib, etc. with deep technical understanding of Spark Processing Framework.
-
In-depth knowledge of standard programming languages such as Scala and/or Java.
-
3+ years of hands-on development experience in one or more libraries & frameworks such as Apache Kafka, Akka, Apache Storm, Apache Nifi, Zookeeper, Hadoop ecosystem (i.e., HDFS, YARN, MapReduce, Oozie & Hive), etc.; extra points if you can demonstrate your knowledge with working examples.
-
3+ years of hands-on development experience in one or more Relational and NoSQL datastores such as PostgreSQL, Cassandra, HBase, MongoDB, DynamoDB, Elastic Search, Neo4J, etc.
-
Practical knowledge of distributed systems involving partitioning, bucketing, CAP theorem, replication, horizontal scaling, etc.
-
Passion for distilling large volumes of data, analyze performance, scalability, and capacity performance issues in Big Data Platforms.
-
Ability to clearly distinguish system and Spark Job performances and perform spark performance tuning and resource optimization.
-
Perform benchmarking/stress tests and document the best practices for different applications.
-
Proactively work with tenants on improving the overall performance and ensure the system is resilient, and scalable.
-
Good understanding of Virtualization & Containerization; must demonstrate experience in technologies such as Kubernetes, Istio, Docker, OpenShift, Anthos, Oracle VirtualBox, Vagrant, etc.
-
Well-versed with demonstrable working experience with API Management, API Gateway, Service Mesh, Identity & Access Management, Data Protection & Encryption.
Hands-on experience with demonstrable working experience with DevOps tools and platforms viz., Jira, GIT, Jenkins, Code Quality & Security Plugins, Maven, Artifactory, Terraform, Ansible/Chef/Puppet, Spinnaker, etc.
-
Well-versed in AWS and/or Azure or and/or Google Cloud; must demonstrate experience in at least FIVE (5) services offered under AWS and/or Azure or and/or Google Cloud in any categories: Compute or Storage, Database, Networking & Content Delivery, Management & Governance, Analytics, Security, Identity, & Compliance (or) equivalent demonstrable Cloud Platform experience.
-
Good understanding of Storage, Networks and Storage Networking basics which will enable you to work in a Cloud environment.
-
Good understanding of Network, Data, and Application Security basics which will enable you to work in a Cloud as well as Business Applications / API services environment.
American Multinational Retail Corp
Should have Passion to learn and adapt new technologies, understanding,
solving/troubleshooting issues and risks, able to make informed decisions and ability to
lead the projects.
Your Qualifications
- 2-5 Years’ Experience with functional programming
- Experience with functional programming using Scala with Spark framework.
- Strong understanding of Object-oriented programming, data structures and algorithms
- Good experience in any of the cloud platforms (Azure, AWS, GCP) etc.,
- Experience with distributed (multi-tiered) systems, relational databases and NoSql storage solutions
- Desire to learn new technologies and languages
- Participation in software design, development, and code reviews
- High level of proficiency with Computer Science/Software Engineering knowledge and contribution to the technical skills growth of other team members
Your Responsibility
- Design, build and configure applications to meet business process and application requirements
- Proactively identify and communicate potential issues and concerns and recommend/implement alternative solutions as appropriate.
- Troubleshooting & Optimization of existing solution
Provide advice on technical design to ensure solutions are forward looking and flexible for potential future requirements and business needs.
Leading StartUp Focused On Employee Growth
● Proficiency in Linux.
● Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
● Must have SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra,
and Athena.
● Must have experience with Python/Scala.
● Must have experience with Big Data technologies like Apache Spark.
● Must have experience with Apache Airflow.
● Experience with data pipelines and ETL tools like AWS Glue.
AI-powered cloud-based SaaS solution provider
● Able to contribute to the gathering of functional requirements, developing technical
specifications, and test case planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● 60% hands-on coding with architecture ownership of one or more products
● Ability to articulate architectural and design options, and educate development teams and
business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Mentor and guide team members
● Work cross-functionally with various bidgely teams including product management, QA/QE,
various product lines, and/or business units to drive forward results
Requirements
● BS/MS in computer science or equivalent work experience
● 8-12 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data EcoSystems.
● Past experience with Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra,
Kafka, Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Ability to lead and mentor technical team members
● Expertise with the entire Software Development Life Cycle (SDLC)
● Excellent communication skills: Demonstrated ability to explain complex technical issues to
both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Business Acumen - strategic thinking & strategy development
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
● Experience with Agile Development, SCRUM, or Extreme Programming methodologies
world’s fastest growing consumer internet company
Data Engineer JD:
- Designing, developing, constructing, installing, testing and maintaining the complete data management & processing systems.
- Building highly scalable, robust, fault-tolerant, & secure user data platform adhering to data protection laws.
- Taking care of the complete ETL (Extract, Transform & Load) process.
- Ensuring architecture is planned in such a way that it meets all the business requirements.
- Exploring new ways of using existing data, to provide more insights out of it.
- Proposing ways to improve data quality, reliability & efficiency of the whole system.
- Creating data models to reduce system complexity and hence increase efficiency & reduce cost.
- Introducing new data management tools & technologies into the existing system to make it more efficient.
- Setting up monitoring and alarming on data pipeline jobs to detect failures and anomalies
What do we expect from you?
- BS/MS in Computer Science or equivalent experience
- 5 years of recent experience in Big Data Engineering.
- Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Zookeeper, Storm, Spark, Airflow and NoSQL systems
- Excellent programming and debugging skills in Java or Python.
- Apache spark, python, hands on experience in deploying ML models
- Has worked on streaming and realtime pipelines
- Experience with Apache Kafka or has worked with any of Spark Streaming, Flume or Storm
Focus Area:
R1 |
Data structure & Algorithms |
R2 |
Problem solving + Coding |
R3 |
Design (LLD) |
- Work in collaboration with the application team and integration team to design, create, and maintain optimal data pipeline architecture and data structures for Data Lake/Data Warehouse.
- Work with stakeholders including the Sales, Product, and Customer Support teams to assist with data-related technical issues and support their data analytics needs.
- Assemble large, complex data sets from third-party vendors to meet business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Elasticsearch, MongoDB, and AWS technology.
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems.
Requirements
- 5+ years of experience in a Data Engineer role.
- Proficiency in Linux.
- Must have SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra, and Athena.
- Must have experience with Python/Scala.
- Must have experience with Big Data technologies like Apache Spark.
- Must have experience with Apache Airflow.
- Experience with data pipeline and ETL tools like AWS Glue.
- Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
at Simplifai Cognitive Solutions Pvt Ltd
We are looking for a skilled Senior/Lead Bigdata Engineer to join our team. The role is part of the research and development team, where you with enthusiasm and knowledge are going to be our technical evangelist for the development of our inspection technology and products.
At Elop we are developing product lines for sustainable infrastructure management using our own patented technology for ultrasound scanners and combine this with other sources to see holistic overview of the concrete structure. At Elop we will provide you with world-class colleagues highly motivated to position the company as an international standard of structural health monitoring. With the right character you will be professionally challenged and developed.
This position requires travel to Norway.
Elop is sister company of Simplifai and co-located together in all geographic locations.
Roles and Responsibilities
- Define technical scope and objectives through research and participation in requirements gathering and definition of processes
- Ingest and Process data from data sources (Elop Scanner) in raw format into Big Data ecosystem
- Realtime data feed processing using Big Data ecosystem
- Design, review, implement and optimize data transformation processes in Big Data ecosystem
- Test and prototype new data integration/processing tools, techniques and methodologies
- Conversion of MATLAB code into Python/C/C++.
- Participate in overall test planning for the application integrations, functional areas and projects.
- Work with cross functional teams in an Agile/Scrum environment to ensure a quality product is delivered.
Desired Candidate Profile
- Bachelor's degree in Statistics, Computer or equivalent
- 7+ years of experience in Big Data ecosystem, especially Spark, Kafka, Hadoop, HBase.
- 7+ years of hands-on experience in Python/Scala is a must.
- Experience in architecting the big data application is needed.
- Excellent analytical and problem solving skills
- Strong understanding of data analytics and data visualization, and must be able to help development team with visualization of data.
- Experience with signal processing is plus.
- Experience in working on client server architecture is plus.
- Knowledge about database technologies like RDBMS, Graph DB, Document DB, Apache Cassandra, OpenTSDB
- Good communication skills, written and oral, in English
We can Offer
- An everyday life with exciting and challenging tasks with the development of socially beneficial solutions
- Be a part of companys research and Development team to create unique and innovative products
- Colleagues with world-class expertise, and an organization that has ambitions and is highly motivated to position the company as an international player in maintenance support and monitoring of critical infrastructure!
- Good working environment with skilled and committed colleagues an organization with short decision paths.
- Professional challenges and development
cloud Transformation products, frameworks and services
- Experience with Cloud native Data tools/Services such as AWS Athena, AWS Glue, Redshift Spectrum, AWS EMR, AWS Aurora, Big Query, Big Table, S3, etc.
- Strong programming skills in at least one of the following languages: Java, Scala, C++.
- Familiarity with a scripting language like Python as well as Unix/Linux shells.
- Comfortable with multiple AWS components including RDS, AWS Lambda, AWS Glue, AWS Athena, EMR. Equivalent tools in the GCP stack will also suffice.
- Strong analytical skills and advanced SQL knowledge, indexing, query optimization techniques.
- Experience implementing software around data processing, metadata management, and ETL pipeline tools like Airflow.
Experience with the following software/tools is highly desired:
- Apache Spark, Kafka, Hive, etc.
- SQL and NoSQL databases like MySQL, Postgres, DynamoDB.
- Workflow management tools like Airflow.
- AWS cloud services: RDS, AWS Lambda, AWS Glue, AWS Athena, EMR.
- Familiarity with Spark programming paradigms (batch and stream-processing).
- RESTful API services.
upGrad is an online education platform building the careers of tomorrow by offering the most industry-relevant programs in an immersive learning experience. Our mission is to create a new digital-first learning experience to deliver tangible career impact to individuals at scale. upGrad currently offers programs in Data Science, Machine Learning, Product Management, Digital Marketing, and Entrepreneurship, etc. upGrad is looking for people passionate about management and education to help design learning programs for working professionals to stay sharp and stay relevant and help build the careers of tomorrow.
- upGrad was awarded the Best Tech for Education by IAMAI for 2018-19,
- upGrad was also ranked as one of the LinkedIn Top Startups 2018: The 25 most sought-after startups in India.
- upGrad was earlier selected as one of the top ten most innovative companies in India by FastCompany.
- We were also covered by the Financial Times along with other disruptors in Ed-Tech.
- upGrad is the official education partner for Government of India - Startup India program.
- Our program with IIIT B has been ranked #1 program in the country in the domain of Artificial Intelligence and Machine Learning.
About the Role
A highly motivated individual who has expe rience in architecting end to end web based ecommerce/online/SaaS products and systems; bringing them to production quickly and with high quality. Able to understand expected business results and map architecture to drive business forward. Passionate about building world class solutions.
Role and Responsibilities
- Work with Product Managers and Business to understand business/product requirements and vision.
- Provide a clear architectural vision in line with business and product vision.
- Lead a team of architects, developers, and data engineers to provide platform services to other engineering teams.
- Provide architectural oversight to engineering teams across the organization.
- Hands on design and development of platform services and features owned by self - this is a hands-on coding role.
- Define guidelines for best practices covering design, unit testing, secure coding etc.
- Ensure quality by reviewing design, code, test plans, load test plans etc. as appropriate.
- Work closely with the QA and Support teams to track quality and proactively identify improvement opportunities.
- Work closely with DevOps and IT to ensure highly secure and cost optimized operations in the cloud.
- Grow technical skills in the team - identify skill gaps with plans to address them, participate in hiring, mentor other architects and engineers.
- Support other engineers in resolving complex technical issues as a go-to person.
Skills/Experience
- 12+ years of experience in design and development of ecommerce scale systems and highly scalable SaaS or enterprise products.
- Extensive experience in developing extensible and scalable web applications with
- Java, Spring Boot, Go
- Web Services - REST, OAuth, OData
- Database/Caching - MySQL, Cassandra, MongoDB, Memcached/Redis
- Queue/Broker services - RabbitMQ/Kafka
- Microservices architecture via Docker on AWS or Azure.
- Experience with web front end technologies - HTML5, CSS3, JavaScript libraries and frameworks such as jQuery, AngularJS, React, Vue.js, Bootstrap etc.
- Extensive experience with cloud based architectures and how to optimize design for cost.
- Expert level understanding of secure application design practices and a working understanding of cloud infrastructure security.
- Experience with CI/CD processes and design for testability.
- Experience working with big data technologies such as Spark/Storm/Hadoop/Data Lake Architectures is a big plus.
- Action and result-oriented problem-solver who works well both independently and as part of a team; able to foster and develop others' ideas as well as his/her own.
- Ability to organize, prioritize and schedule a high workload and multiple parallel projects efficiently.
- Excellent verbal and written communication with stakeholders in a matrixed environment.
- Long term experience with at least one product from inception to completion and evolution of the product over multiple years.
B.Tech/MCA (IT/Computer Science) from a premier institution (IIT/NIT/BITS) and/or a US Master's degree in Computer Science.
at Happymonk AI labs
Work across the full stack, building highly scalable distributed solutions that enable positive user experiences and measurable business growth |
Develop new features and infrastructure development in support of rapidly emerging business and project requirements |
Assume leadership of new projects from conceptualization to deployment |
Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful application design |
Work with agile development methodologies, adhering to best practices and pursuing continued learning opportunities |
Visualize, design, and develop creative and innovative software platforms, as we continue to experience dramatic growth in the usage and visibility of our products |
Create scalable software platforms and applications, and efficient networking solutions that are unit tested, code reviewed and checked regularly for continuous integration |
Examine existing systems, identifying flaws and creating solutions to improve service uptime and time-to-resolve through monitoring and automated remediation |
Plan and execute full software development life cycles (SDLC) for each assigned project, adhering to company standards and expectations |
Special Skills Required |
Bachelor’s degree in software engineering or information technology |
|
2+ years of experience engineering software and networking platforms |
|
2+ years of experience in building large-scale software applications. |
|
Proven ability to document design processes, including development, tests, analytics, and troubleshooting |
|
Experience with rapid development cycles in a web-based environment |
|
Strong scripting and test automation abilities, ability to drive a Test Driven Development Model |
|
Working knowledge of relational databases as well as ORM and Postgre and other SQL technologies |
|
Proficiency with Javascript, Typescript, React.js, Babylon, Nodejs, HTML5, CSS3, and order management systems |
|
Proven experience designing interactive applications and networking platforms |
|
Web application development experience with multiple frameworks, including Blockchain, Hyperledger, Spark, Kafka, Elasticsearch, neo4j, graphQL |
|
Desire to continue to grow professional capabilities with ongoing training and educational opportunities |
|
Additional Knowledge in computer vision, embedded, blockchain technologies a plus |
|
Experience Designing and integrating RESTful APIs |
|
Excellent debugging and optimization skills |
|
Unit/integration testing experience |
|
Interest in learning new tools, languages, workflows, and philosophies to grow |
|
Professional certifications |
Location |
Bengaluru, Karnataka, India - 560072 |
- Does analytics to extract insights from raw historical data of the organization.
- Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
- Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
- Tests the short/long term impact of productized MV models on those trends.
- Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Job Description
Experience : 7+ years
Location : Pune / Hyderabad
Skills :
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!
www.datametica.com
• Create and maintain data pipeline
• Build and deploy ETL infrastructure for optimal data delivery
• Work with various including product, design and executive team to troubleshoot data
related issues
• Create tools for data analysts and scientists to help them build and optimise the product
• Implement systems and process for data access controls and guarantees
• Distill the knowledge from experts in the field outside the org and optimise internal data
systems
Preferred qualifications/skills:
• 5+ years experience
• Strong analytical skills
____ 04
Freight Commerce Solutions Pvt Ltd.
• Degree in Computer Science, Statistics, Informatics, Information Systems
• Strong project management and organisational skills
• Experience supporting and working with cross-functional teams in a dynamic environment
• SQL guru with hands on experience on various databases
• NoSQL databases like Cassandra, MongoDB
• Experience with Snowflake, Redshift
• Experience with tools like Airflow, Hevo
• Experience with Hadoop, Spark, Kafka, Flink
• Programming experience in Python, Java, Scala
- We are looking for an experienced data engineer to join our team.
- The preprocessing involves ETL tasks, using pyspark, AWS Glue, staging data in parquet formats on S3, and Athena
To succeed in this data engineering position, you should care about well-documented, testable code and data integrity. We have devops who can help with AWS permissions.
We would like to build up a consistent data lake with staged, ready-to-use data, and to build up various scripts that will serve as blueprints for various additional data ingestion and transforms.
If you enjoy setting up something which many others will rely on, and have the relevant ETL expertise, we’d like to work with you.
Responsibilities
- Analyze and organize raw data
- Build data pipelines
- Prepare data for predictive modeling
- Explore ways to enhance data quality and reliability
- Potentially, collaborate with data scientists to support various experiments
Requirements
- Previous experience as a data engineer with the above technologies
Recko Inc. is looking for data engineers to join our kick-ass engineering team. We are looking for smart, dynamic individuals to connect all the pieces of the data ecosystem.
What are we looking for:
-
3+ years of development experience in at least one of MySQL, Oracle, PostgreSQL or MSSQL and experience in working with Big Data technologies like Big Data frameworks/platforms/data stores like Hadoop, HDFS, Spark, Oozie, Hue, EMR, Scala, Hive, Glue, Kerberos etc.
-
Strong experience setting up data warehouses, data modeling, data wrangling and dataflow architecture on the cloud
-
2+ experience with public cloud services such as AWS, Azure, or GCP and languages like Java/ Python etc
-
2+ years of development experience in Amazon Redshift, Google Bigquery or Azure data warehouse platforms preferred
-
Knowledge of statistical analysis tools like R, SAS etc
-
Familiarity with any data visualization software
-
A growth mindset and passionate about building things from the ground up and most importantly, you should be fun to work with
As a data engineer at Recko, you will:
-
Create and maintain optimal data pipeline architecture,
-
Assemble large, complex data sets that meet functional / non-functional business requirements.
-
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
-
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
-
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
-
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
-
Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
-
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
-
Work with data and analytics experts to strive for greater functionality in our data systems.
About Recko:
Recko was founded in 2017 to organise the world’s transactional information and provide intelligent applications to finance and product teams to make sense of the vast amount of data available. With the proliferation of digital transactions over the past two decades, Enterprises, Banks and Financial institutions are finding it difficult to keep a track on the money flowing across their systems. With the Recko Platform, businesses can build, integrate and adapt innovative and complex financial use cases within the organization and across external payment ecosystems with agility, confidence and at scale. . Today, customer-obsessed brands such as Deliveroo, Meesho, Grofers, Dunzo, Acommerce, etc use Recko so their finance teams can optimize resources with automation and prioritize growth over repetitive and time-consuming tasks around day-to-day operations.
Recko is a Series A funded startup, backed by marquee investors like Vertex Ventures, Prime Venture Partners and Locus Ventures. Traditionally enterprise software is always built around functionality. We believe software is an extension of one’s capability, and it should be delightful and fun to use.
Working at Recko:
We believe that great companies are built by amazing people. At Recko, We are a group of young Engineers, Product Managers, Analysts and Business folks who are on a mission to bring consumer tech DNA to enterprise fintech applications. The current team at Recko is 60+ members strong with stellar experience across fintech, e-commerce, digital domains at companies like Flipkart, PhonePe, Ola Money, Belong, Razorpay, Grofers, Jio, Oracle etc. We are growing aggressively across verticals.
The Platform Data Science team works at the intersection of data science and engineering. Domain experts develop and advance platforms, including the data platforms, machine learning platform, other platforms for Forecasting, Experimentation, Anomaly Detection, Conversational AI, Underwriting of Risk, Portfolio Management, Fraud Detection & Prevention and many more. We also are the Data Science and Analytics partners for Product and provide Behavioural Science insights across Jupiter.
About the role:
We’re looking for strong Software Engineers that can combine EMR, Redshift, Hadoop, Spark, Kafka, Elastic Search, Tensorflow, Pytorch and other technologies to build the next generation Data Platform, ML Platform, Experimentation Platform. If this sounds interesting we’d love to hear from you!
This role will involve designing and developing software products that impact many areas of our business. The individual in this role will have responsibility help define requirements, create software designs, implement code to these specifications, provide thorough unit and integration testing, and support products while deployed and used by our stakeholders.
Key Responsibilities:
Participate, Own & Influence in architecting & designing of systems
Collaborate with other engineers, data scientists, product managers
Build intelligent systems that drive decisions
Build systems that enable us to perform experiments and iterate quickly
Build platforms that enable scientists to train, deploy and monitor models at scale
Build analytical systems that drives better decision making
Required Skills:
Programming experience with at least one modern language such as Java, Scala including object-oriented design
Experience in contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
Bachelor’s degree in Computer Science or related field
Computer Science fundamentals in object-oriented design
Computer Science fundamentals in data structures
Computer Science fundamentals in algorithm design, problem solving, and complexity analysis
Experience in databases, analytics, big data systems or business intelligence products:
Data lake, data warehouse, ETL, ML platform
Big data tech like: Hadoop, Apache Spark
SpringML is looking to hire a top-notch Senior Data Engineer who is passionate about working with data and using the latest distributed framework to process large dataset. As an Associate Data Engineer, your primary role will be to design and build data pipelines. You will be focused on helping client projects on data integration, data prep and implementing machine learning on datasets. In this role, you will work on some of the latest technologies, collaborate with partners on early win, consultative approach with clients, interact daily with executive leadership, and help build a great company. Chosen team members will be part of the core team and play a critical role in scaling up our emerging practice.
RESPONSIBILITIES:
- Ability to work as a member of a team assigned to design and implement data integration solutions.
- Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
- Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
- Propose design solutions and recommend best practices for large scale data analysis
SKILLS:
- B.tech degree in computer science, mathematics or other relevant fields.
- 4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
- Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
- Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
- Experience with Agile implementation methodologies
Understand various raw data input formats, build consumers on Kafka/ksqldb for them and ingest large amounts of raw data into Flink and Spark.
Conduct complex data analysis and report on results.
Build various aggregation streams for data and convert raw data into various logical processing streams.
Build algorithms to integrate multiple sources of data and create a unified data model from all the sources.
Build a unified data model on both SQL and NO-SQL databases to act as data sink.
Communicate the designs effectively with the fullstack engineering team for development.
Explore machine learning models that can be fitted on top of the data pipelines.
Mandatory Qualifications Skills:
Deep knowledge of Scala and Java programming languages is mandatory
Strong background in streaming data frameworks (Apache Flink, Apache Spark) is mandatory
Good understanding and hands on skills on streaming messaging platforms such as Kafka
Familiarity with R, C and Python is an asset
Analytical mind and business acumen with strong math skills (e.g. statistics, algebra)
Problem-solving aptitude
Excellent communication and presentation skills
Series 'A' funded Silicon Valley based BI startup
companies uncover the 3% of active buyers in their target market. It evaluates
over 100 billion data points and analyzes factors such as buyer journeys, technology
adoption patterns, and other digital footprints to deliver market & sales intelligence.
Its customers have access to the buying patterns and contact information of
more than 17 million companies and 70 million decision makers across the world.
Role – Data Engineer
Responsibilities
Work in collaboration with the application team and integration team to
design, create, and maintain optimal data pipeline architecture and data
structures for Data Lake/Data Warehouse.
Work with stakeholders including the Sales, Product, and Customer Support
teams to assist with data-related technical issues and support their data
analytics needs.
Assemble large, complex data sets from third-party vendors to meet business
requirements.
Identify, design, and implement internal process improvements: automating
manual processes, optimizing data delivery, re-designing infrastructure for
greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and
loading of data from a wide variety of data sources using SQL, Elasticsearch,
MongoDB, and AWS technology.
Streamline existing and introduce enhanced reporting and analysis solutions
that leverage complex data sources derived from multiple internal systems.
Requirements
5+ years of experience in a Data Engineer role.
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
The Data Engineer would be responsible for selecting and integrating Big Data tools and frameworks required. Would implement Data Ingestion & ETL/ELT processes
Required Experience, Skills and Qualifications:
- Hands on experience on Big Data tools/technologies like Spark, Databricks, Map Reduce, Hive, HDFS.
- Expertise and excellent understanding of big data toolset such as Sqoop, Spark-streaming, Kafka, NiFi
- Proficiency in any of the programming language: Python/ Scala/ Java with 4+ years’ experience
- Experience in Cloud infrastructures like MS Azure, Data lake etc
- Good working knowledge in NoSQL DB (Mongo, HBase, Casandra)
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
-
Owns the end to end implementation of the assigned data processing components/product features i.e. design, development, dep
loyment, and testing of the data processing components and associated flows conforming to best coding practices -
Creation and optimization of data engineering pipelines for analytics projects.
-
Support data and cloud transformation initiatives
-
Contribute to our cloud strategy based on prior experience
-
Independently work with all stakeholders across the organization to deliver enhanced functionalities
-
Create and maintain automated ETL processes with a special focus on data flow, error recovery, and exception handling and reporting
-
Gather and understand data requirements, work in the team to achieve high-quality data ingestion and build systems that can process the data, transform the data
-
Be able to comprehend the application of database index and transactions
-
Involve in the design and development of a Big Data predictive analytics SaaS-based customer data platform using object-oriented analysis
, design and programming skills, and design patterns -
Implement ETL workflows for data matching, data cleansing, data integration, and management
-
Maintain existing data pipelines, and develop new data pipeline using big data technologies
-
Responsible for leading the effort of continuously improving reliability, scalability, and stability of microservices and platform