41+ Apache Spark Jobs in Bangalore (Bengaluru) | Apache Spark Job openings in Bangalore (Bengaluru)
Apply to 41+ Apache Spark Jobs in Bangalore (Bengaluru) on CutShort.io. Explore the latest Apache Spark Job opportunities across top companies like Google, Amazon & Adobe.
Responsibilities:
• Build customer facing solution for Data Observability product to monitor Data Pipelines
• Work on POCs to build new data pipeline monitoring capabilities.
• Building next-generation scalable, reliable, flexible, high-performance data pipeline capabilities for ingestion of data from multiple sources containing complex dataset.
•Continuously improve services you own, making them more performant, and utilising resources in the most optimised way.
• Collaborate closely with engineering, data science team and product team to propose an optimal solution for a given problem statement
• Working closely with DevOps team on performance monitoring and MLOps
Required Skills:
• 3+ Years of Data related technology experience.
• Good understanding of distributed computing principles
• Experience in Apache Spark
• Hands on programming with Python
• Knowledge of Hadoop v2, Map Reduce, HDFS
• Experience with building stream-processing systems, using technologies such as Apache Storm, Spark-Streaming or Flink
• Experience with messaging systems, such as Kafka or RabbitMQ
• Good understanding of Big Data querying tools, such as Hive
• Experience with integration of data from multiple data sources
• Good understanding of SQL queries, joins, stored procedures, relational schemas
• Experience with NoSQL databases, such as HBase, Cassandra/Scylla, MongoDB
• Knowledge of ETL techniques and frameworks
• Performance tuning of Spark Jobs
• General understanding of Data Quality is a plus point
• Experience on Databricks,snowflake and BigQuery or similar lake houses would be a big plus
• Nice to have some knowledge in DevOps
We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.
Key Responsibilities:
- Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
- Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
- Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
- Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
- Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
- Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
- Collaborate with diverse teams to convert business requirements into precise technical specifications.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- Demonstrated hands-on experience with Azure cloud services and Databricks.
- Proficient programming skills in Python, Scala, and PHP.
- In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
- Familiarity with distributed data processing and external table management.
- Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
- Exceptional problem-solving acumen and meticulous attention to detail.
Additional Qualifications :
- Acquaintance with data security and privacy standards.
- Experience in CI/CD pipelines and version control systems, notably Git.
- Familiarity with Agile methodologies and DevOps practices.
- Competence in technical writing for comprehensive documentation.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
A LEADING US BASED MNC
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
Lead Data Engineer
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
Job responsibilities
· You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems
· You will partner with teammates to create complex data processing pipelines in order to solve our clients' most ambitious challenges
· You will collaborate with Data Scientists in order to design scalable implementations of their models
· You will pair to write clean and iterative code based on TDD
· Leverage various continuous delivery practices to deploy, support and operate data pipelines
· Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
· Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
· Create data models and speak to the tradeoffs of different modeling approaches
· On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product
· Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
· Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
Job qualifications Technical skills
· You are equally happy coding and leading a team to implement a solution
· You have a track record of innovation and expertise in Data Engineering
· You're passionate about craftsmanship and have applied your expertise across a range of industries and organizations
· You have a deep understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
· You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
· Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
· You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
· You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
· Working with data excites you: you have created Big data architecture, you can build and operate data pipelines, and maintain data storage, all within distributed systems
Professional skills
· Advocate your data engineering expertise to the broader tech community outside of Thoughtworks, speaking at conferences and acting as a mentor for more junior-level data engineers
· You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
· An interest in coaching others, sharing your experience and knowledge with teammates
· You enjoy influencing others and always advocate for technical excellence while being open to change when needed
Roles and Responsibilities:
- Design, develop, and maintain the end-to-end MLOps infrastructure from the ground up, leveraging open-source systems across the entire MLOps landscape.
- Creating pipelines for data ingestion, data transformation, building, testing, and deploying machine learning models, as well as monitoring and maintaining the performance of these models in production.
- Managing the MLOps stack, including version control systems, continuous integration and deployment tools, containerization, orchestration, and monitoring systems.
- Ensure that the MLOps stack is scalable, reliable, and secure.
Skills Required:
- 3-6 years of MLOps experience
- Preferably worked in the startup ecosystem
Primary Skills:
- Experience with E2E MLOps systems like ClearML, Kubeflow, MLFlow etc.
- Technical expertise in MLOps: Should have a deep understanding of the MLOps landscape and be able to leverage open-source systems to build scalable, reliable, and secure MLOps infrastructure.
- Programming skills: Proficient in at least one programming language, such as Python, and have experience with data science libraries, such as TensorFlow, PyTorch, or Scikit-learn.
- DevOps experience: Should have experience with DevOps tools and practices, such as Git, Docker, Kubernetes, and Jenkins.
Secondary Skills:
- Version Control Systems (VCS) tools like Git and Subversion
- Containerization technologies like Docker and Kubernetes
- Cloud Platforms like AWS, Azure, and Google Cloud Platform
- Data Preparation and Management tools like Apache Spark, Apache Hadoop, and SQL databases like PostgreSQL and MySQL
- Machine Learning Frameworks like TensorFlow, PyTorch, and Scikit-learn
- Monitoring and Logging tools like Prometheus, Grafana, and Elasticsearch
- Continuous Integration and Continuous Deployment (CI/CD) tools like Jenkins, GitLab CI, and CircleCI
- Explain ability and Interpretability tools like LIME and SHAP
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
What you get to do in this role:
Work on extremely high scale RUST web services or backend systems.
Design and develop solutions for highly scalable web and backend systems.
Proactively identify and solve performance issues.
Maintain a high bar on code quality and unit testing.
What you bring to the role:
5+ years of hands-on software development experience.
At least 2+ years of RUST development experience.
Knowledge of cargo packages for kafka, redis etc.
Strong CS fundamentals, including system design, data structures and algorithms.
Expertise in backend and web services development.
Good analytical and troubleshooting skills.
What will help you stand out:
Experience working with large scale web services and applications.
Exposure to Golang, Scala or Java
Exposure to Big data systems like Kafka, Spark, Hadoop etc.
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
As Conviva is expanding, we are building products providing deep insights into end user experience for our customers.
Platform and TLB Team
The vision for the TLB team is to build data processing software that works on terabytes of streaming data in real time. Engineer the next-gen Spark-like system for in-memory computation of large time-series dataset’s – both Spark-like backend infra and library based programming model. Build horizontally and vertically scalable system that analyses trillions of events per day within sub second latencies. Utilize the latest and greatest of big data technologies to build solutions for use-cases across multiple verticals. Lead technology innovation and advancement that will have big business impact for years to come. Be part of a worldwide team building software using the latest technologies and the best of software development tools and processes.
What You’ll Do
This is an individual contributor position. Expectations will be on the below lines:
- Design, build and maintain the stream processing, and time-series analysis system which is at the heart of Conviva's products
- Responsible for the architecture of the Conviva platform
- Build features, enhancements, new services, and bug fixing in Scala and Java on a Jenkins-based pipeline to be deployed as Docker containers on Kubernetes
- Own the entire lifecycle of your microservice including early specs, design, technology choice, development, unit-testing, integration-testing, documentation, deployment, troubleshooting, enhancements etc.
- Lead a team to develop a feature or parts of the product
- Adhere to the Agile model of software development to plan, estimate, and ship per business priority
What you need to succeed
- 9+ years of work experience in software development of data processing products.
- Engineering degree in software or equivalent from a premier institute.
- Excellent knowledge of fundamentals of Computer Science like algorithms and data structures. Hands-on with functional programming and know-how of its concepts
- Excellent programming and debugging skills on the JVM. Proficient in writing code in Scala/Java/Rust/Haskell/Erlang that is reliable, maintainable, secure, and performant
- Experience with big data technologies like Spark, Flink, Kafka, Druid, HDFS, etc.
- Deep understanding of distributed systems concepts and scalability challenges including multi-threading, concurrency, sharding, partitioning, etc.
- Experience/knowledge of Akka/Lagom framework and/or stream processing technologies like RxJava or Project Reactor will be a big plus. Knowledge of design patterns like event-streaming, CQRS and DDD to build large microservice architectures will be a big plus
- Excellent communication skills. Willingness to work under pressure. Hunger to learn and succeed. Comfortable with ambiguity. Comfortable with complexity
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their businesses ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision, and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Job Title: Data Engineer
Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.
Responsibilities:
- Design, build, and maintain data pipelines to collect, store, and process data from various sources.
- Create and manage data warehousing and data lake solutions.
- Develop and maintain data processing and data integration tools.
- Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
- Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
- Ensure data quality and integrity across all data sources.
- Develop and implement best practices for data governance, security, and privacy.
- Monitor data pipeline performance / Errors and troubleshoot issues as needed.
- Stay up-to-date with emerging data technologies and best practices.
Requirements:
Bachelor's degree in Computer Science, Information Systems, or a related field.
Experience with ETL tools like Matillion,SSIS,Informatica
Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.
Experience in writing complex SQL queries
Strong programming skills in languages such as Python, Java, or Scala.
Experience with data modeling, data warehousing, and data integration.
Strong problem-solving skills and ability to work independently.
Excellent communication and collaboration skills.
Familiarity with big data technologies such as Hadoop, Spark, or Kafka.
Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks
Familiarity with cloud computing platforms such as AWS, Azure, or GCP.
Familiarity with Reporting tools
Teamwork/ growth contribution
- Helping the team in taking the Interviews and identifying right candidates
- Adhering to timelines
- Intime status communication and upfront communication of any risks
- Tech, train, share knowledge with peers.
- Good Communication skills
- Proven abilities to take initiative and be innovative
- Analytical mind with a problem-solving aptitude
Good to have :
Master's degree in Computer Science, Information Systems, or a related field.
Experience with NoSQL databases such as MongoDB or Cassandra.
Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.
Knowledge of machine learning and statistical modeling techniques.
If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.
About Kloud9:
Kloud9 exists with the sole purpose of providing cloud expertise to the retail industry. Our team of cloud architects, engineers and developers help retailers launch a successful cloud initiative so you can quickly realise the benefits of cloud technology. Our standardised, proven cloud adoption methodologies reduce the cloud adoption time and effort so you can directly benefit from lower migration costs.
Kloud9 was founded with the vision of bridging the gap between E-commerce and cloud. The E-commerce of any industry is limiting and poses a huge challenge in terms of the finances spent on physical data structures.
At Kloud9, we know migrating to the cloud is the single most significant technology shift your company faces today. We are your trusted advisors in transformation and are determined to build a deep partnership along the way. Our cloud and retail experts will ease your transition to the cloud.
Our sole focus is to provide cloud expertise to retail industry giving our clients the empowerment that will take their business to the next level. Our team of proficient architects, engineers and developers have been designing, building and implementing solutions for retailers for an average of more than 20 years.
We are a cloud vendor that is both platform and technology independent. Our vendor independence not just provides us with a unique perspective into the cloud market but also ensures that we deliver the cloud solutions available that best meet our clients' requirements.
What we are looking for:
● 3+ years’ experience developing Data & Analytic solutions
● Experience building data lake solutions leveraging one or more of the following AWS, EMR, S3, Hive& Spark
● Experience with relational SQL
● Experience with scripting languages such as Shell, Python
● Experience with source control tools such as GitHub and related dev process
● Experience with workflow scheduling tools such as Airflow
● In-depth knowledge of scalable cloud
● Has a passion for data solutions
● Strong understanding of data structures and algorithms
● Strong understanding of solution and technical design
● Has a strong problem-solving and analytical mindset
● Experience working with Agile Teams.
● Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
● Able to quickly pick up new programming languages, technologies, and frameworks
● Bachelor’s Degree in computer science
Why Explore a Career at Kloud9:
With job opportunities in prime locations of US, London, Poland and Bengaluru, we help build your career paths in cutting edge technologies of AI, Machine Learning and Data Science. Be part of an inclusive and diverse workforce that's changing the face of retail technology with their creativity and innovative solutions. Our vested interest in our employees translates to deliver the best products and solutions to our customers.
Data Engineer- Senior
Cubera is a data company revolutionizing big data analytics and Adtech through data share value principles wherein the users entrust their data to us. We refine the art of understanding, processing, extracting, and evaluating the data that is entrusted to us. We are a gateway for brands to increase their lead efficiency as the world moves towards web3.
What are you going to do?
Design & Develop high performance and scalable solutions that meet the needs of our customers.
Closely work with the Product Management, Architects and cross functional teams.
Build and deploy large-scale systems in Java/Python.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their algorithms.
Follow best practices that can be adopted in Bigdata stack.
Use your engineering experience and technical skills to drive the features and mentor the engineers.
What are we looking for ( Competencies) :
Bachelor’s degree in computer science, computer engineering, or related technical discipline.
Overall 5 to 8 years of programming experience in Java, Python including object-oriented design.
Data handling frameworks: Should have a working knowledge of one or more data handling frameworks like- Hive, Spark, Storm, Flink, Beam, Airflow, Nifi etc.
Data Infrastructure: Should have experience in building, deploying and maintaining applications on popular cloud infrastructure like AWS, GCP etc.
Data Store: Must have expertise in one of general-purpose No-SQL data stores like Elasticsearch, MongoDB, Redis, RedShift, etc.
Strong sense of ownership, focus on quality, responsiveness, efficiency, and innovation.
Ability to work with distributed teams in a collaborative and productive manner.
Benefits:
Competitive Salary Packages and benefits.
Collaborative, lively and an upbeat work environment with young professionals.
Job Category: Development
Job Type: Full Time
Job Location: Bangalore
The thrill of working at a start-up that is starting to scale massively is something else. Simpl (FinTech startup of the year - 2020) was formed in 2015 by Nitya Sharma, an investment banker from Wall Street and Chaitra Chidanand, a tech executive from the Valley, when they teamed up with a very clear mission - to make money simple so that people can live well and do amazing things. Simpl is the payment platform for the mobile-first world, and we’re backed by some of the best names in fintech globally (folks who have invested in Visa, Square and Transferwise), and
has Joe Saunders, Ex Chairman and CEO of Visa as a board member.
Everyone at Simpl is an internal entrepreneur who is given a lot of bandwidth and resources to create the next breakthrough towards the long term vision of “making money Simpl”. Our first product is a payment platform that lets people buy instantly, anywhere online, and pay later. In
the background, Simpl uses big data for credit underwriting, risk and fraud modelling, all without any paperwork, and enables Banks and Non-Bank Financial Companies to access a whole new consumer market.
In place of traditional forms of identification and authentication, Simpl integrates deeply into merchant apps via SDKs and APIs. This allows for more sophisticated forms of authentication that take full advantage of smartphone data and processing power
Skillset:
Workflow manager/scheduler like Airflow, Luigi, Oozie
Good handle on Python
ETL Experience
Batch processing frameworks like Spark, MR/PIG
File formats: parquet, JSON, XML, thrift, avro, protobuff
Rule engine (drools - business rule management system)
Distributed file systems like HDFS, NFS, AWS, S3 and equivalent
Built/configured dashboards
Nice to have:
Data platform experience for eg: building data lakes, working with near - realtime
applications/frameworks like storm, flink, spark.
AWS
File encoding types: Thrift, Avro, Protobuff, Parquet, JSON, XML
HIVE, HBASE
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
What you'll do:
Design and development of scalable applications.
Collaborate with tech leads to get maximum understanding of underlying infrastructure.
Contribute to continual improvement by suggesting improvements to the software system.
Ensure high scalability and performance
You will advocate for good, clean, well documented and performing code; follow standards and best practices.
We'd love for you to have:
Education: Bachelor/Master Degree in Computer Science
Experience: 1-3 years of relevant experience in BI/Big-Data with hands-on coding experience
Mandatory Skills
Strong in problem-solving
Good exposure to Big Data technologies, Hive, Hadoop, Impala, Hbase, Kafka, Spark
Strong experience of Data Engineering
Able to comprehend challenges related to Database and Data Warehousing technologies and ability to understand complex design, system architecture
Experience with the software development lifecycle, design, develop, review, debug, document, and deliver (especially in a multi-location organization)
Working knowledge of Java, python
Desired Skills
Experience with reporting tools like Tableau, QlikView
Awareness of CI-CD pipeline
Inclination to work on cloud platform ex:- AWS
Crisp communication skills with team members, Business owners.
Be able to work in a challenging, dynamic environment and meet tight deadlines
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team
Leading StartUp Focused On Employee Growth
● Proficiency in Linux.
● Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
● Must have SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra,
and Athena.
● Must have experience with Python/Scala.
● Must have experience with Big Data technologies like Apache Spark.
● Must have experience with Apache Airflow.
● Experience with data pipelines and ETL tools like AWS Glue.
AI-powered cloud-based SaaS solution provider
● Able to contribute to the gathering of functional requirements, developing technical
specifications, and test case planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● 60% hands-on coding with architecture ownership of one or more products
● Ability to articulate architectural and design options, and educate development teams and
business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Mentor and guide team members
● Work cross-functionally with various bidgely teams including product management, QA/QE,
various product lines, and/or business units to drive forward results
Requirements
● BS/MS in computer science or equivalent work experience
● 8-12 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data EcoSystems.
● Past experience with Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra,
Kafka, Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Ability to lead and mentor technical team members
● Expertise with the entire Software Development Life Cycle (SDLC)
● Excellent communication skills: Demonstrated ability to explain complex technical issues to
both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Business Acumen - strategic thinking & strategy development
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
● Experience with Agile Development, SCRUM, or Extreme Programming methodologies
world’s fastest growing consumer internet company
Data Engineer JD:
- Designing, developing, constructing, installing, testing and maintaining the complete data management & processing systems.
- Building highly scalable, robust, fault-tolerant, & secure user data platform adhering to data protection laws.
- Taking care of the complete ETL (Extract, Transform & Load) process.
- Ensuring architecture is planned in such a way that it meets all the business requirements.
- Exploring new ways of using existing data, to provide more insights out of it.
- Proposing ways to improve data quality, reliability & efficiency of the whole system.
- Creating data models to reduce system complexity and hence increase efficiency & reduce cost.
- Introducing new data management tools & technologies into the existing system to make it more efficient.
- Setting up monitoring and alarming on data pipeline jobs to detect failures and anomalies
What do we expect from you?
- BS/MS in Computer Science or equivalent experience
- 5 years of recent experience in Big Data Engineering.
- Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Zookeeper, Storm, Spark, Airflow and NoSQL systems
- Excellent programming and debugging skills in Java or Python.
- Apache spark, python, hands on experience in deploying ML models
- Has worked on streaming and realtime pipelines
- Experience with Apache Kafka or has worked with any of Spark Streaming, Flume or Storm
Focus Area:
R1 |
Data structure & Algorithms |
R2 |
Problem solving + Coding |
R3 |
Design (LLD) |
- Work in collaboration with the application team and integration team to design, create, and maintain optimal data pipeline architecture and data structures for Data Lake/Data Warehouse.
- Work with stakeholders including the Sales, Product, and Customer Support teams to assist with data-related technical issues and support their data analytics needs.
- Assemble large, complex data sets from third-party vendors to meet business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Elasticsearch, MongoDB, and AWS technology.
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems.
Requirements
- 5+ years of experience in a Data Engineer role.
- Proficiency in Linux.
- Must have SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra, and Athena.
- Must have experience with Python/Scala.
- Must have experience with Big Data technologies like Apache Spark.
- Must have experience with Apache Airflow.
- Experience with data pipeline and ETL tools like AWS Glue.
- Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
upGrad is an online education platform building the careers of tomorrow by offering the most industry-relevant programs in an immersive learning experience. Our mission is to create a new digital-first learning experience to deliver tangible career impact to individuals at scale. upGrad currently offers programs in Data Science, Machine Learning, Product Management, Digital Marketing, and Entrepreneurship, etc. upGrad is looking for people passionate about management and education to help design learning programs for working professionals to stay sharp and stay relevant and help build the careers of tomorrow.
- upGrad was awarded the Best Tech for Education by IAMAI for 2018-19,
- upGrad was also ranked as one of the LinkedIn Top Startups 2018: The 25 most sought-after startups in India.
- upGrad was earlier selected as one of the top ten most innovative companies in India by FastCompany.
- We were also covered by the Financial Times along with other disruptors in Ed-Tech.
- upGrad is the official education partner for Government of India - Startup India program.
- Our program with IIIT B has been ranked #1 program in the country in the domain of Artificial Intelligence and Machine Learning.
About the Role
A highly motivated individual who has expe rience in architecting end to end web based ecommerce/online/SaaS products and systems; bringing them to production quickly and with high quality. Able to understand expected business results and map architecture to drive business forward. Passionate about building world class solutions.
Role and Responsibilities
- Work with Product Managers and Business to understand business/product requirements and vision.
- Provide a clear architectural vision in line with business and product vision.
- Lead a team of architects, developers, and data engineers to provide platform services to other engineering teams.
- Provide architectural oversight to engineering teams across the organization.
- Hands on design and development of platform services and features owned by self - this is a hands-on coding role.
- Define guidelines for best practices covering design, unit testing, secure coding etc.
- Ensure quality by reviewing design, code, test plans, load test plans etc. as appropriate.
- Work closely with the QA and Support teams to track quality and proactively identify improvement opportunities.
- Work closely with DevOps and IT to ensure highly secure and cost optimized operations in the cloud.
- Grow technical skills in the team - identify skill gaps with plans to address them, participate in hiring, mentor other architects and engineers.
- Support other engineers in resolving complex technical issues as a go-to person.
Skills/Experience
- 12+ years of experience in design and development of ecommerce scale systems and highly scalable SaaS or enterprise products.
- Extensive experience in developing extensible and scalable web applications with
- Java, Spring Boot, Go
- Web Services - REST, OAuth, OData
- Database/Caching - MySQL, Cassandra, MongoDB, Memcached/Redis
- Queue/Broker services - RabbitMQ/Kafka
- Microservices architecture via Docker on AWS or Azure.
- Experience with web front end technologies - HTML5, CSS3, JavaScript libraries and frameworks such as jQuery, AngularJS, React, Vue.js, Bootstrap etc.
- Extensive experience with cloud based architectures and how to optimize design for cost.
- Expert level understanding of secure application design practices and a working understanding of cloud infrastructure security.
- Experience with CI/CD processes and design for testability.
- Experience working with big data technologies such as Spark/Storm/Hadoop/Data Lake Architectures is a big plus.
- Action and result-oriented problem-solver who works well both independently and as part of a team; able to foster and develop others' ideas as well as his/her own.
- Ability to organize, prioritize and schedule a high workload and multiple parallel projects efficiently.
- Excellent verbal and written communication with stakeholders in a matrixed environment.
- Long term experience with at least one product from inception to completion and evolution of the product over multiple years.
B.Tech/MCA (IT/Computer Science) from a premier institution (IIT/NIT/BITS) and/or a US Master's degree in Computer Science.
at Happymonk AI labs
Work across the full stack, building highly scalable distributed solutions that enable positive user experiences and measurable business growth |
Develop new features and infrastructure development in support of rapidly emerging business and project requirements |
Assume leadership of new projects from conceptualization to deployment |
Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful application design |
Work with agile development methodologies, adhering to best practices and pursuing continued learning opportunities |
Visualize, design, and develop creative and innovative software platforms, as we continue to experience dramatic growth in the usage and visibility of our products |
Create scalable software platforms and applications, and efficient networking solutions that are unit tested, code reviewed and checked regularly for continuous integration |
Examine existing systems, identifying flaws and creating solutions to improve service uptime and time-to-resolve through monitoring and automated remediation |
Plan and execute full software development life cycles (SDLC) for each assigned project, adhering to company standards and expectations |
Special Skills Required |
Bachelor’s degree in software engineering or information technology |
|
2+ years of experience engineering software and networking platforms |
|
2+ years of experience in building large-scale software applications. |
|
Proven ability to document design processes, including development, tests, analytics, and troubleshooting |
|
Experience with rapid development cycles in a web-based environment |
|
Strong scripting and test automation abilities, ability to drive a Test Driven Development Model |
|
Working knowledge of relational databases as well as ORM and Postgre and other SQL technologies |
|
Proficiency with Javascript, Typescript, React.js, Babylon, Nodejs, HTML5, CSS3, and order management systems |
|
Proven experience designing interactive applications and networking platforms |
|
Web application development experience with multiple frameworks, including Blockchain, Hyperledger, Spark, Kafka, Elasticsearch, neo4j, graphQL |
|
Desire to continue to grow professional capabilities with ongoing training and educational opportunities |
|
Additional Knowledge in computer vision, embedded, blockchain technologies a plus |
|
Experience Designing and integrating RESTful APIs |
|
Excellent debugging and optimization skills |
|
Unit/integration testing experience |
|
Interest in learning new tools, languages, workflows, and philosophies to grow |
|
Professional certifications |
Location |
Bengaluru, Karnataka, India - 560072 |
- Does analytics to extract insights from raw historical data of the organization.
- Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
- Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
- Tests the short/long term impact of productized MV models on those trends.
- Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.
• Create and maintain data pipeline
• Build and deploy ETL infrastructure for optimal data delivery
• Work with various including product, design and executive team to troubleshoot data
related issues
• Create tools for data analysts and scientists to help them build and optimise the product
• Implement systems and process for data access controls and guarantees
• Distill the knowledge from experts in the field outside the org and optimise internal data
systems
Preferred qualifications/skills:
• 5+ years experience
• Strong analytical skills
____ 04
Freight Commerce Solutions Pvt Ltd.
• Degree in Computer Science, Statistics, Informatics, Information Systems
• Strong project management and organisational skills
• Experience supporting and working with cross-functional teams in a dynamic environment
• SQL guru with hands on experience on various databases
• NoSQL databases like Cassandra, MongoDB
• Experience with Snowflake, Redshift
• Experience with tools like Airflow, Hevo
• Experience with Hadoop, Spark, Kafka, Flink
• Programming experience in Python, Java, Scala
Recko Inc. is looking for data engineers to join our kick-ass engineering team. We are looking for smart, dynamic individuals to connect all the pieces of the data ecosystem.
What are we looking for:
-
3+ years of development experience in at least one of MySQL, Oracle, PostgreSQL or MSSQL and experience in working with Big Data technologies like Big Data frameworks/platforms/data stores like Hadoop, HDFS, Spark, Oozie, Hue, EMR, Scala, Hive, Glue, Kerberos etc.
-
Strong experience setting up data warehouses, data modeling, data wrangling and dataflow architecture on the cloud
-
2+ experience with public cloud services such as AWS, Azure, or GCP and languages like Java/ Python etc
-
2+ years of development experience in Amazon Redshift, Google Bigquery or Azure data warehouse platforms preferred
-
Knowledge of statistical analysis tools like R, SAS etc
-
Familiarity with any data visualization software
-
A growth mindset and passionate about building things from the ground up and most importantly, you should be fun to work with
As a data engineer at Recko, you will:
-
Create and maintain optimal data pipeline architecture,
-
Assemble large, complex data sets that meet functional / non-functional business requirements.
-
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
-
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
-
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
-
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
-
Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
-
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
-
Work with data and analytics experts to strive for greater functionality in our data systems.
About Recko:
Recko was founded in 2017 to organise the world’s transactional information and provide intelligent applications to finance and product teams to make sense of the vast amount of data available. With the proliferation of digital transactions over the past two decades, Enterprises, Banks and Financial institutions are finding it difficult to keep a track on the money flowing across their systems. With the Recko Platform, businesses can build, integrate and adapt innovative and complex financial use cases within the organization and across external payment ecosystems with agility, confidence and at scale. . Today, customer-obsessed brands such as Deliveroo, Meesho, Grofers, Dunzo, Acommerce, etc use Recko so their finance teams can optimize resources with automation and prioritize growth over repetitive and time-consuming tasks around day-to-day operations.
Recko is a Series A funded startup, backed by marquee investors like Vertex Ventures, Prime Venture Partners and Locus Ventures. Traditionally enterprise software is always built around functionality. We believe software is an extension of one’s capability, and it should be delightful and fun to use.
Working at Recko:
We believe that great companies are built by amazing people. At Recko, We are a group of young Engineers, Product Managers, Analysts and Business folks who are on a mission to bring consumer tech DNA to enterprise fintech applications. The current team at Recko is 60+ members strong with stellar experience across fintech, e-commerce, digital domains at companies like Flipkart, PhonePe, Ola Money, Belong, Razorpay, Grofers, Jio, Oracle etc. We are growing aggressively across verticals.
The Platform Data Science team works at the intersection of data science and engineering. Domain experts develop and advance platforms, including the data platforms, machine learning platform, other platforms for Forecasting, Experimentation, Anomaly Detection, Conversational AI, Underwriting of Risk, Portfolio Management, Fraud Detection & Prevention and many more. We also are the Data Science and Analytics partners for Product and provide Behavioural Science insights across Jupiter.
About the role:
We’re looking for strong Software Engineers that can combine EMR, Redshift, Hadoop, Spark, Kafka, Elastic Search, Tensorflow, Pytorch and other technologies to build the next generation Data Platform, ML Platform, Experimentation Platform. If this sounds interesting we’d love to hear from you!
This role will involve designing and developing software products that impact many areas of our business. The individual in this role will have responsibility help define requirements, create software designs, implement code to these specifications, provide thorough unit and integration testing, and support products while deployed and used by our stakeholders.
Key Responsibilities:
Participate, Own & Influence in architecting & designing of systems
Collaborate with other engineers, data scientists, product managers
Build intelligent systems that drive decisions
Build systems that enable us to perform experiments and iterate quickly
Build platforms that enable scientists to train, deploy and monitor models at scale
Build analytical systems that drives better decision making
Required Skills:
Programming experience with at least one modern language such as Java, Scala including object-oriented design
Experience in contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
Bachelor’s degree in Computer Science or related field
Computer Science fundamentals in object-oriented design
Computer Science fundamentals in data structures
Computer Science fundamentals in algorithm design, problem solving, and complexity analysis
Experience in databases, analytics, big data systems or business intelligence products:
Data lake, data warehouse, ETL, ML platform
Big data tech like: Hadoop, Apache Spark
Series 'A' funded Silicon Valley based BI startup
companies uncover the 3% of active buyers in their target market. It evaluates
over 100 billion data points and analyzes factors such as buyer journeys, technology
adoption patterns, and other digital footprints to deliver market & sales intelligence.
Its customers have access to the buying patterns and contact information of
more than 17 million companies and 70 million decision makers across the world.
Role – Data Engineer
Responsibilities
Work in collaboration with the application team and integration team to
design, create, and maintain optimal data pipeline architecture and data
structures for Data Lake/Data Warehouse.
Work with stakeholders including the Sales, Product, and Customer Support
teams to assist with data-related technical issues and support their data
analytics needs.
Assemble large, complex data sets from third-party vendors to meet business
requirements.
Identify, design, and implement internal process improvements: automating
manual processes, optimizing data delivery, re-designing infrastructure for
greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and
loading of data from a wide variety of data sources using SQL, Elasticsearch,
MongoDB, and AWS technology.
Streamline existing and introduce enhanced reporting and analysis solutions
that leverage complex data sources derived from multiple internal systems.
Requirements
5+ years of experience in a Data Engineer role.
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
The Data Engineer would be responsible for selecting and integrating Big Data tools and frameworks required. Would implement Data Ingestion & ETL/ELT processes
Required Experience, Skills and Qualifications:
- Hands on experience on Big Data tools/technologies like Spark, Databricks, Map Reduce, Hive, HDFS.
- Expertise and excellent understanding of big data toolset such as Sqoop, Spark-streaming, Kafka, NiFi
- Proficiency in any of the programming language: Python/ Scala/ Java with 4+ years’ experience
- Experience in Cloud infrastructures like MS Azure, Data lake etc
- Good working knowledge in NoSQL DB (Mongo, HBase, Casandra)
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
-
Owns the end to end implementation of the assigned data processing components/product features i.e. design, development, dep
loyment, and testing of the data processing components and associated flows conforming to best coding practices -
Creation and optimization of data engineering pipelines for analytics projects.
-
Support data and cloud transformation initiatives
-
Contribute to our cloud strategy based on prior experience
-
Independently work with all stakeholders across the organization to deliver enhanced functionalities
-
Create and maintain automated ETL processes with a special focus on data flow, error recovery, and exception handling and reporting
-
Gather and understand data requirements, work in the team to achieve high-quality data ingestion and build systems that can process the data, transform the data
-
Be able to comprehend the application of database index and transactions
-
Involve in the design and development of a Big Data predictive analytics SaaS-based customer data platform using object-oriented analysis
, design and programming skills, and design patterns -
Implement ETL workflows for data matching, data cleansing, data integration, and management
-
Maintain existing data pipelines, and develop new data pipeline using big data technologies
-
Responsible for leading the effort of continuously improving reliability, scalability, and stability of microservices and platform
Roles & Responsibilities
- Proven experience with deploying and tuning Open Source components into enterprise ready production tooling Experience with datacentre (Metal as a Service – MAAS) and cloud deployment technologies (AWS or GCP Architect certificates required)
- Deep understanding of Linux from kernel mechanisms through user space management
- Experience on CI/CD (Continuous Integrations and Deployment) system solutions (Jenkins).
- Using Monitoring tools (local and on public cloud platforms) Nagios, Prometheus, Sensu, ELK, Cloud Watch, Splunk, New Relic etc. to trigger instant alerts, reports and dashboards. Work closely with the development and infrastructure teams to analyze and design solutions with four nines (99.99%) up-time, globally distributed, clustered, production and non-production virtualized infrastructure.
- Wide understanding of IP networking as well as data centre infrastructure
Skills
- Expert with software development tools and sourcecode management, understanding, managing issues, code changes and grouping them into deployment releases in a stable and measurable way to maximize production Must be expert at developing and using ansible roles and configuring deployment templates with jinja2.
- Solid understanding of data collection tools like Flume, Filebeat, Metricbeat, JMX Exporter agents.
- Extensive experience operating and tuning the kafka streaming data platform, specifically as a message queue for big data processing
- Strong understanding and must have experience:
- Apache spark framework, specifically spark core and spark streaming,
- Orchestration platforms, mesos and kubernetes,
- Data storage platforms, elasticstack, carbon, clickhouse, cassandra, ceph, hdfs
- Core presentation technologies kibana, and grafana.
- Excellent scripting and programming skills (bash, python, java, go, rust). Must have previous experience with “rust” in order to support, improve in house developed products
Certification
Red Hat Certified Architect certificate or equivalent required CCNA certificate required 3-5 years of experience running open source big data platforms
we are looking for candidates who have good experiance with
BI/DW Experience of 3 - 6 years with Spark, Scala, SQL expertise
and Azure.
Azure background is needed.
* Spark hands on : Must have
* Scala hands on : Must have
* SQL expertise : Expert
* Azure background : Must have
* Python hands on : Good to have
* ADF, Data Bricks: Good to have
* Should be able to communicate effectively and deliver technology
implementation end to end
Looking for candidates who can join 15 to 30 Days and who will avaailable immeiate.
Regards
Gayatri P
Fragma Data Systems
- Expert software implementation and automated testing
- Promoting development standards, code reviews, mentoring, knowledge sharing
- Improving our Agile methodology maturity
- Product and feature design, scrum story writing
- Build, release, and deployment automation
- Product support & troubleshooting
Who we have in mind:
- Demonstrated experience as a Java
- Should have a deep understanding of Enterprise/Distributed Architecture patterns and should be able to demonstrate the relevant usage of the same
- Turn high-level project requirements into application-level architecture and collaborate with the team members to implement the solution
- Strong experience and knowledge in Spring boot framework and microservice architecture
- Experience in working with Apache Spark
- Solid demonstrated object-oriented software development experience with Java, SQL, Maven, relational/NoSQL databases and testing frameworks
- Strong working experience with developing RESTful services
- Should have experience working on Application frameworks such as Spring, Spring Boot, AOP
- Exposure to tools – Jira, Bamboo, Git, Confluence would be an added advantage
- Excellent grasp of the current technology landscape, trends and emerging technologies
Job Description
Role requires experience in AWS and also programming experience in Python and Spark
Roles & Responsibilities
You Will:
- Translate functional requirements into technical design
- Interact with clients and internal stakeholders to understand the data and platform requirements in detail and determine core cloud services needed to fulfil the technical design
- Design, Develop and Deliver data integration interfaces in the AWS
- Design, Develop and Deliver data provisioning interfaces to fulfil consumption needs
- Deliver data models on Cloud platform, it could be on AWS Redshift, SQL.
- Design, Develop and Deliver data integration interfaces at scale using Python / Spark
- Automate core activities to minimize the delivery lead times and improve the overall quality
- Optimize platform cost by selecting right platform services and architecting the solution in a cost-effective manner
- Manage code and deploy DevOps and CI CD processes
- Deploy logging and monitoring across the different integration points for critical alerts
You Have:
- Minimum 5 years of software development experience
- Bachelor's and/or Master’s degree in computer science
- Strong Consulting skills in data management including data governance, data quality, security, data integration, processing and provisioning
- Delivered data management projects in any of the AWS
- Translated complex analytical requirements into technical design including data models, ETLs and Dashboards / Reports
- Experience deploying dashboards and self-service analytics solutions on both relational and non-relational databases
- Experience with different computing paradigms in databases such as In-Memory, Distributed, Massively Parallel Processing
- Successfully delivered large scale data management initiatives covering Plan, Design, Build and Deploy phases leveraging different delivery methodologies including Agile
- Strong knowledge of continuous integration, static code analysis and test-driven development
- Experience in delivering projects in a highly collaborative delivery model with teams at onsite and offshore
- Must have Excellent analytical and problem-solving skills
- Delivered change management initiatives focused on driving data platforms adoption across the enterprise
- Strong verbal and written communications skills are a must, as well as the ability to work effectively across internal and external organizations
Knowledge of Hadoop ecosystem installation, initial-configuration and performance tuning.
Expert with Apache Ambari, Spark, Unix Shell scripting, Kubernetes and Docker
Knowledge on python would be desirable.
Experience with HDP Manager/clients and various dashboards.
Understanding on Hadoop Security (Kerberos, Ranger and Knox) and encryption and Data masking.
Experience with automation/configuration management using Chef, Ansible or an equivalent.
Strong experience with any Linux distribution.
Basic understanding of network technologies, CPU, memory and storage.
Database administration a plus.
Qualifications and Education Requirements
2 to 4 years of experience with and detailed knowledge of Core Hadoop Components solutions and
dashboards running on Big Data technologies such as Hadoop/Spark.
Bachelor degree or equivalent in Computer Science or Information Technology or related fields.
Responsibilities for Data Architect
- Research and properly evaluate sources of information to determine possible limitations in reliability or usability
- Apply sampling techniques to effectively determine and define ideal categories to be questioned
- Compare and analyze provided statistical information to identify patterns, relationships and problems
- Define and utilize statistical methods to solve industry-specific problems in varying fields, such as economics and engineering
- Prepare detailed reports for management and other departments by analyzing and interpreting data
- Train assistants and other members of the team how to properly organize findings and read data collected
- Design computer code using various languages to improve and update software and applications
- Refer to previous instances and findings to determine the ideal method for gathering data
Data Platform engineering at Uber is looking for a strong Technical Lead (Level 5a Engineer) who has built high quality platforms and services that can operate at scale. 5a Engineer at Uber exhibits following qualities:
- Demonstrate tech expertise › Demonstrate technical skills to go very deep or broad in solving classes of problems or creating broadly leverageable solutions.
- Execute large scale projects › Define, plan and execute complex and impactful projects. You communicate the vision to peers and stakeholders.
- Collaborate across teams › Domain resource to engineers outside your team and help them leverage the right solutions. Facilitate technical discussions and drive to a consensus.
- Coach engineers › Coach and mentor less experienced engineers and deeply invest in their learning and success. You give and solicit feedback, both positive and negative, to others you work with to help improve the entire team.
- Tech leadership › Lead the effort to define the best practices in your immediate team, and help the broader organization establish better technical or business processes.
What You’ll Do
- Build a scalable, reliable, operable and performant data analytics platform for Uber’s engineers, data scientists, products and operations teams.
- Work alongside the pioneers of big data systems such as Hive, Yarn, Spark, Presto, Kafka, Flink to build out a highly reliable, performant, easy to use software system for Uber’s planet scale of data.
- Become proficient of multi-tenancy, resource isolation, abuse prevention, self-serve debuggability aspects of a high performant, large scale, service while building these capabilities for Uber's engineers and operation folks.
What You’ll Need
- 7+ years experience in building large scale products, data platforms, distributed systems in a high caliber environment.
- Architecture: Identify and solve major architectural problems by going deep in your field or broad across different teams. Extend, improve, or, when needed, build solutions to address architectural gaps or technical debt.
- Software Engineering/Programming: Create frameworks and abstractions that are reliable and reusable. advanced knowledge of at least one programming language, and are happy to learn more. Our core languages are Java, Python, Go, and Scala.
- Data Engineering: Expertise in one of the big data analytics technologies we currently use such as Apache Hadoop (HDFS and YARN), Apache Hive, Impala, Drill, Spark, Tez, Presto, Calcite, Parquet, Arrow etc. Under the hood experience with similar systems such as Vertica, Apache Impala, Drill, Google Borg, Google BigQuery, Amazon EMR, Amazon RedShift, Docker, Kubernetes, Mesos etc.
- Execution & Results: You tackle large technical projects/problems that are not clearly defined. You anticipate roadblocks and have strategies to de-risk timelines. You orchestrate work that spans multiple teams and keep your stakeholders informed.
- A team player: You believe that you can achieve more on a team that the whole is greater than the sum of its parts. You rely on others’ candid feedback for continuous improvement.
- Business acumen: You understand requirements beyond the written word. Whether you’re working on an API used by other developers, an internal tool consumed by our operation teams, or a feature used by millions of customers, your attention to details leads to a delightful user experience.
Spark / Scala experience should be more than 2 years.
Combination with Java & Scala is fine or we are even fine with Big Data Developer with strong Core Java Concepts. - Scala / Spark Developer.
Strong proficiency Scala on Spark (Hadoop) - Scala + Java is also preferred
Complete SDLC process and Agile Methodology (Scrum)
Version control / Git
We are looking for BE/BTech graduates (2018/2019 pass out) who want to build their career as Data Engineer covering technologies like Hadoop, NoSQL, RDBMS, Spark, Kafka, Hive, ETL, MDM & Data Quality. You should be willing to learn, explore, experiment, develop POCs/Solutions using these technologies with guidance and support from highly experienced Industry Leaders. You should be passionate about your work and willing to go extra mile to achieve results.
We are looking for candidates who believe in commitment and in building strong relationships. We need people who are passionate about solving problems through software and are flexible.
Required Experience, Skills and Qualifications
Passionate to learn and explore new technologies
Any RDBMS experience (SQL Server/Oracle/MySQL)
Any ETL tool experience (Informatica/Talend/Kettle/SSIS)
Understanding of Big Data technologies
Good Communication Skills
Excellent Mathematical / Logical / Reasoning Skills