



Location: Bangalore/Pune/Hyderabad/Nagpur
4-5 years of overall experience in software development.
- Experience on Hadoop (Apache/Cloudera/Hortonworks) and/or other Map Reduce Platforms
- Experience on Hive, Pig, Sqoop, Flume and/or Mahout
- Experience on NO-SQL – HBase, Cassandra, MongoDB
- Hands on experience with Spark development, Knowledge of Storm, Kafka, Scala
- Good knowledge of Java
- Good background of Configuration Management/Ticketing systems like Maven/Ant/JIRA etc.
- Knowledge around any Data Integration and/or EDW tools is plus
- Good to have knowledge of using Python/Perl/Shell
Please note - Hbase hive and spark are must.

Similar jobs
- Build campaign generation services which can send app notifications at a speed of 10 million a minute
- Dashboards to show Real time key performance indicators to clients
- Develop complex user segmentation engines which creates segments on Terabytes of data within few seconds
- Building highly available & horizontally scalable platform services for ever growing data
- Use cloud based services like AWS Lambda for blazing fast throughput & auto scalability
- Work on complex analytics on terabytes of data like building Cohorts, Funnels, User path analysis, Recency Frequency & Monetary analysis at blazing speed
- You will build backend services and APIs to create scalable engineering systems.
- As an individual contributor, you will tackle some of our broadest technical challenges that requires deep technical knowledge, hands-on software development and seamless collaboration with all functions.
- You will envision and develop features that are highly reliable and fault tolerant to deliver a superior customer experience.
- Collaborating various highly-functional teams in the company to meet deliverables throughout the software development lifecycle.
- Identify and improvise areas of improvement through data insights and research.
- 2-5 years of experience in backend development and must have worked on Java/shell/Perl/python scripting.
- Solid understanding of engineering best practices, continuous integration, and incremental delivery.
- Strong analytical skills, debugging and troubleshooting skills, product line analysis.
- Follower of agile methodology (Sprint planning, working on JIRA, retrospective etc).
- Proficiency in usage of tools like Docker, Maven, Jenkins and knowledge on frameworks in Java like spring, spring boot, hibernate, JPA.
- Ability to design application modules using various concepts like object oriented, multi-threading, synchronization, caching, fault tolerance, sockets, various IPCs, database interfaces etc.
- Hands on experience on Redis, MySQL and streaming technologies like Kafka producer consumers and NoSQL databases like mongo dB/Cassandra.
- Knowledge about versioning like Git and deployment processes like CICD.

Requirements
- 3+ years work experience with production-grade python. Contribution to open source repos is preferred
- Experience writing concurrent and distributed programs, AWS lambda, Kubernetes, Docker, Spark is preferred.
- Experience with one relational & 1 non-relational DB is preferred
- Prior work in the ML domain will be a big boost
What You’ll Do
- Help realize the product vision: Production-ready machine learning models with monitoring within moments, not months.
- Help companies deploy their machine learning models at scale across a wide range of use-cases and sectors.
- Build integrations with other platforms to make it easy for our customers to use our product without changing their workflow.
- Write maintainable, scalable performant python code
- Building gRPC, rest API servers
- Working with Thrift, Protobufs, etc.
● Design and deliver scalable web services, APIs, and backend data modules.
Understand requirements and develop reusable code using design patterns &
component architecture and write unit test cases.
● Collaborate with product management and engineering teams to elicit &
understand their requirements & challenges and develop potential solutions
● Stay current with the latest tools, technology ideas, and methodologies; share
knowledge by clearly articulating results and ideas to key decision-makers.
Requirements
● 3-6 years of strong experience in developing highly scalable backend and
middle tier. BS/MS in Computer Science or equivalent from premier institutes
Strong in problem-solving, data structures, and algorithm design. Strong
experience in system architecture, Web services development, highly scalable
distributed applications.
● Good in large data systems such as Hadoop, Map Reduce, NoSQL Cassandra, etc.. Fluency in Java, Spring, Hibernate, J2EE, REST Services Ability to deliver code
quickly from given scenarios in a fast-paced start-up environment.
● Attention to detail. Strong communication and collaboration skills.


We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.
Exp: 4-9yrs
Location: Pune/Bangalore/Hyderabad
Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala

Our company is seeking to hire a skilled software developer to help with the development of our AI/ML platform.
Your duties will primarily revolve around building Platform by writing code in Scala, as well as modifying platform
to fix errors, work on distributed computing, adapt it to new cloud services, improve its performance, or upgrade
interfaces. To be successful in this role, you will need extensive knowledge of programming languages and the
software development life-cycle.
Responsibilities:
Analyze, design develop, troubleshoot and debug Platform
Writes code and guides other team membersfor best practices and performs testing and debugging of
applications.
Specify, design and implementminor changes to existing software architecture. Build highly complex
enhancements and resolve complex bugs. Build and execute unit tests and unit plans.
Duties and tasks are varied and complex, needing independent judgment. Fully competent in own area of
expertise
Experience:
The candidate should have about 2+ years of experience with design and development in Java/Scala. Experience in
algorithm, Distributed System, Data-structure, database and architectures of distributed System is mandatory.
Required Skills:
1. In-depth knowledge of Hadoop, Spark architecture and its componentssuch as HDFS, YARN and executor, cores and memory param
2. Knowledge of Scala/Java.
3. Extensive experience in developing spark job. Should possess good Oops knowledge and be aware of
enterprise application design patterns.
4. Good knowledge of Unix/Linux.
5. Experience working on large-scale software projects
6. Keep an eye out for technological trends, open-source projects that can be used.
7. Knows common programming languages Frameworks
-
Bachelor’s or master’s degree in Computer Engineering, Computer Science, Computer Applications, Mathematics, Statistics, or related technical field. Relevant experience of at least 3 years in lieu of above if from a different stream of education.
-
Well-versed in and 3+ hands-on demonstrable experience with: ▪ Stream & Batch Big Data Pipeline Processing using Apache Spark and/or Apache Flink.
▪ Distributed Cloud Native Computing including Server less Functions
▪ Relational, Object Store, Document, Graph, etc. Database Design & Implementation
▪ Micro services Architecture, API Modeling, Design, & Programming -
3+ years of hands-on development experience in Apache Spark using Scala and/or Java.
-
Ability to write executable code for Services using Spark RDD, Spark SQL, Structured Streaming, Spark MLLib, etc. with deep technical understanding of Spark Processing Framework.
-
In-depth knowledge of standard programming languages such as Scala and/or Java.
-
3+ years of hands-on development experience in one or more libraries & frameworks such as Apache Kafka, Akka, Apache Storm, Apache Nifi, Zookeeper, Hadoop ecosystem (i.e., HDFS, YARN, MapReduce, Oozie & Hive), etc.; extra points if you can demonstrate your knowledge with working examples.
-
3+ years of hands-on development experience in one or more Relational and NoSQL datastores such as PostgreSQL, Cassandra, HBase, MongoDB, DynamoDB, Elastic Search, Neo4J, etc.
-
Practical knowledge of distributed systems involving partitioning, bucketing, CAP theorem, replication, horizontal scaling, etc.
-
Passion for distilling large volumes of data, analyze performance, scalability, and capacity performance issues in Big Data Platforms.
-
Ability to clearly distinguish system and Spark Job performances and perform spark performance tuning and resource optimization.
-
Perform benchmarking/stress tests and document the best practices for different applications.
-
Proactively work with tenants on improving the overall performance and ensure the system is resilient, and scalable.
-
Good understanding of Virtualization & Containerization; must demonstrate experience in technologies such as Kubernetes, Istio, Docker, OpenShift, Anthos, Oracle VirtualBox, Vagrant, etc.
-
Well-versed with demonstrable working experience with API Management, API Gateway, Service Mesh, Identity & Access Management, Data Protection & Encryption.
Hands-on experience with demonstrable working experience with DevOps tools and platforms viz., Jira, GIT, Jenkins, Code Quality & Security Plugins, Maven, Artifactory, Terraform, Ansible/Chef/Puppet, Spinnaker, etc.
-
Well-versed in AWS and/or Azure or and/or Google Cloud; must demonstrate experience in at least FIVE (5) services offered under AWS and/or Azure or and/or Google Cloud in any categories: Compute or Storage, Database, Networking & Content Delivery, Management & Governance, Analytics, Security, Identity, & Compliance (or) equivalent demonstrable Cloud Platform experience.
-
Good understanding of Storage, Networks and Storage Networking basics which will enable you to work in a Cloud environment.
-
Good understanding of Network, Data, and Application Security basics which will enable you to work in a Cloud as well as Business Applications / API services environment.

Be Part Of Building The Future
Dremio is the Data Lake Engine company. Our mission is to reshape the world of analytics to deliver on the promise of data with a fundamentally new architecture, purpose-built for the exploding trend towards cloud data lake storage such as AWS S3 and Microsoft ADLS. We dramatically reduce and even eliminate the need for the complex and expensive workarounds that have been in use for decades, such as data warehouses (whether on-premise or cloud-native), structural data prep, ETL, cubes, and extracts. We do this by enabling lightning-fast queries directly against data lake storage, combined with full self-service for data users and full governance and control for IT. The results for enterprises are extremely compelling: 100X faster time to insight; 10X greater efficiency; zero data copies; and game-changing simplicity. And equally compelling is the market opportunity for Dremio, as we are well on our way to disrupting a $25BN+ market.
About the Role
The Dremio India team owns the DataLake Engine along with Cloud Infrastructure and services that power it. With focus on next generation data analytics supporting modern table formats like Iceberg, Deltalake, and open source initiatives such as Apache Arrow, Project Nessie and hybrid-cloud infrastructure, this team provides various opportunities to learn, deliver, and grow in career. We are looking for innovative minds with experience in leading and building high quality distributed systems at massive scale and solving complex problems.
Responsibilities & ownership
- Lead, build, deliver and ensure customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product.
- Work on distributed systems for data processing with efficient protocols and communication, locking and consensus, schedulers, resource management, low latency access to distributed storage, auto scaling, and self healing.
- Understand and reason about concurrency and parallelization to deliver scalability and performance in a multithreaded and distributed environment.
- Lead the team to solve complex and unknown problems
- Solve technical problems and customer issues with technical expertise
- Design and deliver architectures that run optimally on public clouds like GCP, AWS, and Azure
- Mentor other team members for high quality and design
- Collaborate with Product Management to deliver on customer requirements and innovation
- Collaborate with Support and field teams to ensure that customers are successful with Dremio
Requirements
- B.S./M.S/Equivalent in Computer Science or a related technical field or equivalent experience
- Fluency in Java/C++ with 8+ years of experience developing production-level software
- Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models, and their use in developing distributed and scalable systems
- 5+ years experience in developing complex and scalable distributed systems and delivering, deploying, and managing microservices successfully
- Hands-on experience in query processing or optimization, distributed systems, concurrency control, data replication, code generation, networking, and storage systems
- Passion for quality, zero downtime upgrades, availability, resiliency, and uptime of the platform
- Passion for learning and delivering using latest technologies
- Ability to solve ambiguous, unexplored, and cross-team problems effectively
- Hands on experience of working projects on AWS, Azure, and Google Cloud Platform
- Experience with containers and Kubernetes for orchestration and container management in private and public clouds (AWS, Azure, and Google Cloud)
- Understanding of distributed file systems such as S3, ADLS, or HDFS
- Excellent communication skills and affinity for collaboration and teamwork
- Ability to work individually and collaboratively with other team members
- Ability to scope and plan solution for big problems and mentors others on the same
- Interested and motivated to be part of a fast-moving startup with a fun and accomplished team





