Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
- You will partner with teammates to create complex data processing pipelines to solve our clients' most complex challenges
- You will collaborate with Data Scientists to design scalable implementations of their models
- You will pair to write clean and iterative code based on TDD
- Leverage various continuous delivery practices to deploy, support and operate data pipelines
- Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
- Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
- Create data models and speak to the tradeoffs of different modeling approaches
- Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
- Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
- You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
- You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
- Hands-on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
- You are comfortable taking data-driven approaches and applying data security strategies to solve business problems
- Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
- You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
- You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
- An interest in coaching, sharing your experience and knowledge with teammates
- You enjoy influencing others and always advocate for technical excellence while being open to change when needed
- Presence in the external tech community: you willingly share your expertise with others via speaking engagements, contributions to open source, blogs and more
Other things to know
Learning & Development
There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.
Thoughtworks is a global technology consultancy that integrates strategy, design and engineering to drive digital innovation. For over 30 years, our clients have trusted our autonomous teams to build solutions that look past the obvious. Here, computer science grads come together with seasoned technologists, self-taught developers, midlife career changers and more to learn from and challenge each other. Career journeys flourish with the strength of our cultivation culture, which has won numerous awards around the world.
Join Thoughtworks and thrive. Together, our extra curiosity, innovation, passion and dedication overcomes ordinary.
Lead/Senior Big Data Engineer
Experience: 6 - 10 Years
Digit88 empowers digital transformation for innovative and high growth B2B and B2C SaaS companies as their trusted offshore software product engineering partner!
We are a lean mid-stage software company, with a team of 75+ fantastic technologists, backed by executives with deep understanding of and extensive experience in consumer and enterprise product development across large corporations and startups. We build highly efficient and effective engineering teams that solve real and complex problems for our partners.
With more than 50+ years of collective experience in areas ranging from B2B and B2C SaaS, web and mobile apps, e-commerce platforms and solutions, custom enterprise SaaS platforms and domains spread across Conversational AI, Chatbots, IoT, Health-tech, ESG/Energy Analytics, Data Engineering, the founding team thrives in a fast paced and challenging environment that allows us to showcase our best.
The Vision: To be the most trusted technology partner to innovative software product companies world-wide
Digit88 development team is establishing a new offshore product development team for its partner , that is building next-generation Big Data, Cloud-Based Business Operation Support technology for utilities, retail energy suppliers and Community Choice Aggregators (CCA). The candidate would be joining an existing team of outstanding data engineers in the US and help us expand the data engineering team and work on different products and on different layers of the infrastructure.
Digit88 is looking for a Big Data Engineer who will work on building, and managing Big Data Pipelines for us to deal with the huge structured data sets that we use as an input to accurately generate analytics at scale for our valued Customers. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining,implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company. Applicants must have a passion for engineering with accuracy and efficiency, be highly motivated and organized, able to work as part of a team, and also possess the ability to work independently with minimal supervision.
To be successful in this role, you should possess
- Collaborate closely with Product Management and Engineering leadership to devise and build the right solution.
- Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big Data tools and frameworks required to solve Big Data problems at scale.
- Design and implement systems to cleanse, process, and analyze large data sets using distributed processing tools like Akka and Spark.
- Understanding and critically reviewing existing data pipelines, and coming up with ideas in collaboration with Technical Leaders and Architects to improve upon current bottlenecks
- Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior Individual contributor on the multiple products and features we have.
- 7+ years of experience in developing highly scalable Big Data pipelines.
- In-depth understanding of the Big Data ecosystem including processing frameworks like Spark, Akka, Storm, and Hadoop, and the file types they deal with.
- Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.
- Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design Patterns when required.
- Experience with Git and build tools like Gradle/Maven/SBT.
- Strong understanding of object-oriented design, data structures, algorithms, profiling, and optimization.
- Have elegant, readable, maintainable and extensible code style.
You are someone who would easily be able to
- Work closely with the US and India engineering teams to help build the Java/Scala based data pipelines
- Lead the India engineering team in technical excellence and ownership of critical modules; own the development of new modules and features
- Troubleshoot live production server issues.
- Handle client coordination and be able to work as a part of a team, be able to contribute independently and drive the team to exceptional contributions with minimal team supervision
- Follow Agile methodology, JIRA for work planning, issue management/tracking.
Additional Project/Soft Skills:
- Should be able to work independently with India & US based team members.
- Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.
- Strong sense of urgency, with a passion for accuracy and timeliness.
- Ability to work calmly in high pressure situations and manage multiple projects/tasks.
- Ability to work independently and possess superior skills in issue resolution.
- Should have the passion to learn and implement, analyse and troubleshoot issues
We are looking for computer science/engineering final year students/ fresh graduates that have solid understanding of computer science fundamentals (algorithms, data structures, object oriented programming) and strong java. programming skills. You will get to work on machine learning algorithms as applied to online advertising or do data analytics. You will learn how to collaborate in small, agile teams, do rapid development, testing and get to taste the invigorating feel of a start-up company.
-Solid foundation in computer science, with strong competencies in data structures, algorithms, and software design
-Java / Python programming
-MYSQL, Relational Databases
-MVC Framework, ReactJS
-Familiarity with online advertising, web technologies
-Familiarity with Hadoop, Spark, Scala
UG - B.Tech/B.E. - Computers; PG - M.Tech - Computers
- Data Engineering spectrum (Java/Spark)
- Spark Scala / Kafka Streaming
- Confluent Kafka components
- Basic understanding of Hadoop
Telstra is Australia’s leading telecommunications and technology company, with operations in more than 20 countries, including In India where we’re building a new Innovation and Capability Centre (ICC) in Bangalore.
We’re growing, fast, and for you that means many exciting opportunities to develop your career at Telstra. Join us on this exciting journey, and together, we’ll reimagine the future.
- We're an iconic Australian company with a rich heritage that's been built over 100 years. Telstra is Australia's leading Telecommunications and Technology Company. We've been operating internationally for more than 70 years.
- International presence spanning over 20 countries.
- We are one of the 20 largest telecommunications providers globally
- At Telstra, the work is complex and stimulating, but with that comes a great sense of achievement. We are shaping the tomorrow's modes of communication with our innovation driven teams.
Telstra offers an opportunity to make a difference to lives of millions of people by providing the choice of flexibility in work and a rewarding career that you will be proud of!
About the team
Being part of Networks & IT means you'll be part of a team that focuses on extending our network superiority to enable the continued execution of our digital strategy.
With us, you'll be working with world-leading technology and change the way we do IT to ensure business needs drive priorities, accelerating our digitisation programme.
Focus of the role
Any new engineer who comes into data chapter would be mostly into developing reusable data processing and storage frameworks that can be used across data platform.
To be successful in the role, you'll bring skills and experience in:-
- Hands-on experience in Spark Core, Spark SQL, SQL/Hive/Impala, Git/SVN/Any other VCS and Data warehousing
- Skilled in the Hadoop Ecosystem(HDP/Cloudera/MapR/EMR etc)
- Azure data factory/Airflow/control-M/Luigi
- Exposure to NOSQL(Hbase/Cassandra/GraphDB(Neo4J)/MongoDB)
- File formats (Parquet/ORC/AVRO/Delta/Hudi etc.)
Experience and knowledgeable on the following:
- Spark Streaming
- Cloud exposure (Azure/AWS/GCP)
- Azure data offerings - ADF, ADLS2, Azure Databricks, Azure Synapse, Eventhubs, CosmosDB etc.
- Azure DevOps
- Jenkins/ Bamboo/Any similar build tools
- Power BI
- Prior experience in building or working in team building reusable frameworks,
- Data modelling.
- Data Architecture and design principles. (Delta/Kappa/Lambda architecture)
- Exposure to CI/CD
- Code Quality - Static and Dynamic code scans
- Agile SDLC
If you've got a passion to innovate, succeed as part of a great team, and looking for the next step in your career, we'd welcome you to apply!
We’re committed to building a diverse and inclusive workforce in all its forms. We encourage applicants from diverse gender, cultural and linguistic backgrounds and applicants who may be living with a disability. We also offer flexibility in all our roles, to ensure everyone can participate.
To learn more about how we support our people, including accessibility adjustments we can provide you through the recruitment process, visit tel.st/thrive.
good exposure to concepts and/or technology across the broader spectrum. Enterprise Risk Technology
covers a variety of existing systems and green-field projects.
A Full stack Hadoop development experience with Scala development
A Full stack Java development experience covering Core Java (including JDK 1.8) and good understanding
of design patterns.
• Strong hands-on development in Java technologies.
• Strong hands-on development in Hadoop technologies like Spark, Scala and experience on Avro.
• Participation in product feature design and documentation
• Requirement break-up, ownership and implantation.
• Product BAU deliveries and Level 3 production defects fixes.
Qualifications & Experience
• Degree holder in numerate subject
• Hands on Experience on Hadoop, Spark, Scala, Impala, Avro and messaging like Kafka
• Experience across a core compiled language – Java
• Proficiency in Java related frameworks like Springs, Hibernate, JPA
• Hands on experience in JDK 1.8 and strong skillset covering Collections, Multithreading with
For internal use only
For internal use only
experience working on Distributed applications.
• Strong hands-on development track record with end-to-end development cycle involvement
• Good exposure to computational concepts
• Good communication and interpersonal skills
• Working knowledge of risk and derivatives pricing (optional)
• Proficiency in SQL (PL/SQL), data modelling.
• Understanding of Hadoop architecture and Scala program language is a good to have.
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
- Minimum 3 yrs of exp in Informatica Big data Developer(BDM) in Hadoop environment.
- Have knowledge of informatica Power exchange (PWX).
- Minimum 3 yrs of exp in big data querying tool like Hive and Impala.
- Ability to designing/development of complex mappings using informatica Big data Developer.
- Create and manage Informatica power exchange and CDC real time implementation
- Strong Unix knowledge skills for writing shell scripts and troubleshoot of existing scripts.
- Good knowledge of big data platforms and its framework.
- Good to have an experience in cloudera data platform (CDP)
- Experience with building stream processing systems using Kafka and spark
- Excellent SQL knowledge
Soft skills :
- Ability to work independently
- Strong analytical and problem solving skills
- Attitude of learning new technology
- Regular interaction with vendors, partners and stakeholders
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills
Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel
• Customer Office (Mumbai) / Remote Work
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
4-5 years of overall experience in software development.
- Experience on Hadoop (Apache/Cloudera/Hortonworks) and/or other Map Reduce Platforms
- Experience on Hive, Pig, Sqoop, Flume and/or Mahout
- Experience on NO-SQL – HBase, Cassandra, MongoDB
- Hands on experience with Spark development, Knowledge of Storm, Kafka, Scala
- Good knowledge of Java
- Good background of Configuration Management/Ticketing systems like Maven/Ant/JIRA etc.
- Knowledge around any Data Integration and/or EDW tools is plus
- Good to have knowledge of using Python/Perl/Shell
Please note - Hbase hive and spark are must.
Key Responsibilities : ( Data Developer Python, Spark)
Exp : 2 to 9 Yrs
Development of data platforms, integration frameworks, processes, and code.
Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages
Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.
Elaborate stories in a collaborative agile environment (SCRUM or Kanban)
Familiarity with cloud platforms like GCP, AWS or Azure.
Experience with large data volumes.
Familiarity with writing rest-based services.
Experience with distributed processing and systems
Experience with Hadoop / Spark toolsets
Experience with relational database management systems (RDBMS)
Experience with Data Flow development
Knowledge of Agile and associated development techniques including:
- Design, implement and support an analytical data infrastructure, providing ad hoc access to large data sets and computing power.
- Contribute to development of standards and the design and implementation of proactive processes to collect and report data and statistics on assigned systems.
- Research opportunities for data acquisition and new uses for existing data.
- Provide technical development expertise for designing, coding, testing, debugging, documenting and supporting data solutions.
- Experience building data pipelines to connect analytics stacks, client data visualization tools and external data sources.
- Experience with cloud and distributed systems principles
- Experience with Azure/AWS/GCP cloud infrastructure
- Experience with Databricks Clusters and Configuration
- Experience with Python, R, sh/bash and JVM-based languages including Scala and Java.
- Experience with Hadoop family languages including Pig and Hive.
- 6+ years of recent hands-on Java development
- Developing data pipelines in AWS or Google Cloud
- Great understanding of designing for performance, scalability, and reliability of data intensive application
- Hadoop MapReduce, Spark, Pig. Understanding of database fundamentals and advanced SQL knowledge.
- In-depth understanding of object oriented programming concepts and design patterns
- Ability to communicate clearly to technical and non-technical audiences, verbally and in writing
- Understanding of full software development life cycle, agile development and continuous integration
- Experience in Agile methodologies including Scrum and Kanban
Job Description for :
Role: Data/Integration Architect
Experience – 8-10 Years
Notice Period: Under 30 days
Key Responsibilities: Designing, Developing frameworks for batch and real time jobs on Talend. Leading migration of these jobs from Mulesoft to Talend, maintaining best practices for the team, conducting code reviews and demos.
Talend Data Fabric - Application, API Integration, Data Integration. Knowledge on Talend Management Cloud, deployment and scheduling of jobs using TMC or Autosys.
Programming Languages - Python/Java
Databases: SQL Server, Other Databases, Hadoop
Should have worked on Agile
Sound communication skills
Should be open to learning new technologies based on business needs on the job
Awareness of other data/integration platforms like Mulesoft, Camel
Awareness Hadoop, Snowflake, S3
We are looking for a skilled Senior/Lead Bigdata Engineer to join our team. The role is part of the research and development team, where you with enthusiasm and knowledge are going to be our technical evangelist for the development of our inspection technology and products.
At Elop we are developing product lines for sustainable infrastructure management using our own patented technology for ultrasound scanners and combine this with other sources to see holistic overview of the concrete structure. At Elop we will provide you with world-class colleagues highly motivated to position the company as an international standard of structural health monitoring. With the right character you will be professionally challenged and developed.
This position requires travel to Norway.
Elop is sister company of Simplifai and co-located together in all geographic locations.
Roles and Responsibilities
- Define technical scope and objectives through research and participation in requirements gathering and definition of processes
- Ingest and Process data from data sources (Elop Scanner) in raw format into Big Data ecosystem
- Realtime data feed processing using Big Data ecosystem
- Design, review, implement and optimize data transformation processes in Big Data ecosystem
- Test and prototype new data integration/processing tools, techniques and methodologies
- Conversion of MATLAB code into Python/C/C++.
- Participate in overall test planning for the application integrations, functional areas and projects.
- Work with cross functional teams in an Agile/Scrum environment to ensure a quality product is delivered.
Desired Candidate Profile
- Bachelor's degree in Statistics, Computer or equivalent
- 7+ years of experience in Big Data ecosystem, especially Spark, Kafka, Hadoop, HBase.
- 7+ years of hands-on experience in Python/Scala is a must.
- Experience in architecting the big data application is needed.
- Excellent analytical and problem solving skills
- Strong understanding of data analytics and data visualization, and must be able to help development team with visualization of data.
- Experience with signal processing is plus.
- Experience in working on client server architecture is plus.
- Knowledge about database technologies like RDBMS, Graph DB, Document DB, Apache Cassandra, OpenTSDB
- Good communication skills, written and oral, in English
We can Offer
- An everyday life with exciting and challenging tasks with the development of socially beneficial solutions
- Be a part of companys research and Development team to create unique and innovative products
- Colleagues with world-class expertise, and an organization that has ambitions and is highly motivated to position the company as an international player in maintenance support and monitoring of critical infrastructure!
- Good working environment with skilled and committed colleagues an organization with short decision paths.
- Professional challenges and development
Advanced degree in computer science, math, statistics or a related discipline ( Must have master degree )
Extensive data modeling and data architecture skills
Programming experience in Python, R
Background in machine learning frameworks such as TensorFlow or Keras
Knowledge of Hadoop or another distributed computing systems
Experience working in an Agile environment
Advanced math skills (Linear algebra
Differential equations (ODEs and numerical)
Theory of statistics 1
Numerical analysis 1 (numerical linear algebra) and 2 (quadrature)
Intermediate analysis (point set topology)) ( important )
Strong written and verbal communications
Hands on experience on NLP and NLG
Experience in advanced statistical techniques and concepts. ( GLM/regression, Random forest, boosting, trees, text mining ) and experience with application.
Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)
Primary Location : India-Pune, Hyderabad
Experience : 7 - 12 Years
Management Level: 7
Joining Time: Immediate Joiners are preferred
- Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
- Align architecture with business requirements and stabilizing the developed solution
- Ability to build prototypes to demonstrate the technical feasibility of your vision
- Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
- To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
- Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
- Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
- Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
- Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
- Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
- Deployment sophisticated analytics program of code using any of cloud application.
Perks and Benefits we Provide!
- Working with Highly Technical and Passionate, mission-driven people
- Subsidized Meals & Snacks
- Flexible Schedule
- Approachable leadership
- Access to various learning tools and programs
- Pet Friendly
- Certification Reimbursement Policy
- Check out more about us on our website below!
Our Kafka developer has a combination of technical skills, communication skills and business knowledge. The developer should be able to work on multiple medium to large projects. The successful candidate will have excellent technical skills of Apache/Confluent Kafka, Enterprise Data WareHouse preferable GCP BigQuery or any equivalent Cloud EDW and also will be able to take oral and written business requirements and develop efficient code to meet set deliverables.
Must Have Skills
- Participate in the development, enhancement and maintenance of data applications both as an individual contributor and as a lead.
- Leading in the identification, isolation, resolution and communication of problems within the production environment.
- Leading developer and applying technical skills Apache/Confluent Kafka (Preferred) AWS Kinesis (Optional), Cloud Enterprise Data Warehouse Google BigQuery (Preferred) or AWS RedShift or SnowFlakes (Optional)
- Design recommending best approach suited for data movement from different sources to Cloud EDW using Apache/Confluent Kafka
- Performs independent functional and technical analysis for major projects supporting several corporate initiatives.
- Communicate and Work with IT partners and user community with various levels from Sr Management to detailed developer to business SME for project definition .
- Works on multiple platforms and multiple projects concurrently.
- Performs code and unit testing for complex scope modules, and projects
- Provide expertise and hands on experience working on Kafka connect using schema registry in a very high volume environment (~900 Million messages)
- Provide expertise in Kafka brokers, zookeepers, KSQL, KStream and Kafka Control center.
- Provide expertise and hands on experience working on AvroConverters, JsonConverters, and StringConverters.
- Provide expertise and hands on experience working on Kafka connectors such as MQ connectors, Elastic Search connectors, JDBC connectors, File stream connector, JMS source connectors, Tasks, Workers, converters, Transforms.
- Provide expertise and hands on experience on custom connectors using the Kafka core concepts and API.
- Working knowledge on Kafka Rest proxy.
- Ensure optimum performance, high availability and stability of solutions.
- Create topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of best practices.
- Create stubs for producers, consumers and consumer groups for helping onboard applications from different languages/platforms. Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other things in the Hadoop ecosystem.
- Use automation tools like provisioning using Jenkins, Udeploy or relevant technologies
- Ability to perform data related benchmarking, performance analysis and tuning.
- Strong skills in In-memory applications, Database Design, Data Integration.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Experience : 7+ years
Location : Pune / Hyderabad
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Access to various learning tools and programs
Certification Reimbursement Policy
Check out more about us on our website below!
- Core Java: advanced level competency, should have worked on projects with core Java development.
- Linux shell : advanced level competency, work experience with Linux shell scripting, knowledge and experience to use important shell commands
- Rdbms, SQL: advanced level competency, Should have expertise in SQL query language syntax, should be well versed with aggregations, joins of SQL query language.
- Data structures and problem solving: should have ability to use appropriate data structure.
- AWS cloud : Good to have experience with aws serverless toolset along with aws infra
- Data Engineering ecosystem : Good to have experience and knowledge of data engineering, ETL, data warehouse (any toolset)
- Hadoop, HDFS, YARN : Should have introduction to internal working of these toolsets
- HIVE, MapReduce, Spark: Good to have experience developing transformations using hive queries, MapReduce job implementation and Spark Job Implementation. Spark implementation in Scala will be plus point.
- Airflow, Oozie, Sqoop, Zookeeper, Kafka: Good to have knowledge about purpose and working of these technology toolsets. Working experience will be a plus point here.
Datametica is looking for talented SQL engineers who would get training & the opportunity to work on Cloud and Big Data Analytics.
- Strong in SQL development
- Hands-on at least one scripting language - preferably shell scripting
- Development experience in Data warehouse projects
- Selected candidates will be provided training opportunities on one or more of the following: Google Cloud, AWS, DevOps Tools, Big Data technologies like Hadoop, Pig, Hive, Spark, Sqoop, Flume, and KafkaWould get a chance to be part of the enterprise-grade implementation of Cloud and Big Data systems
- Will play an active role in setting up the Modern data platform based on Cloud and Big Data
- Would be part of teams with rich experience in various aspects of distributed systems and computing
ears of Exp: 3-6+ Years
Skills: Scala, Python, Hive, Airflow, Spark
Languages: Java, Python, Shell Scripting
GCP: BigTable, DataProc, BigQuery, GCS, Pubsub
AWS: Athena, Glue, EMR, S3, Redshift
MongoDB, MySQL, Kafka
Platforms: Cloudera / Hortonworks
AdTech domain experience is a plus.
Job Type - Full Time
- 9 years and above of total experience preferably in bigdata space.
- Creating spark applications using Scala to process data.
- Experience in scheduling and troubleshooting/debugging Spark jobs in steps.
- Experience in spark job performance tuning and optimizations.
- Should have experience in processing data using Kafka/Pyhton.
- Individual should have experience and understanding in configuring Kafka topics to optimize the performance.
- Should be proficient in writing SQL queries to process data in Data Warehouse.
- Hands on experience in working with Linux commands to troubleshoot/debug issues and creating shell scripts to automate tasks.
- Experience on AWS services like EMR.
- Working knowledge and hands-on experience of Big Data / Hadoop tools and technologies.
- Experience of working in Pig, Hive, Flume, Sqoop, Kafka etc.
- Database development experience with a solid understanding of core database concepts, relational database design, ODS & DWH.
- Expert level knowledge of SQL and scripting preferably UNIX shell scripting, Perl scripting.
- Working knowledge of Data integration solution and well-versed with any ETL tool (Informatica / Datastage / Abinitio/Pentaho etc).
- Strong problem solving and logical reasoning ability.
- Excellent understanding of all aspects of the Software Development Lifecycle.
- Excellent written and verbal communication skills.
- Experience in Java will be an added advantage
- Knowledge of object oriented programming concepts
- Exposure to ISMS policies and procedures.
We are looking for a Developer/Senior Developers to be a part of building advanced analytical platform leveraging Big Data technologies and transform the legacy systems. This role is an exciting, fast-paced, constantly changing and challenging work environment, and will play an important role in resolving and influencing high-level decisions.
- The candidate must be a self-starter, who can work under general guidelines in a fast-spaced environment.
- Overall minimum of 4 to 8 year of software development experience and 2 years in Data Warehousing domain knowledge
- Must have 3 years of hands-on working knowledge on Big Data technologies such as Hadoop, Hive, Hbase, Spark, Kafka, Spark Streaming, SCALA etc…
- Excellent knowledge in SQL & Linux Shell scripting
- Bachelors/Master’s/Engineering Degree from a well-reputed university.
- Strong communication, Interpersonal, Learning and organizing skills matched with the ability to manage stress, Time, and People effectively
- Proven experience in co-ordination of many dependencies and multiple demanding stakeholders in a complex, large-scale deployment environment
- Ability to manage a diverse and challenging stakeholder community
- Diverse knowledge and experience of working on Agile Deliveries and Scrum teams.
- Should works as a senior developer/individual contributor based on situations
- Should be part of SCRUM discussions and to take requirements
- Adhere to SCRUM timeline and deliver accordingly
- Participate in a team environment for the design, development and implementation
- Should take L3 activities on need basis
- Prepare Unit/SIT/UAT testcase and log the results
- Co-ordinate SIT and UAT Testing. Take feedbacks and provide necessary remediation/recommendation in time.
- Quality delivery and automation should be a top priority
- Co-ordinate change and deployment in time
- Should create healthy harmony within the team
- Owns interaction points with members of core team (e.g.BA team, Testing and business team) and any other relevant stakeholders
Mid / Senior Big Data Engineer
Role: Big Data EngineerNumber of open positions: 5Location: PuneAt Clairvoyant, we're building a thriving big data practice to help enterprises enable and accelerate the adoption of Big data and cloud services. In the big data space, we lead and serve as innovators, troubleshooters, and enablers. Big data practice at Clairvoyant, focuses on solving our customer's business problems by delivering products designed with best in class engineering practices and a commitment to keep the total cost of ownership to a minimum.
- 4-10 years of experience in software development.
- At least 2 years of relevant work experience on large scale Data applications.
- Strong coding experience in Java is mandatory
- Good aptitude, strong problem solving abilities, and analytical skills, ability to take ownership as appropriate
- Should be able to do coding, debugging, performance tuning and deploying the apps to Prod.
- Should have good working experience on
- o Hadoop ecosystem (HDFS, Hive, Yarn, File formats like Avro/Parquet)
- o Kafka
- o J2EE Frameworks (Spring/Hibernate/REST)
- o Spark Streaming or any other streaming technology.
- Strong coding experience in Java is mandatory
- Ability to work on the sprint stories to completion along with Unit test case coverage.
- Experience working in Agile Methodology
- Excellent communication and coordination skills
- Knowledgeable (and preferred hands on) - UNIX environments, different continuous integration tools.
- Must be able to integrate quickly into the team and work independently towards team goals
- Take the complete responsibility of the sprint stories' execution
- Be accountable for the delivery of the tasks in the defined timelines with good quality.
- Follow the processes for project execution and delivery.
- Follow agile methodology
- Work with the team lead closely and contribute to the smooth delivery of the project.
- Understand/define the architecture and discuss the pros-cons of the same with the team
- Involve in the brainstorming sessions and suggest improvements in the architecture/design.
- Work with other team leads to get the architecture/design reviewed.
- Work with the clients and counter-parts (in US) of the project.
- Keep all the stakeholders updated about the project/task status/risks/issues if there are any.
Experience: 4 to 9 years
Keywords: java, scala, spark, software development, hadoop, hive
- You will be responsible for design, development and testing of Products
- Contributing in all phases of the development lifecycle
- Writing well designed, testable, efficient code
- Ensure designs are in compliance with specifications
- Prepare and produce releases of software components
- Support continuous improvement by investigating alternatives and technologies and presenting these for architectural review
- Some of the technologies you will be working on: Core Java, Solr, Hadoop, Spark, Elastic search, Clustering, Text Mining, NLP, Mahout and Lucene etc.
This will include:
The verticals included are: