Cutshort logo
Cognologix Technologies  logo
Data Engineer - Java/Scala/Spark
Data Engineer - Java/Scala/Spark
Cognologix Technologies 's logo

Data Engineer - Java/Scala/Spark

Priyal Wagh's profile picture
Posted by Priyal Wagh
4 - 9 yrs
₹10L - ₹30L / yr
Remote, Pune
Skills
PySpark
Data engineering
Big Data
Hadoop
Spark
Apache Kafka

You will work on: 

 

We help many of our clients make sense of their large investments in data – be it building analytics solutions or machine learning applications. You will work on cutting-edge cloud-native technologies to crunch terabytes of data into meaningful insights. 

 

What you will do (Responsibilities):

 

Collaborate with product management & engineering to build highly efficient data pipelines. 

You will be responsible for:

 

  • Dealing with large customer data and building highly efficient pipelines
  • Building insights dashboards
  • Troubleshooting data loss, data inconsistency, and other data-related issues
  • Product development environment delivering stories in a scaled agile delivery methodology.

 

What you bring (Skills):

 

5+ years of experience in hands-on data engineering & large-scale distributed applications

 

  • Extensive experience in object-oriented programming languages such as Java or Scala
  • Extensive experience in RDBMS such as MySQL, Oracle, SQLServer, etc.
  • Experience in functional programming languages such as JavaScript, Scala, or Python
  • Experience in developing and deploying applications in Linux OS
  • Experience in big data processing technologies such as Hadoop, Spark, Kafka, Databricks, etc.
  • Experience in Cloud-based services such as Amazon AWS, Microsoft Azure, or Google Cloud Platform
  • Experience with Scrum and/or other Agile development processes
  • Strong analytical and problem-solving skills

 

Great if you know (Skills):

 

  • Some exposure to containerization technologies such as Docker, Kubernetes, or Amazon ECS/EKS
  • Some exposure to microservices frameworks such as Spring Boot, Eclipse Vert.x, etc.
  • Some exposure to NoSQL data stores such as Couchbase, Solr, etc.
  • Some exposure to Perl, or shell scripting.
  • Ability to lead R&D and POC efforts
  • Ability to learn new technologies on his/her own
  • Team player with self-drive to work independently
  • Strong communication and interpersonal skills

Advantage Cognologix:

  •  A higher degree of autonomy, startup culture & small teams
  •  Opportunities to become an expert in emerging technologies
  •  Remote working options for the right maturity level
  •  Competitive salary & family benefits
  •  Performance-based career advancement


About Cognologix: 

 

Cognologix helps companies disrupt by reimagining their business models and innovate like a Startup. We are at the forefront of digital disruption and take a business-first approach to help meet our client’s strategic goals.

We are a Data focused organization helping our clients to deliver their next generation of products in the most efficient, modern, and cloud-native way.

Benefits Working With Us:

  • Health & Wellbeing
  • Learn & Grow
  • Evangelize 
  • Celebrate Achievements
  • Financial Wellbeing
  • Medical and Accidental cover.
  • Flexible Working Hours.
  • Sports Club & much more.
Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Cognologix Technologies

Founded :
2016
Type
Size
Stage :
Bootstrapped
About
Cognologix is a technology and business development firm with a focus on emerging decentralized business models and innovative technologies related to Blockchain, Machine Learning, Conversational Bots, Big Data and Search. We help enterprises to disrupt – both large and medium – by re-imagining their business models and innovate like a start-up. The Cognologix team excels at the ideation, architecture, prototyping and development of cutting edge products.
Read more
Connect with the team
Profile picture
Omkar Kale
Profile picture
Ravishankar Munukutla
Profile picture
Payal Jain
Profile picture
Vaibhav Natu
Profile picture
Rupa Kadam
Profile picture
Rakesh Zingade
Profile picture
Roopali Kadam
Profile picture
Maheshwar Reddy
Profile picture
Chetan Chaudhari
Profile picture
Priyal Wagh
Profile picture
Maheshwar Reddy
Profile picture
Ajay Bhosale
Profile picture
MuthupandyArunagiri
Profile picture
Priya Barde
Company social profiles
linkedin

Similar jobs

India's best Short Video App
Bengaluru (Bangalore)
4 - 12 yrs
₹25L - ₹50L / yr
Data engineering
Big Data
Spark
Apache Kafka
Apache Hive
+26 more
What Makes You a Great Fit for The Role?

You’re awesome at and will be responsible for
 
Extensive programming experience with cross-platform development of one of the following Java/SpringBoot, Javascript/Node.js, Express.js or Python
3-4 years of experience in big data analytics technologies like Storm, Spark/Spark streaming, Flink, AWS Kinesis, Kafka streaming, Hive, Druid, Presto, Elasticsearch, Airflow, etc.
3-4 years of experience in building high performance RPC services using different high performance paradigms: multi-threading, multi-processing, asynchronous programming (nonblocking IO), reactive programming,
3-4 years of experience working high throughput low latency databases and cache layers like MongoDB, Hbase, Cassandra, DynamoDB,, Elasticache ( Redis + Memcache )
Experience with designing and building high scale app backends and micro-services leveraging cloud native services on AWS like proxies, caches, CDNs, messaging systems, Serverless compute(e.g. lambda), monitoring and telemetry.
Strong understanding of distributed systems fundamentals around scalability, elasticity, availability, fault-tolerance.
Experience in analysing and improving the efficiency, scalability, and stability of distributed systems and backend micro services.
5-7 years of strong design/development experience in building massively large scale, high throughput low latency distributed internet systems and products.
Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Storm, HBase, Scribe, Zookeeper and NoSQL systems etc.
Agile methodologies, Sprint management, Roadmap, Mentoring, Documenting, Software architecture.
Liaison with Product Management, DevOps, QA, Client and other teams
 
Your Experience Across The Years in the Roles You’ve Played
 
Have total or more 5 - 7 years of experience with 2-3 years in a startup.
Have B.Tech or M.Tech or equivalent academic qualification from premier institute.
Experience in Product companies working on Internet-scale applications is preferred
Thoroughly aware of cloud computing infrastructure on AWS leveraging cloud native service and infrastructure services to design solutions.
Follow Cloud Native Computing Foundation leveraging mature open source projects including understanding of containerisation/Kubernetes.
 
You are passionate about learning or growing your expertise in some or all of the following
Data Pipelines
Data Warehousing
Statistics
Metrics Development
 
We Value Engineers Who Are
 
Customer-focused: We believe that doing what’s right for the creator is ultimately what will drive our business forward.
Obsessed with Quality: Your Production code just works & scales linearly
Team players. You believe that more can be achieved together. You listen to feedback and also provide supportive feedback to help others grow/improve.
Pragmatic: We do things quickly to learn what our creators desire. You know when it’s appropriate to take shortcuts that don’t sacrifice quality or maintainability.
Owners: Engineers at Chingari know how to positively impact the business.
Read more
SteelEye
at SteelEye
1 video
3 recruiters
Agency job
via wrackle by Naveen Taalanki
Bengaluru (Bangalore)
1 - 8 yrs
₹10L - ₹40L / yr
skill iconPython
ETL
skill iconJenkins
CI/CD
pandas
+6 more
Roles & Responsibilties
Expectations of the role
This role will be reporting into Technical Lead (Support). You will be expected to resolve bugs in the platform that are identified by Customers and Internal Teams. This role will progress towards SDE-2 in 12-15 months where the developer will be working on solving complex problems around scale and building out new features.
 
What will you do?
  • Fix issues with plugins for our Python-based ETL pipelines
  • Help with automation of standard workflow
  • Deliver Python microservices for provisioning and managing cloud infrastructure
  • Responsible for any refactoring of code
  • Effectively manage challenges associated with handling large volumes of data working to tight deadlines
  • Manage expectations with internal stakeholders and context-switch in a fast-paced environment
  • Thrive in an environment that uses AWS and Elasticsearch extensively
  • Keep abreast of technology and contribute to the engineering strategy
  • Champion best development practices and provide mentorship to others
What are we looking for?
  • First and foremost you are a Python developer, experienced with the Python Data stack
  • You love and care about data
  • Your code is an artistic manifest reflecting how elegant you are in what you do
  • You feel sparks of joy when a new abstraction or pattern arises from your code
  • You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
  • You are a continuous learner
  • You have a natural willingness to automate tasks
  • You have critical thinking and an eye for detail
  • Excellent ability and experience of working to tight deadlines
  • Sharp analytical and problem-solving skills
  • Strong sense of ownership and accountability for your work and delivery
  • Excellent written and oral communication skills
  • Mature collaboration and mentoring abilities
  • We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
Nice to have:
  • Delivering complex software, ideally in a FinTech setting
  • Experience with CI/CD tools such as Jenkins, CircleCI
  • Experience with code versioning (git / mercurial / subversion)
Read more
Curl Analytics
Agency job
via wrackle by Naveen Taalanki
Bengaluru (Bangalore)
5 - 10 yrs
₹15L - ₹30L / yr
ETL
Big Data
Data engineering
Apache Kafka
PySpark
+11 more
What you will do
  • Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
    • programmatically ingesting data from several static and real-time sources (incl. web scraping)
    • rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
    • performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
  • Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
  • Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
  • Build data tools to facilitate fast data cleaning and statistical analysis
  • Ensure data architecture is secure and compliant
  • Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
  • Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).

You should be

  •  Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
  • Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
  • Expert in shell scripting and writing schedulers
  • Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
  • Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
  • Strong knowledge of data security best practices
  • 5+ years experience in a data engineering role
  • Science / Engineering graduate from a Tier-1 university in the country
  • And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
Read more
Molecular Connections
at Molecular Connections
4 recruiters
Molecular Connections
Posted by Molecular Connections
Bengaluru (Bangalore)
8 - 10 yrs
₹15L - ₹20L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+4 more
  1. Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
  2. A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
  3. Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
  4. Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
  5. Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
  6. Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
  7. Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
  8. Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
  9. Exposure to Cloudera development environment and management using Cloudera Manager.
  10. Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
  11. Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  12. Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  13. Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
  14. Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
  15. Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
  16. Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  17. Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  18. In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
  19. Hands on expertise in real time analytics with Apache Spark.
  20. Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
  21. Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
  22. Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  23. Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
  24. Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  25. Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  26. Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
  27. Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
  28. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
  29. Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis. 
  30. Generated various kinds of knowledge reports using Power BI based on Business specification. 
  31. Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
  32. Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
  33. Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
  34. Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  35. Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
  36. Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
  37. Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
  38. Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  39. Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
Read more
Kloud9 Technologies
Remote only
8 - 14 yrs
₹35L - ₹45L / yr
Google Cloud Platform (GCP)
Agile/Scrum
SQL
skill iconPython
Apache Kafka
+1 more

Senior Data Engineer


Responsibilities:

●      Clean, prepare and optimize data at scale for ingestion and consumption by machine learning models

●      Drive the implementation of new data management projects and re-structure of the current data architecture

●      Implement complex automated workflows and routines using workflow scheduling tools 

●      Build continuous integration, test-driven development and production deployment frameworks

●      Drive collaborative reviews of design, code, test plans and dataset implementation performed by other data engineers in support of maintaining data engineering standards 

●      Anticipate, identify and solve issues concerning data management to improve data quality

●      Design and build reusable components, frameworks and libraries at scale to support machine learning products 

●      Design and implement product features in collaboration with business and Technology stakeholders 

●      Analyze and profile data for the purpose of designing scalable solutions 

●      Troubleshoot complex data issues and perform root cause analysis to proactively resolve product and operational issues

●      Mentor and develop other data engineers in adopting best practices 

●      Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders






Qualifications:

●      8+ years of experience developing scalable Big Data applications or solutions on distributed platforms

●      Experience in Google Cloud Platform (GCP) and good to have other cloud platform tools

●      Experience working with Data warehousing tools, including DynamoDB, SQL, and Snowflake

●      Experience architecting data products in Streaming, Serverless and Microservices Architecture and platform.

●      Experience with Spark (Scala/Python/Java) and Kafka

●      Work experience with using Databricks (Data Engineering and Delta Lake components

●      Experience working with Big Data platforms, including Dataproc, Data Bricks etc

●      Experience working with distributed technology tools including Spark, Presto, Databricks, Airflow

●      Working knowledge of Data warehousing, Data modeling


●      Experience working in Agile and Scrum development process

●      Bachelor's degree in Computer Science, Information Systems, Business, or other relevant subject area


Role:

Senior Data Engineer

Total No. of Years:

8+ years of relevant experience 

To be onboarded by:

Immediate

Notice Period:

 

Skills

Mandatory / Desirable

Min years (Project Exp)

Max years (Project Exp)

GCP Exposure 

Mandatory Min 3 to 7

BigQuery, Dataflow, Dataproc, AI Building Blocks, Looker, Cloud Data Fusion, Dataprep .Spark and PySpark

Mandatory Min 5 to 9

Relational SQL

Mandatory Min 4 to 8

Shell scripting language 

Mandatory Min 4 to 8

Python /scala language 

Mandatory Min 4 to 8

Airflow/Kubeflow workflow scheduling tool 

Mandatory Min 3 to 7

Kubernetes

Desirable 1 to 6

Scala

Mandatory Min 2 to 6

Databricks

Desirable Min 1 to 6

Google Cloud Functions

Mandatory Min 2 to 6

GitHub source control tool 

Mandatory Min 4 to 8

Machine Learning

Desirable 1 to 6

Deep Learning

Desirable Min 1to 6

Data structures and algorithms

Mandatory Min 4 to 8

Read more
ERUDITUS Executive Education
Payal Thakker
Posted by Payal Thakker
Remote only
3 - 10 yrs
₹15L - ₹45L / yr
skill iconDocker
skill iconKubernetes
Apache Kafka
Apache Beam
skill iconPython
+2 more
Emeritus is committed to teaching the skills of the future by making high-quality education accessible and affordable to individuals, companies, and governments around the world. It does this by collaborating with more than 50 top-tier universities across the United States, Europe, Latin America, Southeast Asia, India and China. Emeritus’ short courses, degree programs, professional certificates, and senior executive programs help individuals learn new skills and transform their lives, companies and organizations. Its unique model of state-of-the-art technology, curriculum innovation, and hands-on instruction from senior faculty, mentors and coaches has educated more than 250,000 individuals across 80+ countries. Founded in 2015, Emeritus, part of Eruditus Group, has more than 2,000 employees globally and offices in Mumbai, New Delhi, Shanghai, Singapore, Palo Alto, Mexico City, New York, Boston, London, and Dubai. Following its $650 million Series E funding round in August 2021, the Company is valued at $3.2 billion, and is backed by Accel, SoftBank Vision Fund 2, the Chan Zuckerberg Initiative, Leeds Illuminate, Prosus Ventures, Sequoia Capital India, and Bertelsmann.
 
 
As a data engineer at Emeritus, you'll be working on a wide variety of data problems. At this fast paced company, you will frequently have to balance achieving an immediate goal with building sustainable and scalable architecture. The ideal candidate gets excited about streaming data, protocol buffers and microservices. They want to develop and maintain a centralized data platform that provides accurate, comprehensive, and timely data to a growing organization

Role & responsibilities:

    • Developing ETL pipelines for data replication
    • Analyze, query and manipulate data according to defined business rules and procedures
    • Manage very large-scale data from a multitude of sources into appropriate sets for research and development for data science and analysts across the company
    • Convert prototypes into production data engineering solutions through rigorous software engineering practices and modern deployment pipelines
    • Resolve internal and external data exceptions in timely and accurate manner
    • Improve multi-environment data flow quality, security, and performance 

Skills & qualifications:

    • Must have experience with:
    • virtualization, containers, and orchestration (Docker, Kubernetes)
    • creating log ingestion pipelines (Apache Beam) both batch and streaming processing (Pub/Sub, Kafka)
    • workflow orchestration tools (Argo, Airflow)
    • supporting machine learning models in production
    • Have a desire to continually keep up with advancements in data engineering practices
    • Strong Python programming and exploratory data analysis skills
    • Ability to work independently and with team members from different backgrounds
    • At least a bachelor's degree in an analytical or technical field. This could be applied mathematics, statistics, computer science, operations research, economics, etc. Higher education is welcome and encouraged.
    • 3+ years of work in software/data engineering.
    • Superior interpersonal, independent judgment, complex problem-solving skills
    • Global orientation, experience working across countries, regions and time zones
Emeritus provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Read more
Discite Analytics Private Limited
Uma Sravya B
Posted by Uma Sravya B
Ahmedabad
4 - 7 yrs
₹12L - ₹20L / yr
Hadoop
Big Data
Data engineering
Spark
Apache Beam
+13 more
Responsibilities:
1. Communicate with the clients and understand their business requirements.
2. Build, train, and manage your own team of junior data engineers.
3. Assemble large, complex data sets that meet the client’s business requirements.
4. Identify, design and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
5. Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources, including the cloud.
6. Assist clients with data-related technical issues and support their data infrastructure requirements.
7. Work with data scientists and analytics experts to strive for greater functionality.

Skills required: (experience with at least most of these)
1. Experience with Big Data tools-Hadoop, Spark, Apache Beam, Kafka etc.
2. Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
3. Experience in ETL and Data Warehousing.
4. Experience and firm understanding of relational and non-relational databases like MySQL, MS SQL Server, Postgres, MongoDB, Cassandra etc.
5. Experience with cloud platforms like AWS, GCP and Azure.
6. Experience with workflow management using tools like Apache Airflow.
Read more
NCR (Delhi | Gurgaon | Noida)
2 - 12 yrs
₹25L - ₹40L / yr
Data governance
DevOps
Data integration
Data engineering
skill iconPython
+14 more
Data Platforms (Data Integration) is responsible for envisioning, building and operating the Bank’s data integration platforms. The successful candidate will work out of Gurgaon as a part of a high performing team who is distributed across our two development centers – Copenhagen and Gurugram. The individual must be driven, passionate about technology and display a level of customer service that is second to none.

Roles & Responsibilities

  • Designing and delivering a best-in-class, highly scalable data governance platform
  • Improving processes and applying best practices
  • Contribute in all scrum ceremonies; assuming the role of ‘scum master’ on a rotational basis
  •  Development, management and operation of our infrastructure to ensure it is easy to deploy, scalable, secure and fault-tolerant
  • Flexible on working hours as per business needs
Read more
GitHub
at GitHub
4 recruiters
Nataliia Mediana
Posted by Nataliia Mediana
Remote only
3 - 8 yrs
$24K - $60K / yr
ETL
PySpark
Data engineering
Data engineer
athena
+9 more
We are a nascent quant hedge fund; we need to stage financial data and make it easy to run and re-run various preprocessing and ML jobs on the data.
- We are looking for an experienced data engineer to join our team.
- The preprocessing involves ETL tasks, using pyspark, AWS Glue, staging data in parquet formats on S3, and Athena

To succeed in this data engineering position, you should care about well-documented, testable code and data integrity. We have devops who can help with AWS permissions.
We would like to build up a consistent data lake with staged, ready-to-use data, and to build up various scripts that will serve as blueprints for various additional data ingestion and transforms.

If you enjoy setting up something which many others will rely on, and have the relevant ETL expertise, we’d like to work with you.

Responsibilities
- Analyze and organize raw data
- Build data pipelines
- Prepare data for predictive modeling
- Explore ways to enhance data quality and reliability
- Potentially, collaborate with data scientists to support various experiments

Requirements
- Previous experience as a data engineer with the above technologies
Read more
MNC
at MNC
Agency job
via Fragma Data Systems by geeti gaurav mohanty
Bengaluru (Bangalore)
3 - 5 yrs
₹6L - ₹12L / yr
Spark
Big Data
Data engineering
Hadoop
Apache Kafka
+5 more
Data Engineer

• Drive the data engineering implementation
• Strong experience in building data pipelines
• AWS stack experience is must
• Deliver Conceptual, Logical and Physical data models for the implementation
teams.

• SQL stronghold is must. Advanced SQL working knowledge and experience
working with a variety of relational databases, SQL query authoring
• AWS Cloud data pipeline experience is must. Data pipelines and data centric
applications using distributed storage platforms like S3 and distributed processing
platforms like Spark, Airflow, Kafka
• Working knowledge of AWS technologies such as S3, EC2, EMR, RDS, Lambda,
Elasticsearch
• Ability to use a major programming (e.g. Python /Java) to process data for
modelling.
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos