32+ Big data Jobs in Delhi, NCR and Gurgaon | Big data Job openings in Delhi, NCR and Gurgaon
Apply to 32+ Big data Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest Big data Job opportunities across top companies like Google, Amazon & Adobe.
Skills:
Experience with Cassandra, including installing configuring and monitoring a Cassandra cluster.
Experience with Cassandra data modeling and CQL scripting. Experience with DataStax Enterprise Graph
Experience with both Windows and Linux Operating Systems. Knowledge of Microsoft .NET Framework (C#, NETCore).
Ability to perform effectively in a team-oriented environment
- KSQL
- Data Engineering spectrum (Java/Spark)
- Spark Scala / Kafka Streaming
- Confluent Kafka components
- Basic understanding of Hadoop
About us
Classplus is India's largest B2B ed-tech start-up, enabling 1 Lac+ educators and content creators to create their digital identity with their own branded apps. Starting in 2018, we have grown more than 10x in the last year, into India's fastest-growing video learning platform.
Over the years, marquee investors like Tiger Global, Surge, GSV Ventures, Blume, Falcon, Capital, RTP Global, and Chimera Ventures have supported our vision. Thanks to our awesome and dedicated team, we achieved a major milestone in March this year when we secured a “Series-D” funding.
Now as we go global, we are super excited to have new folks on board who can take the rocketship higher🚀. Do you think you have what it takes to help us achieve this? Find Out Below!
What will you do?
· Define the overall process, which includes building a team for DevOps activities and ensuring that infrastructure changes are reviewed from an architecture and security perspective
· Create standardized tooling and templates for development teams to create CI/CD pipelines
· Ensure infrastructure is created and maintained using terraform
· Work with various stakeholders to design and implement infrastructure changes to support new feature sets in various product lines.
· Maintain transparency and clear visibility of costs associated with various product verticals, environments and work with stakeholders to plan for optimization and implementation
· Spearhead continuous experimenting and innovating initiatives to optimize the infrastructure in terms of uptime, availability, latency and costs
You should apply, if you
1. Are a seasoned Veteran: Have managed infrastructure at scale running web apps, microservices, and data pipelines using tools and languages like JavaScript(NodeJS), Go, Python, Java, Erlang, Elixir, C++ or Ruby (experience in any one of them is enough)
2. Are a Mr. Perfectionist: You have a strong bias for automation and taking the time to think about the right way to solve a problem versus quick fixes or band-aids.
3. Bring your A-Game: Have hands-on experience and ability to design/implement infrastructure with GCP services like Compute, Database, Storage, Load Balancers, API Gateway, Service Mesh, Firewalls, Message Brokers, Monitoring, Logging and experience in setting up backups, patching and DR planning
4. Are up with the times: Have expertise in one or more cloud platforms (Amazon WebServices or Google Cloud Platform or Microsoft Azure), and have experience in creating and managing infrastructure completely through Terraform kind of tool
5. Have it all on your fingertips: Have experience building CI/CD pipeline using Jenkins, Docker for applications majorly running on Kubernetes. Hands-on experience in managing and troubleshooting applications running on K8s
6. Have nailed the data storage game: Good knowledge of Relational and NoSQL databases (MySQL,Mongo, BigQuery, Cassandra…)
7. Bring that extra zing: Have the ability to program/script is and strong fundamentals in Linux and Networking.
8. Know your toys: Have a good understanding of Microservices architecture, Big Data technologies and experience with highly available distributed systems, scaling data store technologies, and creating multi-tenant and self hosted environments, that’s a plus
Being Part of the Clan
At Classplus, you’re not an “employee” but a part of our “Clan”. So, you can forget about being bound by the clock as long as you’re crushing it workwise😎. Add to that some passionate people working with and around you, and what you get is the perfect work vibe you’ve been looking for!
It doesn’t matter how long your journey has been or your position in the hierarchy (we don’t do Sirs and Ma’ams); you’ll be heard, appreciated, and rewarded. One can say, we have a special place in our hearts for the Doers! ✊🏼❤️
Are you a go-getter with the chops to nail what you do? Then this is the place for you.
consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
members
• Communication
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team
Must have:
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
Project Management
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
DevOps tools
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills
Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel
Work Environment
• Customer Office (Mumbai) / Remote Work
Education
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
GCP Data Analyst profile must have below skills sets :
- Knowledge of programming languages like https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Ftutorials%2Fsql-tutorial%2Fhow-to-become-sql-developer&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EImfaJAD1KHOyrBQ7FkbaPl1STtfnf4QdQlbjw72%2BmE%3D&reserved=0" target="_blank">SQL, Oracle, R, MATLAB, Java and https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Fwhy-learn-python-a-guide-to-unlock-your-python-career-article&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z2n1Xy%2F3YN6nQqSweU5T7EfUTa1kPAAjbCMTWxDCh%2FY%3D&reserved=0" target="_blank">Python
- Data cleansing, data visualization, data wrangling
- Data modeling , data warehouse concepts
- Adapt to Big data platform like Hadoop, Spark for stream & batch processing
- GCP (Cloud Dataproc, Cloud Dataflow, Cloud Datalab, Cloud Dataprep, BigQuery, Cloud Datastore, Cloud Datafusion, Auto ML etc)
A.P.T Portfolio, a high frequency trading firm that specialises in Quantitative Trading & Investment Strategies.Founded in November 2009, it has been a major liquidity provider in global Stock markets.
As a manager, you would be incharge of managing the devops team and your remit shall include the following
- Private Cloud - Design & maintain a high performance and reliable network architecture to support HPC applications
- Scheduling Tool - Implement and maintain a HPC scheduling technology like Kubernetes, Hadoop YARN Mesos, HTCondor or Nomad for processing & scheduling analytical jobs. Implement controls which allow analytical jobs to seamlessly utilize ideal capacity on the private cloud.
- Security - Implementing best security practices and implementing data isolation policy between different divisions internally.
- Capacity Sizing - Monitor private cloud usage and share details with different teams. Plan capacity enhancements on a quarterly basis.
- Storage solution - Optimize storage solutions like NetApp, EMC, Quobyte for analytical jobs. Monitor their performance on a daily basis to identify issues early.
- NFS - Implement and optimize latest version of NFS for our use case.
- Public Cloud - Drive AWS/Google-Cloud utilization in the firm for increasing efficiency, improving collaboration and for reducing cost. Maintain the environment for our existing use cases. Further explore potential areas of using public cloud within the firm.
- BackUps - Identify and automate back up of all crucial data/binary/code etc in a secured manner at such duration warranted by the use case. Ensure that recovery from back-up is tested and seamless.
- Access Control - Maintain password less access control and improve security over time. Minimize failures for automated job due to unsuccessful logins.
- Operating System -Plan, test and roll out new operating system for all production, simulation and desktop environments. Work closely with developers to highlight new performance enhancements capabilities of new versions.
- Configuration management -Work closely with DevOps/ development team to freeze configurations/playbook for various teams & internal applications. Deploy and maintain standard tools such as Ansible, Puppet, chef etc for the same.
- Data Storage & Security Planning - Maintain a tight control of root access on various devices. Ensure root access is rolled back as soon the desired objective is achieved.
- Audit access logs on devices. Use third party tools to put in a monitoring mechanism for early detection of any suspicious activity.
- Maintaining all third party tools used for development and collaboration - This shall include maintaining a fault tolerant environment for GIT/Perforce, productivity tools such as Slack/Microsoft team, build tools like Jenkins/Bamboo etc
Qualifications
- Bachelors or Masters Level Degree, preferably in CSE/IT
- 10+ years of relevant experience in sys-admin function
- Must have strong knowledge of IT Infrastructure, Linux, Networking and grid.
- Must have strong grasp of automation & Data management tools.
- Efficient in scripting languages and python
Desirables
- Professional attitude, co-operative and mature approach to work, must be focused, structured and well considered, troubleshooting skills.
- Exhibit a high level of individual initiative and ownership, effectively collaborate with other team members.
APT Portfolio is an equal opportunity employer
Job Title: Product Manager
Job Description
Bachelor or master’s degree in computer science or equivalent experience.
Worked as Product Owner before and took responsibility for a product or project delivery.
Well-versed with data warehouse modernization to Big Data and Cloud environments.
Good knowledge* of any of the Cloud (AWS/Azure/GCP) – Must Have
Practical experience with continuous integration and continuous delivery workflows.
Self-motivated with strong organizational/prioritization skills and ability to multi-task with close attention to detail.
Good communication skills
Experience in working within a distributed agile team
Experience in handling migration projects – Good to Have
*Data Ingestion, Processing, and Orchestration knowledge
Roles & Responsibilities
Responsible for coming up with innovative and novel ideas for the product.
Define product releases, features, and roadmap.
Collaborate with product teams on defining product objectives, including creating a product roadmap, delivery, market research, customer feedback, and stakeholder inputs.
Work with the Engineering teams to communicate release goals and be a part of the product lifecycle. Work closely with the UX and UI team to create the best user experience for the end customer.
Work with the Marketing team to define GTM activities.
Interface with Sales & Customer teams to identify customer needs and product gaps
Market and competition analysis activities.
Participate in the Agile ceremonies with the team, define epics, user stories, acceptance criteria
Ensure product usability from the end-user perspective
Mandatory Skills
Product Management, DWH, Big Data
Senior Software Engineer - Data Team
We are seeking a highly motivated Senior Software Engineer with hands-on experience and build scalable, extensible data solutions, identifying and addressing performance bottlenecks, collaborating with other team members, and implementing best practices for data engineering. Our engineering process is fully agile, and has a really fast release cycle - which keeps our environment very energetic and fun.
What you'll do:
Design and development of scalable applications.
Work with Product Management teams to get maximum value out of existing data.
Contribute to continual improvement by suggesting improvements to the software system.
Ensure high scalability and performance
You will advocate for good, clean, well documented and performing code; follow standards and best practices.
We'd love for you to have:
Education: Bachelor/Master Degree in Computer Science.
Experience: 3-5 years of relevant experience in BI/DW with hands-on coding experience.
Mandatory Skills
Strong in problem-solving
Strong experience with Big Data technologies, Hive, Hadoop, Impala, Hbase, Kafka, Spark
Strong experience with orchestration framework like Apache oozie, Airflow
Strong experience of Data Engineering
Strong experience with Database and Data Warehousing technologies and ability to understand complex design, system architecture
Experience with the full software development lifecycle, design, develop, review, debug, document, and deliver (especially in a multi-location organization)
Good knowledge of Java
Desired Skills
Experience with Python
Experience with reporting tools like Tableau, QlikView
Experience of Git and CI-CD pipeline
Awareness of cloud platform ex:- AWS
Excellent communication skills with team members, Business owners, across teams
Be able to work in a challenging, dynamic environment and meet tight deadlines
Greetings! We are looking for Product Manager for our Data modernization product. We need a resource with good knowledge on Big Data/DWH. should have strong Stakeholders management and Presentation skills
Hiring for a gaming company
Job Description:
● At least 4 to 8 years of experience in software quality assurance manual testing
● Proven record of test strategy and planning, STLC, deployment, defect tracking, and test harness.
● High degree of initiative with a passion for learning technology.
● Must have strong hands-on experience on any programming language PHP, React Native JavaScript, HTML, CSS.
● Must be competent enough on API testing using Rest Assured and Postman.
● Performance testing experience with relevant automation and monitoring tools (e.g., JMeter, load runner).
● Responsible for coming up with and executing scale tests, performance, and longevity tests.
● Must have testing experience on cloud-based platforms and working knowledge on integration testing using CI/CD pipelines.
● Should have working experience with at least one of the MYSQL databases and NoSQL DB viz.
● Cosmos Db, Cassandra, MongoDB.
● Should have strong fundamental understanding of network topology and the TCP/IP stack and working knowledge of load balancing methods and deployment architecture
● Experience with GIT Hub, Maven, HP ALM, Jira, Jenkins, code coverage tools and CI/CD
● Exposure of sizing (estimating) the project and defining timelines
● Good to have knowledge of Platforms like Windows / UNIX / Mainframe.
● Work under the direction of the Game QA Lead to provide test support for two dozen experienced game designers, artists, and engineers.
● Perform daily smoke tests of game builds, using the build computer and the install scripts.
● Run through and write test plans for existing and new features. Become familiar with the project design by closely reading project documentation and reaching out regularly to our team of game developers.
● Verify that identified bugs are fixed in the database.
● Identify and bug new issues that occur in the HMD and tablet interface.
● Build on your own fundamental QA skills by learning the specifics of game development QA for mobile devices, VR headsets, and experimental technology.
● Engage in continuous improvement of the QA process.
● Maintains quality service by establishing and enforcing organization standards.
● Maintains professional and technical knowledge by attending educational workshops; reviewing professional publications; establishing personal networks; benchmarking state-of the-art practices; participating in professional societies.
● Contributes to team effort by accomplishing related results as needed
DevOps Engineer
Our engineering team is looking for Big-Data DevOps engineers to join the engineering team and help us automate the build, release, packaging and infrastructure provisioning and support processes. The candidate is expected to own the full life-cycle of provisioning, configuration management, monitoring, maintenance and support for cloud as well as on premise deployments.
Responsibilities
- 3-plus years of DevOps experience managing the Big Data application stack including HDFS, YARN, Spark, Hive and Hbase
- Deeper understanding of all the configurations required for installing and maintaining the infrastructure in the long run
- Experience setting up high availability, configuring resource allocation, setting up capacity schedulers, handling data recovery tasks
- Experience with middle-layer technologies including web servers (httpd, ningx),
application servers (Jboss, Tomcat) and database systems (postgres, mysql)
- Experience setting up enterprise security solutions including setting up active directories, firewalls, SSL certificates, Kerberos KDC servers, etc.
- Experience maintaining and hardening the infrastructure by regularly applying required security packages and patches
- Experience supporting on-premise solutions as well as on AWS cloud
- Experience working with and supporting Spark-based applications on YARN
- Experience with one or more automation tools such as Ansible, Terraform, etc
- Experience working with CI/CD tools like Jenkins and various test report and coverage Plugins
- Experience defining and automating the build, versioning and release processes for complex enterprise products
- Experience supporting clients remotely and on-site
- Experience working with and supporting Java- and Python-based tech stacks would be a Plus
Responsibilities:
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform.
- Driving optimization, testing and tooling to improve quality.
- Reviewing and approving high level & amp; detailed design to ensure that the solution delivers to the business needs and aligns to the data & analytics architecture principles and roadmap.
- Understanding business requirements and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements.
- Following proper SDLC (Code review, sprint process).
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc.
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users.
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in the Hadoop platform.
- Supporting and contributing to development guidelines and standards for data ingestion.
- Working with a data scientist and business analytics team to assist in data ingestion and data related technical issues.
- Designing and documenting the development & deployment flow.
Requirements:
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
Job Description:
The data science team is responsible for solving business problems with complex data. Data complexity could be characterized in terms of volume, dimensionality and multiple touchpoints/sources. We understand the data, ask fundamental-first-principle questions, apply our analytical and machine learning skills to solve the problem in the best way possible.
Our ideal candidate
The role would be a client facing one, hence good communication skills are a must.
The candidate should have the ability to communicate complex models and analysis in a clear and precise manner.
The candidate would be responsible for:
- Comprehending business problems properly - what to predict, how to build DV, what value addition he/she is bringing to the client, etc.
- Understanding and analyzing large, complex, multi-dimensional datasets and build features relevant for business
- Understanding the math behind algorithms and choosing one over another
- Understanding approaches like stacking, ensemble and applying them correctly to increase accuracy
Desired technical requirements
- Proficiency with Python and the ability to write production-ready codes.
- Experience in pyspark, machine learning and deep learning
- Big data experience, e.g. familiarity with Spark, Hadoop, is highly preferred
- Familiarity with SQL or other databases.
Hiring for one of the MNC for India location
Key Responsibilities : ( Data Developer Python, Spark)
Exp : 2 to 9 Yrs
Development of data platforms, integration frameworks, processes, and code.
Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages
Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.
Elaborate stories in a collaborative agile environment (SCRUM or Kanban)
Familiarity with cloud platforms like GCP, AWS or Azure.
Experience with large data volumes.
Familiarity with writing rest-based services.
Experience with distributed processing and systems
Experience with Hadoop / Spark toolsets
Experience with relational database management systems (RDBMS)
Experience with Data Flow development
Knowledge of Agile and associated development techniques including:
n
- Experience of providing technical leadership in the Big Data space (Hadoop Stack like Spark, M/R, HDFS, Pig, Hive, HBase, Flume, Sqoop, etc. Should have contributed to open source Big Data technologies.
- Expert-level proficiency in Python
- Experience in visualizing and evangelizing next-generation infrastructure in Big Data space (Batch, Near Real-time, Real-time technologies).
- Passionate for continuous learning, experimenting, applying, and contributing towards cutting edge open source technologies and software paradigms
- Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop 2.0 (YARN; MR & HDFS) and associated technologies.
- Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib)
Operating knowledge of cloud computing platforms (AWS, especially EMR, EC2, S3, SWF services, and the AWS CLI) - Experience working within a Linux computing environment, and use of command-line tools including knowledge of shell/Python scripting for automating common tasks
- Sr. Data Engineer:
Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python
Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred
Major accountabilities:
- Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
- Have good understanding on Foundry Platform landscape and it’s capabilities
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
- Designs data integrations and data quality framework.
- Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
- Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
- Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed
Desired Candidate Profile :
- Strong data engineering background
- Experience with Clinical Data Model is preferred
- Experience in
- SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
- Java and Groovy for our back-end applications and data integration tools
- Python for data processing and analysis
- Cloud infrastructure based on AWS EC2 and S3
- 7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
- 5+ years of Python and Pyspark development experience
- Strong troubleshooting and problem solving skills
- BTech or master's degree in computer science or a related technical field
- Experience designing, building, and maintaining big data pipelines systems
- Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
- Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
- Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
- Hand-on experience in AWS / Azure cloud platform and stack
- Strong in API based architecture and concept, able to do quick PoC using API integration and development
- Knowledge of machine learning and AI
- Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.
Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision
Company Description
At Bungee Tech, we help retailers and brands meet customers everywhere and, on every occasion, they are in. We believe that accurate, high-quality data matched with compelling market insights empowers retailers and brands to keep their customers at the center of all innovation and value they are delivering.
We provide a clear and complete omnichannel picture of their competitive landscape to retailers and brands. We collect billions of data points every day and multiple times in a day from publicly available sources. Using high-quality extraction, we uncover detailed information on products or services, which we automatically match, and then proactively track for price, promotion, and availability. Plus, anything we do not match helps to identify a new assortment opportunity.
Empowered with this unrivalled intelligence, we unlock compelling analytics and insights that once blended with verified partner data from trusted sources such as Nielsen, paints a complete, consolidated picture of the competitive landscape.
We are looking for a Big Data Engineer who will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.
You will also be responsible for integrating them with the architecture used in the company.
We're working on the future. If you are seeking an environment where you can drive innovation, If you want to apply state-of-the-art software technologies to solve real world problems, If you want the satisfaction of providing visible benefit to end-users in an iterative fast paced environment, this is your opportunity.
Responsibilities
As an experienced member of the team, in this role, you will:
- Contribute to evolving the technical direction of analytical Systems and play a critical role their design and development
- You will research, design and code, troubleshoot and support. What you create is also what you own.
- Develop the next generation of automation tools for monitoring and measuring data quality, with associated user interfaces.
- Be able to broaden your technical skills and work in an environment that thrives on creativity, efficient execution, and product innovation.
BASIC QUALIFICATIONS
- Bachelor’s degree or higher in an analytical area such as Computer Science, Physics, Mathematics, Statistics, Engineering or similar.
- 5+ years relevant professional experience in Data Engineering and Business Intelligence
- 5+ years in with Advanced SQL (analytical functions), ETL, Data Warehousing.
- Strong knowledge of data warehousing concepts, including data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures, data modeling and performance tuning.
- Ability to effectively communicate with both business and technical teams.
- Excellent coding skills in Java, Python, C++, or equivalent object-oriented programming language
- Understanding of relational and non-relational databases and basic SQL
- Proficiency with at least one of these scripting languages: Perl / Python / Ruby / shell script
PREFERRED QUALIFICATIONS
- Experience with building data pipelines from application databases.
- Experience with AWS services - S3, Redshift, Spectrum, EMR, Glue, Athena, ELK etc.
- Experience working with Data Lakes.
- Experience providing technical leadership and mentor other engineers for the best practices on the data engineering space
- Sharp problem solving skills and ability to resolve ambiguous requirements
- Experience on working with Big Data
- Knowledge and experience on working with Hive and the Hadoop ecosystem
- Knowledge of Spark
- Experience working with Data Science teams
Position: Big Data Engineer
What You'll Do
Punchh is seeking to hire Big Data Engineer at either a senior or tech lead level. Reporting to the Director of Big Data, he/she will play a critical role in leading Punchh’s big data innovations. By leveraging prior industrial experience in big data, he/she will help create cutting-edge data and analytics products for Punchh’s business partners.
This role requires close collaborations with data, engineering, and product organizations. His/her job functions include
- Work with large data sets and implement sophisticated data pipelines with both structured and structured data.
- Collaborate with stakeholders to design scalable solutions.
- Manage and optimize our internal data pipeline that supports marketing, customer success and data science to name a few.
- A technical leader of Punchh’s big data platform that supports AI and BI products.
- Work with infra and operations team to monitor and optimize existing infrastructure
- Occasional business travels are required.
What You'll Need
- 5+ years of experience as a Big Data engineering professional, developing scalable big data solutions.
- Advanced degree in computer science, engineering or other related fields.
- Demonstrated strength in data modeling, data warehousing and SQL.
- Extensive knowledge with cloud technologies, e.g. AWS and Azure.
- Excellent software engineering background. High familiarity with software development life cycle. Familiarity with GitHub/Airflow.
- Advanced knowledge of big data technologies, such as programming language (Python, Java), relational (Postgres, mysql), NoSQL (Mongodb), Hadoop (EMR) and streaming (Kafka, Spark).
- Strong problem solving skills with demonstrated rigor in building and maintaining a complex data pipeline.
- Exceptional communication skills and ability to articulate a complex concept with thoughtful, actionable recommendations.
Responsibilities:
- Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world.
- Verifying data quality, and/or ensuring it via data cleaning.
- Able to adapt and work fast in producing the output which upgrades the decision making of stakeholders using ML.
- To design and develop Machine Learning systems and schemes.
- To perform statistical analysis and fine-tune models using test results.
- To train and retrain ML systems and models as and when necessary.
- To deploy ML models in production and maintain the cost of cloud infrastructure.
- To develop Machine Learning apps according to client and data scientist requirements.
- To analyze the problem-solving capabilities and use-cases of ML algorithms and rank them by how successful they are in meeting the objective.
Technical Knowledge:
- Worked with real time problems, solved them using ML and deep learning models deployed in real time and should have some awesome projects under his belt to showcase.
- Proficiency in Python and experience with working with Jupyter Framework, Google collab and cloud hosted notebooks such as AWS sagemaker, DataBricks etc.
- Proficiency in working with libraries Sklearn, Tensorflow, Open CV2, Pyspark, Pandas, Numpy and related libraries.
- Expert in visualising and manipulating complex datasets.
- Proficiency in working with visualisation libraries such as seaborn, plotly, matplotlib etc.
- Proficiency in Linear Algebra, statistics and probability required for Machine Learning.
- Proficiency in ML Based algorithms for example, Gradient boosting, stacked Machine learning, classification algorithms and deep learning algorithms. Need to have experience in hypertuning various models and comparing the results of algorithm performance.
- Big data Technologies such as Hadoop stack and Spark.
- Basic use of clouds (VM’s example EC2).
- Brownie points for Kubernetes and Task Queues.
- Strong written and verbal communications.
- Experience working in an Agile environment.
- Exp- 3+ yrs
- Must be very good in Java- 2.5+ yrs
- At least 1 yr of working experience in any one DB of Neo4j, Cassandra, Elastic Search
- Should have good devops working knowledge - Knowledge of AWS, Ansibleetc is a necessity
- Experience in TDD/BDD is required
- Minimum 1 yr working experience in Samza& Kafka.
- Knowledge of Azure is added advantage
- Understanding of AKKA and Play framework
along with metrics to track their progress
Managing available resources such as hardware, data, and personnel so that deadlines
are met
Analysing the ML algorithms that could be used to solve a given problem and ranking
them by their success probability
Exploring and visualizing data to gain an understanding of it, then identifying
differences in data distribution that could affect performance when deploying the model
in the real world
Verifying data quality, and/or ensuring it via data cleaning
Supervising the data acquisition process if more data is needed
Defining validation strategies
Defining the pre-processing or feature engineering to be done on a given dataset
Defining data augmentation pipelines
Training models and tuning their hyper parameters
Analysing the errors of the model and designing strategies to overcome them
Deploying models to production
at Object Technology Solutions Inc. (OTSI)
Dear Candidate,,
Greetings of the day!
As discussed, Please find the below job description.
Job Title : Hadoop developer
Experience : 3+ years
Job Location : New Delhi
Job type : Permanent
Knowledge and Skills Required:
Brief Skills:
Hadoop, Spark, Scala and Spark SQL
Main Skills:
- Strong experience in Hadoop development
- Experience in Spark
- Experience in Scala
- Experience in Spark SQL
Why OTSi!
Working with OTSi gives you the assurance of a successful, fast-paced career.
Exposure to infinite opportunities to learn and grow, familiarization with cutting-edge technologies, cross-domain experience and a harmonious environment are some of the prime attractions for a career-driven workforce.
Join us today, as we assure you 2000+ friends and a great career; Happiness begins at a great workplace..!
Feel free to refer this opportunity to your friends and associates.
About OTSI: (CMMI Level 3): Founded in 1999 and headquartered in Overland Park, Kansas, OTSI offers global reach and local delivery to companies of all sizes, from start-ups to Fortune 500s. Through offices across the US and around the world, we provide universal access to exceptional talent and innovative solutions in a variety of delivery models to reduce overall risk while optimizing outcomes & enabling our customers to thrive in a global economy.http://otsi-usa.com/?page_id=2806">
OTSI's global presence, scalable and sustainable world-class infrastructure, business continuity processes, ISO 9001:2000, CMMI 3 certifications makes us a preferred service provider for our clients. OTSI has the expertise in different technologies enhanced by our http://otsi-usa.com/?page_id=2933">partnerships and alliances with industry giants like HP, Microsoft, IBM, Oracle, and SAP and others. Highly repetitive local company with a proven success of serving the UAE Government IT needs is seeking to attract, employ and develop people with exceptional skills who want to make a difference in a challenging environment.Object Technology Solutions India Pvt Ltd is a leading Global Information Technology (IT) Services and Solutions company offering a wide array of Solutions for a range of key Verticals. The company is headquartered in Overland Park, Kansas, and has a strong presence in US, Europe and Asia-Pacific with a Global Delivery Center based in India. OTSI offers a broad range of IT application solutions and services including; e-Business solutions, Enterprise Resource Planning (ERP) implementation and Post Implementation Support, Application development, Application Maintenance, Software customizations services.
OTSI Partners & Practices
- SAP Partner
- Microsoft Silver Partner
- Oracle Gold Partner
- Microsoft CoE
- DevOps Consulting
- Cloud
- Mobile & IoT
- Digital Transformation
- Big data & Analytics
- Testing Solutions
OTSI Honor’s & Awards:
- #91 in Inc.5000 .
- Fastest growing IT Companies in Inc.5000…
JD:
Required Skills:
- Intermediate to Expert level hands-on programming using one of programming language- Java or Python or Pyspark or Scala.
- Strong practical knowledge of SQL.
Hands on experience on Spark/SparkSQL - Data Structure and Algorithms
- Hands-on experience as an individual contributor in Design, Development, Testing and Deployment of Big Data technologies based applications
- Experience in Big Data application tools, such as Hadoop, MapReduce, Spark, etc
- Experience on NoSQL Databases like HBase, etc
- Experience with Linux OS environment (Shell script, AWK, SED)
- Intermediate RDBMS skill, able to write SQL query with complex relation on top of big RDMS (100+ table)