Cutshort logo
Digital B2B Platform logo
Site Reliability Engineer/DevOps
Site Reliability Engineer/DevOps
Digital B2B Platform's logo

Site Reliability Engineer/DevOps

Agency job
3 - 4 yrs
₹15L - ₹30L / yr
Bengaluru (Bangalore)
Skills
DevOps
skill iconPython
CI/CD
Linux/Unix
skill iconGit
SQL
skill iconAmazon Web Services (AWS)
Ansible
MySQL
skill iconKubernetes
Terraform
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Digital B2B Platform

Founded
Type
Size
Stage
About
N/A
Company social profiles
N/A

Similar jobs

Coinfantasy
Indira Priyadharshini
Posted by Indira Priyadharshini
Chennai
5 - 12 yrs
₹25L - ₹45L / yr
skill iconAmazon Web Services (AWS)
Windows Azure
Blockchain
Web3js
DevOps
+1 more

CoinFantasy is looking for a tech enthusiast working primarily on blockchain technology to be part of the core blockchain team at CoinFantasy. You would be a part of the Roadmap team that is working on the architecture, design, development, and deployment of our decentralised platform.


Your primary responsibilities would be analysing requirements, designing blockchain technology around a certain business model, and writing smart contracts.

  

Job Responsibilities


  • Administer our blockchain, database, and DevOps infrastructure.
  • Cross team collaboration to coordinate safe, efficient releases.
  • Build complex pipelines for
  • Databases, Messaging, Storage, Compute in AWS.
  • Build deployment pipeline with Github CI (Actions).
  • Build tools to reduce occurrences of errors and improve our protocols.
  • Develop software to integrate with internal back-end systems.
  • Perform root cause analysis for production errors.
  • Investigate and resolve technical issues.
  • Design procedures for system troubleshooting and maintenance.


Requirements


  • 8+ years of Experience working with DevOps, Infrastructure, Site Reliability or Cloud Engineering
  • Understanding the entire tech stack of Blockchain Dapps 
  • Strong experience working with any configuration management tools 
  • Languages: Any modern programming language 
  • Experience working with some of the major public clouds. e.g. AWS, Azure
  • Competent with the “basics”: E.g. Computer Networking
  • Self-motivated individual with enthusiasm for learning and building things
  • Collaborative, communicative, and confident in their abilities to work well with all team members at all seniority and skill levels
  • Hands-on experience with Rust/Substrate and Contribution to open-source blockchain projects is an added advantage






About Us

CoinFantasy is a Play to Invest platform that brings the world of investment to users through engaging games. With multiple categories of games, it aims to make investing fun, intuitive, and enjoyable for users.

It features a sandbox environment in which users are exposed to the end-to-end investment journey without risking financial losses.


Website: https://www.coinfantasy.io/

Benefits

  • Competitive Salary
  • An opportunity to be part of the Core team in a fast-growing company
  • A fulfilling, challenging and flexible work experience
  • Practically unlimited professional and career growth opportunities


Read more
CodeCraft Technologies Private Limited
Priyanka Praveen
Posted by Priyanka Praveen
Bengaluru (Bangalore), Mangalore
7 - 12 yrs
Best in industry
CI/CD
skill iconGitHub
DevOps

Position: SRE/ DevOps

Experience: 6-10 Years

Location: Bengaluru/Mangalore

 

CodeCraft Technologies is a multi-award-winning creative engineering company offering design and technology solutions on mobile, web and cloud platforms.

 

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with the development team to build and maintain scalable infrastructure, implement best practices in CI/CD, and contribute to the overall stability of our technology stack.

 

 

Roles and Responsibilities:

·       CI/CD and DevOps:

o  Implement and maintain robust Continuous Integration/Continuous Deployment (CI/CD) pipelines to ensure efficient and reliable software delivery.

o  Collaborate with development teams to integrate DevOps principles into the software development lifecycle.

o  Experience with pipelines such as Github actions, GitLab, Azure DevOps,CircleCI is a plus.

·       Test Automation:

o  Develop and maintain automated testing frameworks to validate system functionality, performance, and reliability.

o  Collaborate with QA teams to enhance test coverage and improve overall testing efficiency.

·       Logging/Monitoring:

o  Design, implement, and manage logging and monitoring solutions to proactively identify and address potential issues.

o  Respond to incidents and alerts to ensure system uptime and performance.

·       Infrastructure as Code (IaC):

o  Utilize Terraform (or other tools) to define and manage infrastructure as code, ensuring scalability, security, and consistency across environments.

·       Elastic Stack:

o  Implement and manage Elastic Stack (ELK) for log and data analysis to gain insights into system performance and troubleshoot issues effectively.

·       Cloud Platforms:

o  Work with cloud platforms such as AWS, GCP, and Azure to deploy and manage scalable and resilient infrastructure.

o  Optimize cloud resources for cost efficiency and performance.

·       Vulnerability Management:

o  Conduct regular vulnerability assessments and implement measures to address and remediate identified vulnerabilities.

o  Collaborate with security teams to ensure a robust security posture.

·       Security Assessment:

o  Perform security assessments and audits to identify and address potential security risks.

o  Implement security best practices and stay current with industry trends and emerging threats.

o  Experience with tools such as GCP Security Command Center, and AWS Security Hub is a plus.

·       Third-Party Hardware Providers:

o  Collaborate with third-party hardware providers to integrate and support hardware components within the infrastructure.


Desired Profile:

·       The candidate should be willing to work in the EST time zone, i.e. from 6 PM to 2 AM.

·       Excellent communication and interpersonal skills

·       Bachelor’s Degree

·       Certifications related to this field shall be an added advantage.


Read more
Pune
4 - 8 yrs
₹15L - ₹15L / yr
skill iconAmazon Web Services (AWS)
skill iconKubernetes
Ansible
Prometheus
Grafana
+2 more

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

 

About the Organization:

A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

 

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

 

Day-to-day responsibilities

 

  • Ensure the operational integrity of the global infrastructure
  • Design repeatable continuous integration and delivery systems
  • Test and measure new methods, applications and frameworks
  • Analyze and leverage various AWS-native functionality
  • Support and build out an on-premise data center footprint
  • Provide support and diagnose issues to other teams related to our infrastructure
  • Participate in 24/7 on-call rotation (If Required)

 

Candidate's Profile:

 

 

  • Expert-level administrator of Linux-based systems
  • Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
  • Experience with production deployments of Kubernetes Cluster
  • Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
  • Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
  • Experience in Distributed storage systems such as Ceph or GlusterFS.
  • Experience in virtualisation with KVM, Ovirt and OpenStack.
  • Hands-on experience with configuration management systems such as Terraform and Ansible
  • Bash and Python Scripting Expertise
  • Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
  • Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
  • Experience managing hundreds to thousands of servers globally
  • Enjoy automating tasks, rather than repeating them
  • Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
  • Strong verbal and written communication skills
  • Ability to adapt to a rapidly changing environment
  • Comfortable collaborating and supporting a diverse team of engineers
  • Ability to troubleshoot problems in complex systems
  • Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Read more
Smarsh
at Smarsh
1 recruiter
Nichell Dsouza
Posted by Nichell Dsouza
Bengaluru (Bangalore)
9 - 15 yrs
₹40L - ₹50L / yr
Reliability engineering
skill iconKubernetes
IT infrastructure

Company Description

Smarsh is the leader in communications compliance, archiving, and analytics. We provide compliance across the broadest set of communications channels with insights on what’s being captured. Smarsh customers manage over 500 million daily conversations across 80 channels and growing. Customers include the top 10 U.S., top 8 European, top 5 Canadian, and top 3 Asian banks. The Smarsh advantage is customers stay ahead of compliance and uncover patterns and relationships hidden within their data.

At Smarsh , we’ve been helping our customers manage new forms of communication since 1998. We work closely with regulators including the SEC, FINRA, IIROC, and the PRA and FCA, and with our customers, to ensure that they understand the capabilities of today’s technology and that our platform meets their most stringent requirements. Our products include Connected Capture, Connected Archive, Web Archive & Business Solutions.

 

About the team

Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.

Responsibilities

  • Responsible for technical direction at the platform solutions level. Is able to weigh the pros and cons of various solutions and credibly argue for the best path
  • Work closely with Product Management and the rest of the engineering team to define features and their implementations with careful attention to quality, scalability, and maintainability
  • Can break down complex technical solutions into abstractions that the rest of the team and understand
  • Can investigate and solve complex bugs, performance, and scalability issues
  • Collaborates with multiple agile teams to ensure their solutions integrate effectively
  • Track work in ticketing system (JIRA)
  • Participate in Pull Request reviews. Provide and receive feedback to continuously improve.
  • Other duties as assigned.

Desired skills & experience

  • A minimum 10+ years industry experience
  • Masters in CS or equivalent
  • Must have experience in Azure or AWS, either running some large-scale app there or migrating to Azure/AWS. 
  • Experience operating Cloud Foundry in production environments 
  • Experience managing CI/CD systems (Concourse, Jenkins, TravisCI etc.) 
  • Experience deploying and/or operating ELK stack 
  • Experience with container technologies and orchestration platforms (Docker, Kubernetes, Cloud Foundry) 
  • Experience working with monitoring and observability tools (We use Datadog and New Relic) 
  • Familiarity with working with PostgreSQL and MongoDB 
  • Background working in a multi-platform environment (Linux, Windows) 
  • Experience with running on a cloud platform, AWS preferred (S3, RDS, SQS) 
  • Familiarity with Agile/Scrum/Kanban methodologies 
  • Familiarity with programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.) 

Additional Skills

  • Expert programming skills in relevant languages
  • Exceptional analytical and problem-solving skills
  • Strong communication and collaboration skills
  • Deep understanding of modern software architecture
  • Deep domain knowledge of the industry, platform, and existing processes
  • Fault-tolerant design & maintenance
  • Knowledge and understanding of modern software programming/engineering.
  • Product delivery lifecycle - requirement refinement through ops

 

Why Smarsh?

Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?

Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).

Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers

 


Read more
Coredgeio
at Coredgeio
1 recruiter
Abhimanyu Bhatter
Posted by Abhimanyu Bhatter
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
6 - 11 yrs
₹16L - ₹25L / yr
Reliability engineering
skill iconDocker
skill iconKubernetes
DevOps
Site reliability
+6 more
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Read more
Nike
Remote only
5 - 10 yrs
₹20L - ₹30L / yr
Splunk
Site reliability
SRE
DevOps
skill iconAmazon Web Services (AWS)
+5 more
CORE - Site Reliability Engineer with Splunk
 
Within the Site Reliability Engineering our goal is to provide technical
solutions to complex production problems with a focus on reduction of
incident and problem toil, speeding detection and recovery of critical
incidents through observability and continuous improvement through
operational health measurement and sharing.
What You Will Work On
The following are a Site Reliability Engineer’s responsibility for this role but is
not limited to:
Drive reliability throughout the Engineering Organizations through
Observability, informed architectural improvements, and
automation.
Collaborate closely with Engineering teams to build cohesive
service operation solution into the overall service design.
Build and enhance the DevOps process, environment and tool
chains for high service reliability and availability.
Exercise and optimize the service operation process to support the
whole service with all partner teams. Mitigate and recover live site
incident efficiently.
Qualifications
Bachelor’s degree in Computer Science, Engineering, Math,
Science or another technical field
2+ years of working experience in IT industry in supporting large
scale applications/services on platforms like Azure/AWS/GCP.
3+ years of experience in software development automating
business processes using Java, Node or Python on Cloud platform
Experience in supporting high available and scalable systems with
ability to debug/troubleshoot live systems
Adaptive and flexible to manage multiple tasks with changing
priority
Hands on experience with Observability tools like Splunk,
NewRelic, Azure monitor or CloudWatch
2+ years of experience in Incident and problem management
process using tools like Service Now
Read more
Remote, Bengaluru (Bangalore)
3 - 8 yrs
₹15L - ₹30L / yr
skill iconPython
skill iconAmazon Web Services (AWS)
Ansible
Terraform
skill iconDocker
What you’ll do

• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.

What we’re looking for

• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.
Read more
OJAS
Hyderabad
5 - 11 yrs
₹10L - ₹20L / yr
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
+8 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
ScienceLogic
Remote only
5 - 11 yrs
₹10L - ₹17L / yr
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
+9 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Shuttl
at Shuttl
8 recruiters
Tanika Monga
Posted by Tanika Monga
NCR (Delhi | Gurgaon | Noida)
3 - 6 yrs
₹10L - ₹21L / yr
Terraform
skill iconKubernetes
Ansible
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos