Senior Engineer - Cloud Reliability

at Searce Inc

DP
Posted by Reena Bandekar
icon
Pune
icon
5 - 8 yrs
icon
₹10L - ₹17L / yr
icon
Full time
Skills
DevOps
Terraform
Ansible
Puppet
Reliability engineering
Docker
Software deployment
Application server
IT infrastructure
Technical support
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
Experience :
● 4-8 years experience in Cloud Infrastructure and Operations domains
● Experience with Linux systems and/OR Windows servers
● Specialize in one or two cloud deployment platforms: AWS, GCP, Azure
● Hands on experience with AWS services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)
● Experience with one or more programming languages (Python, JavaScript, Ruby, Java,
.Net)
● Good understanding of Apache Web Server, Nginx, MySQL, MongoDB, Nagios
● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)
● DevOps Technologies
● Knowledge on Configuration Management tools such as Ansible, Terraform, Puppet,
Chef
● Experience working with deployment and orchestration technologies (such as Docker,
Kubernetes, Mesos)

About Searce Inc

Searce is a cloud, automation & analytics led process improvement company helping futurify businesses. Searce is a premier partner for Google Cloud for all products and services. Searce is the largest Cloud Systems Integrator for enterprises with the largest # of enterprise Google Cloud clients in India.

 

Searce specializes in helping businesses move to cloud, build on the next generation cloud, adopt SaaS - Helping reimagine the ‘why’ & redefining ‘what’s next’ for workflows, automation, machine learning & related futuristic use cases. Searce has been recognized by Google as one of the Top partners for the year 2015, 2016.

 

Searce's organizational culture encourages making mistakes and questioning the status quo and that allows us to specialize in simplifying complex business processes and use a technology agnostic approach to create, improve and deliver.

 

Founded
2004
Type
Products & Services
Size
100-1000 employees
Stage
Profitable
View full company details
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

Site Reliability Engineer/DevOps

at Digital B2B Platform

Agency job
via Jobdost
DevOps
Python
CI/CD
Linux/Unix
Git
SQL
Amazon Web Services (AWS)
Ansible
MySQL
Kubernetes
Terraform
icon
Bengaluru (Bangalore)
icon
3 - 4 yrs
icon
₹15L - ₹30L / yr
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Job posted by
Shalaka ZawarRathi

Senior Site Reliability Engineer

at One of the largest Equity broking House in India

Agency job
via HyrHub
Reliability engineering
SRE
DevOps
Amazon Web Services (AWS)
Ansible
Terraform
Kubernetes
Git
helm
icon
Mumbai, Bengaluru (Bangalore)
icon
4 - 8 yrs
icon
₹15L - ₹20L / yr
Common roles and responsibilities:
● Be on a PagerDuty rotation to respond to availability incidents and provide support
for service engineers.
● Run the production environment by monitoring availability and taking a holistic view
of system health
● Building and implementing services to make IT and support better at their jobs.
● Improve reliability, quality, and time-to-market of our suite of software solutions
● Measure and optimize system performance, with an eye toward pushing our
capabilities forward, getting ahead of customer needs, and innovating to continually
improve
● Gather and analyze metrics from both operating systems and applications to assist in
performance tuning and fault finding
● Experience from an agile working development environment
● Participate in system design consulting, platform management, and capacity planning
● Balance feature development speed and reliability with well-defined service level
objectives
Required Skills and Qualifications:
● 3+ years of experience working within DevOps or SRE teams.
● 3+ years experience with AWS Cloud
● Ability to program (structured and OO) with one or more high level languages, such
as Python, Go, Java, and JavaScript
● Must have experience with Ansible, Helm, Terraform and Kubernetes.
● Document every action so your findings turn into repeatable actions–and then into
automation.
● Hands-on experience with Distributed Version Control System such as GIT, AWS
CodeCommit or equivalent
● Know your way around Linux and the Unix Shell.
● Experience or familiarity with ELK stack
● Ability to use Azure DevOps
● Experience with distributed storage technologies like NFS, Ceph, S3 as well as
dynamic resource management frameworks (Mesos, Kubernetes)
● A proactive approach to spotting problems, areas for improvement, and performance
bottlenecks
Job posted by
Ashwitha Naik

Site Reliability Engineer

at Vonage (A Ericsson Company)

Agency job
via AVI Consulting LLP
Terraform
Chef
Ansible
Docker
Kubernetes
CI/CD
KMS
Hashikorp Vault
Grafana
ELK
Datadog
icon
Remote only
icon
4 - 12 yrs
icon
₹15L - ₹25L / yr
http://www.vonage.com" target="_blank">www.vonage.com

Site Reliability Engineer (SRE)
Vonage Engineering Mission: Vonage is the emerging leader in the $100B+ cloud communications platform (CPaaS) market.

Customers like Airbnb, Viber, Whatsapp, Snapchat, and many others depend on our APIs and SDKs to connect with their customers all over the world. As businesses continue to shift to a real-time, customer-centric communications model, we are experiencing a time of impressive growth.

Why this role matters:
Vonage, a leader in cloud communications, is looking to build a new SRE team in Bangalore.

We believe that there shouldn’t be walls between operations and development and we have embraced the DevOps movement.

As a Site Reliability Engineer, you will work as part of the development team to build automation and tools to deploy, monitor and maintain the platform's health, targeted SLO and SLAs.

What you'll do
● Lead the effort in ensuring reliability of the platform.
● Create Software and Tooling that improves performance, stability, and reliability of the
platform.
● Ability to work as part of a Development Team.
● Monitor Application Metrics to help with improving software performance.
● Build solutions that are highly resilient, scalable, and secure.
● Have a wide breadth of knowledge from software, infrastructure, and security.
● Adopt best practices and champion an engineering culture emphasizing Agile.
What's required for application
● Proven experience building, supporting, and architecting high-availability cloud
infrastructure.
● Experience working on monitoring, logging. and alerting solutions and used tools.
● Experience with tooling such as Terraform, Ansible, Docker, Kubernetes, and Chef.
● Fluent and comfortable working with Cloud Infrastructure.
● Ability to read, write, and troubleshoot software code.
● Good understanding of CI/CD tools.
● Champion of devsecops using tools such as Hashicorp Vault, KMS, Secrets Manager,
● Experience with software development, algorithms, data structures, and systems design.
● Understand monitoring tools such as DataDog, ELK, and Grafana.
● Bachelor's degree (or higher) in Computer Science and/or related
work experience.

www.vonage.com

Nice to have, but not required
● Working knowledge on other AWS services like Glacier, Elastic Container Service (ECS),
● Elastic MapReduce (EMR), DynamoDB etc.
● Automation and Orchestration tools such as Jenkins
● Ruby or Java development skills
● Data Pipeline knowledge, especially with tools like MapReduce, Kafka and ELK stack
Job posted by
Ashesh Shah

Site Reliability Engineer

at A startup company providing AI based software platforms

Agency job
via zyoin
Site Reliability
DevOps
Docker
Kubernetes
Python
Amazon Web Services (AWS)
Reliability engineering
icon
Remote, Bengaluru (Bangalore)
icon
3 - 7 yrs
icon
₹10L - ₹30L / yr

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Job posted by
RAKESH RANJAN
DevOps
Terraform
Ansible
CI/CD
Linux administration
Kubernetes
Amazon Web Services (AWS)
Puppet
Chef
Python
Java
Go Programming (Golang)
icon
Bengaluru (Bangalore)
icon
6 - 11 yrs
icon
₹20L - ₹38L / yr

 

Roles and Responsibilities

  • Managing Availability, Performance, Capacity of infrastructure and applications.
  • Building and implementing observability for applications health/performance/capacity.
  • Optimizing On-call rotations and processes.
  • Documenting “tribal” knowledge.
  • Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
  • Providing help in onboarding new services with production readiness review process.
  • Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
  • Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
  • Working with Dev team to have in depth understanding of the application architecture

          and its bottlenecks.

  • Identifying observability gaps in product services, infrastructure and working with stake

          owners to fix it.

  • Managing Outages and doing detailed RCA with developers and identifying ways to

          avoid that situation.

  • Managing/Automating upgrades of the infrastructure services.
  • Automate toil work.
  •  

Experience & Skills

  • 6+ years of total experience
  • Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
  • A collaborative spirit with the ability to work across disciplines to influence, learn, and

         deliver.

  • A deep understanding of computer science, software development, and networking principles.
  • Demonstrated experience with languages, such as Python, Java, Golang etc.
  • Extensive experience with Linux administration and good understanding the various

linux kernel subsystems (memory, storage, network etc).

  • Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
  • Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
  • Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
  • Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure

solutions like Microsoft Azure or Google Cloud.

  • Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,

Argo etc.

  • Experience in managing and deploying containerized environments using Docker,

Mesos/Kubernetes is a plus.

Job posted by
RAKESH RANJAN

Site Reliability Engineer

at Market Pulse

Founded 2016  •  Product  •  20-100 employees  •  Profitable
Linux administration
Troubleshooting
Distributed Systems
Network Security
Infrastructure management
Google Cloud Platform (GCP)
icon
Remote only
icon
3 - 10 yrs
icon
₹10L - ₹50L / yr

We are looking for site reliability engineers or system admins, who

have experience and expertise in managing production servers at

scale.

You will be

‣ Helping us meet 99.999% availability across all our systems

‣ Working on Linux, Google Cloud, Netmagic and occasionally on windows

‣ Required to understand or pick up distributed systems, security networking, linux and infrastructure automation

‣ Designing, configuring, securing, monitoring, troubleshooting,

maintaining for our core production infrastructure

‣ Working closely with developers and automation engineers


That’s it.

Job posted by
Prabu Selvan

Site Reliability Engineer - Product

at A listed product development organization

Agency job
via RS Consultants
Amazon Web Services (AWS)
Kubernetes
Ansible
Prometheus
Grafana
Pagerduty
EKS
icon
Pune
icon
4 - 8 yrs
icon
₹15L - ₹15L / yr

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

 

About the Organization:

A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

 

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

 

Day-to-day responsibilities

 

  • Ensure the operational integrity of the global infrastructure
  • Design repeatable continuous integration and delivery systems
  • Test and measure new methods, applications and frameworks
  • Analyze and leverage various AWS-native functionality
  • Support and build out an on-premise data center footprint
  • Provide support and diagnose issues to other teams related to our infrastructure
  • Participate in 24/7 on-call rotation (If Required)

 

Candidate's Profile:

 

 

  • Expert-level administrator of Linux-based systems
  • Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
  • Experience with production deployments of Kubernetes Cluster
  • Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
  • Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
  • Experience in Distributed storage systems such as Ceph or GlusterFS.
  • Experience in virtualisation with KVM, Ovirt and OpenStack.
  • Hands-on experience with configuration management systems such as Terraform and Ansible
  • Bash and Python Scripting Expertise
  • Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
  • Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
  • Experience managing hundreds to thousands of servers globally
  • Enjoy automating tasks, rather than repeating them
  • Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
  • Strong verbal and written communication skills
  • Ability to adapt to a rapidly changing environment
  • Comfortable collaborating and supporting a diverse team of engineers
  • Ability to troubleshoot problems in complex systems
  • Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Job posted by
Biswadeep RS

Senior DevOps Engineer

at Biostrap

Founded 2016  •  Products & Services  •  20-100 employees  •  Bootstrapped
Amazon Web Services (AWS)
DevOps
Terraform
Kubernetes
Python
Go Programming (Golang)
Shell Scripting
Javascript
Docker
Ansible
System Administration
Elastic Search
Monitoring
Amazon RDS
MySQL
SQL
Prometheus
ELK
Grafana
icon
Remote only
icon
4 - 10 yrs
icon
₹12L - ₹30L / yr

Hey there!

 

Biostrap is based in Los Angeles, California with our team working remotely in several countries around the globe. This is a remote position, you’ll need a computer and a high speed internet connection.

 

We are looking for the tough kinds, the warrior ones, always learning  Sr. Devops Engineers to take care of our infrastructure and site reliability @ Biostrap. As an engineer at Biostrap, you will be a part of a lean but extremely passionate team of engineers and work towards making and keeping Biostrap as the go-to best health platform

 

Responsibilities: What would the job be like?

  • Work closely with the engineering team to deploy and maintain the infrastructure.
  • Add automation at every part of the development and deployment lifecycle.
  • Analyze and help in Infrastructure cost optimizations.
  • Build and work with CI + CD workflows..
  • Build robust observability system for system monitoring and tracing.
  • Architect scalable logging servers.
  • Add extensive alerting systems for various important issues, events using monitoring and logging services.
  • Work with other engineers in developing architecture that is scalable and resilient to changes in product requirements and usage in an agile environment.
  • Security Hardening of cloud infrastructure against known/unknown vulnerabilities
  • Write Infrastructure as Code for most of the cloud.
  • Suggest and implement pragmatic changes to infrastructure to increase performance, resilience and availability and to fool-proof infrastructure for future.
  • Build auditing systems for various resource accesses and have a breach detection notification system.
  • Do periodic security reviews and implement improvements.
  • Be incharge of and manage deployments of various services.
  • Work with aws resources, containers and systems like Ansible/EKS/kubernetes.

 

Qualifications: Who should apply for this role?

  • You have 3+ years of working in small to medium size teams building and shipping products.
  • Strong grasp of at least one of the scripting or systems languages like Python, Javascript, Golang etc.
  • Good experience managing various AWS resources.
  • Well equipped with Linux and Bash/Shell scripting
  • Working knowledge of Docker or container management.
  • Have some development experience with Kubernetes.
  • You spin out containers as if it's your fantasy war ground. 
  • Understand deployment tools like Ansible or similar.
  • Built and worked with CI+CD systems like Gitlab Ci, Jenkins, CircleCi, Travis etc.
  • Working knowledge of GIT for version control.
  • Experience with database management and security.
  • Experience with Terraform for Infrastructure as Code.
  • Knowledge of configuration management and secrets/keys management services like AWS KMS, Vault etc.
  • Required to be proficient in English (both speaking and writing).

 

 

Brownie Points for (:D):

  • You already use Biostrap and have plenty of feedback to provide.
  • You can lecture developers on scalable infrastructures.
  • You have built or worked with Prometheus, Grafana, ELK systems.
  • You have a story to tell about how you managed a failure or was part of a disaster recovery.
  • You contribute to Open Source projects or have a good Github/GitLab presence to showcase your past projects.
  • You have sent your code to Space and it runs “a” Rover on Mars. :P
Job posted by
Anirban Das
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
Software Development
AWS CloudFormation
Algorithms
Data Structures
Python
Powershell
DynamoDB
MySQL
icon
Hyderabad
icon
5 - 11 yrs
icon
₹10L - ₹20L / yr
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Job posted by
Pradeep Kumar Burra
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
Terraform
Cloudformation
Amazon Web Services (AWS)
Python
JIRA
Perl
Powershell
Bash
Groovy
icon
Remote only
icon
5 - 11 yrs
icon
₹10L - ₹17L / yr
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Job posted by
Mohammad Farooq Shaik
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Searce Inc?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort