Roles and Responsibilities
- Managing Availability, Performance, Capacity of infrastructure and applications.
- Building and implementing observability for applications health/performance/capacity.
- Optimizing On-call rotations and processes.
- Documenting “tribal” knowledge.
- Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
- Providing help in onboarding new services with production readiness review process.
- Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
- Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
- Working with Dev team to have in depth understanding of the application architecture
and its bottlenecks.
- Identifying observability gaps in product services, infrastructure and working with stake
owners to fix it.
- Managing Outages and doing detailed RCA with developers and identifying ways to
avoid that situation.
- Managing/Automating upgrades of the infrastructure services.
- Automate toil work.
Experience & Skills
- 6+ years of total experience
- Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
- A collaborative spirit with the ability to work across disciplines to influence, learn, and
- A deep understanding of computer science, software development, and networking principles.
- Demonstrated experience with languages, such as Python, Java, Golang etc.
- Extensive experience with Linux administration and good understanding the various
linux kernel subsystems (memory, storage, network etc).
- Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
- Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
- Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
- Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure
solutions like Microsoft Azure or Google Cloud.
- Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,
- Experience in managing and deploying containerized environments using Docker,
Mesos/Kubernetes is a plus.
• Run the production environment by monitoring availability and taking a holistic view of
• Build software and systems to manage platform infrastructure and applications
• Improve reliability, quality, and time-to-market of our suite of software solutions
• Measure and optimize system performance, with an eye toward pushing our capabilities
forward, getting ahead of customer needs, and innovating to continually improve
• Provide primary operational support and engineering for multiple large distributed
• Drive cross-team alignment across development teams around reliability initiatives
The ideal candidate must -
• Bachelor’s degree in computer science or other highly technical, scientific discipline
• Ability to program (structured and OO) with one or more high level languages, such as
• Good experience with microservices architecture and serverless technologies
• Exposure to event driven architecture and state machines
• A proactive approach to spotting problems, areas for improvement, and performance
Site Reliability Engineer (SRE)
Vonage Engineering Mission: Vonage is the emerging leader in the $100B+ cloud communications platform (CPaaS) market.
Customers like Airbnb, Viber, Whatsapp, Snapchat, and many others depend on our APIs and SDKs to connect with their customers all over the world. As businesses continue to shift to a real-time, customer-centric communications model, we are experiencing a time of impressive growth.
Why this role matters:
Vonage, a leader in cloud communications, is looking to build a new SRE team in Bangalore.
We believe that there shouldn’t be walls between operations and development and we have embraced the DevOps movement.
As a Site Reliability Engineer, you will work as part of the development team to build automation and tools to deploy, monitor and maintain the platform's health, targeted SLO and SLAs.
What you'll do
● Lead the effort in ensuring reliability of the platform.
● Create Software and Tooling that improves performance, stability, and reliability of the
● Ability to work as part of a Development Team.
● Monitor Application Metrics to help with improving software performance.
● Build solutions that are highly resilient, scalable, and secure.
● Have a wide breadth of knowledge from software, infrastructure, and security.
● Adopt best practices and champion an engineering culture emphasizing Agile.
What's required for application
● Proven experience building, supporting, and architecting high-availability cloud
● Experience working on monitoring, logging. and alerting solutions and used tools.
● Experience with tooling such as Terraform, Ansible, Docker, Kubernetes, and Chef.
● Fluent and comfortable working with Cloud Infrastructure.
● Ability to read, write, and troubleshoot software code.
● Good understanding of CI/CD tools.
● Champion of devsecops using tools such as Hashicorp Vault, KMS, Secrets Manager,
● Experience with software development, algorithms, data structures, and systems design.
● Understand monitoring tools such as DataDog, ELK, and Grafana.
● Bachelor's degree (or higher) in Computer Science and/or related
Nice to have, but not required
● Working knowledge on other AWS services like Glacier, Elastic Container Service (ECS),
● Elastic MapReduce (EMR), DynamoDB etc.
● Automation and Orchestration tools such as Jenkins
● Ruby or Java development skills
● Data Pipeline knowledge, especially with tools like MapReduce, Kafka and ELK stack
A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud. We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices. We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.
We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.
This person MUST have:
- BE Computer Science or equivalent
- Cloud app development experience.
- Strong Troubleshooting and debugging skills
- A strong passion for writing simple, clean, and efficient code.
- 3 years of experience with the Django framework and other backend technologies.
- Knowledge of NodeJS
- Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
- Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
- Experience creating high-performance applications.
- Experience with messaging and broker tools - Rabbitmq, MQTT
- Experience with SQL and NoSQL databases
- Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
- Knowledge of web services
- Proficient understanding of code versioning tools Git.
- Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
- Experience managing AWS infrastructure.
- Hands-on experience in Linux environment.
- Basic understanding of Kubernetes/Docker orchestration.
- Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
- Experience with scrum or other agile software development methodology.
- Excellent verbal and written communication, teamwork, decision making and influencing skills.
- Handle customer calls/emails regarding technical issues for end-users.
- Strong communication skills
- Attention to detail.
- Min 3 year experience
- Ahmedabad Office Or,
- Work from home
- 40 hours a week with a rotational shift every month.
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
- We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period
Founded by a passionate team of serial entrepreneurs and alumni of IIT Delhi, U.C Berkeley, and well-known tech companies such as Uber and Zomato.
Sourcewiz is on a mission to increase India’s export GDP. This is a unique opportunity to
join a funded early-stage startup and have a massive impact on our product, culture, and
direction. It's a lot of work and a roller coaster ride. But, if you are up for it, you can join us
in replacing the tiresome and slow sales process for importers and exporters and have a
significant impact on our customers. We are not a company that believes engineers should be hidden away from decisions, churning out code for features decided from upon high. Instead, our Engineers form strong bonds with cross-functional peers in Product Management, Product Design and others to become experts in their product domain.
We’re looking for people with a strong interest in building successful products or systems;
are comfortable in dealing with lots of moving pieces; have exquisite attention to detail, and
comfortable learning new technologies and systems.
As a Site Reliability Engineer at Sourcewiz, you will...
• Own and improve the scalability and reliability of our products
• Working directly with product engineering team
• Work with RDBMS, Search, Caching and queuing
• Contribute expertise towards architectural planning and ensure the company builds
sustainable services that meet our customer expectations while leveraging appropriate
tools and frameworks.
• Ongoing participation in the review and testing
JD: Site Reliability Engineers
Location: PUNE, Remote
Sarvaha would like to welcome experienced SRE specialists with minimum of 5 years of professional experience in Google Cloud Platform or AWS based deployments and automation. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. Your will be expected to work with a globally distributed team and contribute independently as well as lead a team of engineers. This is a hands-on position that would require you to be responsible for production software deployments across global availability zones.
- Design, write and run services that provide visibility into a leading IoT platform & underlying services
- Automate deployments, diagnostic and debugging tools
- Participate in on-call rotations
- Adhere to industry-standard security best practices
- Work with other teams in troubleshooting and keeping the systems up and running
- Minimum Bachelor’s Degree in Computer Science or related degree
- Minimum 5+ years of total experience with at least 4 years of experience in SRE, DevOps or similar role. More experience in highly desired
- 4+ years of hands-on experience with one of AWS/Azure/GCP is must have for this position
- 1+ years of experience debugging code written in Python, Java or any strongly typed language
- 3+ years of experience with Kubernetes, Prometheus, ELK, Grafana, Nagios
- 2+ years of experience with Jenkins or similar build and deploy orchestration tool
- 2+ years of experience with RDBMs and no-SQL databases (MySQL, Oracle, Cassandra, CDH)
- 1+ years of experience writing infrastructure as code using Terraform
- Excellent verbal and written communication and strong interpersonal skills are requisite for success of this position
- Strong listening and interpersonal skills and attention to details is highly desired
- Top-notch remuneration with non-linear growth
- Work with industry best cloud architects, DevOPs team and developers
- Excellent, no-nonsense work environment with the very best people to work with
- Cutting edge work with Fortune 500 businesses and learn from high-visibility systems that drive public facing, high-traffic systems
The role Data Lead is responsible for handling the data journey in a product, handling aspects related to data security, data acquisition/retrieval, data massaging etc.
How You Will Make an Impact:
Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
Ensuring the Innovapptive products to be data enrich & data-efficient.
What You Bring to the Team:
A seasoned data engineer with a solid understanding of how data-rich SAAS products retrieve and consume data.
To be successful in this role, we believe that you need to possess the following attributes.
- Bachelor's Degree in IT or Computers Engineering or equivalent degree in Computer Science
- 7-12 years of relevant experience
- This position addresses cloud data operations and classical database developer needs.
- Cloud Data Operations: Hands-on experience with Cloud Data Services on AWS (AWS RDS (MySQL, SQL Server) knowledge of latest cloud database service like Aurora server less DB etc.
- Hands-on experience in: Design stable, reliable and effective databases
- Provisioning cloud (AWS) DB services.
- Installing DB servers on AWS (IAAS model).
- Blob storage (S3, EBS EFS etc.)
- Optimizing DB services.
- Performance tuning, DB service optimization.
- Building fault-tolerant cloud data services.
- Experience with NoSQL technologies (documentDB, NoSQL), creating maintaining and consuming on cloud (AWS)
- Cloud Data security
- Hands-on experience with handling large data sets/transactions and operations.
- Exposure to data analytics and associated tools (Athena)
- Experience in handling Data Strategies, data life cycles in SAAS products.
- Exposure to cloud (AWS) networking.
- Query planning and optimization.
- Knowledge of GDPR, physical/logical/conceptual data segregation in multi-tenant applications.
- Data Modeling
- Enforcing the appropriate security compliance in Customer environments as agreed with the client’s Information Security Council
- Excellent verbal and written communication skills
(deployment, troubleshooting, maintenance,
Helm charts) and Deployment and administration
of one or more of: ELK stack, Kafka, Prometheus
or Grafana with Working knowledge of at least
one cloud platform (GCP, AWS or Azure) & some
configuration management system (such as Salt
or Ansible).Good understanding of networking
concepts (architecture, components, protocols)
& Solid understanding of OS concepts and
internals of Linux is a must.
Biostrap is based in Los Angeles, California with our team working remotely in several countries around the globe. This is a remote position, you’ll need a computer and a high speed internet connection.
We are looking for the tough kinds, the warrior ones, always learning Sr. Devops Engineers to take care of our infrastructure and site reliability @ Biostrap. As an engineer at Biostrap, you will be a part of a lean but extremely passionate team of engineers and work towards making and keeping Biostrap as the go-to best health platform
Responsibilities: What would the job be like?
- Work closely with the engineering team to deploy and maintain the infrastructure.
- Add automation at every part of the development and deployment lifecycle.
- Analyze and help in Infrastructure cost optimizations.
- Build and work with CI + CD workflows..
- Build robust observability system for system monitoring and tracing.
- Architect scalable logging servers.
- Add extensive alerting systems for various important issues, events using monitoring and logging services.
- Work with other engineers in developing architecture that is scalable and resilient to changes in product requirements and usage in an agile environment.
- Security Hardening of cloud infrastructure against known/unknown vulnerabilities
- Write Infrastructure as Code for most of the cloud.
- Suggest and implement pragmatic changes to infrastructure to increase performance, resilience and availability and to fool-proof infrastructure for future.
- Build auditing systems for various resource accesses and have a breach detection notification system.
- Do periodic security reviews and implement improvements.
- Be incharge of and manage deployments of various services.
- Work with aws resources, containers and systems like Ansible/EKS/kubernetes.
Qualifications: Who should apply for this role?
- You have 3+ years of working in small to medium size teams building and shipping products.
- Good experience managing various AWS resources.
- Well equipped with Linux and Bash/Shell scripting
- Working knowledge of Docker or container management.
- Have some development experience with Kubernetes.
- You spin out containers as if it's your fantasy war ground.
- Understand deployment tools like Ansible or similar.
- Built and worked with CI+CD systems like Gitlab Ci, Jenkins, CircleCi, Travis etc.
- Working knowledge of GIT for version control.
- Experience with database management and security.
- Experience with Terraform for Infrastructure as Code.
- Knowledge of configuration management and secrets/keys management services like AWS KMS, Vault etc.
- Required to be proficient in English (both speaking and writing).
Brownie Points for (:D):
- You already use Biostrap and have plenty of feedback to provide.
- You can lecture developers on scalable infrastructures.
- You have built or worked with Prometheus, Grafana, ELK systems.
- You have a story to tell about how you managed a failure or was part of a disaster recovery.
- You contribute to Open Source projects or have a good Github/GitLab presence to showcase your past projects.
- You have sent your code to Space and it runs “a” Rover on Mars. :P
- We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
- Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
- communication and collaboration.
- Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
- Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
- DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
- Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
- Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
- 5+ years experience in a Software and/or Site Reliability Engineering role
- Experience writing automation code in GoLang, Python or Java
- Experience developing and operating large scale distributed systems with Kubernetes and Docker
- Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
- Experience running public cloud environments on AWS
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- Bachelor degree in Engineering, Computer Science or equivalent experience
- The ability to lead, partner, and collaborate cross functionally across an engineering organization
• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.
What we’re looking for
• An AWS Certified Engineer with strong skills in
o *nix and shell scripting
• Preferably with experience in:
o Circle CI
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.