Experience automating systems engineering tasks.
Experience in fast-paced and dynamic SRE or Production Support engineering teams
A proven track record of managing successful complex internet-based product platforms/architectures.
Experience building metrics and monitoring platforms and defining alerting strategies.
Strong analytical ability with a focus on making data driven decisions.
Capable of technical deep-dives, yet verbally and cognitively agile enough to hold their own in a strategy discussion with senior technical or executive leadership
Experience working in a managed services environment.
Good communication skills, both written and oral.
Solid understanding of Engineering, DevOps and cloud computing fundamentals.
Good understanding of cloud services including AWS.
Strong automation and CI / CD experience.
Solid experience with containerized applications/orchestration and serverless functions.
GitHub, CD/CI tools experience.
If I asked your previous team members about you, they would say you were a great leader and they would very much welcome an opportunity to work for you once again.
Experience in high SLA environments.
Computer Science, Engineering or Sciences degree required or equivalent work experience.
Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.
What you will do (or learn) :
1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.
What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems.
Founded by a passionate team of serial entrepreneurs and alumni of IIT Delhi, U.C Berkeley, and well-known tech companies such as Uber and Zomato.
Sourcewiz is on a mission to increase India’s export GDP. This is a unique opportunity to
join a funded early-stage startup and have a massive impact on our product, culture, and
direction. It's a lot of work and a roller coaster ride. But, if you are up for it, you can join us
in replacing the tiresome and slow sales process for importers and exporters and have a
significant impact on our customers. We are not a company that believes engineers should be hidden away from decisions, churning out code for features decided from upon high. Instead, our Engineers form strong bonds with cross-functional peers in Product Management, Product Design and others to become experts in their product domain.
We’re looking for people with a strong interest in building successful products or systems;
are comfortable in dealing with lots of moving pieces; have exquisite attention to detail, and
comfortable learning new technologies and systems.
As a Site Reliability Engineer at Sourcewiz, you will...
• Own and improve the scalability and reliability of our products
• Working directly with product engineering team
• Work with RDBMS, Search, Caching and queuing
• Contribute expertise towards architectural planning and ensure the company builds
sustainable services that meet our customer expectations while leveraging appropriate
tools and frameworks.
• Ongoing participation in the review and testing
Nvizion Solutions is looking for the position of Site Reliability Engineer.
If interested, kindly share your resume along with contact details.
Title: Site Reliability Engineer
No. of job openings: 2
Location:Gurgaon/ Hyderabad/ Bengaluru/ Mumbai/Chennai ( Remote location)
Remuneration:Best in the Industry
· Experience required: 2 to 4 yrs in the industry
· Ensuring overall System's reliability
· Add automation and alerting in the system
· Providing Troubleshooting support
· Cross team communications. Working closely with Product team and Customer success team.
· Proactive support - to ensures the system is back to the healthy state
· R&D for new tools/technologies to support product and support team
· Good verbal/written communication to connect with the client.
· Good team player with a zeal to learn new technologies.
· The candidate will be part of the team responsible for 24X7 monitoring of distributed global platform.
- Linux Scripting
- CI/CD knowledge (Jenkins/ BitBucket Pipelie /GitOps)
- Version Control
- Cloud platform knowledge (GCP/AWS/Azure/Digital Ocean)
- Docker, Kubernetes
SRE - Tech Lead (DevOps):
Location: Permanent Work From Home Option
Notice: Candidates with a notice period of 30 days and less and preferred
SRE-DevOps- Tech Lead - JD:
Srijan is hiring for Site Reliability Engineering (SRE), We are looking for SRE/DevOps- Tech Lead or Sr. Tech Lead with strong automation skills and a good understanding of how to build & run secure & reliable platforms for cloud-native applications. Please find below the detailed job description and kindly go through the same for reference:-
Minimum Experience: 6+ years in DevOps/SRE
Permanent WFH option
The focus of this role is to build scalable, resilient, secure infrastructure for cloud-native applications whilst automating every mundane task you could think of and build observability dashboards, set up alerts, etc to provide optics to relevant stakeholders. In a nutshell: “You are keepers of Production environments”. You must be a problem solver with the ability to multitask and come with strong collaboration and communication skills.
Proactively monitor and review application performance
Handle on-call and emergency support
Ensure software has good logging and diagnostics
Create and maintain operational runbooks
Contribute in Solution Designing and evaluating Technical Debt
Set right practices for Well-Defined Architecture & to minimize toil.
Own SLI, SLO configuration as per Error Budget
Maintain production services through measuring and monitoring availability, latency, and overall system health.
Practice sustainable incident response and blameless postmortems.
Not be afraid to contribute changes back to the Software engineering team to improve the systems.
Managing the delivery pipeline into production.
Able to mentor junior members on regular basis
Troubleshooting issues with web applications
Understanding of security principles and best practices
Ensuring that critical data is backed up
Configuration of monitoring systems including infrastructure monitoring and Application Performance Monitoring systems such as New Relic.
Ensuring that web application infrastructure is built
Ability to act as Customer Technical Advocate and negotiate well with peers on technical fronts.
Flexible enough to work in different Shifts for hyper business requirement
Ability to handle multiple global clients on tech front and generate desired reports to represent health of SRE Delivery.
A key skill of a SRE Tech Lead is that they have a deep knowledge of the application, the code, and how it runs, is configured, and scales. That knowledge is what makes them so valuable at also monitoring and supporting it as site reliability engineers.
System administration, security, and networking
The SRE Tech Lead expected to have a good understanding of system administration (Linux or Windows) and networking.
User and Group Management
Knowledge of networking concepts (DNS, TCP/IP, and Firewalls)
Good grasp of fundamental security concepts
Good understanding of infrastructure as code principles.
Knowledge of a scripting language such as Bash
Ability to configure infrastructure using a Configuration Management technology such as Puppet, Chef, or Ansible.
Familiarity with Jenkins or any other CI/CD tool
Proficiency in a high-level programming language such as Python or Go.
Understanding of container technologies such as Docker, Kubernetes
2 yrs+ hands on experience with container orchestration technologies such as ECS, EKS, AKS or Kubernetes would be beneficial.
Use Terraform and other IaC to deploy cloud infrastructure.
Experience designing available, cost-efficient, fault-tolerant, and scalable distributed systems on AWS/Azure
Hands-on experience using compute, networking, storage, and database AWS/Azure services
Hands-on experience of 4 yrs+ with AWS/Azure deployment and management services
Ability to identify and define technical requirements for an AWS/AZURE-based application
Ability to identify which AWS/AZURE services meet a given technical requirement
Knowledge of recommended best practices for building secure and reliable applications on the AWS/AZURE platform
An understanding of the AWS/AZURE global infrastructure
An understanding of network technologies as they relate to AWS/AZURE
An understanding of security features and tools that AWS/AZURE provides and how they relate to traditional services
A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud. We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices. We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.
We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.
This person MUST have:
- BE Computer Science or equivalent
- Cloud app development experience.
- Strong Troubleshooting and debugging skills
- A strong passion for writing simple, clean, and efficient code.
- 3 years of experience with the Django framework and other backend technologies.
- Knowledge of NodeJS
- Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
- Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
- Experience creating high-performance applications.
- Experience with messaging and broker tools - Rabbitmq, MQTT
- Experience with SQL and NoSQL databases
- Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
- Knowledge of web services
- Proficient understanding of code versioning tools Git.
- Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
- Experience managing AWS infrastructure.
- Hands-on experience in Linux environment.
- Basic understanding of Kubernetes/Docker orchestration.
- Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
- Experience with scrum or other agile software development methodology.
- Excellent verbal and written communication, teamwork, decision making and influencing skills.
- Handle customer calls/emails regarding technical issues for end-users.
- Strong communication skills
- Attention to detail.
- Min 3 year experience
- Ahmedabad Office Or,
- Work from home
- 40 hours a week with a rotational shift every month.
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
- We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period
Biostrap is based in Los Angeles, California with our team working remotely in several countries around the globe. This is a remote position, you’ll need a computer and a high speed internet connection.
We are looking for the tough kinds, the warrior ones, always learning Sr. Devops Engineers to take care of our infrastructure and site reliability @ Biostrap. As an engineer at Biostrap, you will be a part of a lean but extremely passionate team of engineers and work towards making and keeping Biostrap as the go-to best health platform
Responsibilities: What would the job be like?
- Work closely with the engineering team to deploy and maintain the infrastructure.
- Add automation at every part of the development and deployment lifecycle.
- Analyze and help in Infrastructure cost optimizations.
- Build and work with CI + CD workflows..
- Build robust observability system for system monitoring and tracing.
- Architect scalable logging servers.
- Add extensive alerting systems for various important issues, events using monitoring and logging services.
- Work with other engineers in developing architecture that is scalable and resilient to changes in product requirements and usage in an agile environment.
- Security Hardening of cloud infrastructure against known/unknown vulnerabilities
- Write Infrastructure as Code for most of the cloud.
- Suggest and implement pragmatic changes to infrastructure to increase performance, resilience and availability and to fool-proof infrastructure for future.
- Build auditing systems for various resource accesses and have a breach detection notification system.
- Do periodic security reviews and implement improvements.
- Be incharge of and manage deployments of various services.
- Work with aws resources, containers and systems like Ansible/EKS/kubernetes.
Qualifications: Who should apply for this role?
- You have 3+ years of working in small to medium size teams building and shipping products.
- Good experience managing various AWS resources.
- Well equipped with Linux and Bash/Shell scripting
- Working knowledge of Docker or container management.
- Have some development experience with Kubernetes.
- You spin out containers as if it's your fantasy war ground.
- Understand deployment tools like Ansible or similar.
- Built and worked with CI+CD systems like Gitlab Ci, Jenkins, CircleCi, Travis etc.
- Working knowledge of GIT for version control.
- Experience with database management and security.
- Experience with Terraform for Infrastructure as Code.
- Knowledge of configuration management and secrets/keys management services like AWS KMS, Vault etc.
- Required to be proficient in English (both speaking and writing).
Brownie Points for (:D):
- You already use Biostrap and have plenty of feedback to provide.
- You can lecture developers on scalable infrastructures.
- You have built or worked with Prometheus, Grafana, ELK systems.
- You have a story to tell about how you managed a failure or was part of a disaster recovery.
- You contribute to Open Source projects or have a good Github/GitLab presence to showcase your past projects.
- You have sent your code to Space and it runs “a” Rover on Mars. :P
- We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
- Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
- communication and collaboration.
- Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
- Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
- DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
- Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
- Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
- 5+ years experience in a Software and/or Site Reliability Engineering role
- Experience writing automation code in GoLang, Python or Java
- Experience developing and operating large scale distributed systems with Kubernetes and Docker
- Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
- Experience running public cloud environments on AWS
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- Bachelor degree in Engineering, Computer Science or equivalent experience
- The ability to lead, partner, and collaborate cross functionally across an engineering organization
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
- 5+ years of software development or site reliability engineering or equivalent experience
- Skilled at problem solving, algorithms, and data structures
- Building tools and scripting frameworks from scratch
- Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
- Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
- Configuration automation using Ansible or equivalent tools
- Exposure to Windows, Linux administration skills
- Project management tools like Jira, Trello
- Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
- Familiarity with basic networking, security and cloud engineering concepts
- Team player who is eager to help others to succeed through mentoring and leading by example
- Highly collaborative with effective written and verbal communication skills
- Ability to solution & deliver all of Operations/SRE services & processes including managing L2 Environment Support
- 5-12 years of overall environment support experience with 5+ years of experience as support / SRE engineer
- Experience in implementing Monitoring solutions using APM tools( Example: AppDynamics, Graylog, Dynatrace, Datadog etc.) set up and test proactive monitoring alerts
- Have a broad knowledge profile and really excel in some areas, such as HTTP/TLS, DNS, networking or containerization
- Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Interest in designing, analyzing and troubleshooting large-scale distributed systems.
- Practice sustainable incident response and blameless postmortems.
- Proven ability in developing relationships with stakeholders, communicating project/program status, and understanding detailed business requirements across multiple project initiatives
- This role requires candidates to work in rotational shifts. 24*7 support
WHY ZYCUS? :
- Be a part of one of the fastest growing product Company in India
- Come join a young, dynamic & enterprising team
- Work on the latest technologies
- Flexible working hours (As per business requirement).
Zycus Global Leader Procurement: https://www.zycus.com/newsroom/press-releases.html