Observability Systems Engineer

at Top Global Hedge Fund

Agency job
icon
Gurugram, Delhi, Noida, Ghaziabad, Faridabad
icon
3 - 8 yrs
icon
₹4L - ₹15L / yr
icon
Full time
Skills
Kubernetes
Apache Kafka
prometheus
ELK
ELK Stack
Amazon Web Services (AWS)
Linux/Unix
Ansible
Systems analysis and design
Experience in Kubernetes as a systems engineer
(deployment, troubleshooting, maintenance,
Helm charts) and Deployment and administration
of one or more of: ELK stack, Kafka, Prometheus
or Grafana with Working knowledge of at least
one cloud platform (GCP, AWS or Azure) & some
configuration management system (such as Salt
or Ansible).Good understanding of networking
concepts (architecture, components, protocols)
& Solid understanding of OS concepts and
internals of Linux is a must.
Read more
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

Digital B2B Platform
Agency job
via Jobdost by Sathish Kumar
icon
Bengaluru (Bangalore)
icon
3 - 4 yrs
icon
₹15L - ₹30L / yr
DevOps
Python
CI/CD
Linux/Unix
Git
+6 more
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Read more
DP
Posted by Swetha Venugopal
icon
Remote only
icon
5 - 7 yrs
icon
₹20L - ₹40L / yr
Kubernetes
Cloud Native
DevOps
Infrastructure
Amazon Web Services (AWS)
+3 more

Role: Platform and Infrastructure Engineer SDE3

Title: Platform and Infrastructure Engineer SDE3

Location: We are open to candidates working from anywhere in India/across the globe. We are fully remote.

About Us:

Lummo (formerly Bukukas) is a SaaS startup seeking to empower entrepreneurs and brands in SEA to accelerate their growth and to serve their customers by giving them the best technology and partner solutions. Lummo offers localized solutions made for SEA, thereby shining the spotlight on entrepreneurs and brands, enabling them to discover all possibilities to grow their business. Lummo was founded as BukuKas in 2019 by serial entrepreneurs Krishnan Menon and Lorenzo Peracchione.


Our Products

The journey started with BukuKas, an app to digitize the physical record-keeping books by enabling micro and small enterprises to record their sales, expenses, and cash transactions at ease using their smartphone.

Lummo's flagship product, LummoSHOP (formerly Tokko), helps growth-oriented entrepreneurs and brands unlock their full potential by helping them build a strong relationship with their consumers by selling to them directly (D2C), maximize operational efficiency across multiple channels & build their own brand online.


Funding:

Backed by top venture capital firms including Sequoia Capital, Tiger Global, CapitalG (Google’s venture fund), Credit Saison, Speedinvest, and other prominent investors and entrepreneurs like Gokul Rajaram (DoorDash), Taavet Hinrikus (Founder, TransferWise), Sandeep Tandon (FreeCharge), Santiago Sosa (Founder, Nuvemshop), Nipun Mehra (Ula, Sequoia), and Amrish Rao (Pinelabs, Citrus pay). 

Having raised more than $150 Million in funding with the backing of marquee global investors, Lummo has built a world-class team with top talent from across the world and is well poised to become a legendary SaaS company that will last beyond our lifetimes

We have recently received C series funding in January 2022, read more about us here


Requirements / Responsibilities

  • You have experience of 7-8 years in building high-performance consumer-facing mobile applications at Product companies of a decent scale.
  • You have experience developing products on Kubernetes and cloud providers like GCP and AWS.
  • You know and have worked on service meshes like Istio, Linkerd.
  • You can write, code and have experience in writing platform-level components. [ex Golang, python]
  • You have experience with debugging production issues and writing RCAs.
  • You have demonstrable stories of being on-call and how outages have been handled.
  • You understand change management in-depth and are opinionated on the steps to push the change to production.
  • You have worked with Cloud Native (CNCF) technologies.
  • You have worked on Distributed Systems.
  • You are an excellent collaborator & communicator. You know that start-ups are a team sport. You listen to others, aren’t afraid to speak your mind and always try to ask the right questions.
  • You are excited by the prospect of working in a distributed team and company.


What do we offer?

  • The ability for you to make an impact and lay a foundation for the upcoming fin-tech innovations
  • A multicultural and diverse team of colleagues from all over the globe
  • Mission-driven and fast-paced, entrepreneurial environment
  • Competitive salary and flexible leave policy
  • A collaborative and flat company culture


What’s in it for you?

Do you truly want to make a difference and revolutionize the lives of millions of business owners? Do you thrive in an environment where moving at light speed and embracing new challenges every day is essential? If yes, Lummo is the perfect place for you!

place for you!

Read more
DP
Posted by Komal Samudrala
icon
Hyderabad, Bengaluru (Bangalore), Pune, Pondicherry
icon
4 - 6 yrs
icon
₹15L - ₹20L / yr
DevOps
AWS CloudFormation
Kubernetes
Amazon Web Services (AWS)
Shell Scripting

Job Title: Site Reliability Engineer

 

Job Summary:

  • 4+ years overall experience with 2+ years in SRE role handling Kubernetes
  • SRE Engineer with strong experience in monitoring, troubleshooting and support of Kubernetes container platforms
  • Support rapid development and engineering productivity via release engineering, CI/CD automation, and build tools.
  • Perform health checks Apps/Infra to identify and proactively pre-empt issues from occurring (verification, alerts, etc).
  • Work closely with engineering or DevOps teams to debug and fix issues as they arise.
  • Work on development tasks and tools for infrastructure, deployment, monitoring, etc.
  • Participate in on-call rotations and be responsible for infrastructure and platform level escalations.
  • Work with the DevOps team on planning and implementation of infrastructure capacity planning, upgrades, and monitoring.
  • Participate in Daily (Standup) Production Reviews
  • Contribute to the design and improvement of deployment architecture of new and existing applications based on the principles of reliability, high availability, efficiency, and observability.
  • Research, learn, adapt, customize, and create tools to improve the observability, resilience, and usability of applications in scope
  • Create and maintain SRE-related documentation (solution repository, Root Cause Analysis Reports etc)

 Key Skills:

  • Expertise in one or more Cloud/Infrastructure tools – Rancher, JFrog Artifactory, Sysdig, Portworx, Calico, Hashicorp Vault
  • Expertise working on at least 1 public cloud platform – AWS, Azure
  • Expertise in monitoring tools like Datadog
  • Experience in IaaS tools like CFT, Terraform
  • Strong expertise in Cloud concepts like Infrastructure as Code, Cloud Computing, Containerization, and SRE.
  • Experience working with automation tools like Docker, Rancher and Kubernetes to implement End-to-End Automation.
  • Experience in Docker hub, creating Docker images and handling multiple containers as a cluster
  • Expertise in alert & monitoring scripts for applications & servers using Python /or Shell Script.
  • Experience in a cluster configuration (pods, nodes) using Kubernetes
  • Experience in migrating and implementing multiple applications from on-premise to cloud using AWS services like EC2, S3, Glacier, EC2, VPC, RDS, CloudTrail, and EKS Function.
Good communication skills (written and verbal)
Read more
DP
Posted by Saakshi Bhartiya
icon
Bengaluru (Bangalore)
icon
1 - 5 yrs
icon
₹5L - ₹20L / yr
Docker
Terraform
Amazon Web Services (AWS)
DevOps
At Sourcewiz, we are building tools to help exporters grow their businesses. Our first product is a vertical sales software built for exporters, which allows them to market their unique creations to more buyers, generate more inquiries and increase their sales conversion.

Founded by a passionate team of serial entrepreneurs and alumni of IIT Delhi, U.C Berkeley, and well-known tech companies such as Uber and Zomato.

Sourcewiz is on a mission to increase India’s export GDP. This is a unique opportunity to
join a funded early-stage startup and have a massive impact on our product, culture, and
direction. It's a lot of work and a roller coaster ride. But, if you are up for it, you can join us
in replacing the tiresome and slow sales process for importers and exporters and have a
significant impact on our customers. We are not a company that believes engineers should be hidden away from decisions, churning out code for features decided from upon high. Instead, our Engineers form strong bonds with cross-functional peers in Product Management, Product Design and others to become experts in their product domain.

We’re looking for people with a strong interest in building successful products or systems;
are comfortable in dealing with lots of moving pieces; have exquisite attention to detail, and
comfortable learning new technologies and systems.

As a Site Reliability Engineer at Sourcewiz, you will...
• Own and improve the scalability and reliability of our products
• Working directly with product engineering team
• Work with RDBMS, Search, Caching and queuing
• Contribute expertise towards architectural planning and ensure the company builds
sustainable services that meet our customer expectations while leveraging appropriate
tools and frameworks.
• Ongoing participation in the review and testing
Read more
icon
Bengaluru (Bangalore)
icon
5 - 10 yrs
icon
₹25L - ₹40L / yr
SRE
Site Reliability Engineer
Reliability engineering
DevOps
Kubernetes
+5 more
Your Responsibilities
  • We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
  • Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
  • communication and collaboration.
  • Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
  • Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
  • DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
  • Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
  • Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
Required Skill
  • 5+ years experience in a Software and/or Site Reliability Engineering role
  • Experience writing automation code in GoLang, Python or Java
  • Experience developing and operating large scale distributed systems with Kubernetes and Docker
  • Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
  • Experience running public cloud environments on AWS
  • Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
  • Bachelor degree in Engineering, Computer Science or equivalent experience
  • The ability to lead, partner, and collaborate cross functionally across an engineering organization
Read more
DP
Posted by Abhimanyu Bhatter
icon
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
icon
6 - 11 yrs
icon
₹16L - ₹25L / yr
Reliability engineering
Docker
Kubernetes
DevOps
Site reliability
+6 more
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Read more
SteelEye is a fast growing FinTech company based in London
Agency job
via Beiing by Divya R
icon
Remote, Bengaluru (Bangalore)
icon
3 - 8 yrs
icon
₹15L - ₹30L / yr
Python
Amazon Web Services (AWS)
Ansible
Terraform
Docker
What you’ll do

• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.

What we’re looking for

• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.
Read more
icon
Hyderabad
icon
5 - 11 yrs
icon
₹10L - ₹20L / yr
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
+8 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
ScienceLogic
Agency job
via Ojas Innovative Technologies by Mohammad Farooq Shaik
icon
Remote only
icon
5 - 11 yrs
icon
₹10L - ₹17L / yr
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
+9 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
DP
Posted by Tanika Monga
icon
NCR (Delhi | Gurgaon | Noida)
icon
3 - 6 yrs
icon
₹10L - ₹21L / yr
Terraform
Kubernetes
Ansible
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Top Global Hedge Fund?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort