- We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
- Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
- communication and collaboration.
- Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
- Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
- DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
- Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
- Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
- 5+ years experience in a Software and/or Site Reliability Engineering role
- Experience writing automation code in GoLang, Python or Java
- Experience developing and operating large scale distributed systems with Kubernetes and Docker
- Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
- Experience running public cloud environments on AWS
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- Bachelor degree in Engineering, Computer Science or equivalent experience
- The ability to lead, partner, and collaborate cross functionally across an engineering organization
About Uniphore Software Systems
Similar jobs
• Run the production environment by monitoring availability and taking a holistic view of
system health
• Build software and systems to manage platform infrastructure and applications
• Improve reliability, quality, and time-to-market of our suite of software solutions
• Measure and optimize system performance, with an eye toward pushing our capabilities
forward, getting ahead of customer needs, and innovating to continually improve
• Provide primary operational support and engineering for multiple large distributed
software applications
• Drive cross-team alignment across development teams around reliability initiatives
The ideal candidate must -
• Bachelor’s degree in computer science or other highly technical, scientific discipline
• Ability to program (structured and OO) with one or more high level languages, such as
Python, Java, C/C++, Ruby, and JavaScript
• Good experience with microservices architecture and serverless technologies
• Exposure to event driven architecture and state machines
• A proactive approach to spotting problems, areas for improvement, and performance
bottlenecks
A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud. We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices. We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.
We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.
This person MUST have:
- BE Computer Science or equivalent
- Cloud app development experience.
- Strong Troubleshooting and debugging skills
- A strong passion for writing simple, clean, and efficient code.
- 3 years of experience with the Django framework and other backend technologies.
- Knowledge of NodeJS
- Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
- Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
- Experience creating high-performance applications.
- Experience with messaging and broker tools - Rabbitmq, MQTT
- Experience with SQL and NoSQL databases
- Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
- Knowledge of web services
- Proficient understanding of code versioning tools Git.
- Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
- Experience managing AWS infrastructure.
- Hands-on experience in Linux environment.
- Basic understanding of Kubernetes/Docker orchestration.
- Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
- Experience with scrum or other agile software development methodology.
- Excellent verbal and written communication, teamwork, decision making and influencing skills.
- Handle customer calls/emails regarding technical issues for end-users.
- Strong communication skills
- Attention to detail.
Experience:
- Min 3 year experience
Location:
- Ahmedabad Office Or,
- Work from home
Timings:
- 40 hours a week with a rotational shift every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
- We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period
Site Reliability Engineer (SRE)
Vonage Engineering Mission: Vonage is the emerging leader in the $100B+ cloud communications platform (CPaaS) market.
Customers like Airbnb, Viber, Whatsapp, Snapchat, and many others depend on our APIs and SDKs to connect with their customers all over the world. As businesses continue to shift to a real-time, customer-centric communications model, we are experiencing a time of impressive growth.
Why this role matters:
Vonage, a leader in cloud communications, is looking to build a new SRE team in Bangalore.
We believe that there shouldn’t be walls between operations and development and we have embraced the DevOps movement.
As a Site Reliability Engineer, you will work as part of the development team to build automation and tools to deploy, monitor and maintain the platform's health, targeted SLO and SLAs.
What you'll do
● Lead the effort in ensuring reliability of the platform.
● Create Software and Tooling that improves performance, stability, and reliability of the
platform.
● Ability to work as part of a Development Team.
● Monitor Application Metrics to help with improving software performance.
● Build solutions that are highly resilient, scalable, and secure.
● Have a wide breadth of knowledge from software, infrastructure, and security.
● Adopt best practices and champion an engineering culture emphasizing Agile.
What's required for application
● Proven experience building, supporting, and architecting high-availability cloud
infrastructure.
● Experience working on monitoring, logging. and alerting solutions and used tools.
● Experience with tooling such as Terraform, Ansible, Docker, Kubernetes, and Chef.
● Fluent and comfortable working with Cloud Infrastructure.
● Ability to read, write, and troubleshoot software code.
● Good understanding of CI/CD tools.
● Champion of devsecops using tools such as Hashicorp Vault, KMS, Secrets Manager,
● Experience with software development, algorithms, data structures, and systems design.
● Understand monitoring tools such as DataDog, ELK, and Grafana.
● Bachelor's degree (or higher) in Computer Science and/or related
work experience.
www.vonage.com
Nice to have, but not required
● Working knowledge on other AWS services like Glacier, Elastic Container Service (ECS),
● Elastic MapReduce (EMR), DynamoDB etc.
● Automation and Orchestration tools such as Jenkins
● Ruby or Java development skills
● Data Pipeline knowledge, especially with tools like MapReduce, Kafka and ELK stack
Senior Infrastructure Consultant - Site Reliability Engineer
at Thoughtworks
As consultants, we work with our clients to ensure the sustenance of their business-critical applications, evolving their technology and empowering adaptive mindsets to meet their business goals. You could influence the digital strategy of a retail giant, Build and Run a bold new mobile application for a bank, redesign platforms using event sourcing and intelligent data pipelines or influence the lifecycle of a legacy or a modernized application. You will use the latest Lean and Agile thinking, create pragmatic solutions to solve mission-critical problems and contribute to revolutionizing the way operations are executed by evolving the run to be highly automated and intelligence driven, thus challenging yourself each day.
Infrastructure Consultants take a multifaceted approach to helping clients achieve technical excellence by approaching challenges from both a technical and operational perspective. As consummate ‘bringers of knowledge,’ they take extra care to ensure their team and client understand operational requirements and take a shared responsibility for designing and implementing infrastructure that delivers and runs software services. They also help customers adopt DevOps approaches, breaking away from rigid, more traditional ways of working and pivoting to a more customer-focused and agile approach.
You’ll spend time on the following:
- You will evolve and revolutionse projects through analysis, evaluations, hands-on implementations and drive improvements to existing infrastructure
- You will listen to a client’s needs and formulate a technical roadmap and impactful solution that will support their ambitious business goals;
- Help shape and build Thoughtworks’ Digital operations offering through collaboration with business development, marketing, and capabilities development teams;
- Ensure build and manage the controls and processes for continuous delivery of applications, considering all stages of the process and its automations;
- You will assist in preparing Root Cause Analysis (RCA) for High Priority Incidents that will help identify the underlying problems clearly and will work on the permanent fixes as needed.
- Monitor and ensure that technical expectations of deliverables are consistently met on projects;
- Act as a thought leader—at client sites and at Thoughtworks—on DevOps, cloud, and infrastructure engineering;
- Adjust and suggest innovative solutions to current constraints and business policies;
- Develop your career outside of the confinements of a traditional career path by focusing on what you’re passionate about rather than a predetermined one-size-fits-all plan.
Here’s who we’re looking for:
- You genuinely enjoy interacting with teammates from across the business and have a knack for communicating technical concepts to nontechnical audiences
- You are passionate about understanding the current Infra architecture and work on evolving it into a more robust, scalable, flexible, and relevant solution that will help transform the business of clients
- You are passionate about identifying and establishing new practices, tools to improve the different aspects of reliability engineering – observability & monitoring, test strategy, rollout, optimizing usage of the resources (RAM, CPU, Disk, Network)
- You are keen on working with monitoring systems for stress and performance testing with Observability Pattern: Distributed Tracing/ OpenTracing, Log Aggregation, Audit Logging, Exception Tracking, Health Check API, Application MetricS, Self-Healing/Multi-Cloud.
- You have a keen eye to look for and identify automation opportunities in the current system architecture
- You have a deep understanding of cloud and virtualization platforms, infrastructure automation, and application hosting technologies
- You regularly apply DevOps philosophy, Agile methods, Infrastructure as Code to your work and lead infrastructure and operations with these approaches
- You have a history working with server virtualisation, IaaS and PaaS cloud, Infrastructure provisioning, and configuration management tools
- You can write scripts using at least one scripting language and are comfortable with building Linux and/or Windows servers systems
- Experience with continuous integration tools with different tech stacks, web or mobile
- You are willing to be part of a 24x7 availability team
Here are the skills we are looking for :
- Proficiency in one of the programming languages - Java, Python, Golang or Javascript
- Hands-on experience and proficiency with one of the CI/CD tools like Jenkins, BuildKite, Azure Pipelines
- Hands-on experience in implementing IaC practices using the tooling mechanisms like Terraform/Cloud formation, Ansible, Puppet or Chef
- Hands-on experience and proficiency in one or more of the Cloud Service Platforms like AWS, GCP or Azure
- Hands-on experience with containerization and orchestration mechanisms using Docker, Kubernetes or helm
- Hands-on experience with one or more of the observability and monitoring tools like Splunk, ELK stack, DataDog, Prometheus and Grafana
- Understanding of the API lifecycle management and message bus technologies like APIgee, Kafka, Pulsar, RabbitMQ
- Experience in the Networking domain - Load Balancing, Network Security and understanding of standard networking protocols and configurations
- Experience working with one or more of theses tools - Manage Engine, JIRA, PagerDuty and Slack
- Bonus points if you have experience with unit testing and automated testing tools
- Good to have experience working with database products like Postgres, MongoDB.
Role: Platform and Infrastructure Engineer SDE3
Title: Platform and Infrastructure Engineer SDE3
Location: We are open to candidates working from anywhere in India/across the globe. We are fully remote.
About Us:
Lummo (formerly Bukukas) is a SaaS startup seeking to empower entrepreneurs and brands in SEA to accelerate their growth and to serve their customers by giving them the best technology and partner solutions. Lummo offers localized solutions made for SEA, thereby shining the spotlight on entrepreneurs and brands, enabling them to discover all possibilities to grow their business. Lummo was founded as BukuKas in 2019 by serial entrepreneurs Krishnan Menon and Lorenzo Peracchione.
Our Products
The journey started with BukuKas, an app to digitize the physical record-keeping books by enabling micro and small enterprises to record their sales, expenses, and cash transactions at ease using their smartphone.
Lummo's flagship product, LummoSHOP (formerly Tokko), helps growth-oriented entrepreneurs and brands unlock their full potential by helping them build a strong relationship with their consumers by selling to them directly (D2C), maximize operational efficiency across multiple channels & build their own brand online.
Funding:
Backed by top venture capital firms including Sequoia Capital, Tiger Global, CapitalG (Google’s venture fund), Credit Saison, Speedinvest, and other prominent investors and entrepreneurs like Gokul Rajaram (DoorDash), Taavet Hinrikus (Founder, TransferWise), Sandeep Tandon (FreeCharge), Santiago Sosa (Founder, Nuvemshop), Nipun Mehra (Ula, Sequoia), and Amrish Rao (Pinelabs, Citrus pay).
Having raised more than $150 Million in funding with the backing of marquee global investors, Lummo has built a world-class team with top talent from across the world and is well poised to become a legendary SaaS company that will last beyond our lifetimes
We have recently received C series funding in January 2022, read more about us here
Requirements / Responsibilities
- You have experience of 7-8 years in building high-performance consumer-facing mobile applications at Product companies of a decent scale.
- You have experience developing products on Kubernetes and cloud providers like GCP and AWS.
- You know and have worked on service meshes like Istio, Linkerd.
- You can write, code and have experience in writing platform-level components. [ex Golang, python]
- You have experience with debugging production issues and writing RCAs.
- You have demonstrable stories of being on-call and how outages have been handled.
- You understand change management in-depth and are opinionated on the steps to push the change to production.
- You have worked with Cloud Native (CNCF) technologies.
- You have worked on Distributed Systems.
- You are an excellent collaborator & communicator. You know that start-ups are a team sport. You listen to others, aren’t afraid to speak your mind and always try to ask the right questions.
- You are excited by the prospect of working in a distributed team and company.
What do we offer?
- The ability for you to make an impact and lay a foundation for the upcoming fin-tech innovations
- A multicultural and diverse team of colleagues from all over the globe
- Mission-driven and fast-paced, entrepreneurial environment
- Competitive salary and flexible leave policy
- A collaborative and flat company culture
What’s in it for you?
Do you truly want to make a difference and revolutionize the lives of millions of business owners? Do you thrive in an environment where moving at light speed and embracing new challenges every day is essential? If yes, Lummo is the perfect place for you!
place for you!
Site Reliability Engineer - Product
at A listed product development organization
Position: Site Reliability Engineer
Location: Pune (Currently WFH, post pandemic you need to relocate)
About the Organization:
A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.
Job Description:
We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.
In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.
You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.
As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.
Day-to-day responsibilities
- Ensure the operational integrity of the global infrastructure
- Design repeatable continuous integration and delivery systems
- Test and measure new methods, applications and frameworks
- Analyze and leverage various AWS-native functionality
- Support and build out an on-premise data center footprint
- Provide support and diagnose issues to other teams related to our infrastructure
- Participate in 24/7 on-call rotation (If Required)
- Expert-level administrator of Linux-based systems
- Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
- Experience with production deployments of Kubernetes Cluster
- Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
- Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
- Experience in Distributed storage systems such as Ceph or GlusterFS.
- Experience in virtualisation with KVM, Ovirt and OpenStack.
- Hands-on experience with configuration management systems such as Terraform and Ansible
- Bash and Python Scripting Expertise
- Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
- Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
- Experience managing hundreds to thousands of servers globally
- Enjoy automating tasks, rather than repeating them
- Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
- Strong verbal and written communication skills
- Ability to adapt to a rapidly changing environment
- Comfortable collaborating and supporting a diverse team of engineers
- Ability to troubleshoot problems in complex systems
- Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
Senior DevOps Engineer
at Biostrap
Hey there!
Biostrap is based in Los Angeles, California with our team working remotely in several countries around the globe. This is a remote position, you’ll need a computer and a high speed internet connection.
We are looking for the tough kinds, the warrior ones, always learning Sr. Devops Engineers to take care of our infrastructure and site reliability @ Biostrap. As an engineer at Biostrap, you will be a part of a lean but extremely passionate team of engineers and work towards making and keeping Biostrap as the go-to best health platform
Responsibilities: What would the job be like?
- Work closely with the engineering team to deploy and maintain the infrastructure.
- Add automation at every part of the development and deployment lifecycle.
- Analyze and help in Infrastructure cost optimizations.
- Build and work with CI + CD workflows..
- Build robust observability system for system monitoring and tracing.
- Architect scalable logging servers.
- Add extensive alerting systems for various important issues, events using monitoring and logging services.
- Work with other engineers in developing architecture that is scalable and resilient to changes in product requirements and usage in an agile environment.
- Security Hardening of cloud infrastructure against known/unknown vulnerabilities
- Write Infrastructure as Code for most of the cloud.
- Suggest and implement pragmatic changes to infrastructure to increase performance, resilience and availability and to fool-proof infrastructure for future.
- Build auditing systems for various resource accesses and have a breach detection notification system.
- Do periodic security reviews and implement improvements.
- Be incharge of and manage deployments of various services.
- Work with aws resources, containers and systems like Ansible/EKS/kubernetes.
Qualifications: Who should apply for this role?
- You have 3+ years of working in small to medium size teams building and shipping products.
- Strong grasp of at least one of the scripting or systems languages like Python, Javascript, Golang etc.
- Good experience managing various AWS resources.
- Well equipped with Linux and Bash/Shell scripting
- Working knowledge of Docker or container management.
- Have some development experience with Kubernetes.
- You spin out containers as if it's your fantasy war ground.
- Understand deployment tools like Ansible or similar.
- Built and worked with CI+CD systems like Gitlab Ci, Jenkins, CircleCi, Travis etc.
- Working knowledge of GIT for version control.
- Experience with database management and security.
- Experience with Terraform for Infrastructure as Code.
- Knowledge of configuration management and secrets/keys management services like AWS KMS, Vault etc.
- Required to be proficient in English (both speaking and writing).
Brownie Points for (:D):
- You already use Biostrap and have plenty of feedback to provide.
- You can lecture developers on scalable infrastructures.
- You have built or worked with Prometheus, Grafana, ELK systems.
- You have a story to tell about how you managed a failure or was part of a disaster recovery.
- You contribute to Open Source projects or have a good Github/GitLab presence to showcase your past projects.
- You have sent your code to Space and it runs “a” Rover on Mars. :P
Roles and Responsibilities
- Managing Availability, Performance, Capacity of infrastructure and applications.
- Building and implementing observability for applications health/performance/capacity.
- Optimizing On-call rotations and processes.
- Documenting “tribal” knowledge.
- Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
- Providing help in onboarding new services with production readiness review process.
- Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
- Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
- Working with Dev team to have in depth understanding of the application architecture
and its bottlenecks.
- Identifying observability gaps in product services, infrastructure and working with stake
owners to fix it.
- Managing Outages and doing detailed RCA with developers and identifying ways to
avoid that situation.
- Managing/Automating upgrades of the infrastructure services.
- Automate toil work.
Experience & Skills
- 6+ years of total experience
- Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
- A collaborative spirit with the ability to work across disciplines to influence, learn, and
deliver.
- A deep understanding of computer science, software development, and networking principles.
- Demonstrated experience with languages, such as Python, Java, Golang etc.
- Extensive experience with Linux administration and good understanding the various
linux kernel subsystems (memory, storage, network etc).
- Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
- Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
- Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
- Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure
solutions like Microsoft Azure or Google Cloud.
- Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,
Argo etc.
- Experience in managing and deploying containerized environments using Docker,
Mesos/Kubernetes is a plus.
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Site Reliability Engineer
at SteelEye is a fast growing FinTech company based in London
• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.
What we’re looking for
• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.