We are looking for a DevOps Lead to join our team.
Responsibilities
• A technology Professional who understands software development and can solve IT Operational and deployment challenges using software engineering tools and processes. This position requires an understanding of both Software development (Dev) and deployment
Operations (Ops)
• Identity manual processes and automate them using various DevOps automation tools
• Maintain the organization’s growing cloud infrastructure
• Monitor and maintain DevOps environment stability
• Collaborate with distributed Agile teams to define technical requirements and resolve technical design issues
• Orchestrating builds and test setups using Docker and Kubernetes.
• Participate in designing and building Kubernetes, Cloud, and on-prem environments for maximum performance, reliability and scalability
• Share business and technical learnings with the broader engineering and product organization, while adapting approaches for different audiences
Requirements
• Candidates working for this position should possess at least 5 years of work experience as a DevOps Engineer.
• Candidate should have experience in ELK stack, Kubernetes, and Docker.
• Solid experience in the AWS environment.
• Should have experience in monitoring tools like DataDog or Newrelic.
• Minimum of 5 years experience with code repository management, code merge and quality checks, continuous integration, and automated deployment & management using tools like Jenkins, SVN, Git, Sonar, and Selenium.
• Candidates must possess ample knowledge and experience in system automation, deployment, and implementation.
• Candidates must possess experience in using Linux, Jenkins, and ample experience in configuring and automating the monitoring tools.
• The candidates should also possess experience in the software development process and tools and languages like SaaS, Python, Java, MongoDB, Shell scripting, Python, PostgreSQL, and Git.
• Candidates should demonstrate knowledge in handling distributed data systems.
Examples: Elastisearch, Cassandra, Hadoop, and others.
• Should have experience in GitLab- CIRoles and Responsibilities
About Webtiga Private limited
About
Experts in Web 3 Technologies -Metaverse, Generative AI, Web3,Cybersecurity and DeFi spoken
For more details - please check out our website-https://www.webtiga.com/solutions/
Tech stack
Similar jobs
ABOUT THE TEAM:
The production engineering team is responsible for the key operational pillars (Reliability, Observability, Elasticity, Security, and Governance) of the Cloud infrastructure at Swiggy. We thrive to excel & continuously improve on these key operational pillars. We design, build, and operate Swiggy’s cloud infrastructure and developer platforms, to provide a seamless experience to our internal and external consumers.
What qualities are we looking for:
10+ years of professional experience in infrastructure, production engineering
Strong design, debugging, and problem-solving skills
Proficiency in at least one programming language like Python, GoLang or Java.
B Tech/M Tech in Computer Science or equivalent from a reputed college.
Hands-on experience with AWS and Kubernetes or similar cloud/infrastructure platforms
Hands-on with DevOps principles and practices ( Everything-as-a-code, CI/CD, Test everything, proactive monitoring etc)
Deep understanding of OS/virtualization/Containerization, network protocols & concepts
Exposure to modern-day infrastructure technologies, expertise in building and operating distributed systems.
Hands-on coding on any of the languages like Python or GoLang.
Familiarity with software engineering practices including unit testing, code reviews, and design documentation.
Technically mentor and lead the team towards engineering and operational excellence
Act like an owner, strive for excellence.
What will you get to do here?
Be part of a Culture where Customer Obsession, Ownership, Teamwork, Bias for Action and Insist on High standards are a way of life
Coming up with the best practices to help the team achieve their technical tasks and continually thrive in improving the technology of the team
Be a hands-on engineer, ensure frameworks/infrastructure built is well designed, scalable & are of high quality.
Build and/or operate platforms that are highly available, elastic, scalable, operable and observable
Experiment with new & relevant technologies and tools, and drive adoption while measuring yourself on the impact you can create.
Implementation of long-term technology vision for the team.
Build/Adapt and implement tools that empower the Swiggy engineering teams to self-manage the infrastructure and services owned by them.
You will identify, articulate, and lead various long-term tech vision, strategies, cross-cutting initiatives and architecture redesigns.
Design systems and make decisions that will keep pace with the rapid growth of Swiggy. Document your work and decision-making processes, and lead presentations and discussions in a way that is easy for others to understand.
Creating architectures & designs for new solutions around existing/new areas. Decide technology & tool choices for the team.
You will be responsible for:
- Managing all DevOps and infrastructure for Sizzle
- We have both cloud and on-premise servers
- Work closely with all AI and backend engineers on processing requirements and managing both development and production requirements
- Optimize the pipeline to ensure ultra fast processing
- Work closely with management team on infrastructure upgrades
You should have the following qualities:
- 3+ years of experience in DevOps, and CI/CD
- Deep experience in: Gitlab, Gitops, Ansible, Docker, Grafana, Prometheus
- Strong background in Linux system administration
- Deep expertise with AI/ML pipeline processing, especially with GPU processing. This doesn’t need to include model training, data gathering, etc. We’re looking more for experience on model deployment, and inferencing tasks at scale
- Deep expertise in Python including multiprocessing / multithreaded applications
- Performance profiling including memory, CPU, GPU profiling
- Error handling and building robust scripts that will be expected to run for weeks to months at a time
- Deploying to production servers and monitoring and maintaining the scripts
- DB integration including pymongo and sqlalchemy (we have MongoDB and PostgreSQL databases on our backend)
- Expertise in Docker-based virtualization including - creating & maintaining custom Docker images, deployment of Docker images on cloud and on-premise services, monitoring of production Docker images with robust error handling
- Expertise in AWS infrastructure, networking, availability
Optional but beneficial to have:
- Experience with running Nvidia GPU / CUDA-based tasks
- Experience with image processing in python (e.g. openCV, Pillow, etc)
- Experience with PostgreSQL and MongoDB (Or SQL familiarity)
- Excited about working in a fast-changing startup environment
- Willingness to learn rapidly on the job, try different things, and deliver results
- Bachelors or Masters degree in computer science or related field
- Ideally a gamer or someone interested in watching gaming content online
Skills:
DevOps, Ansible, CI/CD, GitLab, GitOps, Docker, Python, AWS, GCP, Grafana, Prometheus, python, sqlalchemy, Linux / Ubuntu system administration
Seniority: We are looking for a mid to senior level engineer
Salary: Will be commensurate with experience.
Who Should Apply:
If you have the right experience, regardless of your seniority, please apply.
Work Experience: 3 years to 6 years
- Minimum 3+ yrs of Experience in DevOps with AWS Platform
- • Strong AWS knowledge and experience
- • Experience in using CI/CD automation tools (Git, Jenkins, Configuration deployment tools ( Puppet/Chef/Ansible)
- • Experience with IAC tools Terraform
- • Excellent experience in operating a container orchestration cluster (Kubernetes, Docker)
- • Significant experience with Linux operating system environments
- • Experience with infrastructure scripting solutions such as Python/Shell scripting
- • Must have experience in designing Infrastructure automation framework.
- • Good experience in any of the Setting up Monitoring tools and Dashboards ( Grafana/kafka)
- • Excellent problem-solving, Log Analysis and troubleshooting skills
- • Experience in setting up centralized logging for system (EKS, EC2) and application
- • Process-oriented with great documentation skills
- • Ability to work effectively within a team and with minimal supervision
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
Kutumb is the first and largest communities platform for Bharat. We are growing at an exponential trajectory. More than 1 Crore users use Kutumb to connect with their community. We are backed by world-class VCs and angel investors. We are growing and looking for exceptional Infrastructure Engineers to join our Engineering team.
More on this here - https://kutumbapp.com/why-join-us.html">https://kutumbapp.com/why-join-us.html
We’re excited if you have:
- Recent experience designing and building unified observability platforms that enable companies to use the sometimes-overwhelming amount of available data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired
- Expertise in deploying and using open-source observability tools in large-scale environments, including Prometheus, Grafana, ELK (ElasticSearch + Logstash + Kibana), Jaeger, Kiali, and/or Loki
- Familiarity with open standards like OpenTelemetry, OpenTracing, and OpenMetrics
- Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. Additionally, the ability to contribute improvements back to the joint platform for the benefit of all teams
- Demonstrated customer engagement and collaboration skills to curate custom dashboards and views, and identify and deploy new tools, to meet their requirements
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment
- Using CICD tools to automatically perform canary analysis and roll out changes after passing automated gates (think Argo & keptn)
- Hands-on experience working with AWS
- Bonus points for knowledge of ETL pipelines and Big data architecture
- Great problem-solving skills & takes pride in your work
- Enjoys building scalable and resilient systems, with a focus on systems that are robust by design and suitably monitored
- Abstracting all of the above into as simple of an interface as possible (like Knative) so developers don't need to know about it unless they choose to open the escape hatch
What you’ll be doing:
- Design and build automation around the chosen tools to make onboarding new services easy for developers (dashboards, alerts, traces, etc)
- Demonstrate great communication skills in working with technical and non-technical audiences
- Contribute new open-source tools and/or improvements to existing open-source tools back to the CNCF ecosystem
Tools we use:
Kops, Argo, Prometheus/ Loki/ Grafana, Kubernetes, AWS, MySQL/ PostgreSQL, Apache Druid, Cassandra, Fluentd, Redis, OpenVPN, MongoDB, ELK
What we offer:
- High pace of learning
- Opportunity to build the product from scratch
- High autonomy and ownership
- A great and ambitious team to work with
- Opportunity to work on something that really matters
- Top of the class market salary and meaningful ESOP ownership
-
Pixuate is a deep-tech AI start-up enabling businesses make smarter decisions with our edge-based video analytics platform and offer innovative solutions across traffic management, industrial digital transformation, and smart surveillance. We aim to serve enterprises globally as a preferred partner for digitization of visual information.
Job Description
We at Pixuate are looking for highly motivated and talented Senior DevOps Engineers to support building the next generation, innovative, deep-tech AI based products. If you are someone who has a passion for building a great software, has analytical mindset and enjoys solving complex problems, thrives in a challenging environment, self-driven, constantly exploring & learning new technologies, have ability to succeed on one’s own merits and fast-track your career growth we would love to talk!
What do we expect from this role?
- This role’s key area of focus is to co-ordinate and manage the product from development through deployment, working with rest of the engineering team to ensure smooth functioning.
- Work closely with the Head of Engineering in building out the infrastructure required to deploy, monitor and scale the services and systems.
- Act as the technical expert, innovator, and strategic thought leader within the Cloud Native Development, DevOps and CI/CD pipeline technology engineering discipline.
- Should be able to understand how technology works and how various structures fall in place, with a high-level understanding of working with various operating systems and their implications.
- Troubleshoots basic software or DevOps stack issues
You would be great at this job, if you have below mentioned competencies
- Tech /M.Tech/MCA/ BSc / MSc/ BCA preferably in Computer Science
- 5+ years of relevant work experience
- https://www.edureka.co/blog/devops-skills#knowledge">Knowledge on Various DevOps Tools and Technologies
- Should have worked on tools like Docker, Kubernetes, Ansible in a production environment for data intensive systems.
- Experience in developing https://www.edureka.co/blog/continuous-delivery/">Continuous Integration/ Continuous Delivery pipelines (CI/ CD) preferably using Jenkins, scripting (Shell / Python) and https://www.edureka.co/blog/what-is-git/">Git and Git workflows
- Experience implementing role based security, including AD integration, security policies, and auditing in a Linux/Hadoop/AWS environment.
- Experience with the design and implementation of big data backup/recovery solutions.
- Strong Linux fundamentals and scripting; experience as Linux Admin is good to have.
- Working knowledge in Python is a plus
- Working knowledge of TCP/IP networking, SMTP, HTTP, load-balancers (ELB, HAProxy) and high availability architecture is a plus
- Strong interpersonal and communication skills
- Proven ability to complete projects according to outlined scope and timeline
- Willingness to travel within India and internationally whenever required
- Demonstrated leadership qualities in past roles
More about Pixuate:
Pixuate, owned by Cocoslabs Innovative Solutions Pvt. Ltd., is a leading AI startup building the most advanced Edge-based video analytics products. We are recognized for our cutting-edge R&D in deep learning, computer vision and AI and we are solving some of the most challenging problems faced by enterprises. Pixuate’s plug-and-play platform revolutionizes monitoring, compliance to safety, and efficiency improvement for Industries, Banks & Enterprises by providing actionable real-time insights leveraging CCTV cameras.
We have enabled our customers such as Hindustan Unilever, Godrej, Secuira, L&T, Bigbasket, Microlabs, Karnataka Bank etc and rapidly expanding our business to cater to the needs of Manufacturing & Logistics, Oil and Gas sector.
Rewards & Recognitions:
- Winner of Elevate by Startup Karnataka (https://pixuate.ai/thermal-analytics/">https://pixuate.ai/thermal-analytics/).
- Winner of Manufacturing Innovation Challenge in the 2nd edition of Fusion 4.0’s MIC2020 organized by the NASSCOM Centre of Excellence in IoT & AI in 2021
- Winner of SASACT program organized by MEITY in 2021
Why join us?
You will get an opportunity to work with the founders and be part of 0 to 1 journey& get coached and guided. You will also get an opportunity to excel your skills by being innovative and contributing to the area of your personal interest. Our culture encourages innovation, freedom and rewards high performers with faster growth and recognition.
Where to find us?
Website: http://pixuate.com/">http://pixuate.com/
Linked in: https://www.linkedin.com/company/pixuate-ai
Work from Office – BengaluruPlace of Work:
Striim (pronounced “stream” with two i’s for integration and intelligence) was founded in 2012 with a simple goal of helping companies make data useful the instant it’s born.
Striim’s enterprise-grade, streaming integration with intelligence platform makes it easy to build continuous, streaming data pipelines – including change data capture (CDC) – to power real-time cloud integration, log correlation, edge processing, and streaming analytics
2 - 5 Years of Experience in any Programming any language (Polyglot Preferred ) & System Operations • Awareness of Devops & Agile Methodologies • Proficient in leveraging CI and CD tools to automate testing and deployment . • Experience in working in an agile and fast paced environment . • Hands on knowledge of at least one cloud platform (AWS / GCP / Azure). • Cloud networking knowledge: should understand VPC, NATs, and routers. • Contributions to open source is a plus. • Good written communication skills are a must. Contributions to technical blogs / whitepapers will be an added advantage.
- Demonstrated experience with AWS
- Knowledge of servers, networks, storage, client-server systems, and firewalls
- Strong expertise in Windows and/or Linux operating systems, including system architecture and design, as well as experience supporting and troubleshooting stability and performance issues
- Thorough understanding of and experience with virtualization technologies (e.g., VMWare/Hyper-V)
- Knowledge of core network services such as DHCP, DNS, IP routing, VLANs, layer 2/3 routing, and load balancing is required
- Experience in reading, writing or modifying PowerShell, Bash scripts & Python code.Experience using git
- Working know-how of software-defined lifecycles, product packaging, and deployments
- POSTGRESSQL or Oracle database administration (Backup, Restore, Tuning, Monitoring, Management)
- At least 2 from AWS Associate Solutions Architect, DevOps, or SysOps
- At least 1 from AWS Professional Solutions Architect, DevOps
- AWS: S3, Redshift, DynamoDB, EC2, VPC, Lambda, CloudWatch etc.
- Bigdata: Databricks, Cloudera, Glue and Athena
- DevOps: Jenkins, Bitbucket
- Automation: Terraform, Cloud Formation, Python, Shell scripting Experience in automating AWS infrastructure with Terraform.
- Experience in database technologies is a plus.
- Knowledge in all aspects of DevOps (source control, continuous integration, deployments, etc.)
- Proficiency in security implementation best practices on IAM policies, KMS encryption, Secrets Management, Network Security Groups etc.
- Experience working in the SCRUM Environment
What you do :
- Developing automation for the various deployments core to our business
- Documenting run books for various processes / improving knowledge bases
- Identifying technical issues, communicating and recommending solutions
- Miscellaneous support (user account, VPN, network, etc)
- Develop continuous integration / deployment strategies
- Production systems deployment/monitoring/optimization
-
Management of staging/development environments
What you know :
- Ability to work with a wide variety of open source technologies and tools
- Ability to code/script (Python, Ruby, Bash)
- Experience with systems and IT operations
- Comfortable with frequent incremental code testing and deployment
- Strong grasp of automation tools (Chef, Packer, Ansible, or others)
- Experience with cloud infrastructure and bare-metal systems
- Experience optimizing infrastructure for high availability and low latencies
- Experience with instrumenting systems for monitoring and reporting purposes
- Well versed in software configuration management systems (git, others)
- Experience with cloud providers (AWS or other) and tailoring apps for cloud deployment
-
Data management skills
Education :
- Degree in Computer Engineering or Computer Science
- 1-3 years of equivalent experience in DevOps roles.
- Work conducted is focused on business outcomes
- Can work in an environment with a high level of autonomy (at the individual and team level)
-
Comfortable working in an open, collaborative environment, reaching across functional.
Our Offering :
- True start-up experience - no bureaucracy and a ton of tough decisions that have a real impact on the business from day one.
-
The camaraderie of an amazingly talented team that is working tirelessly to build a great OS for India and surrounding markets.
Perks :
- Awesome benefits, social gatherings, etc.
- Work with intelligent, fun and interesting people in a dynamic start-up environment.
What we are looking for
Work closely with product & engineering groups to identify and document
infrastructure requirements.
Design infrastructure solutions balancing requirements, operational
constraints and architecture guidelines.
Implement infrastructure including network connectivity, virtual machines
and monitoring.
Implement and follow security guidelines, both policy and technical to
protect our customers.
Resolve incidents as escalated from monitoring solutions and lower tiers.
Identify root cause for issues and develop long term solutions to fix recurring
issues.
Ability to automate recurring tasks to increase velocity and quality.
Partner with the engineering team to build software tolerance for
infrastructure failure or issues.
Research emerging technologies, trends and methodologies and enhance
existing systems and processes.
Qualifications
Master’s/Bachelors degree in Computer Science, Computer Engineering,
Electrical Engineering, or related technical field, and two years of experience
in software/systems or related.
5+ years overall experience.
Work experience must have included:
Proven track record in deploying, configuring and maintaining Ubuntu server
systems on premise and in the cloud.
Minimum of 4 years’ experience designing, implementing and troubleshooting
TCP/IP networks, VPN, Load Balancers & Firewalls.
Minimum 3 years of experience working in public clouds like AWS & Azure.
Hands on experience in any of the configuration management tools like Anisble,
Chef & Puppet.
Strong in performing production operation activities.
Experience with Container & Container Orchestrator tools like Kubernetes, Docker
Swarm is plus.
Good at source code management tools like Bitbucket, GIT.
Configuring and utilizing monitoring and alerting tools.
Scripting to automate infrastructure and operational processes.
Hands on work to secure networks and systems.
Sound problem resolution, judgment, negotiating and decision making skills
Ability to manage and deliver multiple project phases at the same time
Strong analytical and organizational skills
Excellent written and verbal communication skills
Interview focus areas
Networks, systems, monitoring
AWS (EC2, S3, VPC)
Problem solving, scripting, network design, systems administration and
troubleshooting scenarios
Culture fit, agility, bias for action, ownership, communication