
DevOps and Cloud Engineer
at Consulting and Product Engineering Company
Job Dsecription: (8-12 years)
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Deep understanding of Kernel, Networking and OS fundamentals
○ Strong experience in writing helm charts.
○ Deep understanding of K8s.
○ Good knowledge in service mesh.
○ Good Database understanding
Notice Period: 30 day max

Similar jobs
We are looking for a highly skilled and experienced Senior AIOps / MLOps Engineer with strong expertise in Azure Cloud, automation, platform engineering, CI/CD, observability, and enterprise-scale cloud operations.
The ideal candidate should have hands-on experience in designing, implementing, and managing modern cloud-native platforms with focus on AI/ML operationalization, DevOps automation, monitoring, reliability, and infrastructure modernization.
Required Experience
- 6 – 10 Years of overall IT experience
- Strong experience in AIOps / MLOps / DevOps engineering
- Hands-on enterprise experience in Azure Cloud platform engineering
Key Responsibilities
AIOps / MLOps
- Design and implement scalable enterprise-grade AIOps and MLOps platforms across cloud environments.
- Ensure AI platform reliability, governance, security, and model performance optimization.
- Implement LLM/AI model versioning, experiment tracking, drift detection, observability, and operational health monitoring frameworks.
- Collaborate with Data Science, DevOps, Cloud, and Application teams to accelerate AI/ML adoption and platform modernization.
- Develop automation frameworks for AI/ML pipelines, infrastructure provisioning, and operational workflows.
- Lead continuous improvement, automation, and standardization efforts across AI/ML operational ecosystems
- Mentor engineering teams and promote AIOps/MLOps best practices, innovation, and engineering excellence
- Strong Knowledge on embeddings, tokenization, vector databases, and AI/ML model training concepts
Preferred Skills
- Python, MLflow, Model Registry, Experiment Tracking
- Azure DevOps & Azure Cloud
- Azure Machine Learning
- LLMOps / Generative AI operationalization
- AI model deployment and lifecycle management
- AI Gateway and Model Serving architectures
- Azure OpenAI & Azure AI Foundry
- MCP Server implementation and configuration
- CI/CD Automation & AKS
Soft Skills
- Strong communication and stakeholder management
- Good troubleshooting and problem-solving skills
- Ability to work independently and drive ownership
- Strong collaboration and documentation skills

Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?
REVIEW CRITERIA:
MANDATORY:
- Strong Hands-On AWS Cloud Engineering / DevOps Profile
- Mandatory (Experience 1): Must have 12+ years of experience in AWS Cloud Engineering / Cloud Operations / Application Support
- Mandatory (Experience 2): Must have strong hands-on experience supporting AWS production environments (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Mandatory (Infrastructure as a code): Must have hands-on Infrastructure as Code experience using Terraform in production environments
- Mandatory (AWS Networking): Strong understanding of AWS networking and connectivity (VPC design, routing, NAT, load balancers, hybrid connectivity basics)
- Mandatory (Cost Optimization): Exposure to cost optimization and usage tracking in AWS environments
- Mandatory (Core Skills): Experience handling monitoring, alerts, incident management, and root cause analysis
- Mandatory (Soft Skills): Strong communication skills and stakeholder coordination skills
ROLE & RESPONSIBILITIES:
We are looking for a hands-on AWS Cloud Engineer to support day-to-day cloud operations, automation, and reliability of AWS environments. This role works closely with the Cloud Operations Lead, DevOps, Security, and Application teams to ensure stable, secure, and cost-effective cloud platforms.
KEY RESPONSIBILITIES:
- Operate and support AWS production environments across multiple accounts
- Manage infrastructure using Terraform and support CI/CD pipelines
- Support Amazon EKS clusters, upgrades, scaling, and troubleshooting
- Build and manage Docker images and push to Amazon ECR
- Monitor systems using CloudWatch and third-party tools; respond to incidents
- Support AWS networking (VPCs, NAT, Transit Gateway, VPN/DX)
- Assist with cost optimization, tagging, and governance standards
- Automate operational tasks using Python, Lambda, and Systems Manager
IDEAL CANDIDATE:
- Strong hands-on AWS experience (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Experience with Terraform and Git-based workflows
- Hands-on experience with Kubernetes / EKS
- Experience with CI/CD tools (GitHub Actions, Jenkins, etc.)
- Scripting experience in Python or Bash
- Understanding of monitoring, incident management, and cloud security basics
NICE TO HAVE:
- AWS Associate-level certifications
- Experience with Karpenter, Prometheus, New Relic
- Exposure to FinOps and cost optimization practices
Job Description:
We are looking to recruit engineers with zeal to learn cloud solutions using Amazon Web Services (AWS). We\'ll prefer an engineer who is passionate about AWS Cloud technology, passionate about helping customers succeed, passionate about quality and truly enjoys what they do. The qualified candidate for AWS Cloud Engineer position is someone who has a can-do attitude and is an innovative thinker.
- Be a hands on with responsibilities for the installation, configuration, and ongoing management of Linux based solutions on AWS for our clients.
- Responsible for creating and managing Autoscaling EC2 instances using VPCs, Elastic Load Balancers, and other services across multiple availability zones to build resilient, scalable and failsafe cloud solutions.
- Familiarity with other AWS services such as CloudFront, ALB, EC2, RDS, Route 53 etc. desirable.
- Working Knowledge of RDS, Dynamo DB, Guard Duty, WAF, Multi tier architecture.
- Proficient in working on Git, CI CD Pipelined, AWS Devops, Git, Bit Bucket, Ansible.
- Proficient in working on Docker Engine, Containers, Kubernetes .
- Expertise in Migration workload to AWS from different cloud providers
- Should be versatile in problem solving and resolve complex issues ranging from OS and application faults to creatively improving solution design
- Should be ready to work in rotation on a 24x7 schedule, and be available on call at other times due to the critical nature of the role
- Fault finding, analysis and of logging information for reporting of performance exceptions
- Deployment, automation, management, and maintenance of AWS cloud-based production system.
- Ensuring availability, performance, security, and scalability of AWS production systems.
- Management of creation, release, and configuration of production systems.
- Evaluation of new technology alternatives and vendor products.
- System troubleshooting and problem resolution across various application domains and platforms.
- Pre-production acceptance testing for quality assurance.
- Provision of critical system security by leveraging best practices and prolific cloud security solutions.
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on AWS platform.
- Designing, maintenance and management of tools for automation of different operational processes.
Desired Candidate Profile
o Customer oriented personality with good communication skills, who is able to articulate and communicate very effectively verbally as well as in written communications.
o Be a team player that collaborates and shares experience and expertise with the rest of the team.
o Understands database system such as MSSQL, Mongo DB, MySQL, MariaDB, Dynamo DB, RDS.
o Understands Web Servers such as Apache, Ningx.
o Must be RHEL certified.
o In depth knowledge of Linux Commands and Services.
o Efficiency enough to manage all internet applications inclusive FTP, SFTP, Ningx Apache, MySQL, PHP.
o Good communication skill.
o Atleast 3-7 Years of experience in AWS and Devops.
Company Profile:
i2k2 Networks is a trusted name in the IT cloud hosting services industry. We help enterprises with cloud migration, cost optimization, support, and fully managed services which helps them to move faster and scale with lower IT costs. i2k2 Networks offers a complete range of cutting-edge solution that drives the Internet-powered business modules. We excel in:
- Managed IT Services
- Dedicated Web Servers Hosting
- Cloud Solutions
- Email Solutions
- Enterprise Services
- Round the clock Technical Support
https://www.i2k2.com/">https://www.i2k2.com/
Regards
Nidhi Kohli
i2k2 Networks Pvt Ltd.
AM - Talent Acquisition
Company - Apptware Solutions
Location Baner Pune
Team Size - 130+
Job Description -
Cloud Engineer with 8+yrs of experience
Roles and Responsibilities
● Have 8+ years of strong experience in deployment, management and maintenance of large systems on-premise or cloud
● Experience maintaining and deploying highly-available, fault-tolerant systems at scale
● A drive towards automating repetitive tasks (e.g. scripting via Bash, Python, Ruby, etc)
● Practical experience with Docker containerization and clustering (Kubernetes/ECS)
● Expertise with AWS (e.g. IAM, EC2, VPC, ELB, ALB, Autoscaling, Lambda, VPN)
● Version control system experience (e.g. Git)
● Experience implementing CI/CD (e.g. Jenkins, TravisCI, CodePipeline)
● Operational (e.g. HA/Backups) NoSQL experience (e.g. MongoDB, Redis) SQL experience (e.g. MySQL)
● Experience with configuration management tools (e.g. Ansible, Chef) ● Experience with infrastructure-as-code (e.g. Terraform, Cloudformation)
● Bachelor's or master’s degree in CS, or equivalent practical experience
● Effective communication skills
● Hands-on cloud providers like MS Azure and GC
● A sense of ownership and ability to operate independently
● Experience with Jira and one or more Agile SDLC methodologies
● Nice to Have:
○ Sensu and Graphite
○ Ruby or Java
○ Python or Groovy
○ Java Performance Analysis
Role: Cloud Engineer
Industry Type: IT-Software, Software Services
Functional Area: IT Software - Application Programming, Maintenance Employment Type: Full Time, Permanent
Role Category: Programming & Design
Implementation Engineer
Implementation Engineer Duties and Responsibilities
- Understanding requirements from internal consumers about program functionality.
- Perform UAT tests on application with help of test cases and prepare documents for same and coordinate with team to resolve all issues within required timeframe and inform management of any delays.
- Collaborate with development team to design new programs for all client implementation activities and manage all communication with department to resolve all issues and assist implementation analyst to manage all production data.
- Perform research on all client issues and document all findings and implement all technical activities with help of JIRA.
- Assist internal teams to monitor all software implementation lifecycle and assist to track appropriate customization to all software for clients.
- Train technical staff on all OS and software issues and identify all issues in processes and provide solutions for same. Train other team members on processes, procedures, API functionality, and development specifications.
- Supervise/support crossed-functional teams to design, test and deploy to achieve on-time project completion.
- Implement, configure, and debug MySQL, JAVA, Redis, PHP, Node, ActiveMQ setups.
- Monitor and troubleshoot infrastructure utilizing SYSLOG, SNMP and other monitoring software.
- Install, configure, monitor and upgrade applications during installation/upgrade activities.
- Assisting team to identify network issue and help them with respective resolutions.
- Utilize JIRA for issue reporting, status, activity planning, tracking and updating project defects and tasks.
- Managing JIRA and tracking tickets to closure and follow-ups with team members.
- Troubleshoot software issues
- Provide on-call support as necessary
Implementation Engineer Requirements and Qualifications
- Bachelor’s degree in computer science, software engineering, or a related field
- Experience working with
- Linux & Windows Operating system
- Working on shell and bat scripts
- SIP/ISUP based solutions
- deploying / debugging Java, C++ based solutions.
- MySQL to install, backup, update and retrieve data
- Front-end or back-end software development for LINUX
- database management and security a plus
- Very good debugging and analytical skills
- Good Communication skills
Role
We are looking for an experienced DevOps engineer that will help our team establish DevOps practice. You will work closely with the technical lead to identify and establish DevOps practices in the company.
You will also help us build scalable, efficient cloud infrastructure. You’ll implement monitoring for automated system health checks. Lastly, you’ll build our CI pipeline, and train and guide the team in DevOps practices.
This would be a hybrid role and the person would be expected to also do some application level programming in their downtime.
Responsibilities
- Deployment, automation, management, and maintenance of production systems.
- Ensuring availability, performance, security, and scalability of production systems.
- Evaluation of new technology alternatives and vendor products.
- System troubleshooting and problem resolution across various application domains and platforms.
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on AWS platform.
- Manage the establishment and configuration of SaaS infrastructure in an agile way by storing infrastructure as code and employing automated configuration management tools with a goal to be able to re-provision environments at any point in time.
- Be accountable for proper backup and disaster recovery procedures.
- Drive operational cost reductions through service optimizations and demand based auto scaling.
- Have on call responsibilities.
- Perform root cause analysis for production errors
- Uses open source technologies and tools to accomplish specific use cases encountered within the project.
- Uses coding languages or scripting methodologies to solve a problem with a custom workflow.
Requirements
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Prior experience as a software developer in a couple of high level programming languages.
- Extensive experience in any Javascript based framework since we will be deploying services to NodeJS on AWS Lambda (Serverless)
- Extensive experience with web servers such as Nginx/Apache
- Strong Linux system administration background.
- Ability to present and communicate the architecture in a visual form.
- Strong knowledge of AWS (e.g. IAM, EC2, VPC, ELB, ALB, Autoscaling, Lambda, NAT gateway, DynamoDB)
- Experience maintaining and deploying highly-available, fault-tolerant systems at scale (~ 1 Lakh users a day)
- A drive towards automating repetitive tasks (e.g. scripting via Bash, Python, Ruby, etc)
- Expertise with Git
- Experience implementing CI/CD (e.g. Jenkins, TravisCI)
- Strong experience with databases such as MySQL, NoSQL, Elasticsearch, Redis and/or Mongo.
- Stellar troubleshooting skills with the ability to spot issues before they become problems.
- Current with industry trends, IT ops and industry best practices, and able to identify the ones we should implement.
- Time and project management skills, with the capability to prioritize and multitask as needed.
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.










