- Must have a minimum of 3 years of experience in managing AWS resources and automating CI/CD pipelines.
- Strong scripting skills in PowerShell, Python or Bash be able to build and administer CI/CD pipelines.
- Knowledge of infrastructure tools like Cloud Formation, Terraform, Ansible.
- Experience with microservices and/or event-driven architecture.
- Experience using containerization technologies (Docker, ECS, Kubernetes, Mesos or Vagrant).
- Strong practical Windows and Linux system administration skills in the cloud.
- Understanding of DNS, NFS, TCP/IP and other protocols.
- Knowledge of secure SDLC, OWASP top 10 and CWE/SANS top 25.
- Deep understanding of Web Sockets and their functioning. Hands on experience of ElasticCache, Redis, ECS or EKS. Installation, configuration and management of Apache or Nginx web server, Apache/Tomcat Application Server, configure SSL certificates, setup reverse proxy.
- Exposure to RDBMS (MySQL, SQL Server, Aurora, etc.) is a plus.
- Exposure to programming languages like JAVA, PHP, SQL is a plus.
- AWS Developer or AWS SysOps Administrator certification is a plus.
- AWS Solutions Architect Certification experience is a plus.
- Experience building Blue/Green, Canary or other zero down time deployment strategies, advanced understanding of VPC, EC2 Route53 IAM, Lambda is a plus.
Similar jobs
- 2+ years work experience in a DevOps or similar role
- Knowledge of OO programming and concepts (Java, C++, C#, Python)
- A drive towards automating repetitive tasks (e.g., scripting via Bash, Python, etc)
- Fluency in one or more scripting languages such as Python or Ruby.
- Familiarity with Microservice-based architectures
- Practical experience with Docker containerization and clustering (Kubernetes/ECS)
- In-depth, hands-on experience with Linux, networking, server, and cloud architectures.
- Experience with CI/CD tools Azure DevOps, AWS cloud formation, Lamda functions, Jenkins, and Ansible
- Experience with AWS, Azure, or another cloud PaaS provider.
- Solid understanding of configuration, deployment, management, and maintenance of large cloud-hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting, and disaster recovery
- Proficiency with source control, continuous integration, and testing pipelines
- Effective communication skills
Job Responsibilities:
- Deploy and maintain critical applications on cloud-native microservices architecture.
- Implement automation, effective monitoring, and infrastructure-as-code.
- Deploy and maintain CI/CD pipelines across multiple environments.
- Streamline the software development lifecycle by identifying pain points and productivity barriers and determining ways to resolve them.
- Analyze how customers are using the platform and help drive continuous improvement.
- Support and work alongside a cross-functional engineering team on the latest technologies.
- Iterate on best practices to increase the quality & velocity of deployments.
- Sustain and improve the process of knowledge sharing throughout the engineering team
- Identification and prioritization of technical debt that risks instability or creates wasteful operational toil.
- Own daily operational goals with the team.
Requirement
- 1 to 7 years of experience with relative experience in managing development operations
- Hands-on experience with AWS
- Thorough knowledge of setting up release pipelines, and managing multiple environments like Beta, Staging, UAT, and Production
- Thorough knowledge of best cloud practices and architecture
- Hands-on with benchmarking and performance monitoring
- Identifying various bottlenecks and taking pre-emptive measures to avoid downtime
- Hands-on knowledge with at least one toolset Chef/Puppet/Ansible
- Hands-on with CloudFormation / Terraform or other Infrastructure as code is a plus.
- Thorough experience with Shell Scripting and should not know to shy away from learning new technologies or programming languages
- Experience with other cloud providers like Azure and GCP is a plus
- Should be open to R&D for creative ways to improve performance while keeping costs low
What do we want the person to do?
- Manage, Monitor and Provision Infrastructure - Majorly on AWS
- Will be responsible for maintaining 100% uptime on production servers (Site Reliability)
- Setting up a release pipeline for current releases. Automating releases for Beta, Staging & Production
- Maintaining near-production replica environments on Beta and Staging
- Automating Releases and Versioning of Static Assets (Experience with Chef/Puppet/Ansible)
- Should have hands-on experience with Build Tools like Jenkins, GitHub Actions, AWS CodeBuild etc
- Identify performance gaps and ways to fix them.
- Weekly meetings with Engineering Team to discuss the changes/upgrades. Can be related to code issues/architecture bottlenecks.
- Creative Ways of Reducing Costs of Cloud Computing
- Convert Infrastructure Deployment / Provision to Infrastructure as Code for reusability and scaling.
Job description
The ideal candidate is a self-motivated, multi-tasker, and demonstrated team player. You will be a lead developer responsible for the development of new software security policies and enhancements to security on existing products. You should excel in working with large-scale applications and frameworks and have outstanding communication and leadership skills.
Responsibilities
- Consulting with management on the operational requirements of software solutions.
- Contributing expertise on information system options, risk, and operational impact.
- Mentoring junior software developers in gaining experience and assuming DevOps responsibilities.
- Managing the installation and configuration of solutions.
- Collaborating with developers on software requirements, as well as interpreting test stage data.
- Developing interface simulators and designing automated module deployments.
- Completing code and script updates, as well as resolving product implementation errors.
- Overseeing routine maintenance procedures and performing diagnostic tests.
- Documenting processes and monitoring performance metrics.
- Conforming to best practices in network administration and cybersecurity.
Qualifications
- Minimum of 2 years of hands-on experience in software development and DevOps, specifically managing AWS Infrastructure such as EC2s, RDS, Elastic cache, S3, IAM, cloud trail and other services provided by AWS.
- Experience Building a multi-region highly available auto-scaling infrastructure that optimises performance and cost. plan for future infrastructure as well as Maintain & optimise existing infrastructure.
- Conceptualise, architect and build automated deployment pipelines in a CI/CD environment like Jenkins.
- Conceptualise, architect and build a containerised infrastructure using Docker, Mesosphere or similar SaaS platforms.
- Conceptualise, architect and build a secured network utilising VPCs with inputs from the security team.
- Work with developers & QA to institute a policy of Continuous Integration with Automated testing Architect, build and manage dashboards to provide visibility into delivery, production application functional and performance status.
- Work with developers to institute systems, policies and workflows which allow for rollback of deployments Triage release of applications to production environment on a daily basis.
- Interface with developers and triage SQL queries that need to be executed in production environments.
- Assist the developers and on calls for other teams with post mortem, follow up and review of issues affecting production availability.
- Minimum 2 years’ experience in Ansible.
- Must have written playbook to automate provisioning of AWS infrastructure as well as automation of routine maintenance tasks.
- Must have had prior experience automating deployments to production and lower environments.
- Experience with APM tools like New Relic and log management tools.
- Our entire platform is hosted on AWS, comprising of web applications, webservices, RDS, Redis and Elastic Search clusters and several other AWS resources like EC2, S3, Cloud front, Route53 and SNS.
- Essential Functions System Architecture Process Design and Implementation
- Minimum of 2 years scripting experience in Ruby/Python (Preferable) and Shell Web Application Deployment Systems Continuous Integration tools (Ansible)Establishing and enforcing Network Security Policy (AWS VPC, Security Group) & ACLs.
- Establishing and enforcing systems monitoring tools and standards
- Establishing and enforcing Risk Assessment policies and standards
- Establishing and enforcing Escalation policies and standards
A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- Cloud and virtualization-based technologies (Amazon Web Services (AWS), VMWare).
- Java Application Server Administration (Weblogic, WidlFfy, JBoss, Tomcat).
- Docker and Kubernetes (EKS)
- Linux/UNIX Administration (Amazon Linux and RedHat).
- Developing and supporting cloud infrastructure designs and implementations and guiding application development teams.
- Configuration Management tools (Chef or Puppet or ansible).
- Log aggregations tools such as Elastic and/or Splunk.
- Automate infrastructure and application deployment-related tasks using terraform.
- Automate repetitive tasks required to maintain a secure and up-to-date operational environment.
Responsibilities
- Build and support always-available private/public cloud-based software-as-a-service (SaaS) applications.
- Build AWS or other public cloud infrastructure using Terraform.
- Deploy and manage Kubernetes (EKS) based docker applications in AWS.
- Create custom OS images using Packer.
- Create and revise infrastructure and architectural designs and implementation plans and guide the implementation with operations.
- Liaison between application development, infrastructure support, and tools (IT Services) teams.
- Development and documentation of Chef recipes and/or ansible scripts. Support throughout the entire deployment lifecycle (development, quality assurance, and production).
- Help developers leverage infrastructure, application, and cloud platform features and functionality participate in code and design reviews, and support developers by building CI/CD pipelines using Bamboo, Jenkins, or Spinnaker.
- Create knowledge-sharing presentations and documentation to help developers and operations teams understand and leverage the system's capabilities.
- Learn on the job and explore new technologies with little supervision.
- Leverage scripting (BASH, Perl, Ruby, Python) to build required automation and tools on an ad-hoc basis.
Who we have in mind:
- Solid experience in building a solution on AWS or other public cloud services using Terraform.
- Excellent problem-solving skills with a desire to take on responsibility.
- Extensive knowledge in containerized application and deployment in Kubernetes
- Extensive knowledge of the Linux operating system, RHEL preferred.
- Proficiency with shell scripting.
- Experience with Java application servers.
- Experience with GiT and Subversion.
- Excellent written and verbal communication skills with the ability to communicate technical issues to non-technical and technical audiences.
- Experience working in a large-scale operational environment.
- Internet and operating system security fundamentals.
- Extensive knowledge of massively scalable systems. Linux operating system/application development desirable.
- Programming in scripting languages such as Python. Other object-oriented languages (C++, Java) are a plus.
- Experience with Configuration Management Automation tools (chef or puppet).
- Experience with virtualization, preferably on multiple hypervisors.
- BS/MS in Computer Science or equivalent experience.
- Excellent written and verbal skills.
Education or Equivalent Experience:
- Bachelor's degree or equivalent education in related fields
- Certificates of training in associated fields/equipment’s
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
Improvement of monitoring systems
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Incident analysis and fixing
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience of working with git
Preferred experience
Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience administering Atlassian products
- Provide consultation and review all outgoing critical customer communications.
- Apply DevOps thinking in bringing the development and IT Ops process, people, and tools together within the company in order to increase the speed, efficiency, and quality.
- Perform architecture and security reviews for different projects, work with leads to develop strategy and roadmap for the client requirements. Involve in designing of the overall architecture of the system with another leads/architect.
- Develop and grow engineers in DevOps technology to meet the incoming requirements from the business team.
- Work with senior technical team to bring in new technologies/tools being used within the company. Develop and promote best practices and emerging concepts for DevSecOps and secure CI/CD. Participate in Solution Strategy, innovation areas, and technology roadmap.
Key Skills:
- Deals positively with high levels of uncertainty, ambiguity, and shifting priorities.
- Ability to influence stakeholders as a trusted advisor across all levels, including teams outside of shared services.
- Ability to think outside of the box and be innovative by keeping abreast of new trends, identifying opportunities to bring in change for business benefit.
- Implementing CI (Continuous Integration) and CD (Continuous Deployment). Have Good exposure to CI & Build Management tools like Jenkins Azure DevOps GitHub Actions Maven Gradle and etc
- Deployment and provisioning tools (Chef/Ansible/Terraform/AWS CDK etc)
- Docker Orchestration tools like Kubernetes/Swarm etc
- Good hands-on knowledge of automation scripting Python Shell Ruby etc
- Version Control for Source Code Management (SCM) tool: GIT/Bitbucket and etc
- Expertise in Linux based systems like Unix Linux Ubuntu and also manage security systems Linux file system permission etc
- Container Orchestration tool: Kubernetes Swarm Meso Marathon Docker Writing Docker file Docker compose
- Expertise in managing Cloud resources and good exposure to Docker
- Public/Private/Hybrid cloud: AWS /Microsoft Azure/ Google Cloud Platform etc
- Extensive experience with cloud services elastic capacity administration and cloud deployment and migration.
- Good to have knowledge of tools like Splunk, New Relic, PagerDuty, VictorOps
- Familiarity with Network protocols and elements - TCP/IP HTTP(S) SSL DNS Firewall router load balancers proxy.
- Excellent in creating new and improve existing workflows within the agile software development lifecycle.
- Familiar with incident and change management processes.
- Ability to effectively priorities work with fast-changing requirements.
- Troubleshoot and debug infrastructure Network and operating system issues.
- Resolve complex issues in scenarios like resource consumptions server performance backup strategy Scaling.
- Investigate and perform Root Cause Analysis on users' reported issues and provide a workaround before implementing a final fix.
- Monitor servers and applications to ensure the smooth running of IT Architecture (Applications Services Schedulers Server Performance etc)
Design Skills:
- Interpret and implement the designs of others adhering to standards and guidelines
- Design solutions within their area of expertise using technologies that already exist within Tesco
- Understand the roadmaps for their area of Technology Design secure solutions
- Design solutions that can be consumed in a self-service manner by the engineering teams
- Understand the impact of technologies at an enterprise-scale innovation
- Demonstrate knowledge of the latest technology trends related to Infrastructure
- Understand how Industry trends impact their own area
- identify opportunities to automate work and deliver against them
Engineering group to plan ongoing feature development, product maintenance.
• Familiar with Virtualization, Containers - Kubernetes, Core Networking, Cloud Native
Development, Platform as a Service – Cloud Foundry, Infrastructure as a Service, Distributed
Systems etc
• Implementing tools and processes for deployment, monitoring, alerting, automation, scalability,
and ensuring maximum availability of server infrastructure
• Should be able to manage distributed big data systems such as hadoop, storm, mongoDB,
elastic search and cassandra etc.,
• Troubleshooting multiple deployment servers, Software installation, Managing licensing etc,.
• Plan, coordinate, and implement network security measures in order to protect data, software, and
hardware.
• Monitor the performance of computer systems and networks, and to coordinate computer network
access and use.
• Design, configure and test computer hardware, networking software, and operating system
software.
• Recommend changes to improve systems and network configurations, and determine hardware or
software requirements related to such changes.