In-depth knowledge and hands-on experience with all of the AWS services and other similar cloud services
Strong knowledge of core architectural concepts including distributed computing , scalability, availability, and performance to recommend the best backend solutions for our products
Preferred AWS Certifications:
- AWS Solutions Architect Professional/Associate AWS DevOps Engineer Professional
- AWS SysOps Administrator - Associate AWS Developer Associate
- ITCAN is looking for an AWS Solution Architect who will be responsible for development of scalable, optimized, and reliable backend solutions using AWS services for all our products. You will ensure that our products consume AWS services in the mast effective methods. Therefore, a commitment to collaborative problem solving, sophisticated design, and quality product is important.
- Analyse requirements and devise innovative, efficient, and cost-effective architecture using AWS components and services that ensure scalability, availability and high- performance.
- Develop automation and deployment utilities using Ruby, Bash and Shell scripting and implementing
- CI/CD pipelines using Jenkins, Code Deploy, Git, Code Pipeline, Code Commit etc. To ensure seamless deployment with no downtime.
- Redesign architectures end Lo-end seamlessly by working through major software upgrades such as Apache.
- Ensure an always-running network with the ability to set up redundant DNS systems with failover capabilities.
- Ensure the AWS services consumed are aligned with best practices to ensure higher availability and security along with optimal cost utilization.
- Using AWS-managed services, implement ELK systems end-to-end.
● Be on a PagerDuty rotation to respond to availability incidents and provide support
for service engineers.
● Run the production environment by monitoring availability and taking a holistic view
of system health
● Building and implementing services to make IT and support better at their jobs.
● Improve reliability, quality, and time-to-market of our suite of software solutions
● Measure and optimize system performance, with an eye toward pushing our
capabilities forward, getting ahead of customer needs, and innovating to continually
● Gather and analyze metrics from both operating systems and applications to assist in
performance tuning and fault finding
● Experience from an agile working development environment
● Participate in system design consulting, platform management, and capacity planning
● Balance feature development speed and reliability with well-defined service level
Required Skills and Qualifications:
● 3+ years of experience working within DevOps or SRE teams.
● 3+ years experience with AWS Cloud
● Ability to program (structured and OO) with one or more high level languages, such
● Must have experience with Ansible, Helm, Terraform and Kubernetes.
● Document every action so your findings turn into repeatable actions–and then into
● Hands-on experience with Distributed Version Control System such as GIT, AWS
CodeCommit or equivalent
● Know your way around Linux and the Unix Shell.
● Experience or familiarity with ELK stack
● Ability to use Azure DevOps
● Experience with distributed storage technologies like NFS, Ceph, S3 as well as
dynamic resource management frameworks (Mesos, Kubernetes)
● A proactive approach to spotting problems, areas for improvement, and performance
- Work with the development team to plan, execute and monitor deployments
- Capacity planning for product deployments
- Adopt best practices for deployment and monitoring systems
- Ensure the SLAs for performance, up time are met
- Constantly monitor systems, suggest changes to improve performance and decrease costs.
- Ensure the highest standards of security
Key Competencies (Functional):
- Proficiency in coding in atleast one scripting language - bash, Python, etc
- Has personally managed a fleet of servers (> 15)
- Understand different environments production, deployment and staging
- Worked in micro service / Service oriented architecture systems
- Has worked with automated deployment systems – Ansible / Chef / Puppet.
- Can write MySQL queries
• Support software build and release efforts:
• Create, set up, and maintain builds
• Review build results and resolve build problems
• Create and Maintain build servers
• Plan, manage, and control product releases
• Validate, archive, and escrow product releases
• Maintain and administer configuration management tools, including source control, defect management, project management, and other systems.
• Develop scripts and programs to automate process and integrate tools.
• Resolve help desk requests from worldwide product development staff.
• Participate in team and process improvement projects.
• Interact with product development teams to plan and implement tool and build improvements.
• Perform other duties as assigned.
While the job description describes what is anticipated as the requirements of the position, the job requirements are subject to change based upon any changing needs and requirements of the business.
• TFS 2017 vNext Builds or AzureDevOps Builds Process
• Must to have PowerShell 3.0+ Scripting knowledge
• Exposure on Build Tools like MSbuild, NANT, XCode.
• Exposure on Creating and Maintaining vCenter/VMware vSphere 6.5
• Hands On experiences on above Win2k12 OS and basic info on MacOS
• Good to have Shell or Batch Script (optional)
Candidates for this position should hold the following qualifications to be considered as a suitable applicant. Please note that except where specified as “preferred,” or as a “plus,” all points listed below are considered minimum requirements.
• Bachelors Degree in a related discipline is strongly preferred
• 3 or more years experience with Software Configuration Management tools, concepts, and processes.
• Exposure to Source control systems such as TFS, GIT, or Subversion (Optional)
• Familiarity with object-oriented concepts and programming in C# and Power Shell Scripting.
• Experience working on AzureDevOps Builds or vNext Builds or Jenkins Builds
• Experience working with developers to resolve development issues related to source control systems.
What You’ll Do:
- Building solutions to scale our services and applications reliably for high availability
- Setting up and maintaining robust CI/CD pipelines for automated build, test, and deployment of several microservices on different environments
- Setting up efficient monitoring, metrics, logging and tracing of cloud services
- Own the performance and uptime SLAs for the applications, guiding developers on meeting the SLAs
Skills we are looking for:
- Overall 3-5 years of experience
- Deep knowledge of cloud platforms like AWS, GCP etc
- Good knowledge of tools containers ecosystem including Dockers, Kubernates, ECS etc
- Experience in setting up tools like Nagios, grafana, Prometheus, kibana
- Enthusiastic for implementing Infrastructure as code using technologies like terraform
- Understanding of configuration management systems like Puppet or Ansible
- Basic understanding of networking, DNS, TCP/IP, Load Balancing, subnet masking, firewall configurations, DDOS etc
- Good level of scripting skills using any of languages like Python, Shell scripting, Php, Perl etc
JD: Site Reliability Engineers
Location: PUNE, Remote
Sarvaha would like to welcome experienced SRE specialists with minimum of 5 years of professional experience in Google Cloud Platform or AWS based deployments and automation. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. Your will be expected to work with a globally distributed team and contribute independently as well as lead a team of engineers. This is a hands-on position that would require you to be responsible for production software deployments across global availability zones.
- Design, write and run services that provide visibility into a leading IoT platform & underlying services
- Automate deployments, diagnostic and debugging tools
- Participate in on-call rotations
- Adhere to industry-standard security best practices
- Work with other teams in troubleshooting and keeping the systems up and running
- Minimum Bachelor’s Degree in Computer Science or related degree
- Minimum 5+ years of total experience with at least 4 years of experience in SRE, DevOps or similar role. More experience in highly desired
- 4+ years of hands-on experience with one of AWS/Azure/GCP is must have for this position
- 1+ years of experience debugging code written in Python, Java or any strongly typed language
- 3+ years of experience with Kubernetes, Prometheus, ELK, Grafana, Nagios
- 2+ years of experience with Jenkins or similar build and deploy orchestration tool
- 2+ years of experience with RDBMs and no-SQL databases (MySQL, Oracle, Cassandra, CDH)
- 1+ years of experience writing infrastructure as code using Terraform
- Excellent verbal and written communication and strong interpersonal skills are requisite for success of this position
- Strong listening and interpersonal skills and attention to details is highly desired
- Top-notch remuneration with non-linear growth
- Work with industry best cloud architects, DevOPs team and developers
- Excellent, no-nonsense work environment with the very best people to work with
- Cutting edge work with Fortune 500 businesses and learn from high-visibility systems that drive public facing, high-traffic systems
Who You Are
- Creative thinker and strong problem solver with meticulous attention to detail
- Highly organized, creative, motivated, and passionate about achieving results
- Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
- Able to work both independently and as part of a team
- Systematic problem-solver, coupled with a strong sense of ownership and drive
What you need
- 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
- Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
- Write great quality code using SOLID principles including unit and integration tests.
- Hands-on development experience in an object-orientated programming language like Python.
- Hands-on experience developing task automations
- Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
- Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
- Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
- Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
- Worked with log and configuration management tool
- Prior experience of working with AWS, Azure, GCP is a plus
- Prior experience of working with Kubernetes, Docker and containers is plus
- Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
- Documenting your work should be in your DNA
What you get
- A chance to develop and build something (probably from scratch) which you can be proud of
- Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
- Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
- Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
- Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
- Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
- Identify process gaps and implement process improvements to increase operational reliability
- Drive standardization efforts across the services, infrastructure, systems, and practices
- Develop Systems & Tools to help with Development team to uphold the Reliability principles
As a SaaS DevOps Engineer you will be responsible for providing automated tooling and process enhancements for SaaS deployment, application and infrastructure upgrades and production monitoring.
Development of automation scripts and pipelines for deployment and monitoring of new production environments.
Development of automation scripts for upgrades, hot fixes deployments and maintenances.
Work closely with Scrum teams and product groups to support the quality and growth of the SaaS services.
Collaborate closely with SaaS Operations team to handle day to day production activities - handling alerts and incidents.
Assist SaaS Operations team to handle customers focus projects: migrations, features enablement.
Write Knowledge articles to document known issues and best practices.
Conduct regression tests to validate solution or workarounds.
Work in a globally distributed team.
What achievements should you have so far?
Bachelor's or master’s degree in Computer Science, Information Systems, or equivalent.
Experience with containerization, deployment, and operations.
Strong knowledge about CI/CD process (Git, Jenkins, Pipelines).
Good experience with Linux systems and Shell scripting.
Basic cloud experience, preferably oriented on MS Azure.
Basic knowledge about containerized solutions (Helm, Kubernetes, Docker).
Good Networking skills and experience.
Having Terraform or CloudFormation knowledge will be considered a plus.
Ability to analyze a task from a system perspective.
Excellent problem solving and troubleshooting skills.
Excellent written and verbal communication skills; mastery in English and local language.
Must be organized, thorough, autonomous, committed, flexible, customer-focused and productive.
Strong knowledge and experience of cloud infrastructure (AWS, Azure or GCP), systems, network design, and cloud migration projects.
Strong knowledge and understanding of CI/CD processes tools (Jenkins/Azure DevOps) is a must.
Strong knowledge and understanding of Docker & Kubernetes is a must.
Strong knowledge of Python, along with one more language (Shell, Groovy, or Java).
Strong prior experience using automation tools like Ansible, Terraform.
Architect systems, infrastructure & platforms using Cloud Services.
Strong communication skills. Should have demonstrated the ability to collaborate across teams and organizations.
Benefits of working with OpsTree Solutions:
Opportunity to work on the latest cutting edge tools/technologies in DevOps
Knowledge focused work culture
Collaboration with very enthusiastic DevOps experts
High growth trajectory
Opportunity to work with big shots in the IT industry
- Deploy company Application on customer public cloud and on-premise data centers
- Building Kubernetes based workflows for wide variety of use cases
- Document and Automate the deployment process for internal and external deployments
- Interacting with customers over call to deployment and debugging
- Deployment and Product Support
Desired Skills and Experience
- 4-6 years of experience in infrastructure development, or development and operations.
- Minimum 2+ years of experience in docker and kubernetes.
- Experience working with Docker and Kubernetes. Aware of Kubernetes Internals, Networking etc. Experience with Linux infrastructures tools.
- Good interpersonal skills and communication with all levels of management.
- Extensive experience in setting up Kubernetes on AWS, Azure etc.
Good to Have
- Familiarity with Big Data Tools like Hadoop, Spark.
- Experience with Java Application Debugging.
- Experience in monitoring tools like Prometheus, Grafana etc
EXP:: 4 - 7 yrs
- Any scripting language:: Python, Scala, shell or bash
- Cloud:: AWS
- Database:: Relational (SQL) & non-relational (NoSQL)
- CI/CD tools and Version controlling