Job Title:
DevOps Engineer / Site Reliability Engineer (0-4 years)
Job Description:
We are seeking a highly skilled and experienced DevOps Engineer to join our team with a focus on documentation. The successful candidate will be responsible for maintaining and improving our existing infrastructure, as well as designing and implementing new systems and processes to support our growing business. Additionally, this role requires focus on documenting all changes to our infrastructure, codebase, and processes.
Key Responsibilities:
- Design, implement, and maintain scalable and highly available systems
- Continuously improve and automate our deployment processes
- Monitor and troubleshoot production issues
- Collaborate with development teams to ensure smooth and efficient releases
- Implement security best practices throughout our infrastructure
- Stay up-to-date with the latest industry trends and technologies
- Document all changes to the codebase, infrastructure, and processes in a centralized location
- Ensure all team members have access to and are aware of the documentation
Qualifications:
- Strong experience with Linux/Unix administration
- Experience with containerization (Docker, Kubernetes)
- Experience with cloud infrastructure (AWS, Azure, or GCP)
- Strong knowledge of configuration management tools (Ansible, Puppet, Chef)
- Experience with CI/CD pipeline tools (Jenkins, Travis, or CircleCI)
- Strong scripting skills in Python or Bash
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK)
- Strong understanding of network protocols and architecture
- Strong problem-solving skills
- Strong communication and collaboration skills
- Strong documentation skills
- BSc/BA in Computer Science, Engineering, or a related field is preferred.
This is an exciting opportunity to work with cutting-edge technologies and to be part of a team that is passionate about delivering high-quality, scalable, and reliable solutions. If you are a self-motivated and driven individual with a desire to continuously learn and improve, and with strong documentation skills, we would love to hear from you!
About NonStop io Technologies Pvt Ltd
Similar jobs
About us
Classplus is India's largest B2B ed-tech start-up, enabling 1 Lac+ educators and content creators to create their digital identity with their own branded apps. Starting in 2018, we have grown more than 10x in the last year, into India's fastest-growing video learning platform.
Over the years, marquee investors like Tiger Global, Surge, GSV Ventures, Blume, Falcon, Capital, RTP Global, and Chimera Ventures have supported our vision. Thanks to our awesome and dedicated team, we achieved a major milestone in March this year when we secured a “Series-D” funding.
Now as we go global, we are super excited to have new folks on board who can take the rocketship higher🚀. Do you think you have what it takes to help us achieve this? Find Out Below!
What will you do?
· Define the overall process, which includes building a team for DevOps activities and ensuring that infrastructure changes are reviewed from an architecture and security perspective
· Create standardized tooling and templates for development teams to create CI/CD pipelines
· Ensure infrastructure is created and maintained using terraform
· Work with various stakeholders to design and implement infrastructure changes to support new feature sets in various product lines.
· Maintain transparency and clear visibility of costs associated with various product verticals, environments and work with stakeholders to plan for optimization and implementation
· Spearhead continuous experimenting and innovating initiatives to optimize the infrastructure in terms of uptime, availability, latency and costs
You should apply, if you
1. Are a seasoned Veteran: Have managed infrastructure at scale running web apps, microservices, and data pipelines using tools and languages like JavaScript(NodeJS), Go, Python, Java, Erlang, Elixir, C++ or Ruby (experience in any one of them is enough)
2. Are a Mr. Perfectionist: You have a strong bias for automation and taking the time to think about the right way to solve a problem versus quick fixes or band-aids.
3. Bring your A-Game: Have hands-on experience and ability to design/implement infrastructure with GCP services like Compute, Database, Storage, Load Balancers, API Gateway, Service Mesh, Firewalls, Message Brokers, Monitoring, Logging and experience in setting up backups, patching and DR planning
4. Are up with the times: Have expertise in one or more cloud platforms (Amazon WebServices or Google Cloud Platform or Microsoft Azure), and have experience in creating and managing infrastructure completely through Terraform kind of tool
5. Have it all on your fingertips: Have experience building CI/CD pipeline using Jenkins, Docker for applications majorly running on Kubernetes. Hands-on experience in managing and troubleshooting applications running on K8s
6. Have nailed the data storage game: Good knowledge of Relational and NoSQL databases (MySQL,Mongo, BigQuery, Cassandra…)
7. Bring that extra zing: Have the ability to program/script is and strong fundamentals in Linux and Networking.
8. Know your toys: Have a good understanding of Microservices architecture, Big Data technologies and experience with highly available distributed systems, scaling data store technologies, and creating multi-tenant and self hosted environments, that’s a plus
Being Part of the Clan
At Classplus, you’re not an “employee” but a part of our “Clan”. So, you can forget about being bound by the clock as long as you’re crushing it workwise😎. Add to that some passionate people working with and around you, and what you get is the perfect work vibe you’ve been looking for!
It doesn’t matter how long your journey has been or your position in the hierarchy (we don’t do Sirs and Ma’ams); you’ll be heard, appreciated, and rewarded. One can say, we have a special place in our hearts for the Doers! ✊🏼❤️
Are you a go-getter with the chops to nail what you do? Then this is the place for you.
We are looking for an experienced DevOps engineer that will help our team establish DevOps
practice. You will work closely with the technical lead to identify and establish DevOps practices in the company.You will also help us build scalable, efficient cloud infrastructure. You’ll implement monitoring for automated system health checks. Lastly, you’ll build our CI pipeline, and train and guide the team in DevOps practices. This would be a hybrid role and the person would be expected to also do some application-level programming in their downtime.
Responsibilities
- Deployment, automation, management, and maintenance of production systems.
- Ensuring availability, performance, security, and scalability of production systems.
- Evaluation of new technology alternatives and vendor products.
- System troubleshooting and problem resolution across various application domains and
platforms.
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on AWS
platform.
- Manage the establishment and configuration of SaaS infrastructure in an agile way
by storing infrastructure as code and employing automated configuration
management tools with a goal to be able to re-provision environments at any point in
time.
- Be accountable for proper backup and disaster recovery procedures.
- Drive operational cost reductions through service optimizations and demand based
auto scaling.
- Have on call responsibilities.
- Perform root cause analysis for production errors
- Uses open source technologies and tools to accomplish specific use cases encountered
within the project.
- Uses coding languages or scripting methodologies to solve a problem with a custom
workflow.
Requirements
- Systematic problem-solving approach, coupled with strong communication skills and a
sense of ownership and drive.
- Prior experience as a software developer in a couple of high level programming
languages.
- Extensive experience in any Javascript based framework since we will be deploying
services to NodeJS on AWS Lambda (Serverless)
- Extensive experience with web servers such as Nginx/Apache
- Strong Linux system administration background.
- Ability to present and communicate the architecture in a visual form.
- Strong knowledge of AWS (e.g. IAM, EC2, VPC, ELB, ALB, Autoscaling, Lambda, NAT
gateway, DynamoDB)
- Experience maintaining and deploying highly-available, fault-tolerant systems at scale (~
1 Lakh users a day)
- A drive towards automating repetitive tasks (e.g. scripting via Bash, Python, Ruby, etc)
- Expertise with Git
- Experience implementing CI/CD (e.g. Jenkins, TravisCI)
- Strong experience with databases such as MySQL, NoSQL, Elasticsearch, Redis and/or
Mongo.
- Stellar troubleshooting skills with the ability to spot issues before they become problems.
- Current with industry trends, IT ops and industry best practices, and able to identify the
ones we should implement.
- Time and project management skills, with the capability to prioritize and multitask as
needed.
Required Skills
• Automation is a part of your daily functions, so thorough familiarity with Unix Bourne shell scripting and Python is a critical survival skill.
• Integration and maintenance of automated tools
• Strong analytical and problem-solving skills
• Working experience in source control tools such as GIT/Github/Gitlab/TFS
• Have experience with modern virtualization technologies (Docker, KVM, AWS, OpenStack, or any orchestration platforms)
• Automation of deployment, customization, upgrades, and monitoring through modern DevOps tools (Ansible, Kubernetes, OpenShift, etc) • Advanced Linux admin experience
• Using Jenkins or similar tools
• Deep understanding of Container orchestration(Preferably Kubernetes )
• Strong knowledge of Object Storage(Preferably Cept on Rook)
• Experience in installing, managing & tuning microservices environments using Kubernetes & Docker both on-premise and on the cloud.
• Experience in deploying and managing spring boot applications.
• Experience in deploying and managing Python applications using Django, FastAPI, Flask.
• Experience in deploying machine learning pipelines/data pipelines using Airflow/Kubeflow /Mlflow.
• Experience in web server and reverse Proxy like Nginx, Apache Server, HAproxy
• Experience in monitoring tools like Prometheus, Grafana.
• Experience in provisioning & maintaining SQL/NoSQL databases.
Desired Skills
• Configuration software: Ansible
• Excellent communication and collaboration skills
• Good experience on Networking Technologies like a Load balancer, ACL, Firewall, VIP, DNS
• Programmatic experience with AWS, DO, or GCP storage & machine images
• Experience on various Linux distributions
• Knowledge of Azure DevOps Server
• Docker management and troubleshooting
• Familiarity with micro-services and RESTful systems
• AWS / GCP / Azure certification
• Interact with the Engineering for supporting/maintaining/designing backend infrastructure for product support
• Create fully automated global cloud infrastructure that spans multiple regions.
• Great learning attitude to the newest technology and a Team player
- Develop and Deploy Software:
- Architect and create an effective build and release process using industry best practices and tools
- Create and manage build scripts to deploy software in a multi-cloud environment
- Look for opportunities to automate as much of the deployment process as possible to provide for repeatability, auditability, scalability and build in process enforcement
- Manage Release Schedule:
- Act as a “gate keeper” for all releases into production
- Work closely with business stakeholders, development managers and developers to prepare a release schedule
- Help prioritize deployment requests for version upgrades, patches and hot-fixes
- Continuous Delivery of Software:
- Implement Continuous Integration (CI) practices to drive development teams to implement smaller changes and commit code to the version control repo frequently
- Implement Continuous Development (CD) practices that automates deployment of the application to several environments – Dev, Test and Production
- Implement Continuous Testing (functional and non-functional) to execute tests in the CI/CD pipeline
- Manage Version Control:
- Define and implement branching policies to efficiently manage source-code
- Implement business rules as a part of source control standards
- Resolve Software Issues:
- Assist technical support and development teams to troubleshoot issues and identify areas that need improvement
- Address deployment related issues
- Maintain Release Documentation:
- Maintain release notes (features available in stable versions and known issues) and other documents for both internal and external end users
ABOUT RIA
RIA is an InsurTech company on a mission to actively partner with our customers to improve their health and health outcomes. We are taking a very differentiated approach which is backed by how we use health data and digital health to keep our customers healthier.
To enable our mission, we are building our own core InsurTech platform in-house. This platform is cloud-native and on a microservices architecture. We’re building all our core components internally, such as - insurance APIs, an AI/intelligence layer, our own risk models, a health data platform, and a low-code insurance workflow automation platform.
WHY JOIN US
We’re building the foundation of our team right now, and are looking for ambitious team members to join us and grow rapidly with us. The work environment is fast-paced and you would make a clear impact!
You’ll partner with a stellar leadership team that comes from IIT Kanpur, Kellogg, MIT Sloan, IIT Guwahati, UChicago Booth, IIT Madras & University of Michigan. The team has also worked at companies such as McKinsey, Goldman Sachs, Max Bupa, Swiss Re, ICICI Lombard, ICICI Prudential, etc.
We’re a well-funded start-up, which has raised capital from top VCs, global insurers, and prominent Indian family offices.
WHAT YOU WILL DO
RIA’s vision is to build the gold standard for healthcare technology systems and healthcare data platform in India. The core technology we are building is a data platform on the cloud bringing together vast volumes and types of health data. A data analytics layer will sit on top and allow us to create AI and ML models in the near future. Additionally, we are working on an API ecosystem where we integrate with a number of partners for insurance distribution and business partnerships. And to tie it all together a cutting-edge user experience to deliver better healthcare and insurance experience to customers.
If you’re looking to design and architect a cutting edge, serverless, cloud based infrastructure solution on AWS this is the right place for you. Over the next few years you will get to work with building CI/CD deployment pipelines for microservices, architecting data lakes/warehouses, securing sensitive user data on the cloud, and setting up RIA to scale to millions of users seamlessly. This is a great opportunity to be part of something new and cutting edge.
REQUIREMENTS
Must Have
- 4-5 years of experience working as a DevOps Engineer
- Should demonstrate experience in automating CI/CD pipelines using Jenkins or Gitlab CI.
- Should demonstrate experience in managing deployments of microservices with expertise in container(Docker) based workloads.
- Should have experience in managing and optimizing cloud infrastructure management with a preference for AWS.
- Additional experience of Terraform or CloudFormation would be a plus.
- Additional experience on deploying workloads on AWS Fargate, Kubernetes (AWS EKS) would be a plus
This role requires a balance between hands-on infrastructure-as-code deployments as well as involvement in operational architecture and technology advocacy initiatives across the Numerator portfolio.
Responsibilities
|
|
Technical Skills
|
Roles and Responsibilities
- 5 - 8 years of experience in Infrastructure setup on Cloud, Build/Release Engineering, Continuous Integration and Delivery, Configuration/Change Management.
- Good experience with Linux/Unix administration and moderate to significant experience administering relational databases such as PostgreSQL, etc.
- Experience with Docker and related tools (Cassandra, Rancher, Kubernetes etc.)
- Experience of working in Config management tools (Ansible, Chef, Puppet, Terraform etc.) is a plus.
- Experience with cloud technologies like Azure
- Experience with monitoring and alerting (TICK, ELK, Nagios, PagerDuty)
- Experience with distributed systems and related technologies (NSQ, RabbitMQ, SQS, etc.) is a plus
- Experience with scaling data store technologies is a plus (PostgreSQL, Scylla, Redis) is a plus
- Experience with SSH Certificate Authorities and Identity Management (Netflix BLESS) is a plus
- Experience with multi-domain SSL certs and provisioning a plus (Let's Encrypt) is a plus
- Experience with chaos or similar methodologies is a plus
Job Dsecription:
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Having a good knowledge of Terraform + someone who has worked on large TF code bases.
○ Deep understanding of Terraform with best practices & writing TF modules.
○ Hands-on experience of GCP and AWS and knowledge on AWS Services like VPC and VPC related services like (route tables, vpc endpoints, privatelinks) EKS, S3, IAM. Cost aware mindset towards Cloud services.
○ Deep understanding of Kernel, Networking and OS fundamentals
NOTICE PERIOD - Max - 30 days