We are looking for a DevOps Engineer (individual contributor) to maintain and build upon our next-generation infrastructure. We aim to ensure that our systems are secure, reliable and high-performing by constantly striving to achieve best-in-class infrastructure and security by:
- Leveraging a variety of tools to ensure all configuration is codified (using tools like Terraform and Flux) and applied in a secure, repeatable way (via CI)
- Routinely identifying new technologies and processes that enable us to streamline our operations and improve overall security
- Holistically monitoring our overall DevOps setup and health to ensure our roadmap constantly delivers high-impact improvements
- Eliminating toil by automating as many operational aspects of our day-to-day work as possible using internally created, third party and/or open-source tools
- Maintain a culture of empowerment and self-service by minimizing friction for developers to understand and use our infrastructure through a combination of innovative tools, excellent documentation and teamwork
Tech stack: Microservices primarily written in JavaScript, Kotlin, Scala, and Python. The majority of our infrastructure sits within EKS on AWS, using Istio. We use Terraform and Helm/Flux when working with AWS and EKS (k8s). Deployments are managed with a combination of Jenkins and Flux. We rely heavily on Kafka, Cassandra, Mongo and Postgres and are increasingly leveraging AWS-managed services (e.g. RDS, lambda).
About StashAway
Similar jobs
At BigThinkCode, our technology solves complex problems. We are looking for talented Cloud Devops engineer to join our Cloud Infrastructure team at Chennai.
Our ideal candidate will have expert knowledge of software development processes, OS, troubleshooting, infrastructure environment set-up and problem-solving skills. This is an opportunity to join a growing team and make a substantial impact at BigThinkCode Technologies.
Please find below our job description, if interested apply / reply sharing your profile to connect and discuss.
Company: BigThinkCode Technologies
URL: https://www.bigthinkcode.com/
Experience: 2.8 – 4 years
Level: Devops Engineer, Senior
Location: Chennai (Work from Office)
Joining time: Immediate – 4 weeks of time.
Responsibilities of DevOps / Cloud Engineer include:
· Understanding customer requirements and project KPIs.
· Setting up tools and required infrastructure.
· Defining and setting development, test, release, update, and support processes for DevOps operation.
· Have the technical skill to review, verify, and validate the devops related implementation in the project.
· Troubleshooting techniques and fixing the bugs.
· Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage.
· Encouraging and building automated processes wherever possible.
· Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management.
· Incidence management and root cause analysis.
· Coordination and communication within the team and with customers.
· Selecting and deploying appropriate CI/CD tools.
· Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline).
· Monitoring and measuring customer experience and KPIs
· Managing periodic reporting on the progress to the management and the customer
Required skills:
· 4+ years of experience in provisioning, operations, and management of cloud and on-prem environments.
· Demonstrated competency with the any one of Cloud technologies like AWS or GCP.
· Demonstrated competency with On-Prem deployment techniques.
· Experience in code development in at least one scripting languages like Shell, Powershell and Python.
· Knowledge of operating system administration.
· Experience in creation of highly automated infrastructures.
· Comprehensive knowledge regarding contemporary processes and methodologies for development and operations.
· Strong understanding of how to secure Cloud (AWS or AGCP) environments and meet compliance requirements.
· Cloud (AWS or GCP) Disaster Recovery design and deployment across regions a plus.
· Experience with multi-tier architectures: load balancers, caching, web servers, application servers, databases, and networking.
· Clear written and verbal communication.
· Manage your own time and work well both independently and as part of a team
· Understanding of Rest APIs, Big Data Processing, Rules Engines to orchestrate the calls to the Rest APIs and other data sources like Kafka, Snowflake, AWS S3.
· Maintain the organization standards related to ISO:ISMS principles.
· Desired tooling: Ansible, chef, terraform or cloud formation experience.
· Developing our release management and upgrade infrastructure
· Developing configuration and integration tools with customer IT systems
· Playing a key role in defining and implementing security practices
· Understand how to translate business requirements and “user needs” into code.
· You take pride in designing solutions that will outlive the problem.
Requirements :
- Good knowledge of Linux Ubuntu.
- Knowledge of general networking practices/protocols / administrative tasks.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Scripting to ensure operations automation and data gathering is accomplished seamlessly.
- Ability to work cooperatively with software developers, testers, and database administrators.
- Experience with software version control system (GIT) and CI.
- Knowledge of Web server Apache, Nginx etc
- E-Mail servers based on Postfix and Dovecot.
- Understanding of docker and Kubernetes Containers.
- IT hardware, Linux Server, System Admin, Server administrator
Highlights:
- Working 5 days a week.
- Group Health Insurance for our employees.
- Work with a team of 300+ excellent engineers.
- Extra Compensation for Night Shifts.
- Additional Salary for an extra day spent in the office.
- Lunch buffets for all our employees.
- Fantastic Friday meals to dine in for employees.
- Yearly and quarterly awards with CASH amount, Birthday celebration, Dinner coupons etc.
- Team Dinners on Project Completion.
- Festival celebration, Month End celebration.
Job Brief
The role is to coordinate strategies for defining, deploying, and designing a next-generation, cloud-based unified communications platform. This includes managing all engineering projects for VoIP initiatives, planning technology roadmaps, and configuring and optimizing all products and services, both internally and those integrated with Internet-based services.
Responsibilities:
- Provide ongoing support of the Stage Prod environments that are placed in public clouds;
- Improvement Observability of the Product and Infrastructure it placed in;
- Support integrations with other Products and collaborate with Teams owning them;
- Write high-quality documentation;
- Improve deployment process: CI/CD pipelines, automations and so on and so forth.
Requirements:
Technical Experience:
- Confident Linux administrator and common experience as administrator of services used by customers (internal or external);
- Monitoring Systems and Observability Tools: Prometheus Grafana; ELK;
- CI/CD experience: Git GitLab, Bazel or Jenkins;
- DevOps SRE practices understanding, including common toolset, approaches, deployment strategies et cetera;
- IaaC: HashiCorp Terraform, CloudFormation;
- Public Clouds -Networking, Containers. DNS, other common public clouds services: computing, storages, billing, user management and roles control (AWS);
- Docker Kubernetes: near to CKA level;
- Networks: TCP/IP, NAT/PAT, HTTP(s), DNS;
- Basic experience with databases administration (MySQL or PostgreSQL);
- Automations: Python in Linux Administration;
- Understanding of Change Incident management processes
The candidates should have:
· Strong knowledge on Windows and Linux OS
· Experience working in Version Control Systems like git
· Hands-on experience in tools Docker, SonarQube, Ansible, Kubernetes, ELK.
· Basic understanding of SQL commands
· Experience working on Azure Cloud DevOps
- Good knowledge of at least one language (C#, Java, Python, Go, PHP, Node.js)
- Have enough experience on application and infrastructure architectures
- Design and plan cloud solution architecture
- Design for security, network, and compliances
- Analyze and optimize technical and business processes
- Ensure solution and operational reliability
- Manage and provision cloud infrastructure
- Manage IaaS, PaaS, and SaaS solutions
- Design strategies around cloud governance, migration, Cloud operations and DevOps
- Design highly scalable, available, and reliable cloud applications
- Build and test applications
- Deploy applications on cloud
- Integration with cloud services
Certification:
- Architect level certificate of any cloud (AWS, GCP, Azure)
About BootLabs
https://www.google.com/url?q=https://www.bootlabs.in/&sa=D&source=calendar&ust=1667803146567128&usg=AOvVaw1r5g0R_vYM07k6qpoNvvh6" target="_blank">https://www.bootlabs.in/
-We are a Boutique Tech Consulting partner, specializing in Cloud Native Solutions.
-We are obsessed with anything “CLOUD”. Our goal is to seamlessly automate the development lifecycle, and modernize infrastructure and its associated applications.
-With a product mindset, we enable start-ups and enterprises on the cloud
transformation, cloud migration, end-to-end automation and managed cloud services.
-We are eager to research, discover, automate, adapt, empower and deliver quality solutions on time.
-We are passionate about customer success. With the right blend of experience and exuberant youth in our in-house team, we have significantly impacted customers.
Technical Skills:
• Expertise in any one hyper scaler (AWS/AZURE/GCP), including basic services like networking,
data and workload management.
- AWS
Networking: VPC, VPC Peering, Transit Gateway, Route Tables, Security Groups, etc.
Data: RDS, DynamoDB, Elastic Search
Workload: EC2, EKS, Lambda, etc.
- Azure
Data: Azure MySQL, Azure MSSQL, etc.
Workload: AKS, Virtual Machines, Azure Functions
- GCP
Data: Cloud Storage, DataFlow, Cloud SQL, Firestore, BigTable, BigQuery
Workload: GKE, Instances, App Engine, Batch, etc.
• Experience in any one of the CI/CD tools (Gitlab/Github/Jenkins) including runner setup,
templating and configuration.
• Kubernetes experience or Ansible Experience (EKS/AKS/GKE), basics like pod, deployment,
networking, service mesh. Used any package manager like helm.
• Scripting experience (Bash/python), automation in pipelines when required, system service.
• Infrastructure automation (Terraform/pulumi/cloud formation), write modules, setup pipeline and version the code.
Optional:
• Experience in any programming language is not required but is appreciated.
• Good experience in GIT, SVN or any other code management tool is required.
• DevSecops tools like (Qualys/SonarQube/BlackDuck) for security scanning of artifacts, infrastructure and code.
• Observability tools (Opensource: Prometheus, Elasticsearch, Open Telemetry; Paid: Datadog,
24/7, etc)
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
About Us
We have grown over 1400% in revenues in the last year.
Interface.ai provides an Intelligent Virtual Assistant (IVA) to FIs to automate calls and customer inquiries across multiple channels and engage their customers with financial insights and upsell/cross-sell.
Our IVA is transforming financial institutions’ call centers from a cost to a revenue center.
Our core technology is built 100% in-house with several breakthroughs in Natural Language Understanding. Our parser is built based on zero-shot learning that helps us to launch industry-specific IVA that can achieve over 90% accuracy on Day-1.
We are 45 people strong with employees spread across India and US locations. Many of them come from ML teams at Apple, Microsoft, and Salesforce in the US along with enterprise architects with over 20+ years of experience building large-scale systems. Our India team consists of people from ISB, IIMs, and many who have been previously part of early-stage startups.
We are a fully remote team.
Founders come from Banking and Enterprise Technology backgrounds with previous experience scaling companies from scratch to $50M+ in revenues.
As a Site Reliability Engineer you will be in charge of:
- Designing, analyzing and troubleshooting large-scale distributed systems
- Engaging in cross-functional team discussions on design, deployment, operation, and maintenance, in a fast-moving, collaborative set up
- Building automation scripts to validate the stability, scalability, and reliability of interface.ai’s products & services as well as enhance interface.ai’s employees’ productivity
- Debugging and optimizing code and automating routine tasks
- Troubleshoot and diagnose issues (hardware or software), propose and implement solutions to ensure they occur with reduced frequency
- Perform the periodic on-call duty to handle security, availability, and reliability of interface.ai’s products
- You will follow and write good code and solid engineering practices
Requirements
You can be a great fit if you are :
- Extremely self motivated
- Ability to learn quickly
- Growth Mindset (read this if you don't know what it means - https://www.amazon.com/Mindset-Psychology-Carol-S-Dweck/dp/0345472322" target="_blank">link)
- Emotional Maturity (read this if you don't know what it means - https://medium.com/@krisgage/15-signs-of-emotional-maturity-38b1a2ab9766" target="_blank">link)
- Passionate about the possibilities at the intersection of AI + Banking
- Worked in a startup of 5 to 30 employees
- Developer with a strong interest in systems Design. You will be building, maintaining, and scaling our cloud infrastructure through software tooling and automation.
- 4-8 years of industry experience developing and troubleshooting large-scale infrastructure on the cloud
- Have a solid understanding of system availability, latency, and performance
- Strong programming skills in at least one major programming language and the ability to learn new languages as needed
- Strong System/network debugging skills
- Experience with management/automation tools such as Terraform/Puppet/Chef/SALT
- Experience with setting up production-level monitoring and telemetry
- Expertise in Container management & AWS
- Experience with kubernetes is a plus
- Experience building CI/CD pipelines
- Experience working with Web sockets, Redis, Postgres, Elastic search, Logstash
- Experience working in an agile team environment and proficient understanding of code versioning tools, such as Git.
- Ability to effectively articulate technical challenges and solutions.
- Proactive outlook for ways to make our systems more reliable
DevOps Engineer responsibilities include deploying product updates, identifying production issues, and implementing integrations that meet customer needs. If you have a solid background in working with cloud technologies, set up efficient deployment processes, and are motivated to work with diverse and talented teams, we’d like to meet you.
Ultimately, you will execute and automate operational processes fast, accurately, and securely.
Skills and Experience
-
4+ years of experience in building infrastructure experience with Cloud Providers ( AWS, Azure, GCP)
-
Experience in deploying containerized applications build on NodeJS/PHP/Python to kubernetes cluster.
-
Experience in monitoring production workload with relevant metrics and dashboards.
-
Experience in writing automation scripts using Shell, Python, Terraform, etc.
-
Experience in following security practices while setting up the infrastructure.
-
Self-motivated, able, and willing to help where help is needed
-
Able to build relationships, be culturally sensitive, have goal alignment, have learning agility
Roles and Responsibilities
-
Manage various resources across different cloud providers. (Azure, AWS, and GCP)
-
Monitor and optimize infrastructure cost.
-
Manage various kubernetes clusters with appropriate monitoring and alerting setup.
-
Build CI/CD pipelines to orchestrate provisioning and deployment of various services into kubernetes infrastructure.
-
Work closely with the development team on upcoming features to determine the correct infrastructure and related tools.
-
Assist the support team with escalated customer issues.
-
Develop, improve, and thoroughly document operational practices and procedures.
-
Responsible for setting up good security practices across various clouds.
Engineering group to plan ongoing feature development, product maintenance.
• Familiar with Virtualization, Containers - Kubernetes, Core Networking, Cloud Native
Development, Platform as a Service – Cloud Foundry, Infrastructure as a Service, Distributed
Systems etc
• Implementing tools and processes for deployment, monitoring, alerting, automation, scalability,
and ensuring maximum availability of server infrastructure
• Should be able to manage distributed big data systems such as hadoop, storm, mongoDB,
elastic search and cassandra etc.,
• Troubleshooting multiple deployment servers, Software installation, Managing licensing etc,.
• Plan, coordinate, and implement network security measures in order to protect data, software, and
hardware.
• Monitor the performance of computer systems and networks, and to coordinate computer network
access and use.
• Design, configure and test computer hardware, networking software, and operating system
software.
• Recommend changes to improve systems and network configurations, and determine hardware or
software requirements related to such changes.