Your Role:
- Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services
- Gain deep knowledge of our complex applications
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
- Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment
- Work closely with development teams to ensure that platforms are designed with "operability" in mind.
- Function well in a fast-paced, rapidly-changing environment
- Should be able to lead a team of smart engineers
- Should be able to strategically guide the team to greater automation adoption
Must Have:
- Experience Building/managing DevOps/SRE teams
- Strong in troubleshooting/debugging Systems, Network and Applications
- Strong in Unix/Linux operating systems and Networking
- Working knowledge of Open source technologies in Monitoring, Deployment and incident management
Good to Have:
- Minimum 3+ years of team management experience
- Experience in Containers and orchestration layers like Kubernetes, Mesos/Marathon
- Proven experience in programming & diagnostics in any languages like Go, Python, Java
- Experience in NoSQL/SQL technologies like Cassandra/MySQL/CouchBase etc.
- Experience in BigData technologies like Kafka/Hadoop/Airflow/Spark
- Is a die-hard sports fan
About www.thecatalystiq.com
Similar jobs
We are hiring for a Lead DevOps Engineer in Cloud domain with hands on experience in Azure / GCP.
- Expertise in managing Cloud / VMWare resources and good exposure on Dockers/Kubernetes
- Working knowledge of operating systems( Unix, Linux, IBM AIX)
- Experience in installation, configuration and managing apache webserver, Tomcat/Jboss
- Good understanding of JVM, troubleshooting and performance tuning through thread dump and log analysis
-Strong expertise in Dev Ops tools:
- Deployment (Chef/Puppet/Ansible /Nebula/Nolio)
- SCM (TFS, GIT, ClearCase)
- Build tools (Ant,Maven, Make, Gradle)
- Artifact repositories (Nexes, JFrog ArtiFactory)
- CI tools (Jenkins, TeamCity),
- Experienced in scripting languages: Python, Ant, Bash and Shell
What will be required of you?
- Responsible for implementation and support of application/web server infrastructure for complex business applications
- Server configuration management, release management, deployments, automation & troubleshooting
- Set-up and configure Development, Staging, UAT and Production server environment for projects and install/configure all dependencies using the industry best practices
- Manage Code Repositories
- Manage, Document, Control and Innovate Development and Release procedure.
- Configure automated deployment on multiple environment
- Hands-on working experience of Azure or GCP.
- Knowledge Transfer the implementation to support team and until such time support any production issues
- Develop and Deploy Software:
- Architect and create an effective build and release process using industry best practices and tools
- Create and manage build scripts to deploy software in a multi-cloud environment
- Look for opportunities to automate as much of the deployment process as possible to provide for repeatability, auditability, scalability and build in process enforcement
- Manage Release Schedule:
- Act as a “gate keeper” for all releases into production
- Work closely with business stakeholders, development managers and developers to prepare a release schedule
- Help prioritize deployment requests for version upgrades, patches and hot-fixes
- Continuous Delivery of Software:
- Implement Continuous Integration (CI) practices to drive development teams to implement smaller changes and commit code to the version control repo frequently
- Implement Continuous Development (CD) practices that automates deployment of the application to several environments – Dev, Test and Production
- Implement Continuous Testing (functional and non-functional) to execute tests in the CI/CD pipeline
- Manage Version Control:
- Define and implement branching policies to efficiently manage source-code
- Implement business rules as a part of source control standards
- Resolve Software Issues:
- Assist technical support and development teams to troubleshoot issues and identify areas that need improvement
- Address deployment related issues
- Maintain Release Documentation:
- Maintain release notes (features available in stable versions and known issues) and other documents for both internal and external end users
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- Working on scalability, maintainability and reliability of company's products.
- Working with clients to solve their day-to-day challenges, moving manual processes to automation.
- Keeping systems reliable and gauging the effort it takes to reach there.
- Understanding Juxtapose tools and technologies to choose x over y.
- Understanding Infrastructure as a Code and applying software design principles to it.
- Automating tedious work using your favourite scripting languages.
- Taking code from the local system to production by implementing Continuous Integration and Delivery principles.
What you need to have:
- Worked with any one of the programming languages like Go, Python, Java, Ruby.
- Work experience with public cloud providers like AWS, GCP or Azure.
- Understanding of Linux systems and Containers
- Meticulous in creating and following runbooks and checklists
- Microservices experience and use of orchestration tools like Kubernetes/Nomad.
- Understanding of Computer Networking fundamentals like TCP, UDP.
- Strong bash scripting skills.
We are looking for an experienced DevOps (Development and Operations) professional to join our growing organization. In this position, you will be responsible for finding and reporting bugs in web and mobile apps & assist Sr DevOps to manage infrastructure projects and processes. Keen attention to detail, problem-solving abilities, and a solid knowledge base are essential.
As a DevOps, you will work in a Kubernetes based microservices environment.
Experience in Microsoft Azure cloud and Kubernetes is preferred, not mandatory.
Ultimately, you will ensure that our products, applications and systems work correctly.
Responsibilities:
- Detect and track software defects and inconsistencies
- Apply quality engineering principals throughout the Agile product lifecycle
- Handle code deployments in all environments
- Monitor metrics and develop ways to improve
- Consult with peers for feedback during testing stages
- Build, maintain, and monitor configuration standards
- Maintain day-to-day management and administration of projects
- Manage CI and CD tools with team
- Follow all best practices and procedures as established by the company
- Provide support and documentation
Required Technical and Professional Expertise
- Minimum 2+ years if DevOps
- Have experience in SaaS infrastructure development and Web Apps
- Experience in delivering microservices at scale; designing microservices solutions
- Proven Cloud experience/delivery of applications on Azure
- Proficient in configuration Management tools such as Ansible or any of Terraform Puppet, Chef, Salt, etc
- Hands-on experience in Networking/network configuration, Application performance monitoring, Container performance, and security.
- Understanding of Kubernetes, Python along with scripting languages like bash/shell
- Good to have experience in Linux internals, Linux packaging, Release Engineering (Branching, versioning, tagging), Artifact repository, Artifactory, Nexus, and CI/CD tooling (Concourse CI, Travis, Jenkins)
- Must be a proactive person
- You love collaborative environments that use agile methodologies to encourage creative design thinking and find innovative ways to develop with cutting edge technologies
- An ambitious individual who can work under their own direction towards agreed targets/goals and with a creative approach to work.
- An intuitive individual with an ability to manage change and proven time management
- Proven interpersonal skills while contributing to team effort by accomplishing related results as needed.
● Building and managing multiple application environments on AWS using automation tools like Terraform or
Cloudformation etc.
● Deploy applications with zero downtime via automation with configuration management tools such as Ansible.
● Setting up Infrastructure monitoring tools such as Prometheus, Grafana
● Setting up centralised logging using tools such as ELK.
● Containerisation of applications/microservices.
● Ensure application availability to 99.9% with highly available infrastructure.
● Monitoring performance of applications and databases.
● Ensuring that systems are safe and secure against cyber security threats.
● Working with software developers to ensure that release cycle and deployment processes are followed.
● Evaluating existing applications and platforms, give recommendations for enhancing performance via gap analysis,
identifying the most practical alternative solutions and assisting with modifications.
Skills -
● Strong knowledge of AWS Managed Services such as EC2, RDS, ECS, ECR, S3, Cloudfront, SES, Redshift, Elastic Cache,
AMQP etc.
● Experience in handling production workloads.
● Experience with Nginx web server.
● Experience with NoSql and Sql Databases such as MongoDB, Postgresql etc.
● Experience with Containerisation of applications/micro services using Docker.
● Understanding of system administration in Linux environments.
● Strong Knowledge of Infrastructure as a Code such as Terraform, Cloudformation etc.
● Strong knowledge of configuration management tools such as Ansible, Chef etc.
● Familiarity with tools such as GitLab, Jenkins, Vercel, JIRA etc.
● Proficiency in scripting languages including Bash, Python etc.
● Full understanding of software development lifecycle best practices and agile methodology
● Strong communication and documentation skills.
● An ability to drive to goals and milestones while valuing and maintaining a strong attention to detail
● Excellent judgment, analytical thinking, and problem-solving skills
● Self-motivated individual that possesses excellent time management and organizational skills
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
Improvement of monitoring systems
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Incident analysis and fixing
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience of working with git
Preferred experience
Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience administering Atlassian products
Roles and Responsibilities
- 5 - 8 years of experience in Infrastructure setup on Cloud, Build/Release Engineering, Continuous Integration and Delivery, Configuration/Change Management.
- Good experience with Linux/Unix administration and moderate to significant experience administering relational databases such as PostgreSQL, etc.
- Experience with Docker and related tools (Cassandra, Rancher, Kubernetes etc.)
- Experience of working in Config management tools (Ansible, Chef, Puppet, Terraform etc.) is a plus.
- Experience with cloud technologies like Azure
- Experience with monitoring and alerting (TICK, ELK, Nagios, PagerDuty)
- Experience with distributed systems and related technologies (NSQ, RabbitMQ, SQS, etc.) is a plus
- Experience with scaling data store technologies is a plus (PostgreSQL, Scylla, Redis) is a plus
- Experience with SSH Certificate Authorities and Identity Management (Netflix BLESS) is a plus
- Experience with multi-domain SSL certs and provisioning a plus (Let's Encrypt) is a plus
- Experience with chaos or similar methodologies is a plus
Objectives of this Role
Improve reliability, quality, and time-to-market of our suite of software solutions
- Run the production environment by monitoring availability and taking a holistic view of system health
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer - needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Participate in system design consulting, platform management, and capacity planning
- Languages: Python, Java, Ruby DSL, Bash
- Databases : MySQL, Cassandra , Elastic Search
- Deployment: AWS CloudFormation
Essential Criteria:
- 8 or more years administrating production Linux systems in a 24x7 environment
- 3 or more years’ experience in a DevOps/ SRE role as an engineer or technical lead
- At least 1 year of team leadership experience
- Significant knowledge of Amazon Web Services (CLI/APIs, EC2, EBS, S3, VPCs, IAM, AWS Lambda)
- Experience deploying services into containerized orchestration environments such as Kubernetes
- Experience with infrastructure automation tools like CloudFormation, Terraform, etc.
- Experience with at least one of Python, Bash, Ruby, or equivalent
- Experience creating and managing CI/CD pipeline like Jenkins or Spinnaker
- Familiar with version control using Git
- Solid understanding of common security principles
Nice to Have:
- Preference for hands on experience with Serverless Architecture, Kubernetes and Docker
- Strong experience with open-source configuration management tools
- Managing distributed systems spanning multiple AWS regions / data-centers
- Experience with bootstrapping solutions
- Open source contributor
- We’re committed to client success: There are over 6,200 brand and retail websites in the Bazaarvoice network. Our clients represent some of the world’s leading companies across a wide range of industries including retail, apparel, automotive, consumer electronics and travel.
- We’re leaders in consumer-generated content: Each month, more than one billion consumers view and share authentic consumer-generated content, such as ratings and reviews, curated photos, social posts and videos, about products in our network. Thousands upon thousands or reviews are added to the Bazaarvoice network everyday.
- Our network delivers: Network analytics provide insights that help marketers and advertisers provide more engaging experiences that drive brand awareness, consideration, sales, and loyalty.
- We’re a great place to work: We pride ourselves on our unique culture. Join a company that values passion, innovation, authenticity, generosity, respect, teamwork, and performance.