
Interested candidates are requested to email their resumes with the subject line "Application for [Job Title]".
Only applications received via email will be reviewed. Applications through other channels will not be considered.
Job Description
The client’s department DPS, Digital People Solutions, offers a sophisticated portfolio of IT applications, providing a strong foundation for professional and efficient People & Organization (P&O) and Business Management, both globally and locally, for a well-known German company listed on the DAX-40 index, which includes the 40 largest and most liquid companies on the Frankfurt Stock Exchange
We are seeking talented DevOps-Engineers with focus on Elastic Stack (ELK) to join our dynamic DPS team. In this role, you will be responsible for refining and advising on the further development of an existing monitoring solution based on the Elastic Stack (ELK). You will independently handle tasks related to architecture, setup, technical migration, and documentation.
The current application landscape features multiple Java web services running on JEE application servers, primarily hosted on AWS, and integrated with various systems such as SAP, other services, and external partners. DPS is committed to delivering the best digital work experience for the customers employees and customers alike.
Responsibilities:
Install, set up, and automate rollouts using Ansible/CloudFormation for all stages (Dev, QA, Prod) in the AWS Cloud for components such as Elastic Search, Kibana, Metric beats, APM server, APM agents, and interface configuration.
Create and develop regular "Default Dashboards" for visualizing metrics from various sources like Apache Webserver, application servers and databases.
Improve and fix bugs in installation and automation routines.
Monitor CPU usage, security findings, and AWS alerts.
Develop and extend "Default Alerting" for issues like OOM errors, datasource issues, and LDAP errors.
Monitor storage space and create concepts for expanding the Elastic landscape in AWS Cloud and Elastic Cloud Enterprise (ECE).
Implement machine learning, uptime monitoring including SLA, JIRA integration, security analysis, anomaly detection, and other useful ELK Stack features.
Integrate data from AWS CloudWatch.
Document all relevant information and train involved personnel in the used technologies.
Requirements:
Experience with Elastic Stack (ELK) components and related technologies.
Proficiency in automation tools like Ansible and CloudFormation.
Strong knowledge of AWS Cloud services.
Experience in creating and managing dashboards and alerts.
Familiarity with IAM roles and rights management.
Ability to document processes and train team members.
Excellent problem-solving skills and attention to detail.
Skills & Requirements
Elastic Stack (ELK), Elasticsearch, Kibana, Logstash, Beats, APM, Ansible, CloudFormation, AWS Cloud, AWS CloudWatch, IAM roles, AWS security, Automation, Monitoring, Dashboard creation, Alerting, Anomaly detection, Machine learning integration, Uptime monitoring, JIRA integration, Apache Webserver, JEE application servers, SAP integration, Database monitoring, Troubleshooting, Performance optimization, Documentation, Training, Problem-solving, Security analysis.

Similar jobs
Roles and Responsibilities:
- AWS Cloud Management: Design, deploy, and manage AWS cloud infrastructure. Optimize and maintain cloud resources for performance and cost efficiency. Monitor and ensure the security of cloud-based systems.
- Automated Provisioning: Develop and implement automated provisioning processes for infrastructure deployment. Utilize tools like Terraform and Packer to automate and streamline the provisioning of resources.
- Infrastructure as Code (IaC): Champion the use of Infrastructure as Code principles. Collaborate with development and operations teams to define and maintain IaC scripts for infrastructure deployment and configuration.
- Collaboration and Communication: Work closely with cross-functional teams to understand project requirements and provide DevOps expertise. Communicate effectively with team members and stakeholders regarding infrastructure changes, updates, and improvements.
- Continuous Integration/Continuous Deployment (CI/CD): Implement and maintain CI/CD pipelines to automate software delivery processes. Ensure reliable and efficient deployment of applications through the development lifecycle.
- Performance Monitoring and Optimization: Implement monitoring solutions to track system performance, troubleshoot issues, and optimize resource utilization. Proactively identify opportunities for system and process improvements.
Mandatory Skills:
- Proven experience as a DevOps Engineer or similar role, with a focus on AWS.
- Strong proficiency in automated provisioning and cloud management.
- Experience with Infrastructure as Code tools, particularly Terraform and Packer.
- Solid understanding of CI/CD pipelines and version control systems.
- Strong scripting skills (e.g., Python, Bash) for automation tasks.
- Excellent problem-solving and troubleshooting skills.
- Good interpersonal and communication skills for effective collaboration.
Secondary Skills:
- AWS certifications (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect).
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
- Knowledge of microservices architecture and serverless computing.
- Familiarity with monitoring and logging tools (e.g., CloudWatch, ELK stack).
Azure DevOps engineer should have a deep understanding of container principles and hands-on experience with Docker.
They should also be able to set-up and manage clusters using Azure Kubernetes Service (AKS). Additionally, understanding of API management, Azure Key-Vaults, ACR, networking concepts like virtual networks, subnets, NSG, route tables. Awareness of any one of the software like Apigee, Kong, or APIM in Azure is a must. Strong experience with IaC technologies like Terraform, ARM/ Bicep Templates, GitHub Pipelines, Sonar etc.
- Designing DevOps strategies: Recommending strategies for migrating and consolidating DevOps tools, designing an Agile work management approach, and creating a secure development process
- Implementing DevOps development processes: Designing version control strategies, integrating source control, and managing build infrastructure
- Managing application configuration and secrets: Ensuring system and infrastructure availability, stability, scalability, and performance
- Automating processes: Overseeing code releases and deployments with an emphasis on continuous integration and delivery
- Collaborating with teams: Working with architect and developers to ensure smooth code integration and collaborating with development and operations teams to define pipelines.
- Documentation: Producing detailed Development Architecture design, setting up the DevOps tools and working together with the CI/CD specialist in integrating the automated CI and CD pipelines with those tools
- Ensuring security and compliance/DevSecOps: Managing code quality and security policies
- Troubleshooting issues: Investigating issues and responding to customer queries
- Core Skills: Azure DevOps engineer should have a deep understanding of container principles and hands-on experience with Docker. They should also be able to set-up and manage clusters using Azure Kubernetes Service (AKS). Additionally, understanding of API management, Azure Key-Vaults, ACR, networking concepts like virtual networks, subnets, NSG, route tables. Awareness of any one of the software like Apigee, Kong, or APIM in Azure is a must. Strong experience with IaC technologies like Terraform, ARM/ Bicep Templates, GitHub Pipelines, Sonar,
- Additional Skills: Self-starter and ability to execute tasks on time, Excellent communication skills, ability to come up with multiple solutions for problems, interact with client-side experts to resolve issues by providing correct pointers, excellent debugging skills, ability to breakdown tasks into smaller steps.
Job Role - DevOps Infra Lead Engineer
About LenDenClub
LenDenClub is a leading peer-to-peer lending platform that provides an alternate investment opportunity to investors or lenders looking for high returns with creditworthy borrowers looking for short-term personal loans. With a total of 8 million users and 2 million+ investors on board, LenDenClub has become a go-to platform to earn returns in the range of 10%-12%. LenDenClub offers investors a convenient medium to browse thousands of borrower profiles to achieve better returns than traditional asset classes. Moreover, LenDenClub is safeguarded by
market volatility and inflation. LenDenClub provides a great way to diversify one’s investment portfolio.
LenDenClub has raised US $10 million in a Series A round from an association of investors. With the new round of funding, LenDenClub was valued at more than US $51 million in the last round and has grown multifold since then.
Why work at LenDenClub
LenDenClub is a certified great place to work. The certification comes from the Great Place to Work Institute, Inc., a globally renowned firm dedicated to evaluating companies for their employee satisfaction on the grounds of high trust and high-performance culture at workplaces.
As a LenDenite, you will be a part of an enthusiastic and passionate group of individuals who own and love what they do. At LenDenClub we believe in creating leaders and with you coming on board you get to work with complete freedom to chase your ultimate career goal without any inhibitions.
Website - https://www.lendenclub.com
Location - Mumbai (Goregaon)
Responsibilities of a DevOps Infra Lead Engineer:
● Responsible for creating software deployment strategies that are essential for the successful deployment of software in the work environment. Identify and implement data storage methods like clustering to improve the performance of the team.
● Responsible for coming up with solutions for managing a vast number of documents in real-time and enables quick search and analysis. Identifies issues in the production phase and system and implements monitoring solutions to overcome those issues.
● Stay abreast of industry trends and best practices. Conduct research, tests, and execute new techniques which could be reused and applied to the software development project.
● Accountable for designing, building, and optimizing automation systems that help to execute business web and data infrastructure platforms.
● Creating technology infrastructure, automation tools, and maintaining configuration management.
● To cater to the engineering department’s quality and standards, implement lifecycle infrastructure solutions and documentation operations.
● Implementation and maintaining of CI/CD pipelines.
● Containerisation of applications
● Construct and improve the security on the infrastructure
● Infrastructure As A Code
● Maintaining Environments
● Nat and ACL's
● Setup of ECS and ELB for HA
● WAF and Firewall and DMZ
● Deployment strategies for high uptime
● Setup up monitoring and policies for infra and applications
Required Skills
● Communication Skills
● Interpersonal Skills
● Infrastructure
● Aware of technologies like Python, MYSQL, MongoDB, and so on.
● Sound knowledge of cloud infrastructure.
● Should possess knowledge of fundamental Unix/Linux, monitoring, editing, and command-based tools is essential.
● Versed in scripting languages such as Ruby and Shell
● Google Cloud Platforms, Hadoop, NoSQL databases, and big data clusters.
● Knowledge of open source technologies
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
- Responsible for the entire infrastructure including Production (both bare metal and AWS).
- Manage and maintain the production systems and operations including SysAdmin, DB activities.
- Improve tools and processes, automate manual efforts, and maintain the health of the system.
- Champion best practices, CI-CD, Metrics Driven Development
- Optimise the company's computing architecture
- Conduct systems tests for security, performance, and availability
- Maintain security of the system
- Develop and maintain design and troubleshooting documentation
- 7+ years of experience into DevOps/Technical Operations
- Extensive experience in operating scripting language like shell, python, etc
- Experience in developing and maintaining CI/CD process for SaaS applications using tools such as Jenkins
- Hands on experience in using configuration management tools such as Puppet, SaltStack, Ansible, etc
- Hands-on experience to build and handle VMs, Containers utilizing tools such as Kubernetes, Docker, etc
- Hands on experience in building, designing and maintaining cloud-based applications with AWS, Azure,GCP, etc
- Knowledge of Databases (MySQL, NoSQL)
- Knowledge of security/ethical hacking
- Have experience with ElasticSearch, Kibana, LogStash
- Have experience with Cassandra, Hadoop, or Spark
- Have experience with Mongo, Hive
Understanding of any scripting programming language.
Configuration and managing databases such as MySQL
Working knowledge of various tools, open-source technologies, and cloud services (AWS)
Implementing automation tools(Ansible, Jenkins) for deployment and provisioning IT infrastructure
Excellent troubleshooting of cloud systems.
Awareness of critical concepts in DevOps principles.
Objectives of this Role
Improve reliability, quality, and time-to-market of our suite of software solutions
- Run the production environment by monitoring availability and taking a holistic view of system health
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer - needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Participate in system design consulting, platform management, and capacity planning
- Languages: Python, Java, Ruby DSL, Bash
- Databases : MySQL, Cassandra , Elastic Search
- Deployment: AWS CloudFormation
Essential Criteria:
- 8 or more years administrating production Linux systems in a 24x7 environment
- 3 or more years’ experience in a DevOps/ SRE role as an engineer or technical lead
- At least 1 year of team leadership experience
- Significant knowledge of Amazon Web Services (CLI/APIs, EC2, EBS, S3, VPCs, IAM, AWS Lambda)
- Experience deploying services into containerized orchestration environments such as Kubernetes
- Experience with infrastructure automation tools like CloudFormation, Terraform, etc.
- Experience with at least one of Python, Bash, Ruby, or equivalent
- Experience creating and managing CI/CD pipeline like Jenkins or Spinnaker
- Familiar with version control using Git
- Solid understanding of common security principles
Nice to Have:
- Preference for hands on experience with Serverless Architecture, Kubernetes and Docker
- Strong experience with open-source configuration management tools
- Managing distributed systems spanning multiple AWS regions / data-centers
- Experience with bootstrapping solutions
- Open source contributor
- We’re committed to client success: There are over 6,200 brand and retail websites in the Bazaarvoice network. Our clients represent some of the world’s leading companies across a wide range of industries including retail, apparel, automotive, consumer electronics and travel.
- We’re leaders in consumer-generated content: Each month, more than one billion consumers view and share authentic consumer-generated content, such as ratings and reviews, curated photos, social posts and videos, about products in our network. Thousands upon thousands or reviews are added to the Bazaarvoice network everyday.
- Our network delivers: Network analytics provide insights that help marketers and advertisers provide more engaging experiences that drive brand awareness, consideration, sales, and loyalty.
- We’re a great place to work: We pride ourselves on our unique culture. Join a company that values passion, innovation, authenticity, generosity, respect, teamwork, and performance.
We are looking for an experienced software engineer with a strong background in DevOps and handling traffic & infrastructure at scale.
Responsibilities :
Work closely with product engineers to implement scalable and highly reliable systems.
Scale existing backend systems to handle ever-increasing amounts of traffic and new product requirements.
Collaborate with other developers to understand & setup tooling needed for - Continuous Integration/Delivery/
Build & operate infrastructure to support website, backend cluster, ML projects in the organization.
Monitor and track performance and reliability of our services and software to meet promised SLA
2+ years of experience working on distributed systems and shipping high-quality product features on schedule
Intimate knowledge of the whole web stack (Front end, APIs, database, networks etc.)
Ability to build highly scalable, robust, and fault-tolerant services and stay up-to-date with the latest architectural trends
Experience with container based deployment, microservices, in-memory caches, relational databases, key-value stores
Hands-on experience with cloud infrastructure provisioning, deployment, monitoring (we are on AWS and use ECS, RDS, ELB, EC2, Elasticache, Elasticsearch, S3, CloudWatch)








