
About Khati Solutions Pvt. Ltd.
About
Connect with the team
Company social profiles
Similar jobs
Job Title: Sr Dev Ops Engineer
Location: Bengaluru- India (Hybrid work type)
Reports to: Sr Engineer manager
About Our Client :
We are a solution-based, fast-paced tech company with a team that thrives on collaboration and innovative thinking. Our Client's IoT solutions provide real-time visibility and actionable insights for logistics and supply chain management. Cloud-based, AI-enhanced metrics coupled with patented hardware optimize processes, inform strategic decision making and enable intelligent supply chains without the costly infrastructure
About the role : We're looking for a passionate DevOps Engineer to optimize our software delivery and infrastructure. You'll build and maintain CI/CD pipelines for our microservices, automate infrastructure, and ensure our systems are reliable, scalable, and secure. If you thrive on enhancing performance and fostering operational excellence, this role is for you.
What You'll Do 🛠️
- Cloud Platform Management: Administer and optimize AWS resources, ensuring efficient billing and cost management.
- Billing & Cost Optimization: Monitor and optimize cloud spending.
- Containerization & Orchestration: Deploy and manage applications and orchestrate them.
- Database Management: Deploy, manage, and optimize database instances and their lifecycles.
- Authentication Solutions: Implement and manage authentication systems.
- Backup & Recovery: Implement robust backup and disaster recovery strategies, for Kubernetes cluster and database backups.
- Monitoring & Alerting: Set up and maintain robust systems using tools for application and infrastructure health and integrate with billing dashboards.
- Automation & Scripting: Automate repetitive tasks and infrastructure provisioning.
- Security & Reliability: Implement best practices and ensure system performance and security across all deployments.
- Collaboration & Support: Work closely with development teams, providing DevOps expertise and support for their various application stacks.
What You'll Bring 💼
- Minimum of 4 years of experience in a DevOps or SRE role.
- Strong proficiency in AWS Cloud, including services like Lambda, IoT Core, ElastiCache, CloudFront, and S3.
- Solid understanding of Linux fundamentals and command-line tools.
- Extensive experience with CI/CD tools, GitLab CI.
- Hands-on experience with Docker and Kubernetes, specifically AWS EKS.
- Proven experience deploying and managing microservices.
- Expertise in database deployment, optimization, and lifecycle management (MongoDB, PostgreSQL, and Redis).
- Experience with Identity and Access management solutions like Keycloak.
- Experience implementing backup and recovery solutions.
- Familiarity with optimizing scaling, ideally with Karpenter.
- Proficiency in scripting (Python, Bash).
- Experience with monitoring tools such as Prometheus, Grafana, AWS CloudWatch, Elastic Stack.
- Excellent problem-solving and communication skills.
Bonus Points ➕
- Basic understanding of MQTT or general IoT concepts and protocols.
- Direct experience optimizing React.js (Next.js), Node.js (Express.js, Nest.js) or Python (Flask) deployments in a containerized environment.
- Knowledge of specific AWS services relevant to application stacks.
- Contributions to open-source projects related to Kubernetes, MongoDB, or any of the mentioned frameworks.
- AWS Certifications (AWS Certified DevOps Engineer, AWS Certified Solutions Architect, AWS Certified SysOps Administrator, AWS Certified Advanced Networking).
Why this role:
•You will help build the company from the ground up—shaping our culture and having an impact from Day 1 as part of the foundational team.
- At least 5 year of experience in Cloud technologies-AWS and Azure and developing.
- Experience in implementing DevOps practices and DevOps-tools in areas like CI/CD using Jenkins environment automation, and release automation, virtualization, infra as a code or metrics tracking.
- Hands on experience in DevOps tools configuration in different environments.
- Strong knowledge of working with DevOps design patterns, processes and best practices
- Hand-on experience in Setting up Build pipelines.
- Prior working experience in system administration or architecture in Windows or Linux.
- Must have experience in GIT (BitBucket, GitHub, GitLab)
- Hands-on experience on Jenkins pipeline scripting.
- Hands-on knowledge in one scripting language (Nant, Perl, Python, Shell or PowerShell)
- Configuration level skills in tools like SonarQube (or similar tools) and Artifactory.
- Expertise on Virtual Infrastructure (VMWare or VirtualBox or QEMU or KVM or Vagrant) and environment automation/provisioning using SaltStack/Ansible/Puppet/Chef
- Deploying, automating, maintaining and managing Azure cloud based production systems including monitoring capacity.
- Good to have experience in migrating code repositories from one source control to another.
- Hands-on experience in Docker container and orchestration based deployments like Kubernetes, Service Fabric, Docker swarm.
- Must have good communication skills and problem solving skills
- Develop and Deploy Software:
- Architect and create an effective build and release process using industry best practices and tools
- Create and manage build scripts to deploy software in a multi-cloud environment
- Look for opportunities to automate as much of the deployment process as possible to provide for repeatability, auditability, scalability and build in process enforcement
- Manage Release Schedule:
- Act as a “gate keeper” for all releases into production
- Work closely with business stakeholders, development managers and developers to prepare a release schedule
- Help prioritize deployment requests for version upgrades, patches and hot-fixes
- Continuous Delivery of Software:
- Implement Continuous Integration (CI) practices to drive development teams to implement smaller changes and commit code to the version control repo frequently
- Implement Continuous Development (CD) practices that automates deployment of the application to several environments – Dev, Test and Production
- Implement Continuous Testing (functional and non-functional) to execute tests in the CI/CD pipeline
- Manage Version Control:
- Define and implement branching policies to efficiently manage source-code
- Implement business rules as a part of source control standards
- Resolve Software Issues:
- Assist technical support and development teams to troubleshoot issues and identify areas that need improvement
- Address deployment related issues
- Maintain Release Documentation:
- Maintain release notes (features available in stable versions and known issues) and other documents for both internal and external end users
About the job
Our goal
We are reinventing the future of MLOps. Censius Observability platform enables businesses to gain greater visibility into how their AI makes decisions to understand it better. We enable explanations of predictions, continuous monitoring of drifts, and assessing fairness in the real world. (TLDR build the best ML monitoring tool)
The culture
We believe in constantly iterating and improving our team culture, just like our product. We have found a good balance between async and sync work default is still Notion docs over meetings, but at the same time, we recognize that as an early-stage startup brainstorming together over calls leads to results faster. If you enjoy taking ownership, moving quickly, and writing docs, you will fit right in.
The role:
Our engineering team is growing and we are looking to bring on board a senior software engineer who can help us transition to the next phase of the company. As we roll out our platform to customers, you will be pivotal in refining our system architecture, ensuring the various tech stacks play well with each other, and smoothening the DevOps process.
On the platform, we use Python (ML-related jobs), Golang (core infrastructure), and NodeJS (user-facing). The platform is 100% cloud-native and we use Envoy as a proxy (eventually will lead to service-mesh architecture).
By joining our team, you will get the exposure to working across a swath of modern technologies while building an enterprise-grade ML platform in the most promising area.
Responsibilities
- Be the bridge between engineering and product teams. Understand long-term product roadmap and architect a system design that will scale with our plans.
- Take ownership of converting product insights into detailed engineering requirements. Break these down into smaller tasks and work with the team to plan and execute sprints.
- Author high-quality, highly-performance, and unit-tested code running on a distributed environment using containers.
- Continually evaluate and improve DevOps processes for a cloud-native codebase.
- Review PRs, mentor others and proactively take initiatives to improve our team's shipping velocity.
- Leverage your industry experience to champion engineering best practices within the organization.
Qualifications
Work Experience
- 3+ years of industry experience (2+ years in a senior engineering role) preferably with some exposure in leading remote development teams in the past.
- Proven track record building large-scale, high-throughput, low-latency production systems with at least 3+ years working with customers, architecting solutions, and delivering end-to-end products.
- Fluency in writing production-grade Go or Python in a microservice architecture with containers/VMs for over 3+ years.
- 3+ years of DevOps experience (Kubernetes, Docker, Helm and public cloud APIs)
- Worked with relational (SQL) as well as non-relational databases (Mongo or Couch) in a production environment.
- (Bonus: worked with big data in data lakes/warehouses).
- (Bonus: built an end-to-end ML pipeline)
Skills
- Strong documentation skills. As a remote team, we heavily rely on elaborate documentation for everything we are working on.
- Ability to motivate, mentor, and lead others (we have a flat team structure, but the team would rely upon you to make important decisions)
- Strong independent contributor as well as a team player.
- Working knowledge of ML and familiarity with concepts of MLOps
Benefits
- Competitive Salary
- Work Remotely
- Health insurance
- Unlimited Time Off
- Support for continual learning (free books and online courses)
- Reimbursement for streaming services (think Netflix)
- Reimbursement for gym or physical activity of your choice
- Flex hours
- Leveling Up Opportunities
You will excel in this role if
- You have a product mindset. You understand, care about, and can relate to our customers.
- You take ownership, collaborate, and follow through to the very end.
- You love solving difficult problems, stand your ground, and get what you want from engineers.
- Resonate with our core values of innovation, curiosity, accountability, trust, fun, and social good.
Experienced with Azure DevOps, CI/CD and Jenkins.
Experience is needed in Kubernetes (AKS), Ansible, Terraform, Docker.
Good understanding in Azure Networking, Azure Application Gateway, and other Azure components.
Experienced Azure DevOps Engineer ready for a Senior role or already at a Senior level.
Demonstrable experience with the following technologies:
Microsoft Azure Platform As A Service (PaaS) product such as Azure SQL, AppServices, Logic Apps, Functions and other Serverless services.
Understanding of Microsoft Identity and Access Management products such including Azure AD or AD B2C.
Microsoft Azure Operational and Monitoring tools, including Azure Monitor, App Insights and Log Analytics.
Knowledge of PowerShell, GitHub, ARM templates, version controls/hotfix strategy and deployment automation.
Ability and desire to quickly pick up new technologies, languages, and tools
Excellent communication skills and Good team player.
Passionate about code quality and best practices is an absolute must
Must show evidence of your passion for technology and continuous learning
We are looking for people with programming skills in Python, SQL, Cloud Computing. Candidate should have experience in at least one of the major cloud-computing platforms - AWS/Azure/GCP. He should professioanl experience in handling applications and databases in the cloud using VMs and Docker images. He should have ability to design and develop applications for the cloud.
You will be responsible for
- Leading the DevOps strategy and development of SAAS Product Deployments
- Leading and mentoring other computer programmers.
- Evaluating student work and providing guidance in the online courses in programming and cloud computing.
Desired experience/skills
Qualifications: Graduate degree in Computer Science or related field, or equivalent experience.
Skills:
- Strong programming skills in Python, SQL,
- Cloud Computing
Experience:
2+ years of programming experience including Python, SQL, and Cloud Computing. Familiarity with command line working environment.
Note: A strong programming background, in any language and cloud computing platform is required. We are flexible about the degree of familiarity needed for the specific environments Python, SQL. If you have extensive experience in one of the cloud computing platforms and less in others you should still, consider applying.
Soft Skills:
- Good interpersonal, written, and verbal communication skills; including the ability to explain the concepts to others.
- A strong understanding of algorithms and data structures, and their performance characteristics.
- Awareness of and sensitivity to the educational goals of a multicultural population would also be desirable.
- Detail oriented and well organized.
Roles and Responsibilities
● Managing Availability, Performance, Capacity of infrastructure and applications.
● Building and implementing observability for applications health/performance/capacity.
● Optimizing On-call rotations and processes.
● Documenting “tribal” knowledge.
● Managing Infra-platforms like
- Mesos/Kubernetes
- CICD
- Observability(Prometheus/New Relic/ELK)
- Cloud Platforms ( AWS/ Azure )
- Databases
- Data Platforms Infrastructure
● Providing help in onboarding new services with the production readiness review process.
● Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
● Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
● Working with the Dev team to have an in-depth understanding of the application architecture and its bottlenecks.
● Identifying observability gaps in product services, infrastructure and working with stake owners to fix it.
● Managing Outages and doing detailed RCA with developers and identifying ways to avoid that situation.
● Managing/Automating upgrades of the infrastructure services.
● Automate toil work.
Experience & Skills
● 3+ Years of experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
● A collaborative spirit with the ability to work across disciplines to influence, learn, and deliver.
● A deep understanding of computer science, software development, and networking principles.
● Demonstrated experience with languages, such as Python, Java, Golang etc.
● Extensive experience with Linux administration and good understanding of the various linux kernel subsystems (memory, storage, network etc).
● Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
● Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
● Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure solutions like Microsoft Azure or Google Cloud.
● Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker, Argo etc.
● Experience in managing and deploying containerized environments using Docker,
Mesos/Kubernetes is a plus.
● Experience with multiple datastores is a plus (MySQL, PostgreSQL, Aerospike,
Couchbase, Scylla, Cassandra, Elasticsearch).
● Experience with data platforms tech stacks like Hadoop, Hive, Presto etc is a plus
Job Dsecription: (8-12 years)
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Deep understanding of Kernel, Networking and OS fundamentals
○ Strong experience in writing helm charts.
○ Deep understanding of K8s.
○ Good knowledge in service mesh.
○ Good Database understanding
Notice Period: 30 day max
- You will manage all elements of the post-sale program relationship with your customers, starting with customer on-boarding and continuing throughout the customer relationship.
- As the primary customer interface, you engage with customer teams to educate, identify needs, develop designs, set goals, manage and execute on plans that unlock continuous, incremental value from their investments in the CloudPassage Halo platform.
- You are hands-on during execution and thoroughly enjoy seeing your security projects come to life and supporting them afterwards. You are a trusted adviser.
Responsibilities :
- Manage a portfolio of 5+ Enterprise customer accounts with complex needs (typical enterprise customers invest between $500k and $4m+ per year with CloudPassage, have hundreds to tens of thousands of individual public cloud infrastructure deployments, and protect hundreds of thousands of cloud infrastructure assets with Halo).
- Provide level-3 technical support on your customer's most complex issues
- Lead implementation of low-level security controls in Cloud environments, for services, server and containers
- Remotely diagnose & resolve DevSecOps issues in customer environments - able to resolve their DevOps issues that may be interfering with CloudPassage processing.
- Interact with CloudPassage Engineering team by providing customer issue reproduction and data capture, technical diagnostics and validating fixes. QA experience preferred.
- Establish and program manage proactive, value-driven, high-touch relationships with your customers to understand, document and align customer strategies, business objectives, designs, processes and projects with Halo platform capabilities and broader CloudPassage services.
- Develop a trusted advisor relationship by building and maintaining appropriate relationships at all levels with your customer accounts, creating a premium and high-caliber experience.
- Ensure continued satisfaction, identify & confirm unaddressed customer needs that can be value-add opportunities for up-sell and cross-sell, and communicate those needs to the CloudPassage sales team. Identify any early CSAT issues and renewal risks and work with the internal team to remediate and ensure strong CSAT and a successful renewal.
- Be a strong customer advocate within CloudPassage and identify and support areas for improvement in the customer experience, both in our product and processes.
- Be team-oriented, but with a bias towards action to get things done for your customers.
Requirements : Strong cloud security knowledge & experience including :
- End-to-end enterprise security processes
- Cloud security - cloud migrations & shift in security requirements, tooling & approach
- Hands-on DevOps, DevSecOps architecture & automation (critical)
- 4+ years experience in security consulting and project/program management serving cybersecurity customers.
- Complex, level 3 technical support
- Remotely diagnosing & resolving DevSecOps issues in customer environments
- Interacting with CloudPassage Engineering team with customer issue reproduction
- Experience working in a security SaaS company in a startup environment.
- Experience working with Executive and C-Level teams.
- Ability to build and maintain strong relationships with internal and external constituents.
- Excellent organization, project management, time management, and communication skills.
- Understand and document customer requirements, map to product, track & report metrics, identify up-sell and cross-sell opportunities.
- Analytical both quantitatively and qualitatively.
- Excellent verbal and written communication skills.
- Security certifications (Security +, CISSP, etc.).
Expert Technical Skills :
- Consulting and project management : documenting project charters, project plans, executing delivery management, status reporting. Executive-level presentation skills.
- Security best practices expertise : software vulnerabilities, configuration management, intrusion detection, file integrity.
- System administration (including Linux and Windows) of cloud environments : AWS, Azure, GCP; strong networking/proxy skills.
Proficient Technical Skills :
- Configuration/Orchestration (Chef, Puppet, Ansible, SaltStack, CloudFormation, Terraform).
- CI/CD processes and environments.
Familiar Technical Skills & Knowledge : Python scripting & REST API's, Docker containers, Zendesk & JIRA.









