Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.
What you will do (or learn) :
1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.
What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems.
About Digital B2B Platform
Similar jobs
CoinFantasy is looking for a tech enthusiast working primarily on blockchain technology to be part of the core blockchain team at CoinFantasy. You would be a part of the Roadmap team that is working on the architecture, design, development, and deployment of our decentralised platform.
Your primary responsibilities would be analysing requirements, designing blockchain technology around a certain business model, and writing smart contracts.
Job Responsibilities
- Administer our blockchain, database, and DevOps infrastructure.
- Cross team collaboration to coordinate safe, efficient releases.
- Build complex pipelines for
- Databases, Messaging, Storage, Compute in AWS.
- Build deployment pipeline with Github CI (Actions).
- Build tools to reduce occurrences of errors and improve our protocols.
- Develop software to integrate with internal back-end systems.
- Perform root cause analysis for production errors.
- Investigate and resolve technical issues.
- Design procedures for system troubleshooting and maintenance.
Requirements
- 8+ years of Experience working with DevOps, Infrastructure, Site Reliability or Cloud Engineering
- Understanding the entire tech stack of Blockchain Dapps
- Strong experience working with any configuration management tools
- Languages: Any modern programming language
- Experience working with some of the major public clouds. e.g. AWS, Azure
- Competent with the “basics”: E.g. Computer Networking
- Self-motivated individual with enthusiasm for learning and building things
- Collaborative, communicative, and confident in their abilities to work well with all team members at all seniority and skill levels
- Hands-on experience with Rust/Substrate and Contribution to open-source blockchain projects is an added advantage
About Us
CoinFantasy is a Play to Invest platform that brings the world of investment to users through engaging games. With multiple categories of games, it aims to make investing fun, intuitive, and enjoyable for users.
It features a sandbox environment in which users are exposed to the end-to-end investment journey without risking financial losses.
Website: https://www.coinfantasy.io/
Benefits
- Competitive Salary
- An opportunity to be part of the Core team in a fast-growing company
- A fulfilling, challenging and flexible work experience
- Practically unlimited professional and career growth opportunities
Position: SRE/ DevOps
Experience: 6-10 Years
Location: Bengaluru/Mangalore
CodeCraft Technologies is a multi-award-winning creative engineering company offering design and technology solutions on mobile, web and cloud platforms.
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with the development team to build and maintain scalable infrastructure, implement best practices in CI/CD, and contribute to the overall stability of our technology stack.
Roles and Responsibilities:
· CI/CD and DevOps:
o Implement and maintain robust Continuous Integration/Continuous Deployment (CI/CD) pipelines to ensure efficient and reliable software delivery.
o Collaborate with development teams to integrate DevOps principles into the software development lifecycle.
o Experience with pipelines such as Github actions, GitLab, Azure DevOps,CircleCI is a plus.
· Test Automation:
o Develop and maintain automated testing frameworks to validate system functionality, performance, and reliability.
o Collaborate with QA teams to enhance test coverage and improve overall testing efficiency.
· Logging/Monitoring:
o Design, implement, and manage logging and monitoring solutions to proactively identify and address potential issues.
o Respond to incidents and alerts to ensure system uptime and performance.
· Infrastructure as Code (IaC):
o Utilize Terraform (or other tools) to define and manage infrastructure as code, ensuring scalability, security, and consistency across environments.
· Elastic Stack:
o Implement and manage Elastic Stack (ELK) for log and data analysis to gain insights into system performance and troubleshoot issues effectively.
· Cloud Platforms:
o Work with cloud platforms such as AWS, GCP, and Azure to deploy and manage scalable and resilient infrastructure.
o Optimize cloud resources for cost efficiency and performance.
· Vulnerability Management:
o Conduct regular vulnerability assessments and implement measures to address and remediate identified vulnerabilities.
o Collaborate with security teams to ensure a robust security posture.
· Security Assessment:
o Perform security assessments and audits to identify and address potential security risks.
o Implement security best practices and stay current with industry trends and emerging threats.
o Experience with tools such as GCP Security Command Center, and AWS Security Hub is a plus.
· Third-Party Hardware Providers:
o Collaborate with third-party hardware providers to integrate and support hardware components within the infrastructure.
Desired Profile:
· The candidate should be willing to work in the EST time zone, i.e. from 6 PM to 2 AM.
· Excellent communication and interpersonal skills
· Bachelor’s Degree
· Certifications related to this field shall be an added advantage.
Position: Site Reliability Engineer
Location: Pune (Currently WFH, post pandemic you need to relocate)
About the Organization:
A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.
Job Description:
We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.
In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.
You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.
As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.
Day-to-day responsibilities
- Ensure the operational integrity of the global infrastructure
- Design repeatable continuous integration and delivery systems
- Test and measure new methods, applications and frameworks
- Analyze and leverage various AWS-native functionality
- Support and build out an on-premise data center footprint
- Provide support and diagnose issues to other teams related to our infrastructure
- Participate in 24/7 on-call rotation (If Required)
- Expert-level administrator of Linux-based systems
- Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
- Experience with production deployments of Kubernetes Cluster
- Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
- Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
- Experience in Distributed storage systems such as Ceph or GlusterFS.
- Experience in virtualisation with KVM, Ovirt and OpenStack.
- Hands-on experience with configuration management systems such as Terraform and Ansible
- Bash and Python Scripting Expertise
- Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
- Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
- Experience managing hundreds to thousands of servers globally
- Enjoy automating tasks, rather than repeating them
- Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
- Strong verbal and written communication skills
- Ability to adapt to a rapidly changing environment
- Comfortable collaborating and supporting a diverse team of engineers
- Ability to troubleshoot problems in complex systems
- Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
Company Description
Smarsh is the leader in communications compliance, archiving, and analytics. We provide compliance across the broadest set of communications channels with insights on what’s being captured. Smarsh customers manage over 500 million daily conversations across 80 channels and growing. Customers include the top 10 U.S., top 8 European, top 5 Canadian, and top 3 Asian banks. The Smarsh advantage is customers stay ahead of compliance and uncover patterns and relationships hidden within their data.
At Smarsh , we’ve been helping our customers manage new forms of communication since 1998. We work closely with regulators including the SEC, FINRA, IIROC, and the PRA and FCA, and with our customers, to ensure that they understand the capabilities of today’s technology and that our platform meets their most stringent requirements. Our products include Connected Capture, Connected Archive, Web Archive & Business Solutions.
About the team
Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.
Responsibilities
- Responsible for technical direction at the platform solutions level. Is able to weigh the pros and cons of various solutions and credibly argue for the best path
- Work closely with Product Management and the rest of the engineering team to define features and their implementations with careful attention to quality, scalability, and maintainability
- Can break down complex technical solutions into abstractions that the rest of the team and understand
- Can investigate and solve complex bugs, performance, and scalability issues
- Collaborates with multiple agile teams to ensure their solutions integrate effectively
- Track work in ticketing system (JIRA)
- Participate in Pull Request reviews. Provide and receive feedback to continuously improve.
- Other duties as assigned.
Desired skills & experience
- A minimum 10+ years industry experience
- Masters in CS or equivalent
- Must have experience in Azure or AWS, either running some large-scale app there or migrating to Azure/AWS.
- Experience operating Cloud Foundry in production environments
- Experience managing CI/CD systems (Concourse, Jenkins, TravisCI etc.)
- Experience deploying and/or operating ELK stack
- Experience with container technologies and orchestration platforms (Docker, Kubernetes, Cloud Foundry)
- Experience working with monitoring and observability tools (We use Datadog and New Relic)
- Familiarity with working with PostgreSQL and MongoDB
- Background working in a multi-platform environment (Linux, Windows)
- Experience with running on a cloud platform, AWS preferred (S3, RDS, SQS)
- Familiarity with Agile/Scrum/Kanban methodologies
- Familiarity with programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.)
Additional Skills
- Expert programming skills in relevant languages
- Exceptional analytical and problem-solving skills
- Strong communication and collaboration skills
- Deep understanding of modern software architecture
- Deep domain knowledge of the industry, platform, and existing processes
- Fault-tolerant design & maintenance
- Knowledge and understanding of modern software programming/engineering.
- Product delivery lifecycle - requirement refinement through ops
Why Smarsh?
Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?
Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).
Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.
What we’re looking for
• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.
- 5+ years of software development or site reliability engineering or equivalent experience
- Skilled at problem solving, algorithms, and data structures
- Building tools and scripting frameworks from scratch
- Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
- Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
- Configuration automation using Ansible or equivalent tools
- Exposure to Windows, Linux administration skills
- Project management tools like Jira, Trello
- Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
- Familiarity with basic networking, security and cloud engineering concepts
- Team player who is eager to help others to succeed through mentoring and leading by example
- Highly collaborative with effective written and verbal communication skills
- 5+ years of software development or site reliability engineering or equivalent experience
- Skilled at problem solving, algorithms, and data structures
- Building tools and scripting frameworks from scratch
- Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
- Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
- Configuration automation using Ansible or equivalent tools
- Exposure to Windows, Linux administration skills
- Project management tools like Jira, Trello
- Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
- Familiarity with basic networking, security and cloud engineering concepts
- Team player who is eager to help others to succeed through mentoring and leading by example
- Highly collaborative with effective written and verbal communication skills