A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud. We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices. We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.
We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.
This person MUST have:
- BE Computer Science or equivalent
- Cloud app development experience.
- Strong Troubleshooting and debugging skills
- A strong passion for writing simple, clean, and efficient code.
- 3 years of experience with the Django framework and other backend technologies.
- Knowledge of NodeJS
- Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
- Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
- Experience creating high-performance applications.
- Experience with messaging and broker tools - Rabbitmq, MQTT
- Experience with SQL and NoSQL databases
- Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
- Knowledge of web services
- Proficient understanding of code versioning tools Git.
- Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
- Experience managing AWS infrastructure.
- Hands-on experience in Linux environment.
- Basic understanding of Kubernetes/Docker orchestration.
- Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
- Experience with scrum or other agile software development methodology.
- Excellent verbal and written communication, teamwork, decision making and influencing skills.
- Handle customer calls/emails regarding technical issues for end-users.
- Strong communication skills
- Attention to detail.
Experience:
- Min 3 year experience
Location:
- Ahmedabad Office Or,
- Work from home
Timings:
- 40 hours a week with a rotational shift every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
- We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period
About An US based firm offering permanent WFH
Similar jobs
With a core belief that advertising technology can measurably improve the lives of patients, DeepIntent is leading the healthcare advertising industry into the future. Built purposefully for the healthcare industry, the DeepIntent Healthcare Advertising Platform is proven to drive higher audience quality and script performance with patented technology and the industry’s most comprehensive health data. DeepIntent is trusted by 600+ pharmaceutical brands and all the leading healthcare agencies to reach the most relevant healthcare provider and patient audiences across all channels and devices. For more information, visit DeepIntent.com or find us on LinkedIn.
We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a minimum of 3 years of hands-on experience in managing and maintaining production systems, with a focus on reliability, scalability, and performance. As an SRE at Deepintent, you will play a crucial role in ensuring the stability and efficiency of our infrastructure, as well as contributing to the development of automation and monitoring tools.
Responsibilities:
- Deploy, configure, and maintain Kubernetes clusters for our microservices architecture.
- Utilize Git and Helm for version control and deployment management.
- Implement and manage monitoring solutions using Prometheus and Grafana.
- Work on continuous integration and continuous deployment (CI/CD) pipelines.
- Containerize applications using Docker and manage orchestration.
- Manage and optimize AWS services, including but not limited to EC2, S3, RDS, and AWS CDN.
- Maintain and optimize MySQL databases, Airflow, and Redis instances.
- Write automation scripts in Bash or Python for system administration tasks.
- Perform Linux administration tasks and troubleshoot system issues.
- Utilize Ansible and Terraform for configuration management and infrastructure as code.
- Demonstrate knowledge of networking and load-balancing principles.
- Collaborate with development teams to ensure applications meet reliability and performance standards.
Additional Skills (Good to Know):
- Familiarity with ClickHouse and Druid for data storage and analytics.
- Experience with Jenkins for continuous integration.
- Basic understanding of Google Cloud Platform (GCP) and data center operations.
Qualifications:
- Minimum 3 years of experience in a Site Reliability Engineer role or similar.
- Proven experience with Kubernetes, Git, Helm, Prometheus, Grafana, CI/CD, Docker, and microservices architecture.
- Strong knowledge of AWS services, MySQL, Airflow, Redis, AWS CDN.
- Proficient in scripting languages such as Bash or Python.
- Hands-on experience with Linux administration.
- Familiarity with Ansible and Terraform for infrastructure management.
- Understanding of networking principles and load balancing.
Education:
Bachelor's degree in Computer Science, Information Technology, or a related field.
DeepIntent is committed to bringing together individuals from different backgrounds and perspectives. We strive to create an inclusive environment where everyone can thrive, feel a sense of belonging, and do great work together.
DeepIntent is an Equal Opportunity Employer, providing equal employment and advancement opportunities to all individuals. We recruit, hire and promote into all job levels the most qualified applicants without regard to race, color, creed, national origin, religion, sex (including pregnancy, childbirth and related medical conditions), parental status, age, disability, genetic information, citizenship status, veteran status, gender identity or expression, transgender status, sexual orientation, marital, family or partnership status, political affiliation or activities, military service, immigration status, or any other status protected under applicable federal, state and local laws. If you have a disability or special need that requires accommodation, please let us know in advance.
DeepIntent’s commitment to providing equal employment opportunities extends to all aspects of employment, including job assignment, compensation, discipline and access to benefits and training.
Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.
What you will do (or learn) :
1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.
What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems.
Nvizion Solutions is looking for the position of Site Reliability Engineer.
If interested, kindly share your resume along with contact details.
Title: Site Reliability Engineer
No. of job openings: 2
Location:Gurgaon/ Hyderabad/ Bengaluru/ Mumbai/Chennai ( Remote location)
Remuneration:Best in the Industry
· Experience required: 2 to 4 yrs in the industry
· Ensuring overall System's reliability
· Add automation and alerting in the system
· Providing Troubleshooting support
· Cross team communications. Working closely with Product team and Customer success team.
· Proactive support - to ensures the system is back to the healthy state
· R&D for new tools/technologies to support product and support team
· Good verbal/written communication to connect with the client.
· Good team player with a zeal to learn new technologies.
· The candidate will be part of the team responsible for 24X7 monitoring of distributed global platform.
- Linux Scripting
- CI/CD knowledge (Jenkins/ BitBucket Pipelie /GitOps)
- Version Control
- Cloud platform knowledge (GCP/AWS/Azure/Digital Ocean)
- Docker, Kubernetes
Company Description
Smarsh is the leader in communications compliance, archiving, and analytics. We provide compliance across the broadest set of communications channels with insights on what’s being captured. Smarsh customers manage over 500 million daily conversations across 80 channels and growing. Customers include the top 10 U.S., top 8 European, top 5 Canadian, and top 3 Asian banks. The Smarsh advantage is customers stay ahead of compliance and uncover patterns and relationships hidden within their data.
At Smarsh , we’ve been helping our customers manage new forms of communication since 1998. We work closely with regulators including the SEC, FINRA, IIROC, and the PRA and FCA, and with our customers, to ensure that they understand the capabilities of today’s technology and that our platform meets their most stringent requirements. Our products include Connected Capture, Connected Archive, Web Archive & Business Solutions.
About the team
Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.
Responsibilities
- Responsible for technical direction at the platform solutions level. Is able to weigh the pros and cons of various solutions and credibly argue for the best path
- Work closely with Product Management and the rest of the engineering team to define features and their implementations with careful attention to quality, scalability, and maintainability
- Can break down complex technical solutions into abstractions that the rest of the team and understand
- Can investigate and solve complex bugs, performance, and scalability issues
- Collaborates with multiple agile teams to ensure their solutions integrate effectively
- Track work in ticketing system (JIRA)
- Participate in Pull Request reviews. Provide and receive feedback to continuously improve.
- Other duties as assigned.
Desired skills & experience
- A minimum 10+ years industry experience
- Masters in CS or equivalent
- Must have experience in Azure or AWS, either running some large-scale app there or migrating to Azure/AWS.
- Experience operating Cloud Foundry in production environments
- Experience managing CI/CD systems (Concourse, Jenkins, TravisCI etc.)
- Experience deploying and/or operating ELK stack
- Experience with container technologies and orchestration platforms (Docker, Kubernetes, Cloud Foundry)
- Experience working with monitoring and observability tools (We use Datadog and New Relic)
- Familiarity with working with PostgreSQL and MongoDB
- Background working in a multi-platform environment (Linux, Windows)
- Experience with running on a cloud platform, AWS preferred (S3, RDS, SQS)
- Familiarity with Agile/Scrum/Kanban methodologies
- Familiarity with programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.)
Additional Skills
- Expert programming skills in relevant languages
- Exceptional analytical and problem-solving skills
- Strong communication and collaboration skills
- Deep understanding of modern software architecture
- Deep domain knowledge of the industry, platform, and existing processes
- Fault-tolerant design & maintenance
- Knowledge and understanding of modern software programming/engineering.
- Product delivery lifecycle - requirement refinement through ops
Why Smarsh?
Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?
Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).
Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers
Senior Cloud Engineer / Jr. Cloud Solutions Architect
Roles and Responsibilities
-
Define, implement, deploy and maintain development, QA & production environments for cloud-based Azure architecture.
-
Create a strategy for establishing a secure and well-managed enterprise environment in Azure
-
Define and implement security architecture for production, ensure data security at all levels.
-
Provision Infrastructure as code using Azure CLI Powershell ARM templates and or Terraform with Ansible or other tools.
-
Develop scripts to automate the deployment of resource stacks and associated configurations
-
Extend MLP standard systems management processes into the cloud including change, incident, and problem management
-
Establish and implement monitoring and management infrastructure for both availability and performance management
-
Implement observability patterns using Azure Monitor Azure Application Insights and Log Analytics Workspace.
-
Provide internal training to the team.
Primary Skills/Requirements
-
5+ years of experience in IT and infrastructure
-
3+ years of experience in Azure design, support and management for a large-scale organization
-
Experience in design and implementation of high availability architecture.
-
Strong experience in Azure CLI Powershell and ARM Templates Terraform.
-
Strong understanding of IT Security and related audits
-
Experience with deploying applications on Linux - Ubuntu
-
Should know Azure offerings (Storage, OS instances, Availability zones, DR, Load balancers, VPN tunnel, Application Gateway, etc.)Cloud monitoring Experience with Azure Log Analytics Azure Monitor.
-
Experience with log collection tools and analysis, as well as infrastructure performance monitoring tools and optimization practices
-
Microsoft Azure Certification MCSE: Cloud Platform and Infrastructure or equivalent certification would be an added advantage
-
Experience with Postgres SQL Database
Behavioural
-
Positive work ethics
-
Ability to adapt to dynamic environment
-
Time Management
-
Team Player
-
Communication skills
-
Ability to work independently
Experience automating systems engineering tasks.
Experience in fast-paced and dynamic SRE or Production Support engineering teams
A proven track record of managing successful complex internet-based product platforms/architectures.
Experience building metrics and monitoring platforms and defining alerting strategies.
Strong analytical ability with a focus on making data driven decisions.
Capable of technical deep-dives, yet verbally and cognitively agile enough to hold their own in a strategy discussion with senior technical or executive leadership
Experience working in a managed services environment.
Good communication skills, both written and oral.
Solid understanding of Engineering, DevOps and cloud computing fundamentals.
Good understanding of cloud services including AWS.
Strong automation and CI / CD experience.
Solid experience with containerized applications/orchestration and serverless functions.
GitHub, CD/CI tools experience.
If I asked your previous team members about you, they would say you were a great leader and they would very much welcome an opportunity to work for you once again.
Experience in high SLA environments.
Computer Science, Engineering or Sciences degree required or equivalent work experience.
Roles and Responsibilities
- Managing Availability, Performance, Capacity of infrastructure and applications.
- Building and implementing observability for applications health/performance/capacity.
- Optimizing On-call rotations and processes.
- Documenting “tribal” knowledge.
- Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
- Providing help in onboarding new services with production readiness review process.
- Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
- Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
- Working with Dev team to have in depth understanding of the application architecture
and its bottlenecks.
- Identifying observability gaps in product services, infrastructure and working with stake
owners to fix it.
- Managing Outages and doing detailed RCA with developers and identifying ways to
avoid that situation.
- Managing/Automating upgrades of the infrastructure services.
- Automate toil work.
Experience & Skills
- 6+ years of total experience
- Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
- A collaborative spirit with the ability to work across disciplines to influence, learn, and
deliver.
- A deep understanding of computer science, software development, and networking principles.
- Demonstrated experience with languages, such as Python, Java, Golang etc.
- Extensive experience with Linux administration and good understanding the various
linux kernel subsystems (memory, storage, network etc).
- Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
- Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
- Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
- Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure
solutions like Microsoft Azure or Google Cloud.
- Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,
Argo etc.
- Experience in managing and deploying containerized environments using Docker,
Mesos/Kubernetes is a plus.
- We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
- Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
- communication and collaboration.
- Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
- Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
- DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
- Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
- Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
- 5+ years experience in a Software and/or Site Reliability Engineering role
- Experience writing automation code in GoLang, Python or Java
- Experience developing and operating large scale distributed systems with Kubernetes and Docker
- Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
- Experience running public cloud environments on AWS
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- Bachelor degree in Engineering, Computer Science or equivalent experience
- The ability to lead, partner, and collaborate cross functionally across an engineering organization
About the Role
Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.
Responsibilities and Ownership
- Ability to debug and optimize code and automate routine tasks.
- Evangelize and advocate for reliability practices across our organization.
- Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
- Analyze and optimize our core product by developing and implementing reliability and performance practices.
- Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
- Be on-call for services that the SRE team owns.
- Practice sustainable incident response and blameless postmortems.
Qualifications
- 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
- Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
- Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
- You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
- You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- You have a great ability to debug and optimize code and automate routine tasks.
- You have a solid background in software development and architecting resilient and reliable applications.