Detailed Responsibility & Skills:- · Install, Configuration, and Tuning of the following AppDynamics Servers: Controller, Event Service Cluster, End User Monitoring, ADA, ADRUM · Reviews system design and works to continuously improve stability and efficiencies · Provides system backup recovery methodology and makes recommendations regarding enhancements and/or improvements · Formulates policies, procedures, and standards relating to system management, and monitors system resource utilization · Responsible for reducing operational downtime for critical, scheduled, and unscheduled maintenance by accelerating deployments of approved changes/fixes/updates and solutions and automate manual maintenance, deployment, diagnostic health checks, validation, and reporting · Responsible for creating proactive and reactive monitoring methods, generating customer alerts within the Enterprise Event Management and Monitoring capability · Skilled at user requirement gathering and can work independently to craft efficient monitoring, alarming solutions, and dashboards · Understands the Agile process · Ability to operationally support the underlying database as necessary · Hands-on Java and/or .Net Development · IT Operations and Application Support · Application and systems performance management, measurement and analysis. · Deployment and configuration of complex enterprise software · Solid understanding of Operating Systems (Linux/Windows) · Experience with J2EE/LAMP/Microsoft stack · Cloud and containerization experience · Strong understanding of built-in O/S monitoring and performance tools. · Working with a wide variety of platforms and application stacks. Ability to understand new application frameworks in customer environments quickly · Works with minimal direction as a seasoned resource · Support customer initiatives in their transition towards modernization · Tracks own work and backlog, familiar with Agile methodology · Prioritize own work in accordance with user priorities and stakeholder expectations · Communicates efficiently and effectively both written and verbal · Reviews system design and works to continuously improve stability and efficiencies Mandatory: · Knowledge about APM tools (NewRelic / AppDynamics / DataDog / OpenTracing) · Dotnet/Java · Linux/Windows · SQL · Reverse proxy administration (e.g.: IIS) · API · Elastic knowledge · Fault and Performance Monitoring Tools Administration Good to have: Grafana / Python / GoLang / Bash / PowerShellhttps://www.linkedin.com/feed/update/urn:li:
We have done it again - we are growing. Are you a leader who has the unique ability to build, motivate and mobilize his or her team to achieve customer excellence while also have fun doing it? Here at Metallic, we are breaking new grounds in Cloud Data Management technologies and looking for a dynamic leader who can bring his or her SaaS platform management, operations, DevOps, Azure and team leadership experience to build and manage a team of super star Site Reliability Engineers (SRE) and provide our customers with a delightful experience. Skill:- 5+ years experience in maintaining highly scalable SaaS platforms - 5+ years experience managing, administering and operating Azure-based infrastructure- 5+ years experience in leading 24/7 SRE teams - Process oriented execution yet able to improvise on the fly when necessary - Strong scripting experience with languages such as Bash, Powershell, or Python- Strong cloud networking skills- Strong security skill in cloud networking and Azure-based infrastructure - Code first attitude when it comes to infrastructure: strong infrastructure-as-a-code practitioner - Experience using ITSM tools and process to automate hybrid-cloud monitoring and reporting- Excellent problem solving and root-cause analysis skills- Cross organization and discipline communication skills with the ability to simplify technical jargon into business impact - Thorough understanding of SRE Service Level Objectives for SaaS based platforms with the ability to develop, manage and maintain SLOs based on customer needs - Must have a good understanding of native Azure tools such as Azure Security Center and other Azure services such as Azure functions, Azure SQL, Azure API service and how these tools can improve the quality of our SaaS operations - A good understanding of open-source, developer community engagement processes a plus Responsibility:- Maintain 24X7 uptime for our business critical SaaS platfrom - Making sure Monitoring is the first class product of SRE team: participate in continuous improvement of reviewing, planning, creating and operating monitoring solutions- Create and maintain enterprise class secure Cloud Dev/SecOps best practices - Work with R&D and Product Management to plan and create secure, scalable, observable, and maintainable Azure-based infrastructure to run the applications which meet our customer needs- Install and update SaaS-based applications and required infrastructure services- Create and maintain documentation on all SRE functions including provisioning, monitoring, cost and usage, knowledge base, product update, infrastructure update, backup/restore, all exercises and dry-runs, security and operational incident management, security and compliance certifications etc- Plan, exercise, and document Site Backup/Restore and Failover/Failback to ensure we are meeting our SLA and SLOs with our stake holders- Making sure the team follows all pre-defined/documented processes and procedures and look for opportunities to improve ineffective/inefficient processes - Maintain a healthy, constructive, innovative, fail-fast and fun team culture - Coach and mentor team members to take calculated risks, take ownership and be leaders in their rite Education and Related Technical Training/Certifications - Masters Degree in computer science, engineering or related field or equivalent experience - Applicable Cloud technology related certifications such as Azure Administrator Associate, Azure Security Engineer Associate, Azure Solutions Architect Expert
LeadSquared is a leading customer acquisition SaaS platform used by over 25,000 users across 25 countries to run their sales and marketing processes. Our goal is to have million+ users on our platform in the next 5 years, which is an extraordinary and exciting challenge for Engineering team to work on.The Role :Our entire platform is hosted on AWS, comprises of web applications, web services, APIs, databases, analytics systems, storage and networking systems. We primarily use Windows Servers to deploy customer facing web applications and backend services and applications. We are looking for an energetic IT professional with expertise in Windows Servers and networking to help us with these responsibilities, and eventually grow into a full-fledged devops professional working on monitoring, operating and administering our cloud platform :1. Help with build and release across test and staging environments2. Help setup Windows server and IIS hardening guidelines3. Help setup AV and security patch process on production servers4. Write scripts to automate Devops jobsRequirements : - Hands-on experience in Windows Server administration- Strong understanding of networking fundamentals - NAT, VPN, DNS- Knowledge of web servers and IIS configuration- Working experience in PowerShell scripting. - Understanding of Amazon Web Services and Cloud in general.