We are seeking a highly qualified Server Manager to join our Enhanced Operation Service (EOS) team, which is part of the Enterprise Cloud Services Delivery organization. In this role, you will act as a trusted advisor, responsible for safeguarding and optimizing end-to-end service delivery for strategic customers throughout their cloud transformation journey.
You will be part of a global project, integrating into a team that operates in a 24x7 environment for a Tech Mahindra client. This is a remote position, with a minimum expected assignment of one year. The initial working hours will be Monday to Friday, from 9:00 AM to 6:00 PM; however, the schedule may change throughout the project to meet operational demands.
Key Responsibilities:
Your role will involve a mixed workload, with a strong focus on maintaining system stability and performance:
Incident and Problem Management: Actively participate in the resolution of critical incidents (Major Incidents), resolve service request failures, and conduct root cause analysis (RCA) for outages or performance issues.
Service and Change Requests: Execute complex service and change requests, managing extended downtime windows and long-running incidents.
Performance Optimization: Identify and lead proactive initiatives to improve system operation, stability, and standardization for customer environments.
Specialized Technical Support: Provide expert-level technical support in Linux and infrastructure, with strong troubleshooting capabilities for disk, server, and network connectivity issues.
Continuous Improvement: Optimize Standard Operating Procedures (SOPs) through automation and define corrective action plans to achieve established KPIs.
Orchestration and Collaboration: Coordinate work across multiple internal and external cloud service units to ensure seamless service delivery.
Core Technical Requirements:
Linux Expertise (Mandatory): Solid hands-on experience administering SUSE, Red Hat, or Ubuntu systems. Proven real-world experience in system, disk, and performance troubleshooting is essential.
Clustering & High Availability (Mandatory – Flexible): 2 to 3 years of experience with High Availability (HA) configurations. Knowledge of Pacemaker is preferred, but other clustering solutions (such as Red Hat HA) are also acceptable.
Cloud Proficiency (Mandatory): Practical experience with at least one major public cloud provider: AWS, Azure, or GCP. Experience in multi-cloud environments is considered a plus.
Networking: Advanced ability to diagnose network issues in Linux and cloud environments, including TCP/IP, DNS, LDAP, NAT, firewalls, and connectivity analysis.
Automation: Experience with scripting languages (Shell, Python, Go, etc.) and server automation tools such as Ansible or CHEF.
Qualifications and Experience:
Professional Experience: Minimum of 8 to 10 years of experience in IT infrastructure and server operations.
Education: Bachelor’s degree in Computer Science, Engineering, IT Management, or related fields.
Language: Fluent English is mandatory, as all interactions with the global team, documentation, and customer support will be conducted in English.
Soft Skills: Strong customer focus, analytical and solution-oriented mindset, and the ability to independently and proactively acquire new knowledge.