2+ GPU computing Jobs in Hyderabad | GPU computing Job openings in Hyderabad
Apply to 2+ GPU computing Jobs in Hyderabad on CutShort.io. Explore the latest GPU computing Job opportunities across top companies like Google, Amazon & Adobe.
You will be at the forefront of Byteridge's AI infrastructure capabilities, helping customers unlock the full potential of foundation models through expert-level deployment on GPU infrastructure.
This highly technical role requires deep expertise in machine learning infrastructure, GPU optimization, and production ML systems, combined with the ability to translate complex technical concepts into customer success.
What You'll Do
Model Deployment & Optimization
• Lead end-to-end deployments of large language models on AWS infrastructure for strategic
customers
• Design and implement training, fine-tuning, and inference pipelines using Amazon SageMaker AI
• Optimize model performance through GPU-level tuning, kernel optimization, and infrastructure
configuration
• Deploy models on diverse GPU architectures including NVIDIA and AWS custom silicon (Trainium,
Inferentia)
Infrastructure Architecture & Performance
• Architect scalable ML infrastructure using SageMaker AI Inference, HyperPod, and distributed
training frameworks
• Implement CUDA-level optimizations and custom kernels for improved model performance
• Design storage and networking architectures optimized for high-throughput ML workloads
• Troubleshoot and resolve complex performance bottlenecks at the GPU driver and kernel level
Customer Engagement & Technical Leadership
• Partner with AWS AI Specialist Solution Architects and customer ML teams to understand model
requirements and deployment constraints
• Provide technical guidance on model selection, fine-tuning strategies, and production best practices
• Conduct performance benchmarking and cost optimization analysis for ML workloads
• Share field insights with AWS product teams to influence infrastructure and service roadmaps
What We're Looking For
Core Qualifications
• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience (Master's or
PhD preferred)
• 5+ years of experience in machine learning infrastructure, model deployment, or GPU computing
• Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow, JAX)• Deep understanding of LLM architectures, training methodologies, and inference optimization
Technical Expertise (High-Level Alignment)
• Hands-on experience training, fine-tuning, or deploying large language models in production
• Proficiency with GPU programming, CUDA, and kernel-level optimization techniques
• Experience with distributed training frameworks and multi-GPU/multi-node orchestration
• Strong knowledge of AWS core services: EC2 (GPU instances), S3, EFS, VPC, and networking
Preferred Experience
• Direct experience with Amazon SageMaker AI (Training, Inference, HyperPod) or equivalent ML
platforms
• Understanding of GPU architectures (NVIDIA A100, H100) and AWS custom silicon (Trainium,
Inferentia)
• Experience with model compression techniques (quantization, pruning, distillation)
• Knowledge of MLOps practices, model monitoring, and production ML system design
• Background in high-performance computing, distributed systems, or systems programming
Essential Attributes
• Ability to dive deep into technical problems and debug complex infrastructure issues
• Strong analytical skills with data-driven approach to optimization
• Excellent communication skills to explain complex technical concepts to diverse audiences
• Comfortable working in ambiguous, fast-paced environments with evolving requirements
• Ownership mindset with ability to drive projects from architecture to production
We are seeking an experienced AI Architect to design, build, and scale production-ready AI voice conversation agents deployed locally (on-prem / edge / private cloud) and optimized for GPU-accelerated, high-throughput environments.
You will own the end-to-end architecture of real-time voice systems, including speech recognition, LLM orchestration, dialog management, speech synthesis, and low-latency streaming pipelines—designed for reliability, scalability, and cost efficiency.
This role is highly hands-on and strategic, bridging research, engineering, and production infrastructure.
Key Responsibilities
Architecture & System Design
- Design low-latency, real-time voice agent architectures for local/on-prem deployment
- Define scalable architectures for ASR → LLM → TTS pipelines
- Optimize systems for GPU utilization, concurrency, and throughput
- Architect fault-tolerant, production-grade voice systems (HA, monitoring, recovery)
Voice & Conversational AI
- Design and integrate:
- Automatic Speech Recognition (ASR)
- Natural Language Understanding / LLMs
- Dialogue management & conversation state
- Text-to-Speech (TTS)
- Build streaming voice pipelines with sub-second response times
- Enable multi-turn, interruptible, natural conversations
Model & Inference Engineering
- Deploy and optimize local LLMs and speech models (quantization, batching, caching)
- Select and fine-tune open-source models for voice use cases
- Implement efficient inference using TensorRT, ONNX, CUDA, vLLM, Triton, or similar
Infrastructure & Production
- Design GPU-based inference clusters (bare metal or Kubernetes)
- Implement autoscaling, load balancing, and GPU scheduling
- Establish monitoring, logging, and performance metrics for voice agents
- Ensure security, privacy, and data isolation for local deployments
Leadership & Collaboration
- Set architectural standards and best practices
- Mentor ML and platform engineers
- Collaborate with product, infra, and applied research teams
- Drive decisions from prototype → production → scale
Required Qualifications
Technical Skills
- 7+ years in software / ML systems engineering
- 3+ years designing production AI systems
- Strong experience with real-time voice or conversational AI systems
- Deep understanding of LLMs, ASR, and TTS pipelines
- Hands-on experience with GPU inference optimization
- Strong Python and/or C++ background
- Experience with Linux, Docker, Kubernetes
AI & ML Expertise
- Experience deploying open-source LLMs locally
- Knowledge of model optimization:
- Quantization
- Batching
- Streaming inference
- Familiarity with voice models (e.g., Whisper-like ASR, neural TTS)
Systems & Scaling
- Experience with high-QPS, low-latency systems
- Knowledge of distributed systems and microservices
- Understanding of edge or on-prem AI deployments
Preferred Qualifications
- Experience building AI voice agents or call automation systems
- Background in speech processing or audio ML
- Experience with telephony, WebRTC, SIP, or streaming audio
- Familiarity with Triton Inference Server / vLLM
- Prior experience as Tech Lead or Principal Engineer
What We Offer
- Opportunity to architect state-of-the-art AI voice systems
- Work on real-world, high-scale production deployments
- Competitive compensation and equity (if applicable)
- High ownership and technical influence
- Collaboration with top-tier AI and infrastructure talent

