Senior Manager, GPU Machine Learning Production
Company: Google
Location: Sunnyvale
Posted on: April 1, 2026
|
|
|
Job Description:
Minimum qualifications: Bachelor’s degree, or equivalent
practical experience. 8 years of experience programming in C++,
Java, Python, Kotlin or Go. 5 years of experience in a technical
leadership role. 5 years of experience in a people management or
team leadership role. 5 years of experience with embedded systems.
5 years of experience with software architecture. Preferred
qualifications: Experience in steering hardware platforms through
the NPI phase, stabilizing them, and successfully transitioning
them into GA. Experience in organizational design, consolidating
fragmented, project-funded task forces into a centralized and
sustainably funded operational engineering team. Experience
designing or integrating automated diagnostic frameworks and health
management systems to rapidly root-cause hardware and software
faults across a fleet. Experience bridging hardware, systems
software, and site reliability teams to advocate for "design for
supportability" in future hardware iterations. Experience operating
within a cloud or enterprise environment, partnering directly with
customers to translate their operational pain points into upstream
infrastructure improvements. About the job Like Google's own
ambitions, the work of a Software Engineer goes beyond just Search.
Software Engineering Managers have not only the technical expertise
to take on and provide technical leadership to major projects, but
also manage a team of Engineers. You not only optimize your own
code but make sure Engineers are able to optimize theirs. As a
Software Engineering Manager you manage your project goals,
contribute to product strategy and help develop your team. Teams
work all across the company, in areas such as information
retrieval, artificial intelligence, natural language processing,
distributed computing, large-scale system design, networking,
security, data compression, user interface design; the list goes on
and is growing every day. Operating with scale and speed, our
exceptional software engineers are just getting started and as a
manager, you guide the way. With technical and leadership
expertise, you manage engineers across multiple teams and
locations, a large product budget and oversee the deployment of
large-scale projects across multiple sites internationally. The GPU
Platform Software team is responsible for building a bleeding edge
platform that will power Google services and world's ML. GPU
compute platforms enables Google services like Search, Ads, Google
Cloud, Deep Mind, etc. The team develops the system software,
firmware, tools, and tests to bring GPUs to Google's compute
infrastructure. In this role, you will be at the forefront of
scaling our most critical Artificial Intelligence (AI)
infrastructure. Our rapidly expanding GPU fleet presents unique,
high-impact organizational issues, transforming distributed, New
Product Introduction (NPI)-focused task forces into a centralized,
cohesive, and standing MLPS organization. Google Cloud accelerates
every organization’s ability to digitally transform its business
and industry. We deliver enterprise-grade solutions that leverage
Google’s cutting-edge technology, and tools that help developers
build more sustainably. Customers in more than 200 countries and
territories turn to Google Cloud as their trusted partner to enable
growth and solve their most critical business problems. The US base
salary range for this full-time position is $262,000-$365,000 bonus
equity benefits. Our salary ranges are determined by role, level,
and location. Within the range, individual pay is determined by
work location and additional factors, including job-related skills,
experience, and relevant education or training. Your recruiter can
share more about the specific salary range for your preferred
location during the hiring process. Please note that the
compensation details listed in US role postings reflect the base
salary only, and do not include bonus, equity, or benefits. Learn
more about benefits at Google . Responsibilities Guide the
transition of GPU production support from fragmented, NPI-specific
task forces into a centralized, standing organization. Develop and
execute a comprehensive, multi-year roadmap for GPU fleet
stability, capacity turn-up, and automated health management,
aligning with broader Cloud and AI infrastructure goals. Anticipate
the support needs for next-generation GPU architectures and
proactively build the capabilities required for seamless NPI to
General Availability (GA) transitions. Sponsor and drive the
development of advanced telemetry, debugging tooling, and automated
remediation systems to scale fleet management and significantly
reduce operational toil. Serve as the executive escalation point
for critical, systemic hardware and software issues on GPU
Superpods.
Keywords: Google, Daly City , Senior Manager, GPU Machine Learning Production, IT / Software / Systems , Sunnyvale, California