Machine Learning Engineer Job at Evolve Group, San Mateo, CA

dDJvTmFjUlh2T0dhM1dYK2FJRlRyc3VoZGc9PQ==
  • Evolve Group
  • San Mateo, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

Bradleys Inc

Electric Motor Mechanic I Job at Bradleys Inc

 ...who enjoys working with their hands and prides themselves on being a quick learner and enjoys working with a team. As an electro-motor mechanic, you will closely work within the mechanic team following or documented processes that ensure we repair every customers motor... 

STUDIO SUPERETTE

Junior Architect Job at STUDIO SUPERETTE

 ...Hello! Were Studio Superette , a small and growing team based in NYC focusing on the design and project management of small to large-scale architecture and interiors projects. If you have proficiency in Revit and experience managing projects from start to finish, we... 

AWH Logistics

Class C Driver Job at AWH Logistics

 ...PLEASE FOLLOW THIS LINK TO FILL OUT A MOTOR VEHICLE REPORT: AWH Logistics, LLC is looking for experienced Class C Box Truck drivers for lift gate route delivery in Leetsdale, PA. You are the key to keeping our commitment to exceed customer expectations and ensuring... 

PermitFlow

Customer Success Manager (SF) (San Francisco) Job at PermitFlow

 ...Join to apply for the Customer Success Manager (SF) role at PermitFlow . Get AI-powered advice on this job and more exclusive features...  ..., HubSpot, Procore, Yelp, Brex, and more. Our team is remote-first and consists of architects, structural engineers, permitting... 

Ganahl Lumber Co

Truck Driver Class - C Job at Ganahl Lumber Co

 ...Truck Driver Class - C Job Summary: This position requires the ability to drive a truck with a capacity of at least 26,001 pounds Gross Vehicle Weight Rating (GVWR). May load and unload vehicle as required. Requires commercial drivers license. Supervisory Responsibilities...