Senior Machine Learning Engineer - Model Inference

Cupertino, California, United StatesPosted today

Location

Cupertino, California, United States

About the Role

Join Apple Maps to help build the best map in the world. In this role on ML Platform, you will help bring advanced deep learning and large language models into high-volume, low-latency, highly available production serving, improving search quality and powering experiences across Maps. You will partner closely with research and product teams, take end-to-end ownership, and deliver measurable results at global scale.

Responsibilities

As a Software Engineer on the Apple Maps team, you will:

Lead the design and implementation of large-scale, high-performance inference services that support a wide range of models used across Maps, including deep learning and large language models
Collaborate closely with research and product partners to bring models into production, with a strong focus on efficiency, reliability, and scalability
Span the full server stack, including onboarding new use cases, optimizing inference across heterogeneous accelerated compute hardware, deploying services on Kubernetes, building and integrating inference engines and control-plane components, and ensuring seamless integration with Maps infrastructure

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
5+ years in software engineering focused directly on ML inference, GPU acceleration, and large-scale systems
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
Proficiency in Python, Java or C++
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment

Preferred Qualifications

Master's or PhD in Computer Science, Machine Learning, or a related field
Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models
Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed
Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference

About Apple

Apple Inc. is an American multinational technology company that designs, manufactures, and markets smartphones, personal computers, tablets, and wearable devices, and offers related software applications, accessories, and online services. Its product portfolio includes iPhone, Mac, iPad, Apple Watch, and Apple TV, along with software and services such as iOS, macOS, the App Store, and Apple News.

Industry

Computers and Electronics Manufacturing

Head office

Cupertino, California, United States

Company size

10,001+ employees

Founded

1976

Smartphones (iPhone)Personal computers (Mac)Tablets (iPad)Wearable devices (Apple Watch)Software and operating systems (iOS, macOS)Online services and digital content (App Store, Apple News)Machine learning and health sensing

View Apple’s profile →