TL;DR
Over the past 12+ years, I've built the frameworks that run and deploy on-device AI — at Google, Meta, and Apple.
At Apple, I founded and architected Core AI, and currently lead the project and the on-device infrastructure team. At Meta, I was Tech Lead for PyTorch, where I founded and architected ExecuTorch — the framework used to deploy Meta's family of apps across Android and iOS, and to power AI on Meta's wearables. At Google, I was part of Google Brain and served as Tech Lead in TensorFlow, co-founded TensorFlow Lite (now LiteRT), and built the internal frameworks running on-device AI for Google's early AI products, including Google Assistant (now Gemini).
I got into AI frameworks by trying to deploy my own team's speech recognition research at Google. What followed was over a decade of building infrastructure and keeping pace with research — particularly around deep learning optimization.
Along the way, I've played every role: data gathering, applied research, framework development, hardware optimization, and influencing hardware roadmaps.
Selected Projects
Apple's purpose-built framework for running AI and ML models entirely on-device
across iOS, macOS, visionOS, and iPadOS. Powering on-device Apple Intelligence,
it gives developers a Swift-native pipeline to load, optimize, and deploy models
while keeping user data private — with zero server or token costs.
Core AI is a full-lifecycle toolset: from Python model creation and AI optimization
techniques, through compiler and runtime optimizations, all the way to on-device
execution with Xcode integration and a built-in debugger.
Lead the infrastructure to create and deploy Apple Intelligence and the revamped Siri AI across supported Apple platforms, including state-of-the-art LLMs that make efficient use of Apple's memory architecture.
An end-to-end solution for on-device inference across mobile, wearables, embedded devices, and microcontrollers. Part of the PyTorch Edge ecosystem, enabling efficient deployment of vision, speech, and Generative AI models. Powers Meta's family of apps on Android and iOS, and AI on Meta's wearables.
- Created the vision and strategy for ExecuTorch 🏆 Best Paper, MLSys '26
- Redefined the strategy for PyTorch's on-device ML suite: "PyTorch Edge"
- Defined the ExecuTorch technical architecture
- Defined the relationship and integration of PyTorch Core, IRs, and the new 2.0 APIs
- Key contributor to define PyTorch's torch.export and Edge IRs
- Redefined the strategy and architecture for ML optimization around torch.export
- First embedded use of ExecuTorch within Meta's smart glasses
A suite of tools for optimizing ML models for deployment and execution, via easy-to-use and consistent APIs implementing powerful optimization techniques — including quantization, pruning, and weight clustering.
- Introducing the Model Optimization Toolkit for TensorFlow
- TensorFlow Model Optimization Toolkit — Pruning API
- Post-Training Integer Quantization
- Post-training reduced-precision fp16 quantization
- Quantization Aware Training API
- EfficientNet-EdgeTPU: Accelerator-aware neural network design with AutoML
- Weight Clustering API
Google's open-source deep learning framework for on-device ML. Billions of installs across mobile phones, smart displays, speakers, cars, and wearables — powering Google's and other companies' products. Now known as LiteRT.
Brought speech and related technologies to run entirely on-device. Part of the team that developed the very-low-power "Hey Google" capabilities, building the first end-to-end system and the latest iteration of the ML model. Also built the pre-TensorFlow ML inference engine that powered a new generation of on-device speech recognizers, text-to-speech generators, and keyboard technology.
Experience
Founded and architected Core AI. Lead the on-device infrastructure team, developing state-of-the-art infrastructure to deploy machine learning across Apple's products and devices — playing a key role in the rollout of Apple Intelligence and Siri AI — as well as third-party applications.
Championed PyTorch 2.0 technology. Founded and led the architecture of ExecuTorch — PyTorch's end-to-end solution for enabling on-device inference across mobile and edge devices — now the deployment backbone for Meta's app family and AI on Meta's wearables.
Served as Tech Lead in TensorFlow, co-founded TensorFlow Lite (now LiteRT), and founded the TensorFlow Model Optimization Toolkit. Earlier, worked in the Speech team on on-device recognition, developing the technology behind "Hey Google" and building the pre-TensorFlow ML inference engine that powered Google Assistant's early on-device AI.
Led a number of engineering projects, most significantly co-authoring the SAIL (Self-Assembling Interface Layer) technology — the foundational technology underpinning Appian's low-code platform.
Patents & Publications
A selection of patents:
For academic publications view the full list on Google Scholar.
Education
Research assistant. Competed in the ACM Collegiate Programming Contest 🎈 and represented the university at the RoboCup World Cup ⚽.
ACM Collegiate Programming Contest 🎈 · RoboCup World Cup ⚽