AI prompts
base on LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Now with LiteRT Next, we're expanding our vision with a new generation of APIs designed for superior performance and simplified hardware acceleration. Discover what's next for on-device AI. # LiteRT
<p align="center">
<img src="./docs/sources/litert_logo.png" alt="LiteRT Logo" width="250"/>
</p>
Google's On-device framework for high-performance ML & GenAI deployment on edge
platforms, via efficient conversion, runtime, and optimization
π [Get Started](#-installation) | π€ [Contributing](#-contributing) | π
[License](#-license) | π‘ [Security Policy](SECURITY.md) | π
[Documentation](#-getting-help)
## Description
LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance
runtime for on-device AI.
LiteRT V2 (aka Next as announced at Google IO '25), introduced a new set of
APIs, featuring advanced GPU/NPU acceleration, delivering superior performance,
and making on-device ML inference easier than ever.
### π Status: Alpha
- LiteRT V2 is an alpha release and under active development.
- Join **LiteRT NPU Early access program**:
[g.co/ai/LiteRT-NPU-EAP](https://g.co/ai/LiteRT-NPU-EAP)
### π What's New
- **π New LiteRT v2 API**: Streamline development with automated accelerator
selection, true async execution, and efficient I/O buffer handling.
- Automated accelerator selection vs explicit delegate creation
- Async execution for faster overall execution time
- Easy NPU runtime and model distribution
- Efficient I/O buffer handling
- **π€ Unified NPU Acceleration**: Offer seamless access to NPUs from major
chipset providers with a consistent developer experience. LiteRT NPU
acceleration is available through an Early Access Program.
- **β‘ Best-in-class GPU Performance**: Use state-of-the-art GPU acceleration for
on-device ML. The new buffer interoperability enables zero-copy and minimizes
latency across various GPU buffer types.
- **π§ Superior Generative AI inference**: Enable the simplest integration with
the best performance for GenAI models.
## π» Platforms Supported
LiteRT is designed for cross-platform deployment on a wide range of hardware.
| Platform | CPU Support | GPU Support | NPU Support |
| ----------- | ----------- | --------------------- | -------------------------------------------------------- |
| π€ Android | β
| β
OpenCL<br>WebGPU\* | Google Tensor\*<br>β
Qualcomm<br>β
MediaTek<br>S.SLI\* |
| π iOS | β
| Metal\* | ANE\* |
| π§ Linux | β
| WebGPU\* | N/A |
| π macOS | β
| Metal\* | ANE\* |
| π» Windows | β
| WebGPU\* | Intel\* |
| π Web | Coming soon | Coming soon | Coming soon |
| π§© Embedded | | | Broadcom\*<br>Raspberry Pi\* |
*\*Coming soon*
## Model Coverage and Performance
Coming soon...
## π Installation
For a comprehensive guide to setting up your application with LiteRT Next, see
the [Get Started guide](https://ai.google.dev/edge/litert).
You can build LiteRT from source:
1. Start a docker daemon.
1. Run `build_with_docker.sh` under `docker_build/`
The script automatically creates a Linux Docker image, which allows you to build
artifacts for Linux and Android (through cross compilation). See build
instructions in [BUILD_INSTRUCTIONS.md](./BUILD_INSTRUCTIONS.md) for more
information on how to build runtime libraries with the docker container.
For more information about using docker interactive shell or building different
targets, please refer to `docker_build/README.md`.
## πΊ Choose Your Adventure
Every developer's path is different. Here are a few common journeys to help you
get started based on your goals:
### 1. π I have a PyTorch model...
- **Goal**: Convert a model from PyTorch to run on LiteRT.
- **Path1 (classic models)**: Use the
[AI Edge Torch Converter](https://github.com/google-ai-edge/ai-edge-torch) to
transform your PyTorch model into the `.tflite` format, and use AI Edge
Quantizer to optimize the model for optimal performance under resource
constraints. From there, you can deploy it using the standard LiteRT runtime.
- **Path2 (LLMs)**: Use
[Torch Generative API](https://github.com/google-ai-edge/ai-edge-torch) to
reauthor and convert your PyTorch LLMs into Apache format, and deploy it using
[LiteRT LM](https://github.com/google/litert).
### 2. π± I'm new to on-device ML...
- **Goal**: Run a pre-trained model (like image segmentation) in a mobile app
for the first time.
- **Path1 (Beginner dev)**: Follow step-by-step instructions via Android Studio
to create a
[Real-time segmentation App](https://developers.google.com/codelabs/litert-image-segmentation-android#0)
for CPU/GPU/NPU inference. Source code
[link](https://github.com/google-ai-edge/litert-samples/tree/main/v2/image_segmentation).
- **Path2 (Experienced dev)**: Start with the
[Get Started guide](https://ai.google.dev/edge/litert/next/get_started), find
a pre-trained .tflite model on [Kaggle Models](https://www.kaggle.com/models),
and use the standard LiteRT runtime to integrate it into your Android or iOS
app.
### 3. β‘ I need to maximize performance...
- **Goal**: Accelerate an existing model to run faster and more efficiently
on-device.
- **Path**:
- Explore the [LiteRT API](https://ai.google.dev/edge/litert/next/overview) to
easily leverage hardware acceleration. Learn how to enable the GPU
acceleration or the NPU acceleration (NPU EAP:
[g.co/ai/LiteRT-NPU-EAP](https://g.co/ai/LiteRT-NPU-EAP)).
- **For working with Generative AI**: Dive into
[LiteRT LM](https://github.com/google/litert), our specialized solution for
running GenAI models.
### 4. π§ I'm working with Generative AI...
- **Goal**: Deploy a large language model (LLM) or diffusion model on a mobile
device.
- **Path**: Dive into [LiteRT LM](https://github.com/google/litert), our
specialized solution for running GenAI models. You'll focus on model
quantization and optimizations specific to large model architectures.
## πΊ Roadmap
Where Next:
**Beta by Dec 2025:**
- Achieve feature parity with TensorFlow Lite
- Upgrade GPU Acceleration to ML SDK, Metal and more advanced version
- Simplify Android development with Maven, Android Studio, and Google Tensor
- Proactively increase ML and GenAI model coverage
- Enable Certain support
- Broader LiteRT Runtime/Converter upgrades from TensorFlow Lite
**General Availability by Google IO, May 2026**
Our commitment is to make LiteRT the best runtime for any on-device ML
deployment. The above roadmap is defined based on the following product
strategy:
- **Expanding Hardware Acceleration**: Broadening our support for NPUs and
improving performance across all major hardware accelerators.
- **Generative AI Optimizations**: Introducing new optimizations and features
specifically for the next wave of on-device generative AI models.
- **Improving Developer Tools**: Building better tools for debugging, profiling,
and optimizing models.
- **Platform Support**: Enhancing support for core platforms and exploring new
ones.
Going forward, LiteRT will establish a release cadence for minor release every
4-6 weeks.
This roadmap is subject to change. We encourage community feedbackβplease open
an issue to discuss proposals or ideas!
## π Contributing
We welcome contributions to LiteRT. Please see the
[CONTRIBUTING.md](CONTRIBUTING.md) file for more information on how to
contribute.
## π¬ Getting Help
We encourage you to reach out if you need help.
- **GitHub Issues**: For bug reports and feature requests, please file a new
issue on our [GitHub Issues](https://github.com/google/litert/issues) page.
- **GitHub Discussions**: For questions, general discussions, and community
support, please visit our
[GitHub Discussions](https://github.com/google/litert/discussions).
## π Related Products
LiteRT is part of a larger ecosystem of tools for on-device machine learning.
Check out these other projects from Google:
- **[LiteRT Samples](https://github.com/google-ai-edge/litert-samples)**: A
collection of LiteRT sample apps.
- **[AI Edge Torch Converter](https://github.com/google-ai-edge/ai-edge-torch)**:
A tool in LiteRT to convert PyTorch models into the LiteRT(.tflite) format for
on-device deployment.
- **[Torch Generative API](https://github.com/google-ai-edge/ai-edge-torch)**: A
library in LiteRT to reauthor LLMs for efficient conversion and on-device
inference.
- **[LiteRT-LM](https://github.com/google-ai-edge/litert-lm)**: A library to
efficiently run Large Language Models (LLMs) across edge platforms, built on
top of LiteRT.
- **[XNNPACK](https://github.com/google/XNNPACK)**: A highly optimized library
of neural network inference operators for ARM, x86, and WebAssembly
architectures that provides high-performance CPU acceleration for LiteRT.
- **V2 GPU Delegate** - Coming soon
- **[MediaPipe](https://github.com/google-ai-edge/mediapipe)**: A framework for
building cross-platform, customizable ML solutions for live and streaming
media.
## β€οΈ Code of Conduct
This project is dedicated to fostering an open and welcoming environment. Please
read our [Code of Conduct](CODE_OF_CONDUCT.md) to understand the standards of
behavior we expect from all participants in our community.
## π License
LiteRT is licensed under the [Apache-2.0 License](LICENSE).
", Assign "at most 3 tags" to the expected json: {"id":"13839","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"