Local AI Chat
Chat with browser-ready local LLMs like Qwen 3 and SmolLM using Transformers.js. Keep prompts local, stream replies progressively, save sessions to reopen later, and manage the downloaded model cache yourself.
Private browser LLM chat with local models
Local AI Chat lets you run curated browser-ready ONNX models such as Qwen 3 and SmolLM directly on your device with Transformers.js. It is built for private prompting, progressive streaming, downloadable model caches, and lightweight session persistence, without routing your messages through a hosted chatbot.
Frequently asked questions
Does this tool send my prompts to a server?
No. Prompts stay in your browser. The only network activity is downloading the selected model files from Hugging Face. Your messages and saved sessions remain on your device.
Which local models can I use here?
This page focuses on curated browser-ready ONNX models that make sense with Transformers.js in modern browsers, including Qwen 3 and SmolLM variants chosen for practical local chat.
Can replies stream while the model is still generating?
Yes. The chat streams generated text progressively so you can start reading before the local model has finished its full answer.
What is reasoning mode?
Some models support a reasoning mode that emits an internal thinking trace. In this interface it is shown in a muted expandable block, collapsed by default, while the final answer remains clearly visible underneath.
Can I unload or delete downloaded models later?
Yes. You can unload the current model from memory and delete its browser cache when you are done, which helps free local resources and remove downloaded files.