Local AI Chat

Chat with browser-ready local LLMs like Qwen 3 and SmolLM using Transformers.js. Keep prompts local, stream replies progressively, save sessions to reopen later, and manage the downloaded model cache yourself.

Private prompts On-device LLM Experimental
How privacy works here: your chat messages stay in the browser. The only network activity is downloading the selected model from Hugging Face. This is a local-inference tool, not a cloud chatbot.
Chat History Switch, export, detach
Model Default
Select a browser-ready ONNX model.
RuntimeAuto
Download~2.2 GB
No model loaded yet.
Advanced Optional controls
The default stays high to avoid truncating longer answers, especially on reasoning-capable models. Reduce it only if you want faster or lighter local generations.
Chat Ready for input
No messages yet Download the default local model, start chatting, and reopen this browser later to continue from the same saved session.
0 chars

Private browser LLM chat with local models

Local AI Chat lets you run curated browser-ready ONNX models such as Qwen 3 and SmolLM directly on your device with Transformers.js. It is built for private prompting, progressive streaming, downloadable model caches, and lightweight session persistence, without routing your messages through a hosted chatbot.

Frequently asked questions

Does this tool send my prompts to a server?

No. Prompts stay in your browser. The only network activity is downloading the selected model files from Hugging Face. Your messages and saved sessions remain on your device.

Which local models can I use here?

This page focuses on curated browser-ready ONNX models that make sense with Transformers.js in modern browsers, including Qwen 3 and SmolLM variants chosen for practical local chat.

Can replies stream while the model is still generating?

Yes. The chat streams generated text progressively so you can start reading before the local model has finished its full answer.

What is reasoning mode?

Some models support a reasoning mode that emits an internal thinking trace. In this interface it is shown in a muted expandable block, collapsed by default, while the final answer remains clearly visible underneath.

Can I unload or delete downloaded models later?

Yes. You can unload the current model from memory and delete its browser cache when you are done, which helps free local resources and remove downloaded files.