Which local models does this chat support?

This page supports curated browser-ready ONNX models such as Qwen 3 and SmolLM, selected for practical use with Transformers.js in modern browsers.

Can I stream tokens while the answer is being generated?

Yes. Replies stream into the chat progressively so you can read the answer while the local model is still generating.

Can I remove downloaded model data?

Yes. You can unload the active model from memory and delete its cached files from the browser when you are done.

Local AI Chat

Chat with browser-ready local LLMs like Qwen 3 and SmolLM using Transformers.js. Keep prompts local, stream replies progressively, save sessions to reopen later, and manage the downloaded model cache yourself.

Private prompts On-device LLM Experimental

How privacy works here: your chat messages stay in the browser. The only network activity is downloading the selected model from Hugging Face. This is a local-inference tool, not a cloud chatbot.

Chat History Switch, export, detach

Model Default

Model selector

Select a browser-ready ONNX model.

RuntimeAuto

Download~2.2 GB

No model loaded yet.

Advanced Optional controls

System prompt

Max new tokens 3072

Temperature 0.7

The default stays high to avoid truncating longer answers, especially on reasoning-capable models. Reduce it only if you want faster or lighter local generations.

Chat Ready for input

No messages yet Download the default local model, start chatting, and reopen this browser later to continue from the same saved session.

0 chars

Private browser LLM chat with local models

Local AI Chat lets you run curated browser-ready ONNX models such as Qwen 3 and SmolLM directly on your device with Transformers.js. It is built for private prompting, progressive streaming, downloadable model caches, and lightweight session persistence, without routing your messages through a hosted chatbot.

Frequently asked questions

Does this tool send my prompts to a server?

No. Prompts stay in your browser. The only network activity is downloading the selected model files from Hugging Face. Your messages and saved sessions remain on your device.

Which local models can I use here?

This page focuses on curated browser-ready ONNX models that make sense with Transformers.js in modern browsers, including Qwen 3 and SmolLM variants chosen for practical local chat.

Can replies stream while the model is still generating?

Yes. The chat streams generated text progressively so you can start reading before the local model has finished its full answer.

What is reasoning mode?

Some models support a reasoning mode that emits an internal thinking trace. In this interface it is shown in a muted expandable block, collapsed by default, while the final answer remains clearly visible underneath.

Can I unload or delete downloaded models later?

Yes. You can unload the current model from memory and delete its browser cache when you are done, which helps free local resources and remove downloaded files.