What happened to Ollama and LM Studio integration?

Browser security policies prevent HTTPS production sites from accessing localhost HTTP endpoints. We removed Ollama and LM Studio support to ensure consistent functionality and replaced it with WebLLM for local AI and AI-ML API for cloud processing.

What are the current AI integration options?

Three options: AI-ML API for cloud models (Claude, GPT-4.1, Gemini, image/video generation), OpenRouter for multi-provider access to 100+ models with a single API key, and WebLLM for privacy-first browser-based AI that runs entirely offline after download.

Should I use cloud or local AI models?

Cloud models (AI-ML API, OpenRouter) offer superior performance and access to the latest models but require internet. WebLLM local models provide complete privacy and offline operation but with limited capabilities. Use GPT-4.1 Nano for daily tasks, WebLLM Qwen3 4B for privacy-sensitive work.

What are the hardware requirements for WebLLM?

WebLLM requires minimum 16GB RAM (32GB recommended for larger models). Models range from 0.9GB (Llama 3.2 1B) to 5.7GB (Qwen3 8B). Enable WebGPU in your browser for optimal performance. No GPU required but helps with speed.

Which AI model is best for daily tasks?

GPT-4.1 Nano via AI-ML API offers the best value for routine work. For coding, try Phi-3.5 Mini (WebLLM) or Qwen 2.5 Coder 32B (cloud). For complex reasoning, use Claude 4.0 Sonnet. For free models, try Llama 3.3 70B or DeepSeek V3 on OpenRouter.

Do WebLLM models work offline?

Yes. After the initial download, WebLLM models run entirely in your browser with zero internet communication. Models are cached locally and provide complete offline AI assistance for writing, coding, and analysis tasks.

How does OpenRouter differ from AI-ML API?

OpenRouter provides access to 100+ models from multiple providers (OpenAI, Anthropic, Google, Meta) through a single API key, including free models. AI-ML API offers curated high-quality models with specialized capabilities like image and video generation. Both require internet connectivity.

Can I use AI for image and video generation?

Yes, through AI-ML API. FLUX Schnell and Stable Diffusion 3.5 are available for image generation, while Minimax Video and Google Veo3 handle video creation. The system automatically routes requests to the appropriate specialized model.

Every tool on Tools-Online.app includes AI assistance for writing, coding, diagram generation, and content analysis. This guide covers the three AI integration options available in 2026, how to set each one up, and which models work best for different tasks.

How to Use AI Features

The AI workflow is consistent across all tools — Notepad, code editors, Mermaid diagrams, and more.

Step 1: Click the settings icon at the bottom left of your browser window. This opens the AI configuration panel.

AI assistant configuration window

Step 2: Click the Model dropdown to see available options, organized by provider:

AI-ML API — Cloud models (Claude, GPT, Gemini, image/video generation)
OpenRouter — 100+ models from multiple providers via single API key
WebLLM — Privacy-first models running entirely in your browser

Step 3: Select your model and enter an API key if required. WebLLM models need no API key.

What You Need to Know Before Starting

Hardware Requirements

Cloud Models (AI-ML API, OpenRouter): No local hardware constraints — just stable internet
WebLLM Browser Models: Minimum 16 GB RAM (32 GB recommended). Models range from 0.9 GB to 5.7 GB in size. Processing runs on your CPU/GPU using WebAssembly

Costs

AI-ML API: Pay-per-use pricing. GPT-4.1 Nano is the most cost-effective for daily tasks
OpenRouter: Pay-per-use with many free models available (Llama 3.3 70B, DeepSeek V3, Qwen3)
WebLLM: Completely free after download — no API costs

Privacy

Cloud models: Data processed on provider servers with HTTPS encryption
WebLLM: Complete local processing — data never leaves your device

AI-ML API Models (Recommended for Most Users)

Our primary cloud provider offering state-of-the-art AI models through a unified API.

Text & Reasoning Models

GPT-4.1 Nano — Top pick for daily tasks. Efficient text generation and analysis ($ — most cost-effective)
Claude 4.0 Sonnet/Opus — Advanced reasoning and writing ($$ Sonnet / $$$$ Opus)
o4-mini — Specialized reasoning model ($)
Gemini 2.5 Flash Preview — Multimodal with vision support ($$)
Qwen 2.5 Coder 32B — Specialized coding assistance ($$$)
Llama 3.1 70B — Open-source high-performance model ($$$)

Multimodal Models

Llama 3.2 90B/11B Vision — Image understanding and analysis ($$ 11B / $$$ 90B)
Claude 4.0 — Vision capabilities for document analysis ($$ Sonnet / $$$$ Opus)

Image & Video Generation

FLUX Schnell — Fast image generation ($)
Stable Diffusion 3.5 Large — High-quality image creation ($$)
Minimax Video (video-01) — Professional video generation ($$$)
Google Veo3 — State-of-the-art video creation ($$$$)

The system automatically routes requests to appropriate specialized models based on task type (text, image, code, multimodal).

OpenRouter Models (Multi-Provider Access)

OpenRouter provides a unified gateway to 100+ AI models from multiple providers — all accessible through a single API key. This is the fastest way to experiment with diverse models including free options.

Top Models on OpenRouter

Model	Provider	Context / Parameters	Strengths
Claude 4 Sonnet	Anthropic	200K tokens	Hybrid reasoning, chain of thought, agents
Gemini 2.0 Flash	Google	1M tokens	Low-latency SEO, summarization
Gemini 2.5 Pro	Google	1M tokens, thinking mode	Deep reasoning, coding, science
GPT-4o-mini	OpenAI	128K tokens	Vision support, highly cost-effective
DeepSeek V3 0324	DeepSeek	685B MoE, 163K tokens	Free, open source, top logic performance

Top Free Models on OpenRouter

Model	Provider	Parameters / Context	Use Case
Llama 4 "Maverick"	Meta	400B MoE (17B active), 128K	Multimodal, vision + text tasks
Llama 3.3 70B Instruct	Meta	70B, 131K tokens	Chat, reasoning, multilingual
DeepSeek V3 0324	DeepSeek	685B MoE, 163K tokens	Research, logic, general purpose
Qwen3-30B-A3B	Tencent	30.5B (3.3B active), 131K	Fast and intelligent dialogue + code
Mistral Small 3	Mistral	24B, 32K tokens	High-quality open model, low latency

Model rankings update frequently. Explore current options:

WebLLM Browser Models (Privacy-First)

Privacy-first AI processing that runs entirely in your browser. After the initial download, these models work completely offline.

Ultra-Lightweight Options

Llama 3.2 1B (~0.9 GB) — Instruction-following with enhanced efficiency
Gemma 2 2B (~1.9 GB) — Google's optimized model with improved memory usage

Recommended Models (Best Performance-to-Size Ratio)

Qwen3 4B (~3.2 GB) — Top pick. Excellent multilingual capabilities with strong reasoning
Phi-3.5 Mini (~3.7 GB) — Best for coding. Microsoft's specialized programming model
Llama 3.2 3B (~2.3 GB) — Enhanced reasoning with better memory efficiency

High-Performance Options

Qwen3 8B (~5.7 GB) — High-performance multilingual model with coding expertise
Hermes-3 Llama 3.1 8B (~4.9 GB) — Function-calling and instruction-following

Models are cached permanently after download until manually cleared.

Step-by-Step Setup Instructions

Setting Up AI-ML API

Navigate to any AI-enabled tool (Notepad, Code Editors, Mermaid, etc.)
Click the AI settings icon in the bottom left
Select an AI-ML model from the dropdown menu
Enter your AI-ML API key when prompted
Start using AI assistance immediately

Get your API key: AI-ML API Dashboard

Setting Up OpenRouter

Visit OpenRouter and create an account
Generate an API key from your dashboard
In your tool's AI settings, select your preferred OpenRouter model
Enter your API key in the provided text box
Start using any of 100+ available models

Documentation: OpenRouter Docs

Setting Up WebLLM Models

Enable GPU acceleration in your browser for optimal performance:
- Chrome/Edge: Visit chrome://flags/#enable-webgpu and enable WebGPU
- Firefox: Visit about:config and set dom.webgpu.enabled to true
Open AI settings and select a WebLLM model
Choose based on your system resources:
- 8–16 GB RAM: Llama 3.2 1B or Gemma 2B
- 16–24 GB RAM: Qwen3 4B (recommended), Llama 3.2 3B, or Phi-3.5 Mini (coding)
- 32 GB+ RAM: Any model including Qwen3 8B or Hermes-3 8B
First selection triggers automatic download
Wait for model initialization (one-time process)
Enjoy offline AI assistance

Pro Tip: Download models on fast WiFi — they're cached permanently until manually cleared.

Best Practices

Model Selection Strategy

Task Type	Recommended Model	Provider	Cost
Daily tasks	GPT-4.1 Nano	AI-ML API	$
Complex reasoning	Claude 4.0 Sonnet	AI-ML API	$$
Coding assistance	Phi-3.5 Mini or Qwen 2.5 Coder	WebLLM / AI-ML	Free / $$$
Privacy-sensitive	Qwen3 4B	WebLLM	Free
Experimentation	Llama 3.3 70B or DeepSeek V3	OpenRouter	Free
Image generation	FLUX Schnell	AI-ML API	$
Video generation	Google Veo3	AI-ML API	$$$$

Resource Management

Download WebLLM models on WiFi to conserve mobile data
Clear unused models if storage becomes limited
Monitor system performance when running multiple browser models

Troubleshooting

WebLLM model won't download: Ensure sufficient disk space and stable internet connection.

WebLLM models running slowly: Enable GPU acceleration in your browser:

Chrome/Edge: Visit chrome://flags/#enable-webgpu and enable WebGPU
Firefox: Visit about:config and set dom.webgpu.enabled to true

AI-ML API errors: Verify your API key is correctly entered in settings.

OpenRouter API errors: Check your API key and account balance at openrouter.ai.

Browser crashes with WebLLM: Reduce model size or increase system RAM allocation.

Security & Privacy

Cloud Providers (AI-ML API, OpenRouter)

Data processed on secure cloud infrastructure with HTTPS encryption
Check provider-specific privacy policies for data retention details
Requires active internet connection for all operations

WebLLM (Browser-Based)

Complete local processing — data never leaves your device
Models run in browser sandbox environment
No internet communication after initial model download
Ideal for sensitive documents, personal notes, and confidential code

Deprecated: Ollama/LM Studio Integration

In January 2025, we removed support for Ollama and LM Studio integration. Browser security policies prevent HTTPS production sites from accessing localhost HTTP endpoints, making these integrations unreliable in deployment.

Migration paths:

For privacy-first local AI → Use WebLLM models (Qwen3 4B recommended)
For superior performance → Use AI-ML API or OpenRouter
Your workflow remains the same — only the underlying AI provider changes

Conclusion

Whether you prioritize privacy with WebLLM, cutting-edge capabilities with AI-ML API, or multi-provider flexibility with OpenRouter, every tool on Tools-Online.app gives you access to powerful AI assistance directly in your browser.

Start with the AI-powered Notepad to try text generation, explore the Mermaid diagram editor for AI-generated diagrams, or use any code editor with AI coding assistance.

Categories

Tags

AI Integration Guide 2026: Complete Model Setup & Best Practices