AI Integration Guide 2026: Complete Model Setup & Best Practices
Every tool on Tools-Online.app includes AI assistance for writing, coding, diagram generation, and content analysis. This guide covers the three AI integration options available in 2026, how to set each one up, and which models work best for different tasks.
How to Use AI Features
The AI workflow is consistent across all tools — Notepad, code editors, Mermaid diagrams, and more.
Step 1: Click the settings icon at the bottom left of your browser window. This opens the AI configuration panel.

Step 2: Click the Model dropdown to see available options, organized by provider:
- AI-ML API — Cloud models (Claude, GPT, Gemini, image/video generation)
- OpenRouter — 100+ models from multiple providers via single API key
- WebLLM — Privacy-first models running entirely in your browser
Step 3: Select your model and enter an API key if required. WebLLM models need no API key.
What You Need to Know Before Starting
Hardware Requirements
- Cloud Models (AI-ML API, OpenRouter): No local hardware constraints — just stable internet
- WebLLM Browser Models: Minimum 16 GB RAM (32 GB recommended). Models range from 0.9 GB to 5.7 GB in size. Processing runs on your CPU/GPU using WebAssembly
Costs
- AI-ML API: Pay-per-use pricing. GPT-4.1 Nano is the most cost-effective for daily tasks
- OpenRouter: Pay-per-use with many free models available (Llama 3.3 70B, DeepSeek V3, Qwen3)
- WebLLM: Completely free after download — no API costs
Privacy
- Cloud models: Data processed on provider servers with HTTPS encryption
- WebLLM: Complete local processing — data never leaves your device
AI-ML API Models (Recommended for Most Users)
Our primary cloud provider offering state-of-the-art AI models through a unified API.
Text & Reasoning Models
- GPT-4.1 Nano — Top pick for daily tasks. Efficient text generation and analysis ($ — most cost-effective)
- Claude 4.0 Sonnet/Opus — Advanced reasoning and writing ($$ Sonnet / $$$$ Opus)
- o4-mini — Specialized reasoning model ($)
- Gemini 2.5 Flash Preview — Multimodal with vision support ($$)
- Qwen 2.5 Coder 32B — Specialized coding assistance ($$$)
- Llama 3.1 70B — Open-source high-performance model ($$$)
Multimodal Models
- Llama 3.2 90B/11B Vision — Image understanding and analysis ($$ 11B / $$$ 90B)
- Claude 4.0 — Vision capabilities for document analysis ($$ Sonnet / $$$$ Opus)
Image & Video Generation
- FLUX Schnell — Fast image generation ($)
- Stable Diffusion 3.5 Large — High-quality image creation ($$)
- Minimax Video (video-01) — Professional video generation ($$$)
- Google Veo3 — State-of-the-art video creation ($$$$)
The system automatically routes requests to appropriate specialized models based on task type (text, image, code, multimodal).
OpenRouter Models (Multi-Provider Access)
OpenRouter provides a unified gateway to 100+ AI models from multiple providers — all accessible through a single API key. This is the fastest way to experiment with diverse models including free options.
Top Models on OpenRouter
| Model | Provider | Context / Parameters | Strengths |
| Claude 4 Sonnet | Anthropic | 200K tokens | Hybrid reasoning, chain of thought, agents |
| Gemini 2.0 Flash | 1M tokens | Low-latency SEO, summarization | |
| Gemini 2.5 Pro | 1M tokens, thinking mode | Deep reasoning, coding, science | |
| GPT-4o-mini | OpenAI | 128K tokens | Vision support, highly cost-effective |
| DeepSeek V3 0324 | DeepSeek | 685B MoE, 163K tokens | Free, open source, top logic performance |
Top Free Models on OpenRouter
| Model | Provider | Parameters / Context | Use Case |
| Llama 4 "Maverick" | Meta | 400B MoE (17B active), 128K | Multimodal, vision + text tasks |
| Llama 3.3 70B Instruct | Meta | 70B, 131K tokens | Chat, reasoning, multilingual |
| DeepSeek V3 0324 | DeepSeek | 685B MoE, 163K tokens | Research, logic, general purpose |
| Qwen3-30B-A3B | Tencent | 30.5B (3.3B active), 131K | Fast and intelligent dialogue + code |
| Mistral Small 3 | Mistral | 24B, 32K tokens | High-quality open model, low latency |
Model rankings update frequently. Explore current options:
WebLLM Browser Models (Privacy-First)
Privacy-first AI processing that runs entirely in your browser. After the initial download, these models work completely offline.
Ultra-Lightweight Options
- Llama 3.2 1B (~0.9 GB) — Instruction-following with enhanced efficiency
- Gemma 2 2B (~1.9 GB) — Google's optimized model with improved memory usage
Recommended Models (Best Performance-to-Size Ratio)
- Qwen3 4B (~3.2 GB) — Top pick. Excellent multilingual capabilities with strong reasoning
- Phi-3.5 Mini (~3.7 GB) — Best for coding. Microsoft's specialized programming model
- Llama 3.2 3B (~2.3 GB) — Enhanced reasoning with better memory efficiency
High-Performance Options
- Qwen3 8B (~5.7 GB) — High-performance multilingual model with coding expertise
- Hermes-3 Llama 3.1 8B (~4.9 GB) — Function-calling and instruction-following
Models are cached permanently after download until manually cleared.
Step-by-Step Setup Instructions
Setting Up AI-ML API
- Navigate to any AI-enabled tool (Notepad, Code Editors, Mermaid, etc.)
- Click the AI settings icon in the bottom left
- Select an AI-ML model from the dropdown menu
- Enter your AI-ML API key when prompted
- Start using AI assistance immediately
Get your API key: AI-ML API Dashboard
Setting Up OpenRouter
- Visit OpenRouter and create an account
- Generate an API key from your dashboard
- In your tool's AI settings, select your preferred OpenRouter model
- Enter your API key in the provided text box
- Start using any of 100+ available models
Documentation: OpenRouter Docs
Setting Up WebLLM Models
- Enable GPU acceleration in your browser for optimal performance:
- Chrome/Edge: Visit
chrome://flags/#enable-webgpuand enable WebGPU - Firefox: Visit
about:configand setdom.webgpu.enabledtotrue
- Open AI settings and select a WebLLM model
- Choose based on your system resources:
- 8–16 GB RAM: Llama 3.2 1B or Gemma 2B
- 16–24 GB RAM: Qwen3 4B (recommended), Llama 3.2 3B, or Phi-3.5 Mini (coding)
- 32 GB+ RAM: Any model including Qwen3 8B or Hermes-3 8B
- First selection triggers automatic download
- Wait for model initialization (one-time process)
- Enjoy offline AI assistance
Pro Tip: Download models on fast WiFi — they're cached permanently until manually cleared.
Best Practices
Model Selection Strategy
| Task Type | Recommended Model | Provider | Cost |
| Daily tasks | GPT-4.1 Nano | AI-ML API | $ |
| Complex reasoning | Claude 4.0 Sonnet | AI-ML API | $$ |
| Coding assistance | Phi-3.5 Mini or Qwen 2.5 Coder | WebLLM / AI-ML | Free / $$$ |
| Privacy-sensitive | Qwen3 4B | WebLLM | Free |
| Experimentation | Llama 3.3 70B or DeepSeek V3 | OpenRouter | Free |
| Image generation | FLUX Schnell | AI-ML API | $ |
| Video generation | Google Veo3 | AI-ML API | $$$$ |
Resource Management
- Download WebLLM models on WiFi to conserve mobile data
- Clear unused models if storage becomes limited
- Monitor system performance when running multiple browser models
Troubleshooting
WebLLM model won't download: Ensure sufficient disk space and stable internet connection.
WebLLM models running slowly: Enable GPU acceleration in your browser:
- Chrome/Edge: Visit
chrome://flags/#enable-webgpuand enable WebGPU - Firefox: Visit
about:configand setdom.webgpu.enabledtotrue
AI-ML API errors: Verify your API key is correctly entered in settings.
OpenRouter API errors: Check your API key and account balance at openrouter.ai.
Browser crashes with WebLLM: Reduce model size or increase system RAM allocation.
Security & Privacy
Cloud Providers (AI-ML API, OpenRouter)
- Data processed on secure cloud infrastructure with HTTPS encryption
- Check provider-specific privacy policies for data retention details
- Requires active internet connection for all operations
WebLLM (Browser-Based)
- Complete local processing — data never leaves your device
- Models run in browser sandbox environment
- No internet communication after initial model download
- Ideal for sensitive documents, personal notes, and confidential code
Deprecated: Ollama/LM Studio Integration
In January 2025, we removed support for Ollama and LM Studio integration. Browser security policies prevent HTTPS production sites from accessing localhost HTTP endpoints, making these integrations unreliable in deployment.
Migration paths:
- For privacy-first local AI → Use WebLLM models (Qwen3 4B recommended)
- For superior performance → Use AI-ML API or OpenRouter
- Your workflow remains the same — only the underlying AI provider changes
Conclusion
Whether you prioritize privacy with WebLLM, cutting-edge capabilities with AI-ML API, or multi-provider flexibility with OpenRouter, every tool on Tools-Online.app gives you access to powerful AI assistance directly in your browser.
Start with the AI-powered Notepad to try text generation, explore the Mermaid diagram editor for AI-generated diagrams, or use any code editor with AI coding assistance.
