Guide

5 LLM Providers, Your Choice: How to Pick the Best AI Model for Nemo

Q: Which LLM provider is best for Nemo?

The best LLM provider for Nemo depends on your priorities. Anthropic Claude is the best overall choice for quality, nuanced reasoning, and safety-aware responses, especially for email and writing tasks. OpenAI GPT-4 offers the broadest capabilities including vision. Ollama is the best for privacy since models run entirely on your hardware with zero cloud dependency. OpenRouter gives you access to 100+ models and often has free tiers for experimentation. For most users, we recommend starting with Claude for quality or Ollama for privacy.

Q: Can I use Nemo completely free with Ollama?

Yes. Ollama is a completely free, open-source local LLM runner. You can download and run models like Llama 3, Mistral, and Phi on your own hardware at zero cost. Combined with Nemo's free core features, this gives you a fully functional AI agent with no API costs, no subscriptions, and no cloud dependency. The only cost is your electricity. Performance depends on your hardware: a machine with 16GB RAM can run 7B parameter models comfortably, while 32GB or more is recommended for larger models.

Q: How much does it cost to run Nemo with Claude?

Running Nemo with Anthropic Claude costs approximately $3-15 per month for typical personal use. Claude's pricing is based on tokens: roughly $3 per million input tokens and $15 per million output tokens for Claude 3.5 Sonnet. A typical email triage task uses about 2,000-5,000 tokens. A document summary uses 3,000-10,000 tokens. A form filling session uses 5,000-15,000 tokens. Light use (5-10 tasks per day) costs around $3-5 per month. Heavy use (30+ tasks per day) can reach $10-15 per month. Nemo's smart routing feature can reduce costs by automatically using cheaper models for simple tasks.

Q: What is smart routing?

Smart routing is Nemo's automatic model selection feature. Instead of always using your most powerful (and expensive) model for every task, Nemo analyzes the complexity of each task and routes it to the most cost-effective model capable of handling it. Simple tasks like reading a file listing or categorizing an email might be routed to a smaller, cheaper model. Complex tasks like multi-step form filling or nuanced email composition are routed to the full-capability model. Smart routing can reduce LLM costs by 40-60% without noticeable quality degradation for most workflows.

Q: Can I use multiple providers at once?

Yes. Nemo supports configuring multiple LLM providers simultaneously. You can set Claude as your primary model for complex tasks, Ollama as your local fallback for offline use, and a cheaper OpenRouter model for simple operations. Smart routing can automatically distribute tasks across providers based on complexity and cost. You can also manually override the provider for any specific skill in Nemo's settings. API keys for all providers are stored securely in Nemo's encrypted vault.

A detailed comparison of every LLM provider Nemo supports, with cost breakdowns, skill-specific recommendations, and a guide to smart routing that cuts your AI costs in half.

By the Nemo Team | Last updated: February 2026 | 13 min read

Why Model Choice Matters
Anthropic (Claude)
OpenAI (GPT-4)
Ollama (Fully Local)
OpenRouter
Custom Endpoints
Smart Routing Explained
Cost Comparison Table
Which Provider for Which Skill
How to Switch Providers
Token Usage Tracking
Our Recommendation
Frequently Asked Questions

Why Model Choice Matters

Most AI applications lock you into a single model provider. ChatGPT uses OpenAI. Copilot uses OpenAI. Gemini uses Google. You get whatever model the company chose, at whatever price they set, with whatever privacy policy they enforce. If the model is too expensive, too slow, or sends your data to servers you do not trust, your only option is to switch to an entirely different product.

Nemo takes a fundamentally different approach. It is a model-agnostic AI agent that supports five LLM providers out of the box. You choose which model powers your agent. You can change it at any time. You can even use different providers for different tasks simultaneously. This is not a theoretical feature — it is a practical necessity because different models genuinely excel at different things.

Claude is exceptional at nuanced writing and safety-conscious reasoning. GPT-4 has the broadest general capabilities and strong vision support. Ollama models run entirely on your hardware for complete privacy. OpenRouter gives you access to over 100 models, including free tiers for experimentation. And custom endpoints let enterprises use their own self-hosted models behind corporate firewalls.

The model you choose affects three things: quality (how well the agent performs tasks), cost (how much you pay per task), and privacy (where your data goes). This guide helps you make an informed decision across all three dimensions.

Anthropic (Claude)

Anthropic's Claude is our recommended model for users who prioritize quality. Claude is known for nuanced, thoughtful responses that follow instructions precisely. It excels at understanding complex multi-step tasks, producing well-structured output, and behaving predictably with safety-related instructions.

Strengths

Instruction following: Claude is exceptionally good at following detailed system prompts, which is critical for skill-based AI agents. When Nemo's system prompt says "use the batch tool instead of calling tools one at a time," Claude consistently follows that instruction. This reduces LLM roundtrips and makes tasks faster.
Nuanced writing: For email composition, document drafting, and any task involving natural language output, Claude produces text that sounds human, matches the appropriate tone, and avoids the "AI-generated" feel that plagues many models.
Safety awareness: Claude is trained to be cautious with sensitive operations. When a task involves potentially destructive actions, Claude tends to ask for confirmation rather than proceeding blindly. This complements Sentinel's safety layer.
Long context: Claude supports 200K token context windows, allowing Nemo to process large documents, long email threads, and complex multi-turn conversations without truncation.

Setup

To use Claude with Nemo, you need an Anthropic API key. Sign up at console.anthropic.com, create an API key, and enter it in Nemo's Settings under LLM Provider. The key is stored in Nemo's encrypted vault — never in plain text, never sent anywhere except Anthropic's API endpoint.

Best for

Email triage and composition, document writing, complex multi-step tasks that require precise instruction following, any task where output quality matters more than cost.

OpenAI (GPT-4)

OpenAI's GPT-4 family is the most widely used LLM ecosystem. GPT-4 and its variants (GPT-4 Turbo, GPT-4o) offer broad capabilities across virtually every task category. If you are already paying for an OpenAI API subscription, using GPT-4 with Nemo is a natural choice.

Strengths

Broad capabilities: GPT-4 performs well across a wide range of tasks without notable weak spots. From code generation to creative writing to data analysis, it delivers consistently good results.
Vision support: GPT-4o includes multimodal vision capabilities, allowing Nemo to process screenshots and images. This is particularly useful for desktop automation tasks where the agent needs to understand what is on screen.
Function calling: OpenAI pioneered the function calling format that Nemo's tool system is based on. GPT-4 has excellent support for structured tool calls, reducing parsing errors and improving reliability.
Ecosystem: OpenAI has the largest model ecosystem, with frequent updates, new model variants, and extensive documentation. Compatibility issues are rare.

Setup

Sign up at platform.openai.com, generate an API key, and enter it in Nemo's Settings. OpenAI requires a paid account with API credits. There is no permanent free tier for API access, though new accounts typically receive a small starter credit.

Best for

General-purpose tasks, desktop automation with screenshot analysis (vision), users already in the OpenAI ecosystem, tasks that benefit from the latest model updates.

Ollama (Fully Local)

Ollama is an open-source tool that lets you run large language models entirely on your local hardware. No API keys, no cloud services, no usage fees, no data leaving your machine. It is the only provider option that makes Nemo truly cost-free and completely private.

How it works

Ollama downloads and manages open-source models on your computer. It provides a local API endpoint (typically http://localhost:11434) that is compatible with the same interface Nemo uses for cloud providers. From Nemo's perspective, Ollama is just another LLM provider — the fact that it is running on your local GPU or CPU is transparent.

Recommended models

Llama 3 (8B) — Meta's latest open-source model. Excellent general performance. Requires about 5GB of RAM. Good balance of speed and capability for most Nemo skills.
Llama 3 (70B) — The full-size version. Near-GPT-4 quality on many benchmarks. Requires 40GB+ RAM or a GPU with 40GB+ VRAM. For users with powerful hardware.
Mistral (7B) — French AI lab's model. Fast, efficient, and strong at structured tasks. Great for form filling and data extraction. Requires about 4GB RAM.
Phi-3 (3.8B) — Microsoft's compact model. Surprisingly capable for its size. Runs on low-end hardware. Good for simple tasks like file management and basic email categorization.
CodeLlama (7B/13B) — Specialized for code-related tasks. If you use Nemo for development workflows, this is the best local option.

Performance on consumer hardware

Running LLMs locally is more practical than most people expect. On a modern laptop with 16GB RAM, Llama 3 8B generates about 20–30 tokens per second on CPU. With a discrete GPU (even a mid-range NVIDIA RTX 3060), speeds jump to 40–80 tokens per second. For comparison, cloud APIs typically deliver 30–60 tokens per second. The experience is comparable for most tasks.

The main limitation is the initial model download (4–40GB depending on the model) and the first-load time (10–30 seconds to load the model into memory). Once loaded, the model stays in memory and subsequent queries are fast. Ollama handles model management, caching, and memory optimization automatically.

The privacy advantage

When you use Ollama, your data never leaves your computer. Not for inference, not for safety screening (Sentinel already runs locally), not for anything. Your emails, documents, form data, and browsing activity stay on your hardware. No API provider sees your data. No cloud service stores your queries. This is the strongest privacy guarantee any AI system can offer.

Setup

Download Ollama from ollama.com, install it, and run ollama pull llama3 to download your first model. In Nemo's Settings, select Ollama as your provider and enter the local URL (http://localhost:11434). No API key needed.

Best for

Privacy-focused users, offline operation, zero-cost AI, users with decent hardware (16GB+ RAM), document summarization, desktop automation, any task where data sensitivity is paramount.

OpenRouter

OpenRouter is an API aggregator that provides access to over 100 models from multiple providers through a single, unified API. It uses the OpenAI-compatible API format, making it seamless to use with Nemo. OpenRouter is the best option for users who want to experiment with different models or access free-tier models without signing up for multiple providers.

How it works

OpenRouter acts as a proxy between Nemo and various model providers. You send your request to OpenRouter, they route it to the appropriate model provider, and return the response. The API format is identical to OpenAI's, so Nemo's existing integration works without modification. OpenRouter requires two custom headers (HTTP-Referer and X-Title) for attribution, which Nemo sends automatically.

Free tier models

OpenRouter hosts several models with free tiers, including variants of Llama, Mistral, and other open-source models. These free tiers have rate limits (typically 10–20 requests per minute) but no per-token costs. Nemo can discover available free models automatically using the OpenRouter API, so you always know what is available without checking the website.

Best for

Experimentation with multiple models, budget-conscious users who want to find the cheapest capable model, accessing newer or niche models that are not available directly through major providers, fallback provider when your primary provider is down.

Custom Endpoints

For enterprises and advanced users who host their own models, Nemo supports custom API endpoints. Any server that exposes an OpenAI-compatible chat completions API can be used as a provider. This includes self-hosted deployments of vLLM, text-generation-inference, LocalAI, and LiteLLM.

Use cases

Corporate compliance: Companies that cannot send data to external APIs can host models on internal servers and point Nemo at those endpoints.
Fine-tuned models: If you have fine-tuned a model for your specific domain (legal, medical, financial), you can run it on your infrastructure and use it with Nemo.
GPU clusters: Organizations with dedicated GPU infrastructure can run the largest models (70B+, 180B) at speeds that exceed consumer hardware.
Air-gapped environments: For maximum security, Nemo can operate in fully air-gapped environments where the custom endpoint is on the same local network with no internet connectivity at all.

Setup

In Nemo's Settings, select Custom as your provider. Enter the base URL of your OpenAI-compatible endpoint (e.g., http://your-server:8000/v1). Enter any required API key or leave blank if your endpoint does not require authentication. Nemo will test the connection and confirm that the endpoint responds correctly.

Smart Routing Explained

Using a top-tier model like Claude or GPT-4 for every task is like driving a Ferrari to the grocery store. It works, but it is unnecessarily expensive. Many tasks that Nemo performs — reading a file listing, categorizing a simple email, extracting text from a document — do not require the reasoning capabilities of a frontier model. A smaller, cheaper model handles them just as well.

Smart routing is Nemo's automatic model selection system. When you configure it (in Settings under LLM Provider > Routing), you specify a primary model and a secondary model. Nemo then analyzes the complexity of each task and routes it to the appropriate model:

Simple tasks — Routed to the secondary (cheaper) model. Examples: listing files, reading a web page, categorizing an email as urgent or not, extracting structured data from a form.
Complex tasks — Routed to the primary (more capable) model. Examples: composing a nuanced email reply, filling a multi-field form with derived values, multi-step desktop automation sequences, summarizing a complex legal document.

The complexity classification uses a lightweight analysis of the task description, the number of tools available, and the expected number of tool calls. It does not require a separate LLM call — the classification itself is rule-based and adds no latency.

Smart routing also supports a judge model for the Sentinel safety layer. Instead of using your expensive primary model to run safety checks, you can assign a cheaper model specifically for Sentinel's screening decisions. Since safety screening is a simpler classification task (safe vs. unsafe, PII vs. no PII), a smaller model handles it effectively. This can reduce the total cost of Sentinel's overhead to near zero.

Typical savings

In our testing, smart routing reduces LLM costs by 40–60% compared to using a single frontier model for everything. For a user spending $10/month on Claude, enabling smart routing with Ollama as the secondary model can bring costs down to $4–6/month with no noticeable quality difference in task completion.

Cost Comparison Table

Here is what each provider costs for typical Nemo usage patterns, based on February 2026 pricing:

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)	Est. Light Use/mo	Est. Heavy Use/mo
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	$3–5	$10–15
Anthropic	Claude 3.5 Haiku	$0.80	$4.00	$1–2	$3–5
OpenAI	GPT-4o	$2.50	$10.00	$2–4	$8–12
OpenAI	GPT-4o mini	$0.15	$0.60	<$1	$1–2
Ollama	Llama 3 / Mistral / Phi	Free	Free	$0	$0
OpenRouter	Free tier models	Free	Free	$0	$0 (rate limited)
OpenRouter	Paid models	Varies	Varies	$1–5	$5–15
Custom	Self-hosted	Hardware cost	Hardware cost	Varies	Varies

Light use: 5–10 tasks per day. Heavy use: 30+ tasks per day. Estimates based on average token usage per task across Nemo's skill categories.

Which Provider for Which Skill

Not all skills have the same requirements. Here are our tested recommendations for matching providers to Nemo's most popular skills:

Email triage — Claude (recommended)

Email triage requires understanding context, tone, urgency, and relationships between senders. Claude excels at this nuanced classification. It correctly identifies passive-aggressive emails, distinguishes between "FYI" and "action required" messages, and understands organizational hierarchy cues. GPT-4 performs nearly as well. Ollama models (Llama 3 8B) handle basic urgent/not-urgent classification but struggle with subtle priority distinctions.

Email composition — Claude (recommended)

Composing emails that sound natural and match the appropriate tone is Claude's strength. It produces replies that sound like a human wrote them, matching formality level to the original thread. GPT-4 is a close second. Local models tend to produce slightly stilted or overly formal emails, though Llama 3 70B is competitive with cloud models.

Document summarization — GPT-4 or Ollama

Summarization is a task where local models shine. Llama 3 8B produces excellent summaries of most documents, and since the task does not require external API access, it can run fully offline. GPT-4 produces slightly more polished summaries with better structural organization. For most users, Ollama is the best choice here because summarized documents often contain sensitive content that benefits from local processing.

Form filling — Claude (recommended)

Form filling is one of Nemo's most complex skills, involving multi-step reasoning, field matching, profile data lookup, and derived value computation (extracting birth month from a date of birth, for example). Claude handles these reasoning chains most reliably. GPT-4 is a good alternative. Smaller local models struggle with the multi-step reasoning required for complex forms but work well for simple forms with obvious field mappings.

Desktop automation — GPT-4 (recommended)

Desktop automation benefits from GPT-4's vision capabilities when screenshot analysis is involved. The agent can look at what is on screen, identify UI elements, and plan its actions accordingly. Claude handles text-based desktop automation well but lacks native vision. Ollama models work for simple, scripted desktop tasks where the agent does not need to interpret visual content.

Coding and development tasks — Claude or GPT-4

Both Claude and GPT-4 excel at code generation, debugging, and development workflow automation. For local code assistance, CodeLlama via Ollama is a strong option. The choice between Claude and GPT-4 for coding is largely a matter of personal preference — both produce high-quality code with good explanations.

How to Switch Providers

Switching your LLM provider in Nemo takes about 30 seconds:

Open Nemo and navigate to Settings (gear icon in the sidebar).
Find the LLM Provider section.
Select your desired provider from the dropdown (Anthropic, OpenAI, Ollama, OpenRouter, or Custom).
Enter your API key (not needed for Ollama) or endpoint URL (for Custom).
Select the specific model you want to use from the model dropdown.
Click Test Connection to verify the provider responds correctly.
Click Save. All subsequent tasks will use the new provider.

Your API keys are stored in Nemo's encrypted vault, which uses AES-256 encryption. Keys are never stored in plain text, never logged, and never sent anywhere except the specific provider's API endpoint during requests. You can view, update, or delete stored keys from the vault at any time.

Switching providers does not affect your task history, skill configurations, or any other settings. The agent's behavior is the same regardless of provider — only the underlying model intelligence changes.

Token Usage Tracking

Nemo tracks every token consumed across all your tasks, providing real-time visibility into your AI costs. This is particularly important for pay-per-token providers like Anthropic and OpenAI, where costs can accumulate without clear visibility.

Real-time header pill

The Nemo header bar displays a small pill showing your current session's token usage and estimated cost. Updated after every LLM call, it shows both input and output tokens consumed along with the calculated cost based on your provider's pricing. You can click it for a detailed breakdown by task.

Task-level tracking

Every completed task records its total token usage and cost. This data appears in the History view, where you can see a Tokens column alongside each task's name, status, and timestamp. You can sort by token usage to identify which tasks are the most expensive and optimize your workflow accordingly.

Session-level tracking

Nemo maintains a running session total that accumulates across all tasks until you reset it. This gives you a clear picture of your daily or weekly AI spending. You can reset the counter at any time from the header pill menu.

Provider-specific normalization

Different providers report usage differently. Anthropic uses input_tokens and output_tokens. OpenAI uses a similar format. Ollama does not return usage data at all, so Nemo estimates it from response length. OpenRouter forwards the underlying provider's usage data. Nemo normalizes all of these into a consistent format so your tracking is accurate regardless of which provider you are using.

Our Recommendation

After extensive testing across all of Nemo's skill categories, here is our recommendation for three types of users:

For maximum quality: Anthropic Claude

If you want the best possible task completion quality and are willing to pay $5–15/month in API costs, Claude 3.5 Sonnet is our top recommendation. It handles Nemo's full skill catalog with the highest reliability, produces the most natural language output, and follows complex multi-step instructions most faithfully. Enable smart routing with Claude 3.5 Haiku as the secondary model to reduce costs by 40% without noticeable quality loss.

For maximum privacy: Ollama

If your data sensitivity requirements are high — medical documents, financial records, legal contracts, personal correspondence — Ollama with Llama 3 8B is the clear choice. Your data never leaves your hardware. There are zero API costs. Performance is good on 16GB+ machines. You sacrifice some quality on the most complex tasks compared to frontier cloud models, but for the majority of everyday tasks, the difference is minimal.

For budget-conscious users: OpenRouter free tier + Ollama

If you want to use Nemo at zero cost, combine OpenRouter's free-tier models (for tasks that need internet access like email) with Ollama (for local tasks like document summarization and desktop automation). Smart routing can manage this automatically. You get capable AI automation for $0/month, though you will hit rate limits during heavy use of OpenRouter's free tier.

The best model is the one that matches your priorities. Nemo gives you the freedom to choose — and the intelligence to help you choose wisely through smart routing.

Your AI. Your model. Your choice.

5 LLM providers. Smart routing. Free with Ollama. Download Nemo and pick your model.

Download Nemo Free for Windows

Windows 10+ · macOS coming soon · No credit card required

Frequently Asked Questions

Which LLM provider is best for Nemo?

The best LLM provider for Nemo depends on your priorities. Anthropic Claude is the best overall choice for quality, nuanced reasoning, and safety-aware responses, especially for email and writing tasks. OpenAI GPT-4 offers the broadest capabilities including vision. Ollama is the best for privacy since models run entirely on your hardware with zero cloud dependency. OpenRouter gives you access to 100+ models and often has free tiers for experimentation. For most users, we recommend starting with Claude for quality or Ollama for privacy.

Can I use Nemo completely free with Ollama?

Yes. Ollama is a completely free, open-source local LLM runner. You can download and run models like Llama 3, Mistral, and Phi on your own hardware at zero cost. Combined with Nemo's free core features, this gives you a fully functional AI agent with no API costs, no subscriptions, and no cloud dependency. The only cost is your electricity. Performance depends on your hardware: a machine with 16GB RAM can run 7B parameter models comfortably, while 32GB or more is recommended for larger models.

How much does it cost to run Nemo with Claude?

Running Nemo with Anthropic Claude costs approximately $3–15 per month for typical personal use. Claude's pricing is based on tokens: roughly $3 per million input tokens and $15 per million output tokens for Claude 3.5 Sonnet. A typical email triage task uses about 2,000–5,000 tokens. A document summary uses 3,000–10,000 tokens. Light use (5–10 tasks per day) costs around $3–5 per month. Heavy use (30+ tasks per day) can reach $10–15 per month. Nemo's smart routing feature can reduce costs by automatically using cheaper models for simple tasks.

What is smart routing?

Smart routing is Nemo's automatic model selection feature. Instead of always using your most powerful (and expensive) model for every task, Nemo analyzes the complexity of each task and routes it to the most cost-effective model capable of handling it. Simple tasks like reading a file listing or categorizing an email might be routed to a smaller, cheaper model. Complex tasks like multi-step form filling or nuanced email composition are routed to the full-capability model. Smart routing can reduce LLM costs by 40–60% without noticeable quality degradation for most workflows.

Can I use multiple providers at once?

Yes. Nemo supports configuring multiple LLM providers simultaneously. You can set Claude as your primary model for complex tasks, Ollama as your local fallback for offline use, and a cheaper OpenRouter model for simple operations. Smart routing can automatically distribute tasks across providers based on complexity and cost. You can also manually override the provider for any specific skill in Nemo's settings. API keys for all providers are stored securely in Nemo's encrypted vault.

Table of Contents

Why Model Choice Matters

Anthropic (Claude)

Strengths

Setup

Best for

OpenAI (GPT-4)

Strengths

Setup

Best for

Ollama (Fully Local)

How it works

Recommended models

Performance on consumer hardware

The privacy advantage

Setup

Best for

OpenRouter

How it works

Free tier models

Best for

Custom Endpoints

Use cases

Setup

Smart Routing Explained

Typical savings

Cost Comparison Table

Which Provider for Which Skill

Email triage — Claude (recommended)

Email composition — Claude (recommended)

Document summarization — GPT-4 or Ollama

Form filling — Claude (recommended)

Desktop automation — GPT-4 (recommended)

Coding and development tasks — Claude or GPT-4

How to Switch Providers

Token Usage Tracking

Real-time header pill

Task-level tracking

Session-level tracking

Provider-specific normalization

Our Recommendation

For maximum quality: Anthropic Claude

For maximum privacy: Ollama

For budget-conscious users: OpenRouter free tier + Ollama

Your AI. Your model. Your choice.

Frequently Asked Questions