Why AI Safety Matters
AI agents are no longer theoretical. They are clicking buttons, sending emails, filling forms, and controlling desktop applications on your behalf. This is an extraordinary leap in productivity — and an extraordinary leap in risk. When you give an AI the ability to take real-world actions, the consequences of mistakes are real too. A misclassified email gets sent to the wrong person. A form filler leaks your Social Security number. A desktop automation script closes an unsaved document. These are not hypothetical scenarios. They are the daily reality of using AI agents without proper safety guardrails.
The AI industry has largely focused on content safety — preventing language models from generating harmful text, hate speech, or misinformation. But action safety is a fundamentally different problem. When an AI agent can interact with your computer, your browser, your email, and your files, the question is not just "what did the model say?" but "what did the model do?" Content moderation APIs do not help when an AI agent is about to paste your credit card number into a public form.
This is why we built Sentinel. Nemo is designed to be an autonomous AI agent that handles real tasks on your computer. That power demands a safety system that is just as sophisticated as the agent itself — one that understands the difference between reading a file and deleting it, between drafting an email and sending it, between filling a form field and submitting the entire form.
What Is the Sentinel Safety Layer
Sentinel is Nemo's built-in safety system that sits between the AI agent's decisions and their execution. Every single action the agent attempts — every tool call, every keystroke, every API request — passes through Sentinel before it reaches the outside world. Think of it as a security checkpoint that inspects every package leaving the building.
At its core, Sentinel uses SmolLM2-360M, a lightweight language model with 360 million parameters. This model runs entirely on your local machine. It never sends data to external servers. It never phones home. The model was chosen specifically because it is small enough to run on consumer hardware with minimal latency (typically 50–150 milliseconds per screening) while being capable enough to detect sensitive patterns in text and understand the intent behind actions.
Sentinel operates on three principles:
- Every action is screened. There are no exceptions, no fast paths, no trusted actions. Even read-only operations are logged. Write operations are screened for PII and compared against the skill's configured consent level.
- Safety is local. The safety model runs on your hardware. The audit log is stored on your machine. Your data never leaves your computer for safety screening purposes. This is a fundamental architectural decision — we believe that a safety layer that requires sending your data to the cloud defeats its own purpose.
- Defaults are strict. Every new skill starts with the most restrictive safety policies. PII types like SSN and credit card numbers are blocked by default. Write actions default to draft consent (user must approve). Dangerous key combinations are hardcoded as blocked. You can relax these policies, but you have to do it intentionally.
PII Detection and Protection
Personally identifiable information is the most common category of data that AI agents accidentally expose. An email triage skill reads your inbox and encounters a message containing your bank account number. A form filler reads your profile and has access to your Social Security number. A document summarizer processes a contract with confidential financial terms. In each case, the AI model sees this sensitive data and might include it in its output, pass it to a tool, or send it somewhere it should not go.
Sentinel uses a combination of pattern matching and the SmolLM2-360M model to detect PII in every action's inputs and outputs. It recognizes the following sensitive data types:
- Social Security numbers — 9-digit patterns with or without dashes (e.g., 123-45-6789)
- Credit card numbers — 13–19 digit patterns with Luhn checksum validation
- API keys and tokens — high-entropy alphanumeric strings matching known key formats (AWS, Stripe, GitHub, etc.)
- Email addresses — standard email pattern matching
- Phone numbers — domestic and international format detection
- Physical addresses — street address pattern recognition
- IP addresses — IPv4 and IPv6 pattern matching
When PII is detected, Sentinel does not simply block the action. Instead, it applies the PII policy configured for that specific skill. There are four policy levels, each appropriate for different situations:
- Block — The action is stopped entirely. The agent receives an error message explaining that PII was detected and the action was prevented. This is the default for SSN and credit card numbers in most skills.
- Redact — The PII is completely removed from the content before the action proceeds. For example, a document summary that contained "Contact John at john@example.com" would become "Contact John at [REDACTED]". The action still completes, but the sensitive data never reaches the destination.
- Mask — The PII is partially hidden while preserving enough information for context. A credit card number becomes "****-****-****-4242". A phone number becomes "(***) ***-5678". This is useful when you need to reference sensitive data without fully exposing it.
- Pass — The PII is allowed through without modification. This policy exists because some skills legitimately need to handle sensitive data. An email composer must be able to include email addresses. A form filler needs to enter phone numbers. The pass policy is applied selectively to specific PII types that a skill requires for its core function.
The key insight in Sentinel's PII system is that policies are per-skill, not global. The email composer skill has a pass policy for email addresses (because it needs to send emails) but a block policy for SSN and credit card numbers. The form filler skill has a pass policy for phone numbers and addresses (because it fills form fields) but a redact policy for API keys. This granularity ensures that each skill gets exactly the access it needs and nothing more.
The Consent Model Explained
PII detection handles what data leaves your machine. The consent model handles what actions the AI agent can take. Together, they form Sentinel's two core safety mechanisms.
Nemo's consent model has three tiers:
- Execute — The action runs automatically without asking for your approval. This level is reserved for actions that are inherently safe: reading a file, listing directory contents, querying a local database, taking a screenshot. These are read-only operations that cannot modify, delete, or send data. The agent proceeds immediately, and you see the result in the chat.
- Draft — The action is prepared and queued for your review before execution. You see exactly what the agent wants to do — the email it wants to send, the form it wants to submit, the file it wants to write — and you explicitly approve or reject it. This is the default for any write operation that affects external systems: sending emails, submitting forms, posting to social media, modifying files.
- Observe — The action is logged and displayed but never executed. This mode is useful during initial setup when you want to see what the agent would do without any risk. It is also used for administrative auditing and for skills that are in testing. The agent proceeds as if the action completed, but nothing actually happens in the real world.
The consent level is configured per-skill and can even vary per-tool within a skill. For example, the email triage skill uses execute consent for reading and categorizing emails (safe, read-only operations) but draft consent for moving emails to folders or marking them as read (write operations that modify your inbox state). The form filler skill uses execute consent for scanning a web page (reading) but draft consent for submitting a form (writing to an external system).
This tiered approach exists because asking for approval on every action would make Nemo unusable. If you had to confirm every file read, every screenshot, every page scan, the agent would require more interaction than doing the task yourself. The consent model strikes a balance: safe actions flow automatically while risky actions pause for your review. The result is an agent that feels autonomous for routine work but always defers to you for consequential decisions.
Nemo also applies a concept we call READ_ONLY_PREFIXES. Tool names that start with certain prefixes (such as "browser.read", "vault.read", "desktop.screenshot") are automatically classified as read-only regardless of the skill's default consent setting. This ensures that read operations are never accidentally configured as draft or observe, which would break the agent's ability to gather information.
Dangerous Command Blocking
Desktop automation gives Nemo the ability to control your computer through keyboard shortcuts, mouse clicks, and application interactions. This power is what makes Nemo genuinely useful for automating desktop workflows. It is also the most dangerous capability Nemo has.
Consider what happens if an AI model decides, for whatever reason, to press Alt+F4. On Windows, that closes the active application. If that application has unsaved work, it is gone. Or consider Ctrl+Alt+Delete, which on Windows opens the security screen and could lead to system lock or shutdown. Or Win+L, which locks the workstation. These are not actions that any AI agent should ever perform autonomously.
Sentinel maintains a DANGEROUS_HOTKEYS frozenset — a hardcoded, immutable list of key combinations that are blocked at the system level. No configuration change, no skill policy, no user override can enable these combinations. They are:
- Alt+F4 — Closes the active application
- Ctrl+Alt+Delete — Opens Windows security screen
- Win+L — Locks the workstation
- Ctrl+W — Closes the active tab or window
- Alt+Tab rapid sequences — Could be used to switch to unintended applications
- Ctrl+Shift+Delete — Clears browser data in most browsers
- Win+R — Opens Run dialog, potential for arbitrary command execution
The decision to hardcode these blocks rather than make them configurable was deliberate. There is no legitimate automation use case where an AI agent needs to press Ctrl+Alt+Delete. Making it configurable would only create risk with no benefit.
Beyond hotkey blocking, Sentinel also enforces velocity limits on write actions in desktop automation. Click, type, drag, and hotkey actions are rate-limited to prevent the agent from executing too many write operations in a short window. This protects against scenarios where the agent enters a loop and starts clicking or typing at machine speed, potentially causing cascading damage before you can intervene.
Encrypted Audit Trail
Every action the AI agent takes — whether it was executed, drafted, observed, or blocked — is recorded in an append-only encrypted audit log stored locally on your machine. This log cannot be modified or deleted by the agent, by any skill, or by any external process. It is a permanent, tamper-resistant record of everything Nemo has done.
The audit trail records:
- Timestamp — When the action was attempted, to the millisecond
- Skill ID — Which skill initiated the action
- Tool name — The specific tool function that was called
- Action parameters — What arguments were passed to the tool (with PII redacted in the log itself)
- Consent decision — Whether the action was executed, approved by the user (draft), observed, or blocked
- Sentinel verdict — What Sentinel's safety screening found (PII types detected, risk level assessed)
- Result summary — Whether the action succeeded or failed, and a brief description of the outcome
- Token usage — How many tokens the LLM used for this decision, enabling cost tracking per action
Why does this matter? Accountability. If Nemo sends an email you did not authorize, the audit trail proves exactly what happened: which skill was active, what the agent's reasoning was, what Sentinel's screening found, and whether the consent system was properly invoked. If a form filler enters incorrect data, you can trace back to see exactly what the agent saw, what it decided, and what it typed.
The audit trail is also the foundation for Nemo's Collective Intelligence system. When you opt in, anonymized patterns from your audit trail (with all PII stripped) contribute to a shared knowledge base that helps all Nemo agents learn from common errors and platform quirks. The anonymization is thorough: SSN patterns, credit card numbers, API keys, file paths, IP addresses, and email addresses are all stripped before any data leaves your machine.
How Sentinel Differs from Other Safety Approaches
Most AI platforms either have no safety layer at all or rely on cloud-based content moderation. Neither approach is adequate for AI agents that take real-world actions.
OpenAI Moderation API (cloud-based content safety)
OpenAI's Moderation API is designed to classify text for harmful content categories: hate speech, violence, self-harm, sexual content. It is excellent at what it does, but it was designed for chatbot conversations, not for AI agent actions. It does not understand the difference between "read this file" and "delete this file." It does not detect SSN or credit card patterns. It does not enforce consent levels. And critically, it requires sending all your data to OpenAI's servers for screening, which defeats the purpose of local-first privacy.
Zapier and Make (no safety layer)
Traditional automation platforms like Zapier and Make have no AI safety layer because they are not AI agents. They execute predefined workflows exactly as designed. The safety model is that you, the human, verified the workflow before enabling it. This works for static workflows but completely breaks down when you introduce AI decision-making. If an AI model is choosing which tools to call and what parameters to pass, the "human verified the workflow" assumption no longer holds. Zapier added AI features (natural language workflow creation) without adding safety guardrails for AI-selected actions.
n8n and open-source alternatives (no built-in safety)
Open-source automation tools like n8n, Activepieces, and Node-RED have no built-in AI safety layer. Some offer basic error handling and retry logic, but none screen actions for PII, enforce consent models, or maintain encrypted audit trails. When these platforms add AI nodes (n8n has LLM nodes), the AI's output flows directly into the next workflow step without any safety screening.
Sentinel is purpose-built for the AI agent paradigm. It understands that an AI model is making decisions about what actions to take, and it interposes itself between those decisions and their execution. This is a fundamentally different approach from content moderation (which screens text after generation) and workflow validation (which checks a static workflow before deployment).
Real-World Examples
Abstract safety principles are best understood through concrete scenarios. Here are three real situations where Sentinel protects you:
Scenario 1: Draft email containing a Social Security number
You ask Nemo to compose a reply to an email from your accountant. The original email thread contains your SSN. The AI model, trying to be helpful, includes the SSN in the draft reply because it was referenced in the conversation. Sentinel's PII scanner detects the SSN pattern in the outgoing email body. Because the email composer skill has a block policy for SSN, the action is stopped entirely. You see a notification: "Blocked: SSN detected in draft email body. The email was not sent." You can review the draft, remove the SSN, and ask Nemo to try again.
Scenario 2: Desktop automation attempting Alt+F4
You ask Nemo to close a specific application that is not responding. The AI model decides to use the Alt+F4 keyboard shortcut to close it. Sentinel's DANGEROUS_HOTKEYS check catches the combination before it reaches the keyboard simulator. The action is blocked with a message: "Blocked: Alt+F4 is a restricted key combination." The agent then falls back to alternative approaches, such as using pywinauto's window management functions to close the application gracefully, or suggesting that you manually end the task.
Scenario 3: Form filler exposing credit card data
You ask Nemo to fill a government form using your saved profile. Your profile contains your credit card number for payment forms. The government form has a text field that the AI model mistakenly identifies as a payment field. It attempts to enter your credit card number. Sentinel's PII scanner detects the credit card pattern in the tool call's parameters. Because the form filler skill has a block policy for credit card numbers on non-payment fields and a draft policy for payment fields, the action is paused. You see the pending action: "Form filler wants to enter ****-****-****-4242 into field 'Additional Information'. Approve or reject?" You reject it, and the agent skips that field.
In all three cases, the user's data was protected without any manual configuration. Sentinel's strict defaults caught the problem before any damage occurred. This is the core design philosophy: safety should work out of the box, not require setup.
Customizing Safety Policies
While Sentinel's defaults are strict, Nemo is designed for power users who may need to adjust safety policies for specific workflows. Every skill's safety configuration can be customized through its skill.json file or through the Nemo settings interface.
Per-skill PII policies
Each skill defines a pii_policy object that maps PII types to actions. You can modify these policies to match your comfort level and use case. For example, if you are using a form filler skill exclusively on internal company forms where no sensitive data is involved, you could relax the PII policy to mask rather than block. However, we strongly recommend keeping block as the default for SSN and credit card numbers in all skills.
Consent level adjustments
The consent_defaults object in each skill's configuration defines which actions require approval. You can change a skill's write actions from draft to execute if you fully trust its behavior after extensive use. Conversely, you can tighten a skill's read actions from execute to draft if you want to see every piece of information the agent accesses. The flexibility is there, but again, the defaults are strict for good reason.
Budget templates
Each skill can define a budget_template that limits the total number of actions, API calls, or tokens the skill can consume in a single task. This prevents runaway agent behavior. If a skill enters a loop and starts making hundreds of API calls, the budget cap stops it. Budget limits are enforced by Sentinel independently of the agent's decision-making, so even a confused AI model cannot exceed them.
The customization options are powerful, but they are deliberately secondary to the defaults. Nemo ships safe. You can make it less restrictive. You cannot make it unsafe — the hardcoded DANGEROUS_HOTKEYS, the encrypted audit trail, and the core screening pipeline cannot be disabled.
The Future of AI Safety
Sentinel represents the first generation of local AI safety for autonomous agents. As the AI agent ecosystem matures, safety systems will need to evolve alongside it. Here is where we see the future heading:
Better local safety models
SmolLM2-360M is capable but limited by its size. As model architectures improve and hardware gets faster, future Sentinel versions will use more sophisticated local models that can understand context better, catch more subtle PII patterns, and predict the downstream consequences of actions. The goal is a safety model that understands not just "this action contains a credit card number" but "this sequence of three actions, taken together, would expose confidential financial data."
Trust scoring through Collective Intelligence
Nemo's Collective Intelligence system already assigns trust scores to AI agents based on their behavior. Agents that consistently trigger safety blocks or produce rejected drafts receive lower trust scores. In the future, Sentinel will use trust scores to dynamically adjust safety thresholds. A high-trust agent that has completed thousands of tasks without incidents might earn relaxed screening for routine operations, while a new or low-trust agent receives more rigorous checks. This is similar to how credit scoring works — earned trust leads to earned autonomy.
Cross-agent safety coordination
As AI agents become more common, they will increasingly interact with each other. Your Nemo agent might delegate a subtask to a marketplace skill created by a third party. Sentinel will need to enforce safety boundaries across these agent interactions, ensuring that a trusted skill cannot be used as a conduit to bypass safety controls through a less trusted one.
The fundamental principle will not change: every action is screened, safety is local, and defaults are strict. How those principles are implemented will get more sophisticated over time, but the commitment to user safety and data privacy will remain the foundation of everything Nemo does.
An AI agent without safety guardrails is a tool that can hurt you. An AI agent with Sentinel is a tool you can trust. The difference is not the AI's capability — it is the system's commitment to keeping you in control.