SILMARIL HACKED

Problem
Attacks on AI are compounding in complexity faster than defenses can adapt, and the gap is widening.
6 months ago, a prompt injection was a single hidden instruction to fool the model. If the guardrail or the model caught the pattern, the attack failed.
Today's attacks are chains that manipulate agents. A poisoned input such as a calendar invite triggers agent behavior which exfiltrates data, escalates privilege, and causes real damage. Guardrails are too static to block these chains.
This complexity used to be theoretical because no human would manually sequence these chains at scale. Now attackers use AI to create thousands of multi-step attack paths and converge on the ones that work, causing $30B in damages in 2025 alone.
ShareLeak and PipeLeak turned public form fields into agent hijack paths across Microsoft Copilot Studio and Salesforce Agentforce, exfiltrating SharePoint and CRM data through legitimate Outlook and email actions. CVE-2026-21520 · CVSS 7.5
CurseChain showed hidden README comments in Cursor can steal developer SSH keys, then poison future unrelated projects with regenerated exfiltration code even when the attacker ships no malicious code.
Forcepoint and Google found indirect prompt injection deployed across live websites, with payloads for API-key theft, financial fraud, data destruction, and agent denial of service. Google measured a 32% rise in malicious prompt-injection content.
Solution
Silmaril wraps your inference calls to evaluate whether an execution sequence is heading toward a harmful outcome.
Existing guardrails filter inputs. Silmaril's multihead classifier inspects user intent, application context, and execution states together to detect harmful outcomes before they materialize. The model retrains continuously on exploits our threat hunting agents discover in your environment. Your defense gets stronger before attackers even have a chance to probe it.
Five lines of code, zero overhead. Silmaril operates at the application layer and supports every major agentic SDK and inference provider. It is available as a managed SaaS for teams that want to move quickly, or as a self-hosted deployment for environments with strict data residency and compliance requirements. Blocking is configurable by workflow node type.
Approach
Finding vulnerabilities before attackers do
Autonomous agents probe the defender’s system through the UI, determining trust boundaries and hacking from first principles. They chain AI risks such as prompt injection, tool abuse, context poisoning, and more. Silmaril has found chains resulting in self-replicating worms and cross-user remote code execution.
Blocking attacks in real-time
Classifier model in the firewall is based on a ModernBERT variant with Flash Attention. It is trained on execution traces from autonomous threat research runs against your application, so it learns your specific decision boundaries, tool calls, and data flows. At inference time, it evaluates the current execution sequence (user intent, application context, and accumulated state) and outputs a binary pass/block decision with a p90 latency of 20ms. Integration is 5 lines of code, zero overhead.
When the classifier blocks, it throws an error to the orchestration layer, allowing you to determine threat handling.
Turning every attack into a deployed defense
Every attack discovered generates synthetic training data. The firewall retrains and deploys updated weights automatically, from discovery to active defense in under an hour. When a novel technique is blocked at one deployment, it is anonymized and propagated to every other firewall deployment.
Performance
Accuracy
Metrics
| System | Precision | Recall | F1 | Latency |
|---|---|---|---|---|
| Silmaril Firewall | 0.932 | 0.979 | 0.955 | 20ms |
| Lakera Guard | 0.699 | 0.807 | 0.749 | 114ms |
| BrowseSafe | 0.909 | 0.626 | 0.741 | 102ms |
| GPT Safeguard | 1.000 | 0.261 | 0.413 | 537ms |
| Model Armor | 0.778 | 0.265 | 0.395 | 220ms |
Threats Blocked
15 critical vulnerabilities disclosed to OpenAI, Anthropic, Google, and Microsoft in two weeks.
#1 AI-native productivity app
Silmaril found the exploits, retrained the firewall, and prevented $68M in damages, spanning:
- //Self-replicating worm propagation via document poisoning
- //Agent-to-agent supply chain compromise
- //Sandbox credential theft leading to cross-user remote code execution
- //Zero-click data exfiltration through calendar injection
- //Silent document and message harvesting via email injection
#1 AI-native analytics platform
Silmaril found the exploits, retrained the firewall, and prevented $20M in damages, spanning:
- //Entity injection via feedback fields into agent context
- //Unauthorized workflow execution through tool-manipulation payloads
Open AI
Silmaril hacked the ChatGPT agent by chaining a prompt injection into escalated root access and moved laterally across containers. Silmaril accessed internal source code and secrets. The exploit took <5 minutes to execute and <5 hours to ideate with our agents.
Microsoft
Critical prompt injection vulnerabilities using email as the entry vector, achieving data exfiltration through SSRF in Copilot. Microsoft patched the vulnerability for millions of users.