Novee just launched an autonomous AI agent that pen tests your LLM-powered applications, and based on their Cursor vulnerability disclosure alone, these people know what they're doing.
The company introduced AI Red Teaming for LLM Applications at RSAC 2026 in San Francisco this week. It's currently in beta, with live demos running at booth S-0262. The pitch: point an AI agent at your chatbot, copilot, or autonomous agent workflow, and it will chain together multi-step attacks to find vulnerabilities that static scanners and one-shot prompt testing miss.
What it actually does
Novee's agent doesn't just fire payloads at your app and check for errors. Before running any tests, it gathers context on the target: reads documentation, queries APIs, and builds an internal model of how the application works. Then it tailors its attack strategies to that specific environment.
Gon Chalamish, co-founder and CPO at Novee, described an example where the agent maps an application's role-based access control structure, then probes whether a lower-privileged user can access data restricted to a higher-privileged one. That's not a canned test. That's reconnaissance followed by targeted exploitation, which is how actual attackers operate.
The agent tests for prompt injection, indirect prompt injection, jailbreak attempts, data exfiltration, tool abuse, and agent manipulation. It works with apps built on OpenAI, Anthropic, or open-source models. And it plugs into CI/CD pipelines, so you can run security tests as part of your standard deployment process.
Why traditional pen testing falls short here
Here's the core problem Novee is solving: most enterprise security teams test each application once a year, maybe less. According to Novee, a security team managing 500 applications simply cannot keep pace with manual testing. Meanwhile, LLM applications change continuously. Model updates alter behavior even when no code is deployed. Your annual pen test is stale before the report is finished.
Human pen testers face two constraints. First, they're expensive and scarce. Second, most of them specialize in web and infrastructure testing. Prompt injection and indirect prompt injection aren't part of the standard pen tester toolkit. The attack surface is fundamentally different.
"Attackers are already adapting their techniques for AI systems," Chalamish said. "Security teams need a way to test those systems the same way adversaries attack them."
Ido Geffen, CEO and co-founder, put it more bluntly: "The window between vulnerability and exploitation can shrink to minutes. Defending against that requires continuous testing, not periodic assessments."



