Agentic AI Use Cases in Product Security
Intro:
This is the blogpost version of the talk: “Agentic AI in AppSec.”
This blogpost started as an internal, lunch and learn presentation. We repurposed it to a lighting talk when invited to sponsor an OWASP meetup and finally we developed it to a full talk for the excellent AI Security Collective meetup.
The interest in the topic is strong enough to warrant a blogpost version.
My social media this year are full of the phrase: “2025 is the year of Agentic AI”.
It is a very catchy marketing term — makes me read the article more often than not.
But in the end, it’s not clear what is agentic AI and most importantly, what you can do with it.
What is Agentic AI
Imagine a software system that doesn’t just execute commands but takes proactive steps to achieve a goal—this is Agentic AI.
We understand this as a system that has the capability to non-deterministically select between executing one or more external functions based on input.
There is an important distinction here: Agentic AI uses AI Agents, but not every AI Agent is Agentic AI.
For example: Most leading chatbot implementations allow for multiple external function calling based on input (e.g. Generate an Image, Search the Web, run Python). That’s Agentic AI: the system has the agency to choose how it affects its environment based on input.
On the contrary, the OpenCRE RAG chatbot, is deterministic code that goes to an LLM to summarize a resource and provide an answer. That is an agent that uses AI to perform some tasks, its output is mostly deterministic, e.g. if you ask what is XSS
twice you should get similar answers.
General Use Cases
Looking at YC cohorts, latest funding updates and product hunt launches, there’s several small and large players who are actively using agentic workflows or networks of agents to serve several use cases. Some ones are the following.
Knowledge Augmentation
There’s several companies (and some open source projects) that try to build AI knowledge aggregators the agentic way. Imagine a chatbot that receives the question: “Who is the developer on-call for our Login functionality, tell them I found a bug that is XYZ”. The chatbot then has access to the on-call portal (so it can find the on-call teams), the developer portal (so it can find the on-call dev or as a backup) and finally the employee portal so it can find slack, bug tracking systems or emails. It can then try to ping a human with a summary or reply with the relevant link which you can use to file a bug.
Marketing
If done right this will be a game changer, prepare for a few years of this not done right
Don’t you hate it when you receive a cold email that is some marketeers generic pitch?
Several people thought: you know what this means? Humans need more spam but tailored to them!
This is an advertising story as old as the internet, where agents come in, is in the ability to tailor not to a generic user profile, but to a user.
Prepare for an onslaught of vibe-coded data scrappers that read your entire BlueSky or Reddit history, sent it to the cheapest model possible with the prompt: “Which of the products you have in your knowledge base would this person be interested in? Generate a short marketing video and a text that match this person’s interests”
The advertising equivalent of the annoying club-promoter.
Customer Service
This is pretty great use case that we’ve seen in the wild already, surprisingly.
When you talk to a customer service chatbot, you really don’t need a human if you only want an invoice. Similar for when you want any well defined workflow which just involves multiple system interaction (“what is my latest bill”, “you overcharged me”, “i didn’t sign up for this service” etc).
But also it’s important to know when a human is needed because the conversation with the AI agent is going nowhere.
Product Security Use Cases
Now that we understand what is likely done with Agentic AI, what is being done in security?
Continuous Regulatory Compliance
Both OWASP and ISACA forums have teams working on CRC.
What we’ve seen implementation-wise is systems that crawl confluence, Jira and other internal systems, summarise pages and link them dynamically to popular frameworks. Then access CNAPPs, ASPMs etc to gather evidence and flag policy violations.
Smithy allows for the evidence gathering and policy violation part while at the same time helping with indexing existing policy systems via external plugins. Talk to us for a demo.
Tooling Configuration
We are actively experimenting with this.
Security tools are useful but boy, do their command line arguments and configurations don’t make any sense!
“It’d be awesome if we could train a model to understand the docs of all the tools we support and then have an agent run until we get an end to end test to pass.” — Team pub night, February 2025
Turns out this is very possible and it helps with customised config to reduce noise and increase detection.
Product Security workflow creation
AKA: what to do after automated tool config
That’s a natural next step after a single tool has been configured. Knowing the average performance of the tool, you can then add post-processing modules dynamically to improve it.
Remediation (with testing)
Choose a big AppSec player, they’re working on this or already have this.
However, most of the time it’s either a model over-fitted to the problem space (awesome approach btw), or a knowledge graph with an LLM component.
The next step on this and our bet is on GitHub solving this problem first, is an agentic workflow that fixes, runs tests, learns on a loop until the test is either correct or the tests need adjusting.
This is likely to produce more accurate remediation that also has a better chance of not breaking functionality.
False Positive Detection
A lightweight version of this can be done with a single LLM and prompting.
An agent network orchestrated by a single model with agentic capabilities could:
- receive code & finding
- receive context of what the code and team are supposed to be doing
- generate a plan on how to validate the finding against context with multiple agents
- receive context validation, figure out the consensus based on weights
False positives are the reason for our entire enricher ecosystem existing, we are looking at this use case.
Here be dragons — Security Considerations
Building on Agentic Systems we have messed up multiple ways. Here’s our lessons and what can happen if you don’t follow them.
Accuracy decreases with the number of available options.
An LLM is still a statistical model, it’s important to limit the number of available options to the problem domain. We faced this when experimenting with tooling configuration. If you want to configure e.g. Semgrep, you should really only give it access to the Semgrep related functions.
Permissions
An agentic system is a trust boundary. If the system needs to retrieve employee information, it should not have access to call the “delete users” method.
Funds Denial
This is our favourite one. Your AI use case is epic, until you run out of money.
We’ve seen this many times on social media with propaganda bots, if you message them enough, sometimes they run out of money and post error messages.
We faced this when we gave users an LLM-proxy component, we found out how easy it is to template MAX_TOKENS to a prompt 20k times in a row in a single workflow.
If you’re wondering why we removed the LLM component, that’s why, you know who you are.
A fix for this is delegation, long term caching and pre-processing.
- Delegation: Not all prompts are equal, some don’t need the big expensive model
- Caching: A lot of the input is linguistically similar, if similar input arrives, likely you can serve a slightly modified response.
- Pre-Processing: Those 20k findings had a 95% false positive rate, on average. Even some lightweight detection would have reduced our LLM traffic significantly.
Resources
No blog post is complete without awesome resources.
Here’s some links to places we follow, are passionate about and/or have awesome content:
- OWASP AI Exchange : written by the incredible Rob Van Der Veer, it’s the best resource for AI security.
- Chip Huyen’s post on Agents : Chip wrote the book on foundational AI models.
- Anthropic’s post on Agents
Smithy is a big user of AI, both in training and using models.
- We generate embeddings and do extensive NLP to match security tooling output to customer-internal company resources through our standards, training and knowledge base correlation enrichers.
- We map company resources to CREs using prompting and human in the loop validation.
- The team building Smithy are the same people who brought you the OpenCRE chatbot and the first/only open source cybersecurity RAG framework.
- Look at the use cases below to get a glimpse on what else we are currently experimenting with.