Home
Blog
Agentic AI Use Cases in Product Security -- A Primer

Agentic AI Use Cases in Product Security -- A Primer

The blogpost version of the same-named talk.

Author

Spyros Gasteratos

Published on 28 March, 2025

Agentic AI Use Cases in Product Security

Intro:

This is the blogpost version of the talk: “Agentic AI in AppSec.”

This blogpost started as an internal, lunch and learn presentation. We repurposed it to a lighting talk when invited to sponsor an OWASP meetup and finally we developed it to a full talk for the excellent AI Security Collective meetup.

The interest in the topic is strong enough to warrant a blogpost version.

My social media this year are full of the phrase: “2025 is the year of Agentic AI”.

It is a very catchy marketing term — makes me read the article more often than not.

But in the end, it’s not clear what is agentic AI and most importantly, what you can do with it.

What is Agentic AI

Imagine a software system that doesn’t just execute commands but takes proactive steps to achieve a goal—this is Agentic AI.

We understand this as a system that has the capability to non-deterministically select between executing one or more external functions based on input.

There is an important distinction here: Agentic AI uses AI Agents, but not every AI Agent is Agentic AI.

For example: Most leading chatbot implementations allow for multiple external function calling based on input (e.g. Generate an Image, Search the Web, run Python). That’s Agentic AI: the system has the agency to choose how it affects its environment based on input.

On the contrary, the OpenCRE RAG chatbot, is deterministic code that goes to an LLM to summarize a resource and provide an answer. That is an agent that uses AI to perform some tasks, its output is mostly deterministic, e.g. if you ask what is XSS twice you should get similar answers.

General Use Cases

Looking at YC cohorts, latest funding updates and product hunt launches, there’s several small and large players who are actively using agentic workflows or networks of agents to serve several use cases. Some ones are the following.

Knowledge Augmentation

There’s several companies (and some open source projects) that try to build AI knowledge aggregators the agentic way. Imagine a chatbot that receives the question: “Who is the developer on-call for our Login functionality, tell them I found a bug that is XYZ”. The chatbot then has access to the on-call portal (so it can find the on-call teams), the developer portal (so it can find the on-call dev or as a backup) and finally the employee portal so it can find slack, bug tracking systems or emails. It can then try to ping a human with a summary or reply with the relevant link which you can use to file a bug.

Marketing

If done right this will be a game changer, prepare for a few years of this not done right

Don’t you hate it when you receive a cold email that is some marketeers generic pitch?

Several people thought: you know what this means? Humans need more spam but tailored to them!

This is an advertising story as old as the internet, where agents come in, is in the ability to tailor not to a generic user profile, but to a user.

Prepare for an onslaught of vibe-coded data scrappers that read your entire BlueSky or Reddit history, sent it to the cheapest model possible with the prompt: “Which of the products you have in your knowledge base would this person be interested in? Generate a short marketing video and a text that match this person’s interests”

The advertising equivalent of the annoying club-promoter.

Customer Service

This is pretty great use case that we’ve seen in the wild already, surprisingly.

When you talk to a customer service chatbot, you really don’t need a human if you only want an invoice. Similar for when you want any well defined workflow which just involves multiple system interaction (“what is my latest bill”, “you overcharged me”, “i didn’t sign up for this service” etc).

But also it’s important to know when a human is needed because the conversation with the AI agent is going nowhere.

Product Security Use Cases

Now that we understand what is likely done with Agentic AI, what is being done in security?

Continuous Regulatory Compliance

Both OWASP and ISACA forums have teams working on CRC.

What we’ve seen implementation-wise is systems that crawl confluence, Jira and other internal systems, summarise pages and link them dynamically to popular frameworks. Then access CNAPPs, ASPMs etc to gather evidence and flag policy violations.

Smithy allows for the evidence gathering and policy violation part while at the same time helping with indexing existing policy systems via external plugins. Talk to us for a demo.

Tooling Configuration

We are actively experimenting with this.

Security tools are useful but boy, do their command line arguments and configurations don’t make any sense!

“It’d be awesome if we could train a model to understand the docs of all the tools we support and then have an agent run until we get an end to end test to pass.” — Team pub night, February 2025

Turns out this is very possible and it helps with customised config to reduce noise and increase detection.

Product Security workflow creation

AKA: what to do after automated tool config

That’s a natural next step after a single tool has been configured. Knowing the average performance of the tool, you can then add post-processing modules dynamically to improve it.

Remediation (with testing)

Choose a big AppSec player, they’re working on this or already have this.

However, most of the time it’s either a model over-fitted to the problem space (awesome approach btw), or a knowledge graph with an LLM component.

The next step on this and our bet is on GitHub solving this problem first, is an agentic workflow that fixes, runs tests, learns on a loop until the test is either correct or the tests need adjusting.

This is likely to produce more accurate remediation that also has a better chance of not breaking functionality.

False Positive Detection

A lightweight version of this can be done with a single LLM and prompting.

An agent network orchestrated by a single model with agentic capabilities could:

receive code & finding
receive context of what the code and team are supposed to be doing
generate a plan on how to validate the finding against context with multiple agents
receive context validation, figure out the consensus based on weights

False positives are the reason for our entire enricher ecosystem existing, we are looking at this use case.

Here be dragons — Security Considerations

Building on Agentic Systems we have messed up multiple ways. Here’s our lessons and what can happen if you don’t follow them.

Accuracy decreases with the number of available options.

An LLM is still a statistical model, it’s important to limit the number of available options to the problem domain. We faced this when experimenting with tooling configuration. If you want to configure e.g. Semgrep, you should really only give it access to the Semgrep related functions.

Permissions

An agentic system is a trust boundary. If the system needs to retrieve employee information, it should not have access to call the “delete users” method.

Funds Denial

This is our favourite one. Your AI use case is epic, until you run out of money.

We’ve seen this many times on social media with propaganda bots, if you message them enough, sometimes they run out of money and post error messages.

We faced this when we gave users an LLM-proxy component, we found out how easy it is to template MAX_TOKENS to a prompt 20k times in a row in a single workflow.

If you’re wondering why we removed the LLM component, that’s why, you know who you are.

A fix for this is delegation, long term caching and pre-processing.

Delegation: Not all prompts are equal, some don’t need the big expensive model
Caching: A lot of the input is linguistically similar, if similar input arrives, likely you can serve a slightly modified response.
Pre-Processing: Those 20k findings had a 95% false positive rate, on average. Even some lightweight detection would have reduced our LLM traffic significantly.

Resources

No blog post is complete without awesome resources.

Here’s some links to places we follow, are passionate about and/or have awesome content:

OWASP AI Exchange : written by the incredible Rob Van Der Veer, it’s the best resource for AI security.
Chip Huyen’s post on Agents : Chip wrote the book on foundational AI models.
Anthropic’s post on Agents

Smithy is a big user of AI, both in training and using models.

We generate embeddings and do extensive NLP to match security tooling output to customer-internal company resources through our standards, training and knowledge base correlation enrichers.
We map company resources to CREs using prompting and human in the loop validation.
The team building Smithy are the same people who brought you the OpenCRE chatbot and the first/only open source cybersecurity RAG framework.
Look at the use cases below to get a glimpse on what else we are currently experimenting with.