Evolving by focusing on Time To Value

How we extracted the core DNA of our Open Source project to dramatically reduce time to value and create a more accessible developer experience.

Evolving by focusing on Time To Value featured Image
Published on

When I was interviewing for my current role at Smithy, I prepared by exploring the Interactive Demos and the Smithy Open Source Project.

The former impressed me! A platform to orchestrate all security tooling in the same way with Open Source at the center! I didn’t see this being done before.

About the second part, well… I struggled to get the project up and running and actually understanding what was going on with it. I was asking myself:

  • What does this project actually do and what do I get out of it?
  • How do I run it?
  • How do I build custom integrations?
  • How is this related with the amazing demos?

After interviewing with my future colleagues I had great answers to all these questions! I understood how the Open Source project was playing an important role in the SaaS offering, backing up the demos.

Before I even accepted the offer, I was already thinking of how amazing it would have been to focus on decreasing the Time To Value (TTV) for the existing Open Source offering and how this would have also impacted SaaS.

Little did I know that when I joined, I was tasked with working on exactly this bit.

Join with me on this jurassic journey to see how we evolved our project to decrease TTV drastically and answer to the questions that a first-time user might have when they deal with Smithy.

You might ask yourself, How is this blogpost connected with dinosaurs? The whole journey felt like cloning extinct giants and let them roam free in a fancy park… And I’m a massive dino nerd!

helicopter

When Dinosaurs Ruled the Earth

Smithy started as an internal project in a big fintech unicorn company, and it was designed to run security workflows to make sure that the company was compliant with different regulations.

It was able to orchestrate SCA, SAST, SBOM (and more) based tools and leverage their findings to notify security teams on different channels about new vulnerabilities found in critical software and artifacts.

The fundamental difference between Smithy and the competition, was (and still is) the fact that it could:

  • run any security tool
  • be deployed on Kubernetes on premises - removing CI pressure - and keeping everything in house
  • be extended with custom integrations as well as using Open Source ones

The workflow engine was originally built on top of the internal available technologies:

  • Kubernetes: to orchestrate the engine and its parts.
  • Tekton: a powerful and flexible open-source framework for creating pipelines, mainly for CI/CD.
  • Kustomize: for configuration.

Workflows were Tekton Pipelines and each different stage in it, like running a scanner or a notifier, was a Tekton Task in such Pipeline. Each Task could have multiple steps, such as running a scanner and then some other small binary to analyse its findings; these were Tekton Steps. The data between the different Tasks and steps was shared using mounted Kubernetes Persistent Volumes.

Each different step in a Task was (and still is) known as integration: a binary written in any programming language to orchestrate security tools or interpret their findings. These were built and maintained by the core team as well. These were executed as containers in Kubernetes Pods.

This setup enabled to run security workflows with built-in resiliency, completely on clients’ premises, which automated security engineers’ duties and let them focus on taking decisions, rather than running multiple tools themselves.

old

Dinos of different species were roaming free.

Extinction

When we tried to build a sellable product on top of the existing system, a few problems came out.

Not answering the fundamental question: Engineers couldn’t quickly evaluate our solution, leading to lost opportunities.

It required very technical knowledge as well as a spare ~3h to get started.

Low Development Productivity: Building, testing and deploying new integrations was becoming increasingly complicated and our SaaS offering had to handle a lot of complexity due to Tekton pipelines and Tasks.

The existing system was a great place for bugs to develop. We were often chasing them and moving slower than expected to innovate with new features and improvements, leading to customers moving away.

Results’ Entropy Security tools report security findings in different ways. Some of them, like Snyk, can be configured to report their findings in various formats, including Sarif.

These tools, often scanners, are great at doing their job, but when it comes to combining them, it’s very difficult to build any kind of datalake that makes sense from their findings. For example, it might be useful to run a specialised scanner like GoSec and Semgrep for all Go repositories and then understand if the two are reporting the same issues. Since they don’t use the same lingo, this is a non-trivial ask.

In our world, this translates into maintaining many different parsers that try to make sense of these different outputs and build a smart datalake with them. A lot of development work goes in this direction, unfortunately.

This is a very common challenge that a lot of security teams have to face.

The Asteroid Impact: Ultimately, this led a general sense of slowness and frustration, paralysing development.

asteroid

The TTV was too long, even for ourselves, forget about the customers. This is an extremely important metric when it comes to building a platform like ours: a complex but easy to run system to get more value from the available security tools.

When it comes to our Open Source project, the TTV is the time that a customer will take to run any security workflow and get relevant and not noisy (like duplicate) findings out of it.

The Amber

We didn’t want to give up on the circumstances, so, we decided to take on a real challenge and tackle our problems with a centralised solution that would enable us to move faster and reduce TTV by a huge margin by:

Having quicker iterations: Being able to work on integrations in isolation. Build, test and distribute them individually. Take care of data persistence and observability out-of-the-box - allowing to focus only on the business logic of a integration.

Enabling for fast execution.

Simplifying the platform: Removing the dependencies on Kubernetes, Tekton and Kustomize to get started. Our target customers shouldn’t be DevSecOps experts to understand or run our product.

They should be able to run any workflow in minutes, perhaps with a single backbone application!

A smarter datalake Leveraging the Open Cybersecurity Schema Framework (OCSF) from the Linux Foundation to normalise all the outputs from different security tools orchestrated on top of Smithy to build a predictable datalake. OCSF allows to map different security findings, not only from Static Analysis, and can be used as a foundation to build platforms on top of it.

Allowing to make smart post-processing on the data and building features to reduce noise, enhance development experience and build better compliance workflows.

Enabling for Dogfooding: Set the product up for ourselves and start to use it daily to improve our own productivity. In this way, we could focus and work on what really mattered and spot those nasty bugs early on.

Our bet was that by focusing on TTV we would have direct and indirect impact on all the above.

amber

Extracting the essence of what we did well and enable for success.

Cloning Dinosaurs

To enhance development experience when it came to build, test and deploy integrations, we created an SDK to take care of:

Storage interactions: writing, updating and reading security findings in OCSF format out-of-the-box. The SDK does the heavy lifting by parsing reports into OCSF and knowing how to pass findings in this format between different integrations.

Instrumentation: logging and upcoming metrics/traces. This enables us to debug our integration with ease, using Open Source tools.

Isolation: executing and test the integrations in isolation; natively or in containers. Each integration has its own dependency and versioning; individually distributed and deployed.

Integrations are then configured like in the specification.

dna

Life Finds a Way

In order to dramatically reduce how long it would take to run workflows locally to understand how the product works, we created Smithyctl, a CLI that can be used to build/distribute integrations and run workflows like they would run in our Demos.

Instead of having to deploy a Kubernetes cluster and running Tekton workflows on it, Smithyctl runs containerised integrations on top of Docker - knowledge not required.

To run a workflow, one has to write a configuration file like in the specification, then simply run a command.

dino

Welcome to Jurassic Park

With these solid foundations, we could finally set up our SaaS up for Dogfooding and our customers correctly.

Both the SDK and Smithyctl allow us to run workflows locally in the same way as we do on SaaS and move much more quickly. Thanks to these settings, we can focus on creating integrations and workflows that we actually need to use ourselves daily, enhancing our own productivity, as we need to be compliant too.

jp

We were recently able to create all the integrations necessary to scan every new image in a registry using Trivy and reporting only new vulnerabilities as Jira issues in a week.

trivy-wf

We shipped this successfully as we were able to actually run the workflow locally with Smithyctl and building production ready integrations on top of our SDK.

Spared No Expense

By focusing on reducing TTV, we were able to:

Bring down time to implement and deploy a new integration from one week to a few business days. No need to understand how to build and distribute a whole new application - just focus on the business logic, the rest is covered.

Lower the time to run a workflow to get vulnerabilities on a specific repository from ~3h with Kubernetes/Tekton experience down to ~3 minutes.

Create solid foundations for contributors to write their own integrations on top of Smithy in a way that makes sense.

Standardise how workflows run, both locally, where you want and on SaaS - boosting Dogfooding capabilities.

Build a smart data lake, powered by OCSF, on which more complex features can be built.

Below we compare the time to get vulnerability findings by running a sample workflow before our evolution and after, even with pre-existing technical knowledge and using the platform daily:

before-after

Give back time to security teams, including ourselves, to focus on what really matters.

no-expense

The Lost World: What’s Coming Next

Our ecosystem is not mature yet. We’re already planning and soon rolling out updates to make developing integrations even easier and more transparent.

We’re also rolling out multiple features built on top of our smart datalake to make you focus on what really matters:

  • only reachable and relevant vulnerabilities (code to cloud)
  • dismiss vulnerabilities
  • automatically resolve vulnerabilities

trex

Credits

We’re planning to deep dive into more technical bits of how we built both the SDK and Smithyctl, as well as how we integrated this into our SaaS for a smooth transition. We’ll also cover why we chose OCSF and how we leverage it to make our lives (and yours) easier.

Follow us on LinkedIn to know more.

If you have any questions or what to get a tailored demo up and running:

Finally, you can get started with Smithy by following our docs.