What led me to capability security

Hessra started out with a dose of naivety. I strongly believe in zero trust as the best way to make networks secure but felt that our tools are lacking. VPNs are too coarse, gateways and proxies too cumbersome, OAuth feels clunky. So I created Hessra to be the identity and authorization service meant to give everything a way to be zero trust. Present your identity, receive an authorization token, use it. Simple right?

Then I started using what I had built to authorize different things. The waitlist signup microservice? Easy peasy. A daliance with postgres row-level security? A deep rabbit hole but it fit the simple authorization model cleanly. An AI Agent to delegate my identity to and read the waitlist as my delegate? Wow this is going great.

Next up: a simple SaaS-style webapp with users that belong to tenants. Something done millions of times across the internet. It turned out to be a veritable quagmire. All of the sudden the thus far gently evolving Hessra authorization service exploded in complexity. New, specialized API endpoints. Tons of optional arguments. The startling realization of "oh I see why people just use OAuth now." I was shook. Things got out of hand.

The problem was that this webapp, like many others, have dynamic components that specified a resource. If a user needed to get resource foo, they aren't actually getting foo, they are getting tenant123/foo. It's easy enough to make a policy on the authorization service saying userX can access foo, but one that makes sure its the right foo and avoids the confused deputy problem was not easy without the previously mentioned complexity.

Up until this point, I thought I understood capability security. I would use the terms "authorization token" and "capability token" interchangably to describe the tokens issued by the Hessra service. Simultaneously, in the back of my mind, I knew I had long punted on the question of "what is a resource and who exactly was in charge of naming them?" I figured it was fine to hand-wave it away and say future users would manage naming however they see fit and keep the Hessra authorization service updated with that state. What a disaster that would have been.

I tried countless variations to try and resolve this. One idea was a control-plane API to update the authorization service with the resources. This would mean that any new user or tenant on the the webapp would first need to be written to the authorization service before it could be used. Also, what if I decided to change the webapp and add another layer like projects? Both the authorization service and the webapp would need to be updated at the same time. Good lord no.

Another idea was to add some kind of wild-card or pattern-matching components to a resource's name. But that doesn't really do anything because you still need to distinguish between tenant123 and tenant456. Plus, there would still be the migration problem AND there'd now be a DSL or regex type thing in the policy and/or authorization token itself. Yuck.

The next idea was to craft the authorization token such that it is useless until the webapp fills in pertinent information. As an aside, it is awesome that you can do such things using biscuits. This meant something like "userX can access resource foo restricted to a mystery tenant" but was incomplete until the webapp specified which tenant. This was a bit better, but still extremely complex because it relies on the authorization service to understand the webapp's structure and that there is the concept of a tenant instead of organization or something else entirely. I could get around needing to know specific names, but I'd still need to understand the application's naming conventions.

If I was building an authorization service bespoke to this specific application, there's no problem with this. But Hessra is meant to be a generic service for any machine to talk to any other machine.

I was starting to get to the crux of the issue: naming. Why is naming so dang hard.

At this point, I took a step back. I read more deeply about security models and finally accepted the fact I didn't understand capability security at all. I went searching for answers and find them I did.

I came across this hackernews comment thread from 2015 by user kentonv which led me to this blog post by Mark Miller on his site for the E programming language. That post was a holy revelation.

In Miller's post, he talks about various myths accepted by most (at the time) as limitations in capability security. In addressing these, he compares capability security to access control lists (ACLs) and paints a very descriptive picture of what capability security is in the process.

There is no better feeling than finding a blog post from the past that explains how to fix your shit (RIP stackoverflow). But this felt deeper. Like stumbling upon a scroll of the old magic.

You should really read Miller's post, but to summarize:

A capability is made up of two fundamental elements: authority and designation. The authority is the provable ability to perform an action. The designation is the complete, unambiguous name of the object being acted upon. Designation is of equal concern as authority. Naming is hard but equally as important as the authority. And I've been punting on it. Without authority and designation coupled together, confused deputy problems arise easily. With them coupled, there are no confused deputies.

Applying this model to Hessra and the webapp gives the following: the authorization server becomes the root authority, able to grant capabilities to objects in the system underneath it. The root authority can issue fully designated capabilities, but for something like the webapp use case, the root authority no longer needs to. Naming can live with the app, the thing that comes up with the names. The webapp simply adds designations by attenuating the biscuit token.

Put another way, the webapp is requesting a capability for itself for foo. The webapp owns all the foos. They are the webapp's foos. Then, when a user wants to access its specific foo found at tenant123/foo, the webapp adds the designations tenant123 and userX thereby fully naming the object.

Finally my problem is solved without turning my core service into a trash heap of APIs, args, and edge cases.

There was a hard-to-swallow pill unearthed in all this. At the outset, I envisioned a very centralized system so that I could have (and therfore provide) complete visibility into every identity and authorization decision. Embracing capability security means something more decentralized and compositional. There are still lots of ways to get the benefits centralization gave so cheaply, but it was hard for someone who spent twelve years working on products whose value proposition was in centralization (Meraki cloud-managed networks).

The benefits of capability security helped make my reformation easy.

Embracing capability security and solving the naming problem means that all the work to authorize a request is done up front. This means there is very little burden on verifiers and a huge lessening of the possibility to mistakenly grant access.

Confused deputies are no more.

And the cherry on top: if the webapp's naming changes, only the webapp needs to care. There's no nightmare of multi-system update tango, only sweet dreams of secure systems left in charge of their own naming.

This is what Hessra is now built on. If you've run into similar walls or are finding the usual tools too clunky, capability security might be the model that makes the pieces start to fit. Check out our open source capability engine.