High level approaches to Zero Trust

Networked systems evolved organically and but carried a serious flaw called ambient trust. The perimeter model emerged to contain the resulting security risks, but it created a false sense of safety and left networks vulnerable. Today we have pushed that model to its limits, and Zero Trust is needed to fix the underlying problem by treating every connection as untrusted, verifying identity at every step, and only allowing actions in accordance with policy.

If you have not yet, read my last post “Your network was never safe” or for a more detailed history, the book Zero Trust Networks: Building Secure Systems in Untrusted Networks.

So starting from where we are today, how should we solve this problem? I break the possibilities into three conceptual buckets.

Simulated Zero Trust

The first bucket I think of as simulated zero trust. These solutions try to take the tools we already have and orchestrate them in a way that looks like a Zero Trust system. If you can pull in enough telemetry and have hooks across the system, you can dynamically alter its configuration so that vetted operations succeed and malicious ones fail.

For example, if you can access traffic data from access points, switches, and firewalls, you can build a picture of what a connection is doing. If you have an agent on client devices, you can classify traffic even more deeply. Based on that classification, you could then reconfigure firewalls, routing, and VPNs in real time to allow or block that traffic.

In the 2010s, this was called intent-based networking. The hope was that administrators would express high-level intent such as: employees can access internal tools and public resources needed for their jobs, nothing else. Vendors with enough products deployed across the environment could use that intent to shape the network accordingly.

This is enormously complex. It might work if the entire network was made up of virtualized functions from a single vendor, but real networks are much messier. Devices from many vendors behave differently. Traffic classification is a losing game in a world where so much is hosted on CDNs or public clouds. Defining “Google Search” as a set of IP addresses becomes an endless chase.

This idea has since evolved into Secure Access Service Edge (SASE). If you cannot embed enough control in the network, have all traffic sent to you instead. By proxying the traffic, you can inspect it and control it. It is the ultimate middlebox.

But it suffers from the same middlebox problems. You cannot deeply manipulate encrypted traffic without explicit cooperation from at least one endpoint. And you have introduced a huge new attack surface in the vendor infrastructure that now sees all your traffic. You are trading one kind of implicit trust (trusting location) for another (trusting a single entity with wide access to your raw data).

It also has the obvious problem of broadening your attack surface to include another entity that has access to your raw data. In trying to build a zero trust system based on identity and policy, you break the principle of least privilege by creating a superuser in your system that can do almost anything: the SASE vendor.

Current examples: Zscaler, Cisco, Cloudflare

Shrink the perimeter

Shrinking the perimeter began naturally. NAT and firewalls separated the internet from local networks. VLANs then split local networks into smaller segments.

What if we kept shrinking? Instead of using firewalls to block traffic based on location, we could assign identities to smaller and smaller perimeters and apply policy directly to those identities.

The perimeter used to be the firewall or gateway. Then it moved to VLANs, subnets, or VPCs. Now it can be machines, services, or even individual workloads. Communication between them happens through encrypted channels like VPNs, with access controlled at each endpoint.

This feels decentralized and peer-to-peer, but it does not fully eliminate implicit trust. It just moves it. These approaches are much better than before, but still incomplete.

Identity in this system is straightforward: building a VPN or encrypted channel requires keys. Give each endpoint its own public and private key pair and that is the identity.

But when identity is attached to the tunnel, then that identity ends where you terminate that tunnel. You need to have a plan on how to securely propagate identity and the associated requests from that point to the finish, or you have just introduced a perimeter and implicit trust in your system.

Authorization is more complex. You can enforce policy at the tunnel entry point, but if that machine is compromised, the attacker can rewrite the rules. You can enforce policy at the tunnel exit, but encrypted traffic (for example HTTPS) must terminate there or your gatekeeper will have no visibility. Proxies are often used for this, which again introduces trust in an intermediary.

You could push the tunnel termination deeper into the system, but now you are inserting heavy encryption, decryption, and key management into parts of the system that may not be designed for it.

The two main issues:

The identity of the original entity is easily severed
Maintaining identity binding deep in the system requires expensive encryption and key management or a separate solution like request signing applied at the right point

VPNs and other encrypted channels have great connectivity features like hole-punching and relays, but encryption alone is not enough. It is table stakes. Adding true identity and authorization to what is essentially a connectivity technology is very hard.

Current examples: Tailscale, NetFoundry

Request-Scoped Security

The third approach is to attach identity and authorization to every request.

In the tunnel-based approaches, you need to be sure identity is propagated along with the requests from its associated tunnel through to the end of the system. What if you removed the tunnel and focused on that?

This is how most internet-facing applications work. But even here you can easily reintroduce implicit trust if you do not tightly couple requests and identities together. This can happen if you rely on a proxy to terminate TLS and pass traffic along (as many vendors or Kubernetes do by default), or if a request flows through multiple services before reaching its final destination.

You can avoid this by request signing. That way each request carries a cryptographic binding to the identity that created it, and no proxy or intermediary can tamper with it without detection.

Improper handling of TLS can also derail this approach. If TLS is misconfigured, especially on either side of a proxy, attackers can inject or smuggle requests. Careful configuration, such as disabling protocol downgrades or chunked requests, can mitigate this risk.

If you do the work, you can reliably carry a request and its identity through the system. Now you need to decide where and how to apply authorization.

Policy Engines

You can bake authorization logic directly into application logic, but that becomes brittle and difficult to manage. This is often how authorization starts off in an internet-facing application and is also why broken access control is number one in OWASP’s top 10. A better approach is a policy engine.

This can be a library and DSL embedded into your code, or a policy database you query at runtime.

A library and DSL embedded into your code is simpler and will do a great job at enforcing consistency in defining and applying policy, but it suffers from difficulty in making changes, as you usually have to update and deploy code to do so.

The database approach can make defining, enforcing, and updating policy easier. A library and DSL can operate similarly as a sidecar or centralized service to get the same policy update gains. An actual database-centric approach, like SpiceDB, has a real benefit in addition: being able to fuzzy-match policies against data. If you do not know what type of data will be retrieved by a request but have strict access controls on that data, this is a notable benefit as the policy lives close to the actual data.

The major downside of these approaches is that it can be hard to replicate policy everywhere. Getting all three benefits of easily defining policy, enforcing it, and updating it means you will not be able to embed it everywhere. This naturally leads to a central service that incurs costly round trips. Those overheads are often invisible to developers and can create performance surprises.

Active Directory and LDAP were early forms of this idea, but they were built on perimeter-model assumptions. Modernizing them is a losing battle.

Application-embedded examples: Oso/Polar, Cedar

Database policy engine examples: AuthZed/SpiceDB

Token-based

Request-end policy engines require a provable identity and request pair to arrive intact at the endpoint. What if you included the authorization decision itself in the request?

Instead of sending only a signed identity token, you could also encode the policy or permissions into the token. In role-based systems, this might be roles. In attribute-based systems, attributes. Depending on the design, this can reduce or even eliminate the need for a central policy engine.

It also makes enforcement easier in small-footprint environments like microservices, edge functions, or embedded devices.

This pattern has emerged organically. OAuth and SAML were early forms of it. OAuth began as a way to delegate authority from a human user to a third-party application but expanded to many more use cases. JSON Web Tokens (JWTs) became a common token format for encoding identity and authorization claims.

JWTs are now everywhere, especially in cloud services. But they are flexible to the point of chaos. Different teams and providers use them differently, and standards only partially rein this in. Large providers like AWS even have security token services to help manage the sprawl.

A flexible token type with little in the way of adopted standards that combines identity and authorization seems like a recipe for disaster. To make matters worse, these tokens are also unknowingly treated as distributed policy caches with long expiration times as they were stuffed into cookies and used in place of bearer tokens.

But with all this mess, they are increasingly used. Why?

Because they can be easily and securely attached to a request for its entire lifecycle in complex systems. And with the right architecture, underlying token technology (biscuits), and opinionated definition for what its contents are, token-based authorization promises to be so much more.

Where Hessra fits in

At Hessra, we are building a new token-based authorization system designed from the ground up for Zero Trust. Our goal is to make it secure and simple to carry identity and authorization everywhere in your system. By tightly coupling identity, request context, and policy into a single verifiable token, Hessra makes it possible to enforce authorization decisions consistently across services, environments, and infrastructure without relying on implicit trust or brittle perimeter controls.

In future posts, I'll be covering what our architecture looks like and what new possibilities it provides.

— Jacob Valentic, Co-founder of Hessra