The Challenges of Using OPA for Application Authorization
The Open Policy Agent (OPA) project is an incredibly flexible and powerful policy engine. While being a general-purpose decision engine, it is applied heavily in the infrastructure space. At Aserto, we think its use can be extended to the fine-grained application authorization use case as well. OPA is at the core of our solution, but we’ve encountered some challenges that developers using OPA might face when applying it to their solutions. In this post, we’re going to discuss some of these challenges and propose some ways for mitigating them.
What does OPA offer?
OPA is a lightweight, general-purpose policy engine that provides an API to a decision engine that makes decisions given a policy, data, and input from the application. It decouples the policy decision-making logic from the policy enforcement logic, which makes software built around it easier to evolve and scale.
OPA policies are expressed in a declarative language called Rego, and it is used to define the decision-making logic. Alongside the policy, which contains the policy rules, developers add a data file in JSON format which includes information needed for the decision engine to make its decisions. These files are packaged together in a tarball to form the policy bundle, which is then loaded into the decision engine. OPA also provides mechanisms for retrieving external data that might be needed by the decision engine.
At runtime, the decision engine receives input from the application, and evaluates it against the data and policy that were loaded into it. The policy may retrieve additional data as part of the evaluation process. Once the engine has gathered all of this data, it makes a decision.
The way OPA bundles its policies presents a distribution challenge. Once we are done writing our policy and packaging it, we’ll deploy our policy to the engine. Then the question we’ll have to answer will be: how do we know that the policy we are currently running is the policy we intended to run?
Using OPA, there are no guidelines or mechanisms for versioning, naming, or transporting policies after they’ve been bundled. Unfortunately, a metadata file can’t be arbitrarily added to the policy bundle, which limits the ability to annotate the policy with additional information that is included as part of the bundle.
OPA bundles are opaque to consumers after they are packaged as tarballs — there is no outside-in visibility into the bundle. The only way to reason about the content of the policy bundle is to unpack it. This presents a challenge for the lifecycle management of the policy since developers are now forced to retrieve, unpack and inspect the content of the bundle to make any kind of operational decision.
Moreover, OPA doesn’t provide or recommend any signing mechanism which would ensure the trustworthiness of policy bundles. So the question posed at the beginning of this section is left unanswered: How can we verify that the policy we are currently running is the policy we intended to run? Can we trust that the policy bundle wasn’t tampered with?
Further, without versioning or naming conventions, there are no out-of-the-box mechanisms for discoverability or sharing capabilities for policy bundles. Without discoverability and sharing capabilities, it is a challenge to manage the distribution of policies or to promote the reusability of policies.
Simply put, policy bundles in and of themselves are non-standard, and thus they leave a lot of open-ended questions that are left for developers to solve.
The path we found to mitigate this challenge is by applying an enveloping strategy: we wrap the policy bundle with an Open Container Initiative (OCI) image. Using an OCI wrapper moves us from using a data structure that is unique to OPA and towards using a data structure that has been standardized and embraced by a much broader ecosystem.
Using the OCI standard, we can apply Semantic Versioning as well as standard signing solutions (like Sigstore) to ensure these concerns are addressed: An OCI artifact can have both labels and attributes that are indexable and searchable, which allows for discoverability and sharing. Semantic versioning in conjunction with signing allows us to know exactly what policy bundle we are currently running and prevents tampering, strengthening the integrity of our build.
We created the Open Policy Registry project to address these issues, providing tools that allow developers to create OCI images from their policy bundles that can be tagged and versioned. In addition, OPCR allows for easy discoverability and sharing of policies.
If you want to learn more about how to apply this approach to your policy bundles, head on to openpolicyregistry.io.
Currently, OPA policy bundles can be published to an S3 bucket, and the engine will poll that bucket for any changes. But there is no guaranteed way of telling which version of the policy is currently being run — since there’s no way to tell when the bucket was polled last, and there are no naming conventions or versioning in place which would provide the answer.
Further, there’s no way to push a new policy to the engine, so there’s no way of assuring what version is currently being run on the engine at any given time. The discovery of new or updated policy bundles is left for developers to figure out.
At Aserto, the use of signed OCI images, as well as the ability to push those images to the edge, gives us the assurance that we are running the version of a policy that we’re expecting.
Some special considerations need to be taken into account when using OPA for the application authorization use case. One of the most important questions is how the identity of a user is resolved by the engine.
The OPA engine has access to a JWT or SAML token, but any other piece of identity information it would want to resolve would have to be resolved over an HTTP call (unless it lives in the data.json file, which is an unlikely scenario). This is problematic for the application authorization use case, for three reasons:
- Using an HTTP call to inject data from an external service compromises the integrity of the decision made by the engine since the policy can no longer be considered to be immutable and read-only.
- In the application authorization use case, the engine is expected to make a decision for every application request, which means decisions have to be made in milliseconds and there’s no time for expensive network calls.
- JWT/SAML tokens are limited in size and might not be adequate for transmitting all information required for making an authorization decision.
The solution for this challenge is to bring the identity information needed to make authorization decisions as close to the engine as possible so that no network calls are made at runtime.
We host a database close to the decision engine itself, and it is synchronized and up to date with a centralized directory. This ensures the decision engine can be autonomous and that it’ll keep running even when the network might be down.
This pattern also allows for better control of the operational footprint of the engine: instead of relying on keeping all the identity information memory resident, we can load the required data on demand.
In addition to the identity context, a fine-grained authorization engine needs to be able to access information regarding the resource being accessed.
Currently, four mechanisms are relevant for this task, each with its own drawbacks:
- Overloading input — the application itself will provide the information regarding the resource accessed. This approach is problematic because it allows the decision engine to be influenced in unexpected ways by the application state. In an ideal situation, all data points used by the decision would come from a trusted source that can’t be tampered with.
- Bundle API — the resource information will be packed into the bundle (in the data.json file). This approach means that the resource data will have to be memory resident, which may be prohibitive in cases where there are many resources.
- Push data — A similar approach to the Bundle API, it suffers from similar drawbacks. Using this approach, data is replicated from a database into the OPA decision engine, where it lives in memory.
- Pull data — Using a built-in function like http.send, to retrieve the required data. This approach is problematic since it increases the processing time of each decision and might not be feasible for most application authorization use cases where a sub-millisecond response is required.
Similar to the approach we applied to the identity challenge, we use a database that lives close to the decision engine, which is then synced with all the resource information automatically. This ensures the integrity and speed of the decision engine by eliminating any network calls and the policy remains a read-only, immutable artifact.
Auditing and tracing is a key component of a production-grade authorization solution.
While OPA provides the ability to push decision logs to an HTTP endpoint, it doesn’t help with aggregating and centralizing all the messages. In deployments with multiple decision engine instances, this becomes a real challenge.
OPA is a versatile and flexible tool that makes it possible to build powerful decision flows while keeping them decoupled from the application. With that said, it leaves a lot of room for developers to come up with their own solutions when it comes to the operationalization of policies, the use of external identity and resource data, the centralization of decision logs, and more.
We proposed applying an enveloping strategy and wrapping policy bundles with OCI images and using standards like Semver and tools like Sigstore to make the process of distribution clearer and more conducive to sharing and reusability. We also presented the notion of co-locating a synchronized database close to the decision engine to overcome the challenges around using data required by the engine that might not be suitable to store in memory or fetch over the network.