Consistent and Secure Default RBAC

Problem Summary

OpenStack’s initial goal to be a multi-tenant platform drove the idea that users operate within the confines of one project at a time. Early versions of the authorization system, which included keystone and various middleware, fulfilled this requirement.

However, OpenStack’s explosive growth and adoption added services and API surface area to the ecosystem. This growth quickly outpaced the authorization engine. This allowed the community to develop rich APIs, across services, that operate on different layers of the infrastructure. For example, OpenStack has APIs that manage compute hosts, services, endpoints, domains, physical networks, and storage pools. All of these resources require knowledge about the underlying hardware, deployment architecture, and usage within a given organization. These APIs are clearly targeted at different users from APIs that expose resources, like instance, block storage devices, or virtual networks.

The authorization strategy didn’t age gracefully with the rest of OpenStack. This means we used the best available tools at the time to protect the API we were developing across OpenStack.

This led to the following problems:

  1. By default, users were either average end-users or operators, which is far too restrictive for real-world clouds

  2. The design violated the principle of least privilege

  3. Inconsistent authorization behavior across services, resulting in some default policies being completely open to any user

  4. Operators need to be intimately familiar with the policy implementation to supply overrides for valid use cases (read-only privileges)

  5. Auditing OpenStack APIs requires administrative access

  6. Having no role hierarchy makes it hard to establish any low-level collection permission collection, like a role for read-only access, which is implemented inconsistently across deployments

The above issues aren’t a complete set of all problems related to authorization in OpenStack, but they are pain points we, as the upstream community, know about. They also prohibit the adoption of OpenStack by:

  • Requiring operators to understand and configure policy for any compliance target

  • Aggregating all authoritative power into select users, violating the constrained RBAC model

  • Not providing a role hierarchy that allows for easy authorization management

  • Not providing a granular set of permissions

  • Not providing an easy way for operators to audit what a particular user can do within the deployment

Where are we today?

The following initiatives are in progress or complete:

  1. Moved policy and documentation into code

  2. Created a default role hierarchy in keystone (default roles specification)

  3. Added a new scope to keystone (system scope specification)

  4. Updated all libraries to understand the new scope

  5. Documented the idea of personas

  6. Created a policy popup team to enable this work across more OpenStack services.

  7. Proposed and implemented several testing strategies using tempest, functional, and unit tests for projects to use as a reference

  8. Audited every active OpenStack project API and mapped administrative functionality into the system-scope personas

  9. Applied the reader and member role consistently to project-scoped resources

  10. Converted each service policy file from using JSON to YAML

To date, the work to audit each API, propose new default policies, and implement unit, functional, or tempest tests has accumulated more than 130,000 lines of code change across 41 repositories.

Direction change

Throughout this process we’ve communicated with operators and end users about the changes to implement a new scope type. Early feedback on the approach to isolate system-level APIs behind a new authorization target alluded to the ability for operators to continue supporting their users by interacting with project-owned resources. A good example of this use case is evacuating an instance from a host.

Based on the initial discussions of how system-scope would be used, we decided to allow operators to interact with project-owned resources using system-scoped tokens.

It wasn’t until we started applying this idea to various services that we realized it was going to cause issues with service-to-service communication and require significant refactoring in each service. This is due to the fact that OpenStack services have been developed with the assumption that project IDs will always be present, and it’s rare to interact with a resource without a project ID associated to the request.

For example, if an operator uses a system-scoped token to create an instance for a user in a specific project, they need to specify the project ID that owns the instance and they need to pass their system-scoped token to the service. Each service would need to understand the concept of system-scope and make sure to use the correct project ID. This approach is error prone, especially since each OpenStack service can have multiple clients to other services. We worked through the design and uncovered these issues while implementing a specification we targeted for the Xena release that allowed system users to pass-through project IDs with a system-scoped token. This work would have required significant refactoring and non-trivial changes to multiple projects, increasing the risk in implementing the functionality consistently and safely.

We spent significant amount of time during the Yoga PTG revisiting the discussion (etherpad). Ultimately, we stepped back and realized that the primary use case for allowing system users to operate on project-owned resources with a system-scoped token was to allow for backwards compatibility.

While we certainly want to make things as easy as possible for operators to use, we’re not sure the additional overhead required to teach each OpenStack service about system-scope in this way would be beneficial. This is especially true when we considered the fact that a single user account, or bearer token, carries a significant amount of authorization. We’re really just pushing the problem from a user with the admin role on a project to anyone with the admin role on the system.

Instead, we decided to remove the assumption that anyone using a system-scoped token should automatically be able to access any OpenStack API.

To clarify, we did agree that system administrators (e.g., operators) should be able to manage resources within a project, but we don’t want to conflate that use case into the system-scope construct for the reasons described above. System administrators have the ability to grant themselves authorization to projects, domains, and the deployment system itself. A few extra steps would allow them to get the correct authorization to the intended project and perform the necessary operations using a token flow that’s already supported. Additionally, it provides a very clear audit trail.

So, where do we go from here?

We have a set of OpenStack services that have over-extended the usage of system-scope and applied it to project-specific resources. Other services have yet to adopt the system-scope feature.

Currently, none of the policy work we’ve done since Queens is widely usable by default since it’s not applied consistently across services. The idea of this community goal is to define the absolute minimum amount of work required to allow operators to opt into the new authorization behavior and start using the personas we’ve been developing since Queens.

We should defer any policy work that isn’t absolutely necessary to the criteria of this goal for future improvements. Otherwise we risk delaying the functionality another release. Instead, we can acknowledge the gaps, order them on a timeline for future improvements, and at least deliver something useful to operators sooner rather than later.

Phase 1

Implement support for system-admin, project-admin, project-member, and project-reader personas.

The project-member and project-reader changes are relatively trivial. The majority of the work in this phase is focused on breaking administrative functionality into the project-admin and system-admin personas.

Re-evaluate project-specific API policies

We need to go through each policy across OpenStack services and make sure it aligns with the direction described above. Ideally, each policy should only include a single scope type. Please refer to Crafting check strings for APIs that interact with multiple scopes for APIs that are truly designed for multiple scopes. For example, the following policy was written to eventually allow system administrators to create instances on a targeted host using a system-scoped token:

policy.DocumentedRuleDefault(
    name='os_compute_api:servers:create:forced_host',
    check_str='role:admin and project_id:%(project_id)s',
    scope_types=['system', 'project']
)

Since instances are project-owned resources we want to keep the functionality isolated to project-scoped tokens. The policy should be updated accordingly:

policy.DocumentedRuleDefault(
    name='os_compute_api:servers:create:forced_host',
    check_str='role:admin and project_id:%(project_id)s',
    scope_types=['project']
)

This will only allow operators with a project-scoped token containing the admin role to perform targeted boot. If or when nova sanitizes hypervisor discovery to expose information safely to end users, the policy could evolve further (potentially in Phase 2):

policy.DocumentedRuleDefault(
    name='os_compute_api:servers:create:forced_host',
    check_str='role:manager and project_id:%(project_id)s',
    scope_types=['project']
)

This would push the functionality even closer to end users, making the API more self-serviceable.

Isolate system-specific API policies

We need to perform the same exercise for system-specific API policies and ensure system-level APIs are only accessed with system-scoped tokens. These cases should be much more rare than the previous examples, since the majority of OpenStack’s APIs and resources have grown to expect project ownership.

We need to make sure APIs that are truly system-specific set the appropriate scope type. An example of these resources are hypervisors:

policy.DocumentedRuleDefault(
    name='os_compute_api:os-hypervisors:list',
    check_str='role:admin',
    scope_types=['system']
)

Managed volumes:

policy.DocumentedRuleDefault(
    name='volume_extension:volume_manage',
    check_str='role:admin',
    scope_types=['system'],
)

Services and endpoints:

policy.DocumentedRuleDefault(
    name='identity:delete_service',
    check_str='role:admin',
    scope_types=['system']
)
policy.DocumentedRuleDefault(
    name='identity:create_endpoint',
    check_str='role:admin',
    scope_types=['system']
)

Note

Each example above only uses a role check in the check string. This is by design and allows for backwards compatibility while the [oslo_policy] enforce_scope=False because a user with the admin role on a project is still allowed to access that API.

Once [oslo_policy] enforce_scope=True, the API will only be exposed to system users. After we guarantee that scope enforcement happens in oslo.policy using enforce_scope we can re-assess the roles of each policy and loosen them as necessary (e.g., moving from role:admin to role:member or role:reader where system-member or system-reader is appropriate).

Crafting check strings for APIs that interact with multiple scopes

At this point, any remaining policies that are not either project-scoped or system-scoped should have a valid use case for interacting with both scopes.

Flavors are a good example of a resource that should operate with multiple scopes. Operators should be able to create, update, and delete flavors for a deployment, which affects every project and user of the deployment. Project users should be able to view flavors available for them to use. Additionally, users with authorization on a domain should also be able to view flavors.

The following shows how you can specify multiple scopes for a single rule:

scope_types=['system', 'domain', 'project'],

Listing project resources across the deployment

Now that we’re taking a firm stance on how scope interacts with different types of resources, we’re presented with a problem.

Traditionally, anyone with the admin role, usually on a project, could list all resources. This is usually implemented as a query parameter telling the service that the user wants all instances in the entire deployment (e.g., GET /v2.1/servers/detail?all_tenants=True.) This pattern is applied across resources and service, and it’s applicable to instances, volumes, backups, snapshots, etc.

The direction defined in this goal suggests that anyone with the admin role on a project should only be able to view resources within that project, even if that persona is reserved for operators. Additionally, we’re also standing firm in our decision to not allow system users to interact with project-owned resources.

How do we support operators that wish to view all resources in a deployment?

There are at least four potential solutions:

  1. Add domain-admin to Phase 1

  2. Add domain-admin to Phase 2

  3. Implement client-side functionality to brute force resource lists in Phase 1

  4. Allow project-admins to view resources across the entire deployment

The first solution is to add formal support for domain-admin. This would allow someone with the admin role on a domain to use a domain-scoped token to call GET /v2.1/servers/detail, and nova would understand that it needs to filter the instance list by all projects owned by the domain. This is probably the correct solution, but it adds to an already full schedule for services implementing Phase 1.

The second solution would push implementing domain-admin off to Phase 2, giving the community more time to focus on delivering Phase 1. If we take this approach, operators waiting to use this functionality won’t have a way to list all resources in the deployment in the Yoga, or potentially Z-release.

The third solution takes a brute force approach where the client recognizes it’s dealing with a domain-scoped token, queries keystone for all projects within that domain, gets a token scoped to each project, and asks the service for all resources with each project-scoped token. Then, it would aggregate all those results together and present it to the user.

The fourth solution would be to continue allowing people with the admin role on a project to list all resources across the deployment (for applicable APIs only.) The following is an example of what a policy would look like using this approach:

policy.DocumentedRuleDefault(
    name='os_compute_api:servers:detail:get_all_tenants',
    check_str='role:admin',
    scope_types=['project']),

This would allow things to work as they do today for operators, but with the understanding that this functionality is going to change when services adopt Phase 2. Eventually, domain users will be allowed to use list all resources across projects and at that point, we should restrict project-admins from being allowed to list resources outside their project:

policy.DocumentedRuleDefault(
    name='os_compute_api:servers:detail:get_all_tenants',
    check_str='role:admin',
    scope_types=['domain']),

This functionality is important for operators finding resources, especially for support cases, like rebooting or live migrating an instance.

The direction for Phase 1 is to use solution #4, where a project-admin can continue listing resources across the deployment, while we target domain support for Phase 2 or Phase 3.

How operators opt into the new functionality

If we can complete each item above for the Yoga release, operators will be able to configure each service to opt into the new defaults across all services, securely implementing the same personas across the deployment:

[oslo_policy]
enforce_new_defaults=True
enforce_scope=True

This configuration enables the following personas:

  • System Administrator
    • Denoted by someone with the admin role on the system

    • Intended for the most trusted operators or support personnel

    • Not intended for end users

    • Has the ability to interact with any resource in the deployment because they can give themselves any role on any authorization target (project, domain, or system)

    • Can grant any role to any user or group on any project, domain, or system

    • Add or delete services and endpoints

    • Create new volume types

    • Move pre-existing volumes in and out of projects

    • Create or delete HSM transport keys

  • Project Admin
    • Denoted by someone with the admin role on a project

    • Intended for operators who need elevated privilege on project resources

    • Can perform operations on project resources that affect other projects in the deployment

    • Not intended for end users

    • Forcibly reset the state of an instance

    • Forcibly deleting an application stack

    • Making an image public to the entire deployment

    • Create physical provider networks

  • Project Member
    • Denoted by someone with the member role on a project

    • Intended to be used by end users who consume resources within a project

    • Create, delete, or update an instance

    • Create, delete, or update a volume

    • Create, delete, or update a network

  • Project Reader
    • Denoted by someone with the reader role on a project

    • Intended to be used by end users for read-only access within a project

    • Not allowed to make any writable changes to project-owned resources

    • List and get instances

    • List and get volumes

    • List and get images, including private images within the project

    • List and get networks

These new persona divide the current role of an operator between system-admin and project-admin personas. This is by design and starts to slowly break down the authorization associated to administrative tokens.

For increased usability, operators could bootstrap their trusted team of operators or support with inherited role assignments on each domain, making it easier for operators to get project-scoped tokens for each project in the deployment:

$ openstack role add --os-cloud system-admin --user 2c0865 --domain foo --inherited reader
$ openstack role add --os-cloud system-admin --group b3dbc2 --domain foo --inherited admin

Phase 2

  1. Isolate service-to-service APIs to the service role

  2. Update policies to incorporate project-manager

  3. Implement domain-admin support where service keep track of domain IDs in addition to project IDs as owners of a resource

Any API developed for machines to communicate with each other should use the service role. This is an important part in reducing authorization for each service. For example, neutron needs to inform nova about network changes, but it shouldn’t need the ability to create new users and groups in keystone, which it currently has. The project-manager persona is described as follows:

  • Project Manager
    • Denoted by someone with the manager role on a project

    • Intended to be used by end users

    • Slightly more privileged than regular project-members

    • Locking and unlocking an instance

    • Setting the default volume type for a project

    • Setting the default secret store for a project

Phase 3

Implement system-member and system-reader personas. This allows operators to use the principle of least privilege for their team members, support personnel, or auditors.

  1. Implement system-member persona for applicable system APIs

  2. Implement system-reader persona for applicable system APIs

After we update the default for [oslo_policy] enforce_scope=True we can re-assess all system-admin policies and loosen them to implement the system-member and system-reader personas, resulting in the following functionality.

  • System Member
    • Denoted by someone with the member role on the system

    • Intended for operators or lab technicians

    • Not intended for end users

    • Manage hypervisors and aggregates

    • Manage resources in placement

  • System Reader
    • Denoted by someone with the reader role on the system

    • Intended for operators or auditors for system-specific resources

    • Not intended for end users

    • View hypervisor and aggregate information

    • List all cinder services

    • View all domains and identity providers within the deployment

Tracking Etherpad: https://etherpad.opendev.org/p/rbac-goal-tracking

Champion

  1. Lance Bragstad <lbragstad@redhat.com> (lbragstad)

  2. Ghanshyam Mann <gmann@ghanshyammann.com> (gmann)

Gerrit Topic

To facilitate tracking, commits related to this goal should use the gerrit topic:

secure-rbac

Completion Date & Criteria

Yoga Timeline (7th Mar 2022)

  1. Keystone implements a new default role called manager

    The manager role will be a part of the role hierarchy and it will sit in between the admin and member roles. This work requires a keystone specification.

  2. Keystone implements a new default role called service

    The service will standardize a role that’s already required in some default policies across OpenStack. This role must be built outside the existing role hierarchy, where admin implies manager implies member implies reader. This work requires a keystone specification.

  3. Keystone enforces scope by default

    Keystone sets keystone.conf [oslo_policy] enforce_scope = True.

    Keystone has fully supported system-admin, system-member, system-reader, domain-admin, domain-member, domain-reader, project-admin, project-member, and project-reader since the Train release.

    For the Yoga release, Keystone should remove all deprecated policies, which will require operators to use the new personas. This will be relatively low-touch for end-users since Keystone’s API is mostly administrative. This gives operators the opportunity to experiment with the domain and system personas.

  4. Services start implementing Phase 1

    At this point, all services are free to start implementing system-admin, project-admin, project-member, and project-reader personas as described above in Phase 1. By the end of the Yoga release, at least one service must have Phase 1 complete. Phase 1 introduces the new personas but allows operators to opt into the new behavior for services that complete Phase 1, allowing operators to upgrade smoothly to the new permission model on a per-service basis.

    It’s important that we have an OpenStack-wide release note or statement that explicitly states the status of this work and how permissions behave across OpenStack services.

  5. OpenStack-wide Personas Documentation

    We need very clear documentation that describes all the potential personas, what they mean, who they were designed for, and how to use them. By the end of the Yoga release, this document should include each persona and what its support is across OpenStack services.

    Engineers should use this documentation to determine what the default policy should be for APIs they’re developing and maintaining. Operators should use it to understand what personas are the most appropriate for their users based on the permissions they need. The documentation should also clearly describe the scope associated to each API. Highlighting the relationship between scope and a resource will help build a frame of reference for operators delegating authorization on various scopes. It will also help establish the expectation that mixing and matching scopes won’t be supported in future releases.

At this point, operators must run keystone with enforce_scope=True since the deprecated policies will be gone, and the default value for this specific option in keystone will be updated accordingly. They can also choose to run any service that’s completed Phase 1. This will require the operator to configure the service to use enforce_scope=True and enforce_new_defaults=True if they chose to adopt the new behavior for services that support it.

This means that operators must use the correct scope when interacting with services they’ve configured to enforce scope. For example, an operator will need a system-scoped token to manage domains or service endpoints in keystone. If the operator also deploys nova to enforce scope, they will need a system-scoped token to manage hypervisors or aggregates.

Z-Release Timeline

  1. Keystone implements Phase 2 and updates policies to include the manager role where applicable

    Keystone starts implementing support for manager across project, domain, and system scopes. Keystone has supported system-admin, system-member, and system-reader since Train, which completes the Phase 3 goals

  2. All services must implement Phase 1

  3. Any service that completed Phase 1 in Yoga can set enforce_scope=True by default

At this point, every OpenStack service will have completed Phase 1, which allows operators to opt into using system-admin, project-admin, project-member, and project-reader across their entire deployment.

To summarize, operators will need to update every service configuration file where they want to use system-admin, project-admin, project-manager, project-member, and project-reader. For example:

  1. Set glance-api.conf [DEFAULT] enforce_secure_defaults=True

  2. Set glance-api.conf [oslo_policy] enforce_scope=True

  3. Set glance-api.conf [oslo_policy] enforce_new_defaults=True

  4. Set neutron.conf [oslo_policy] enforce_scope=True

  5. Set neutron.conf [oslo_policy] enforce_new_defaults=True

  6. Set cinder.conf [oslo_policy] enforce_scope=True

  7. Set cinder.conf [oslo_policy] enforce_new_defaults=True

  8. Set ironic.conf [oslo_policy] enforce_scope=True

  9. Set ironic.conf [oslo_policy] enforce_new_defaults=True

  10. Set barbican.conf [oslo_policy] enforce_scope=True

  11. Set barbican.conf [oslo_policy] enforce_new_defaults=True

AA-Release Timeline

  1. Update oslo.policy enforce_scope=True

    Since all services have completed Phase 1, we can update the default in oslo.policy so that enforcement checks scope by default. This will allow each service to remove code to override the enforce_scope=True and use the upstream default from oslo.policy.

  2. Any service that implemented Phase 1 in Yoga and enabled enforce_scope in Z can removed deprecated policies used to implement Phase 1 and can start implementing Phase 2

Operators consuming the AA release will have the personas delivered in Phase 1 available and enabled by default. This includes system-admin for all system-level administrative APIs, project-admin for project-level administrative APIs, project-member for common end-user interactions, and project-reader for a read-only variant of project-member.

BB-Release Timeline

  1. All services can remove deprecated policies used to implement Phase 1

  2. All services must implement Phase 2

  3. Any service that completed Phase 2 in the AA release can remove the deprecated policies used to implement Phase 2 and start implementing Phase 3

Operators consuming the BB release will have full support for system-admin, project-admin, project-member, project-reader, and service role dedicated for service-to-service communication. Additionally, they will have a project-manager persona for elevated privileges safe for end users on a project.

CC-Release Timeline

  1. All services can remove deprecated policies used to implement Phase 2

  2. All services must implement Phase 3 and remove deprecated policies in a future release following an acceptable deprecation cycle

  3. Any service that completed Phase 3 in the BB release can remove the deprecated policies used to implement Phase 3

Operator will have all the benefits from the BB release, as well as two additional system personas called system-member and system-reader that will enable operators, support personnel, and auditors who need access to system resources.

References

Current State / Anticipated Impact

Current progress is maintained on the wiki page.