Teapot Load Balancing

Load balancers are one of the things that Kubernetes expects to be provided by the underlying cloud. No multi-tenant bare-metal solutions for this exist, so project Teapot would need to provide one. Ideally an external load balancer would act as an abstraction over what could be either a tenant-specific software load balancer or multi-tenant-safe access to a hardware (or virtual) load balancer.

There are two ways for an application to request an external load balancer in Kubernetes. The first is to create a Service with type LoadBalancer. This is the older way of doing things but is still useful for lower-level plumbing, and may be required for non-HTTP(S) protocols. The preferred (though nominally beta) way is to create an Ingress. The Ingress API allows for more sophisticated control (such as adding TLS termination), and can allow multiple services to share a single external load balancer (including across different DNS names), and hence a single IP address.

Most managed Kubernetes services provide an Ingress controller that can set up external load balancers, including TLS termination, using the underlying cloud’s services. Without this, tenants can still use an Ingress controller, but it would have to be one that uses resources available to the tenant, such as by running software load balancers in the tenant cluster.

When using a Service of type LoadBalancer (rather than an Ingress), there is no standardised way of requesting TLS termination (some cloud providers permit it using an annotation), so supporting this use case is not a high priority. The LoadBalancer Service type in general should be supported, however (though there are existing Kubernetes offerings where it is not).

Implementation options

The choices below are not mutually exclusive. An administrator of a Teapot cloud and their tenants could each potentially choose from among several available options.

MetalLB (Layer 2) on tenant cluster

The MetalLB project provides two ways of doing load balancing for bare-metal clusters. One requires control over only layer 2, although it really only provides the high-availability aspects of load balancing, not actual balancing. All incoming traffic for each service is directed to a single node; from there kubeproxy distributes it to the endpoints that handle it. However, should the node die, traffic rapidly fails over to another node.

This form of load balancing does not support offloading TLS termination, results in large amounts of East-West traffic, and consumes resources from the guest cluster.

Tenants could decide to use this unilaterally (i.e. without the involvement of the management cluster or its administrators). However, using MetalLB restricts the choice of CNI plugins – for example it does not work with OVN. A pre-requisite to use it would be that all tenant machines share a layer 2 broadcast domain, which may be undesirable in larger clouds. This may be an acceptable solution for Services in some cases though.

MetalLB (Layer 3) on management cluster

The layer 3 form of MetalLB load balancing provides true load balancing, but requires control over the network hardware in the form of advertising ECMP routes via BGP. (This also places additional requirements on the network hardware.) Since tenant clusters are not trusted to do this, it would have to run in the management cluster. There would need to be an API in the management cluster to vet requests and pass them on to MetalLB, and a cloud-provider-teapot plugin that tenants could optionally install to connect to it.

This form of load balancing does not support offloading TLS termination either.

MetalLB (Layer 3) on tenant cluster

While the network cannot trust BGP announcements from tenants, in principle the management cluster could have a component, perhaps based on ExaBGP, that listens to such announcements on the tenant V(x)LANs, drops any that refer to networks not allocated to the tenant, and rebroadcasts the legitimate ones to the network hardware.

This would allow tenant networks to choose to make use of MetalLB in its Layer 3 mode, providing actual traffic balancing as well as making it possible to split tenant machines amongst separate L2 broadcast domains. It would also allow tenants to choose among a much wider range of CNI plugins, many of which also rely on BGP announcements.

This form of load balancing still does not support offloading TLS termination.

Build a new OVN-based load balancer

One drawback of MetalLB is that it is not compatible with using OVN as the network overlay. This is unfortunate, as OVN is one of the most popular network overlays used with OpenStack, and thus might be a common choice for those wanting to integrate workloads running in OpenStack and Kubernetes together.

A new OVN-based network load balancer in the vein of MetalLB might provide more options for this group.

Build a new API using Ingress resources

A new API in the management cluster would receive requests in a form similar to an Ingress resource, sanitise them, and then proxy them to an Ingress controller running in the management cluster (or some other centrally-controlled cluster). In fact, it is possible the ‘API’ could be as simple as using the existing Ingress API in a namespace with a validating webhook.

The most challenging part of this would be coaxing the Ingress controllers on the load balancing cluster to target services in a different cluster (the tenant cluster). Most likely we would have to sync the EndpointSlices from the tenant cluster into the load balancing cluster.

In all likelihood when using a software-based Ingress controller running in a load balancing cluster, a network load balancer would also be used on that cluster to ensure high-availability of the load balancers themselves. Examples include MetalLB and kube-keepalived-vip (which uses VRRP to ensure high availability). This component would need to be integrated with public IP assignment.

There are already controllers for several types of software load balancers (the nginx controller is even officially supported by the Kubernetes project), as well as multiple hardware load balancers. This includes an existing Octavia Ingress controller in cloud-provider-openstack, which would be useful for integrating with OpenStack clouds. The ecosystem around this API is likely to have continued growth. This is also likely to be the site of future innovation around configuration of network hardware, such as hardware firewalls.

In general, Ingress controllers are not expected to support non-HTTP(S) protocols, so it’s not necessarily possible to implement the LoadBalancer Service type with an arbitrary plugin. However, the nginx Ingress controller has support for arbitrary TCP and UDP services, so the API would be able to provide for either type.

Unlike the network load balancer options, this form of load balancing would be able to terminate TLS connections.

Build a new custom API

A new service running on the management cluster would provide an API through which tenants could request a load balancer. An implementation of this API would provide a pure-software load balancer running in containers in the management cluster (or some other centrally-controlled cluster). As in the case of an Ingress-based controller, a network load balancer would likely be used to provide high-availability of the load balancers.

The API would be designed such that alternate implementations of the controller could be created for various load balancing hardware. Ideally one would take the form of a shim to the existing cloud-provider API for load balancers, so that existing plugins could be used. This would include cloud-provider-openstack, for the case where Teapot is installed alongside an OpenStack cloud allowing it to make use of Octavia.

Unlike the network load balancer options, this form of load balancing would be able to terminate TLS connections.

This option seems to be strictly inferior to using Ingress controllers on the load balancing cluster to implement an API, assuming both options prove feasible.

Build a new Ingress controller

In the event that we build a new API in the management cluster, a Teapot Ingress controller would proxy requests for an Ingress to it. This controller would likely be responsible for syncing the EndpointSlices to the API as well.

Build a new cloud-provider

In the event that we build a new API in the management cluster, a cloud-provider-teapot plugin that tenants could optionally install would allow them to make use of the API in the management cluster to configure Services of type LoadBalancer.

While helpful to increase portability of applications between clouds, this is a much lower priority than building an Ingress controller. Tenants can always choose to use Layer 2 MetalLB for their LoadBalancer Services instead.

OpenStack Octavia

On paper, Octavia provides exactly what we want: a multi-tenant abstraction layer over hardware load balancer APIs, with a software-based driver for those wanting a pure-software solution.

In practice, however, there is only one driver for a hardware load balancer (along with a couple of other out-of-tree drivers), and an Ingress controller for that hardware also exists. More drivers existed for the earlier Neutron LBaaS v2 API, but some vendors had largely moved on to Kubernetes by the time the Neutron API was replaced by Octavia.

The pure-software driver (Amphora) itself supports provider plugins for its compute and network. However the only currently available providers are for OpenStack Nova and OpenStack Neutron. Nova will not be present in Teapot. Since we want to make use of Neutron only as a replaceable implementation detail – if at all – Teapot cannot allow other components of the system to become dependent on it. Additional providers would have to be written in order to use Octavia in Teapot.

Another possibility is integration in the other direction – using a Kubernetes-based service as a driver for Octavia when Teapot is co-installed with an OpenStack cloud.