Simplifying KubeVirt's `VirtualMachine` UX with Flavors and Preferences

tl;dr

The following is based on an active Design Proposal, an initial foundational PR and complete DNM/WIP series PR enhancing the existing Flavors API and introducing Preferences. Reviews are very much welcome on all of these PRs!

Overview

A common pattern for IaaS is to have abstractions separating the resource sizing and performance of a workload from the user defined values related to launching their custom application. This pattern is evident across all the major cloud providers (also known as hyperscalers) as well as open source IaaS projects like OpenStack. AWS has instance types, GCP has machine types, Azure has instance VM sizes and OpenStack has flavors.

Let’s take AWS for example to help visualize what this abstraction enables. Launching an EC2 instance only requires a few top level arguments, the disk image, instance type, keypair, security group, and subnet:

$ aws ec2 run-instances --image-id ami-xxxxxxxx \
                        --count 1 \
                        --instance-type c4.xlarge \
                        --key-name MyKeyPair \
                        --security-group-ids sg-903004f8 \
                        --subnet-id subnet-6e7f829e

When creating the EC2 instance the user doesn’t define the amount of resources, what processor to use, how to optimize the performance of the instance, or what hardware to schedule the instance on. Instead all of that information is wrapped up in that single --instance-type c4.xlarge CLI argument. c4 denoting a specific performance profile version, in this case from the Compute Optimized family and xlarge denoting a specific amount of compute resources provided by the instance type, in this case 4 vCPUs, 7.5 GiB of RAM, 750 Mbps EBS bandwidth etc.

While hyperscalers can provide predefined types with performance profiles and compute resources already assigned IaaS and virtualization projects such as OpenStack and KubeVirt can only provide the raw abstractions for operators, admins and even vendors to then create instances of these abstractions specific to each deployment.

KubeVirt’s VirtualMachine API contains many advanced options for tuning a virtual machine performance that goes beyond what typical users need to be aware of. Users are unable to simply define the storage/network they want assigned to their VM and then declare in broad terms what quality of resources and kind of performance they need for their VM.

Instead, the user has to be keenly aware how to request specific compute resources alongside all of the performance tunings available on the VirtualMachine API and how those tunings impact their guest’s operating system in order to get a desired result.

The partially implemented and currently v1alpha1 Virtual Machine Flavors API was an attempt to provide operators and users with a mechanism to define resource buckets that could be used during VM creation. At present this implementation provides a cluster-wide VirtualMachineClusterFlavor and a namespaced VirtualMachineFlavor CRDs. Each containing an array of VirtualMachineFlavorProfile that at present only encapsulates CPU resources by applying a full copy of the CPU type to the VirtualMachineInstance at runtime.

This approach has a few pitfalls such as using embedded profiles within the CRDs, relying on the user to select the correct Flavor or VirtualMachineFlavorProfile that will allow their workload to run correctly, not allowing a user to override some viable attributes at runtime etc.

VirtualMachineFlavor refactor

As suggested in the title of this blog post, the ultimate goal of the Design Proposal is to provide the end user with a simple set of choices when defining a VirtualMachine within KubeVirt. We want to limit this to a flavor, optional set of preferences, volumes for storage and networks for connectivity.

To achieve this the existing VirtualMachineFlavor CRDs will be heavily modified and extended to better encapsulate resource, performance or schedulable attributes of a VM.

This will include the removal of the embedded VirtualMachineFlavorProfile type within the CRDs, this will be replaced with a singular VirtualMachineFlavorSpec type per flavor. The decision to remove VirtualMachineFlavorProfile has been made as the concept isn’t prevalent within the wider Kubernetes ecosystem and could be confusing to end users. Instead users looking to avoid duplication when defining flavors will be directed to use tools such as kustomize to generate their flavors. This tooling is already commonly used when defining resources within Kubernetes and should afford users plenty of flexibility when defining their flavors either statically or as part of a larger GitOps based workflow.

VirtualMachineFlavorSpec will also include elements of CPU, Devices, HostDevices, GPUs, Memory and LaunchSecurity defined fully below. Users will be unable to override any aspect of the flavor (for example, vCPU count or amount of Memory) within the VirtualMachine itself, any attempt to do so resulting in the VirtualMachine being rejected.

Introduction of VirtualMachinePreference

A new set of VirtualMachinePreference CRDs will then be introduced to define any remaining attributes related to ensuring the selected guestOS can run. As the name suggests the VirtualMachinePreference CRDs will only define preferences, so unlike a flavor if a preference conflicts with something user defined within the VirtualMachine it will be ignored. For example, if a user selects a VirtualMachinePreference that requests a preferredDiskBus of virtio but then sets a disk bus of SATA for one or more disk devices within the VirtualMachine the supplied preferredDiskBus preference will not be applied to these disks. Any remaining disks that do not have a disk bus defined will however use the preferredDiskBus preference of virtio.

The Design Proposal contains a complete break down of where each VirtualMachineInstanceSpec attribute will reside, if at all, in this new approach.

Versioning (TBD)

Versioning of these CRDs is key to ensure VirtualMachine and VirtualMachineInstance remain unchanged even with modifications to an associated Flavor or Preference.

This is currently missing from the Design Proposal but is being worked on and will be incorporated shortly.

What else?

The current Design Proposal does list some useful ideas as non-goals for the initial implementation, these include:

All of which should be revisited before the Flavor API graduates from Alpha.

Examples

kustomize

I’ve created an example repo (many thanks to @fabiand for starting this) using kustomize to generate various classes and sizes of flavors alongside preferences.

$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/example.yaml 
selecting docker as container runtime
virtualmachineflavor.flavor.kubevirt.io/c.large created
virtualmachineflavor.flavor.kubevirt.io/c.medium created
virtualmachineflavor.flavor.kubevirt.io/c.small created
virtualmachineflavor.flavor.kubevirt.io/c.xlarge created
virtualmachineflavor.flavor.kubevirt.io/c.xsmall created
virtualmachineflavor.flavor.kubevirt.io/g.medium created
virtualmachineflavor.flavor.kubevirt.io/g.xlarge created
virtualmachineflavor.flavor.kubevirt.io/g.xsmall created
virtualmachineflavor.flavor.kubevirt.io/m.large created
virtualmachineflavor.flavor.kubevirt.io/m.medium created
virtualmachineflavor.flavor.kubevirt.io/m.small created
virtualmachineflavor.flavor.kubevirt.io/m.xlarge created
virtualmachineflavor.flavor.kubevirt.io/m.xsmall created
virtualmachineflavor.flavor.kubevirt.io/r.large created
virtualmachineflavor.flavor.kubevirt.io/r.medium created
virtualmachineflavor.flavor.kubevirt.io/r.xlarge created
virtualmachineflavor.flavor.kubevirt.io/r.xsmall created
virtualmachinepreference.flavor.kubevirt.io/linux.cirros created
virtualmachinepreference.flavor.kubevirt.io/linux.fedora created
virtualmachinepreference.flavor.kubevirt.io/linux.rhel9 created
virtualmachinepreference.flavor.kubevirt.io/windows.windows10 created

$ cat ../vmdefs/example.yaml
[...]
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachineFlavor
metadata:
  name: m.xsmall
spec:
  cpu:
    guest: 1
  memory:
    guest: 512M
[...]
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachinePreference
metadata:
  name: linux.cirros
spec:
  devices:
    preferredCdromBus: virtio
    preferredDiskBus: virtio
    preferredRng: {}
[...]

$ cat ../vmdefs/cirros.yaml
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: cirros
  name: cirros
spec:
  flavor:
    name: m.xsmall
    kind: VirtualMachineFlavor
  preference:
    name: linux.cirros
    kind: VirtualMachinePreference
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: cirros
    spec:
      domain:
        devices:
          disks:
          - disk:
            name: containerdisk
          - disk:
            name: cloudinitdisk
        resources: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: registry:5000/kubevirt/cirros-container-disk-demo:devel
        name: containerdisk
      - cloudInitNoCloud:
          userData: |
            #!/bin/sh

            echo 'printed from cloud-init userdata'
        name: cloudinitdisk

$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/cirros.yaml 
selecting docker as container runtime
virtualmachine.kubevirt.io/cirros created

$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/virtctl.sh start cirros
selecting docker as container runtime
VM cirros was scheduled to start

$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis
selecting docker as container runtime
NAME     AGE   PHASE     IP               NODENAME   READY
cirros   9s    Running   10.244.196.134   node01     True

$ diff <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vms/cirros -o json | jq --sort-keys .spec.template.spec) <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis/cirros -o json | jq --sort-keys .spec)
selecting docker as container runtime
selecting docker as container runtime
2a3,8
>     "cpu": {
>       "cores": 1,
>       "model": "host-model",
>       "sockets": 1,
>       "threads": 1
>     },
5a12,14
>           "disk": {
>             "bus": "virtio"
>           },
8a18,20
>           "disk": {
>             "bus": "virtio"
>           },
11c23,38
<       ]
---
>       ],
>       "interfaces": [
>         {
>           "bridge": {},
>           "name": "default"
>         }
>       ],
>       "rng": {}
>     },
>     "features": {
>       "acpi": {
>         "enabled": true
>       }
>     },
>     "firmware": {
>       "uuid": "6784d43b-39fb-5ee7-8c17-ef10c49af985"
16c43,50
<     "resources": {}
---
>     "memory": {
>       "guest": "512M"
>     },
>     "resources": {
>       "requests": {
>         "memory": "512M"
>       }
>     }
17a52,57
>   "networks": [
>     {
>       "name": "default",
>       "pod": {}
>     }
>   ],
22c62,63
<         "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"
---
>         "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",
>         "imagePullPolicy": "IfNotPresent"

Windows

Below is a basic example taken from the Design Proposal that defines a single VirtualMachineFlavor and VirtualMachinePreference to simplify the creation of Windows based VirtualMachine and later once started a VirtualMachineInstance:

VirtualMachineFlavor

---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachineFlavor
metadata:
  name: clarge
spec:
  cpu:
    guest: 4
  memory:
    guest: 8Gi

VirtualMachinePreference

---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachinePreference
metadata:
  name: Windows
spec:
  clock:
    preferredClockOffset:
      utc: {}
    preferredTimer:
      hpet:
        present: false
      hyperv: {}
      pit:
        tickPolicy: delay
      rtc:
        tickPolicy: catchup
  cpu:
    preferredCPUTopology: preferSockets
  devices:
    preferredDiskBus: sata
    preferredInterfaceModel: e1000
    preferredTPM: {}
  features:
    preferredAcpi: {}
    preferredApic: {}
    preferredHyperv:
      relaxed: {}
      spinlocks:
        spinlocks: 8191
      vapic: {}
    preferredSmm: {}
  firmware:
    preferredUseEfi: true
    preferredUseSecureBoot: true

VirtualMachine

---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-windows-clarge-windows
  name: vm-windows-clarge-windows
spec:
  flavor:
    kind: VirtualMachineFlavor
    name: clarge
  preference:
    kind: VirtualMachinePreference
    name: Windows
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-windows-clarge-windows
    spec:
      domain:
        devices:
          disks:
          - disk: {}
            name: containerdisk
        resources: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: registry:5000/kubevirt/windows-disk:devel
        name: containerdisk

VirtualMachineInstance

---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  annotations:
    kubevirt.io/flavor-name: clarge
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/preference-name: Windows
    kubevirt.io/storage-observed-api-version: v1alpha3
  creationTimestamp: "2022-04-19T10:51:53Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  - foregroundDeleteVirtualMachine
  generation: 9
  labels:
    kubevirt.io/nodeName: node01
    kubevirt.io/vm: vm-windows-clarge-windows
  name: vm-windows-clarge-windows
  namespace: default
  ownerReferences:
  - apiVersion: kubevirt.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: VirtualMachine
    name: vm-windows-clarge-windows
    uid: 8974d1e6-5f41-4486-996a-84cd6ebb3b37
  resourceVersion: "8052"
  uid: 369e9a17-8eca-47cc-91c2-c8f12e0f6f9f
spec:
  domain:
    clock:
      timer:
        hpet:
          present: false
        hyperv:
          present: true
        pit:
          present: true
          tickPolicy: delay
        rtc:
          present: true
          tickPolicy: catchup
      utc: {}
    cpu:
      cores: 1
      model: host-model
      sockets: 4
      threads: 1
    devices:
      disks:
      - disk:
          bus: sata
        name: containerdisk
      interfaces:
      - bridge: {}
        name: default
      tpm: {}
    features:
      acpi:
        enabled: true
      apic:
        enabled: true
      hyperv:
        relaxed:
          enabled: true
        spinlocks:
          enabled: true
          spinlocks: 8191
        vapic:
          enabled: true
      smm:
        enabled: true
    firmware:
      bootloader:
        efi:
          secureBoot: true
      uuid: bc694b87-1373-5514-9694-0f495fbae3b2
    machine:
      type: q35
    memory:
      guest: 8Gi
    resources:
      requests:
        memory: 8Gi
  networks:
  - name: default
    pod: {}
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: registry:5000/kubevirt/windows-disk:devel
      imagePullPolicy: IfNotPresent
    name: containerdisk
status:
  activePods:
    557c7fef-04b2-47c1-880b-396da944a7d3: node01
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-04-19T10:51:57Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: cannot migrate VMI which does not use masquerade to connect to the pod
      network
    reason: InterfaceNotLiveMigratable
    status: "False"
    type: LiveMigratable
  guestOSInfo: {}
  interfaces:
  - infoSource: domain
    ipAddress: 10.244.196.149
    ipAddresses:
    - 10.244.196.149
    - fd10:244::c494
    mac: 66:f7:21:4e:d9:30
    name: default
  launcherContainerImageVersion: registry:5000/kubevirt/virt-launcher@sha256:40b2036eae39776560a73263198ff42ffd6a8f09c9aa208f8bbdc91ec35b42cf
  migrationMethod: BlockMigration
  migrationTransport: Unix
  nodeName: node01
  phase: Running
  phaseTransitionTimestamps:
  - phase: Pending
    phaseTransitionTimestamp: "2022-04-19T10:51:53Z"
  - phase: Scheduling
    phaseTransitionTimestamp: "2022-04-19T10:51:53Z"
  - phase: Scheduled
    phaseTransitionTimestamp: "2022-04-19T10:51:57Z"
  - phase: Running
    phaseTransitionTimestamp: "2022-04-19T10:51:59Z"
  qosClass: Burstable
  runtimeUser: 0
  virtualMachineRevisionName: revision-start-vm-8974d1e6-5f41-4486-996a-84cd6ebb3b37-2
  volumeStatus:
  - name: cloudinitdisk
    size: 1048576
    target: sdb

Contents

comments powered by Disqus