tl;dr
The following is based on an active Design Proposal, an initial foundational PR and complete DNM/WIP series PR enhancing the existing Flavors API and introducing Preferences. Reviews are very much welcome on all of these PRs!
Overview
A common pattern for IaaS is to have abstractions separating the resource sizing and performance of a workload from the user defined values related to launching their custom application. This pattern is evident across all the major cloud providers (also known as hyperscalers) as well as open source IaaS projects like OpenStack. AWS has instance types, GCP has machine types, Azure has instance VM sizes and OpenStack has flavors.
Let’s take AWS for example to help visualize what this abstraction enables. Launching an EC2 instance only requires a few top level arguments, the disk image, instance type, keypair, security group, and subnet:
$ aws ec2 run-instances --image-id ami-xxxxxxxx \
--count 1 \
--instance-type c4.xlarge \
--key-name MyKeyPair \
--security-group-ids sg-903004f8 \
--subnet-id subnet-6e7f829e
When creating the EC2 instance the user doesn’t define the amount of resources, what processor to use, how to optimize the performance of the instance, or what hardware to schedule the instance on. Instead all of that information is wrapped up in that single --instance-type c4.xlarge
CLI argument. c4
denoting a specific performance profile version, in this case from the Compute Optimized
family and xlarge
denoting a specific amount of compute resources provided by the instance type, in this case 4 vCPUs, 7.5 GiB of RAM, 750 Mbps EBS bandwidth etc.
While hyperscalers can provide predefined types with performance profiles and compute resources already assigned IaaS and virtualization projects such as OpenStack and KubeVirt can only provide the raw abstractions for operators, admins and even vendors to then create instances of these abstractions specific to each deployment.
KubeVirt’s VirtualMachine
API contains many advanced options for tuning a virtual machine performance that goes beyond what typical users need to be aware of. Users are unable to simply define the storage/network they want assigned to their VM and then declare in broad terms what quality of resources and kind of performance they need for their VM.
Instead, the user has to be keenly aware how to request specific compute resources alongside all of the performance tunings available on the VirtualMachine
API and how those tunings impact their guest’s operating system in order to get a desired result.
The partially implemented and currently v1alpha1 Virtual Machine Flavors
API was an attempt to provide operators and users with a mechanism to define resource buckets that could be used during VM creation. At present this implementation provides a cluster-wide VirtualMachineClusterFlavor
and a namespaced VirtualMachineFlavor
CRDs. Each containing an array of VirtualMachineFlavorProfile
that at present only encapsulates CPU resources by applying a full copy of the CPU
type to the VirtualMachineInstance
at runtime.
This approach has a few pitfalls such as using embedded profiles within the CRDs, relying on the user to select the correct Flavor or VirtualMachineFlavorProfile
that will allow their workload to run correctly, not allowing a user to override some viable attributes at runtime etc.
VirtualMachineFlavor
refactor
As suggested in the title of this blog post, the ultimate goal of the Design Proposal is to provide the end user with a simple set of choices when defining a VirtualMachine
within KubeVirt. We want to limit this to a flavor, optional set of preferences, volumes for storage and networks for connectivity.
To achieve this the existing VirtualMachineFlavor
CRDs will be heavily modified and extended to better encapsulate resource, performance or schedulable attributes of a VM.
This will include the removal of the embedded VirtualMachineFlavorProfile
type within the CRDs, this will be replaced with a singular VirtualMachineFlavorSpec
type per flavor. The decision to remove VirtualMachineFlavorProfile
has been made as the concept isn’t prevalent within the wider Kubernetes ecosystem and could be confusing to end users. Instead users looking to avoid duplication when defining flavors will be directed to use tools such as kustomize
to generate their flavors. This tooling is already commonly used when defining resources within Kubernetes and should afford users plenty of flexibility when defining their flavors either statically or as part of a larger GitOps based workflow.
VirtualMachineFlavorSpec
will also include elements of CPU
, Devices
, HostDevices
, GPUs
, Memory
and LaunchSecurity
defined fully below. Users will be unable to override any aspect of the flavor (for example, vCPU
count or amount of Memory
) within the VirtualMachine
itself, any attempt to do so resulting in the VirtualMachine
being rejected.
Introduction of VirtualMachinePreference
A new set of VirtualMachinePreference
CRDs will then be introduced to define any remaining attributes related to ensuring the selected guestOS can run. As the name suggests the VirtualMachinePreference
CRDs will only define preferences, so unlike a flavor if a preference conflicts with something user defined within the VirtualMachine
it will be ignored. For example, if a user selects a VirtualMachinePreference
that requests a preferredDiskBus
of virtio
but then sets a disk bus of SATA
for one or more disk devices within the VirtualMachine
the supplied preferredDiskBus
preference will not be applied to these disks. Any remaining disks that do not have a disk bus defined will however use the preferredDiskBus
preference of virtio
.
The Design Proposal contains a complete break down of where each VirtualMachineInstanceSpec
attribute will reside, if at all, in this new approach.
Versioning (TBD)
Versioning of these CRDs is key to ensure VirtualMachine
and VirtualMachineInstance
remain unchanged even with modifications to an associated Flavor or Preference.
This is currently missing from the Design Proposal but is being worked on and will be incorporated shortly.
What else?
The current Design Proposal does list some useful ideas as non-goals for the initial implementation, these include:
-
Introspection of imported images to determine the correct guest OS related
VirtualMachinePreferences
to apply. -
Using image labels to determine the correct guest OS related
VirtualMachinePreferences
to apply. -
Remove the need to define
Disks
withinDomainSpec
when providingVolumes
within aVirtualMachineInstanceSpec
. -
Remove the need to define
Interfaces
withinDomainSpec
when providingNetworks
within aVirtualMachineInstanceSpec
.
All of which should be revisited before the Flavor API graduates from Alpha.
Examples
kustomize
I’ve created an example repo (many thanks to @fabiand for starting this) using kustomize
to generate various classes and sizes of flavors alongside preferences.
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/example.yaml
selecting docker as container runtime
virtualmachineflavor.flavor.kubevirt.io/c.large created
virtualmachineflavor.flavor.kubevirt.io/c.medium created
virtualmachineflavor.flavor.kubevirt.io/c.small created
virtualmachineflavor.flavor.kubevirt.io/c.xlarge created
virtualmachineflavor.flavor.kubevirt.io/c.xsmall created
virtualmachineflavor.flavor.kubevirt.io/g.medium created
virtualmachineflavor.flavor.kubevirt.io/g.xlarge created
virtualmachineflavor.flavor.kubevirt.io/g.xsmall created
virtualmachineflavor.flavor.kubevirt.io/m.large created
virtualmachineflavor.flavor.kubevirt.io/m.medium created
virtualmachineflavor.flavor.kubevirt.io/m.small created
virtualmachineflavor.flavor.kubevirt.io/m.xlarge created
virtualmachineflavor.flavor.kubevirt.io/m.xsmall created
virtualmachineflavor.flavor.kubevirt.io/r.large created
virtualmachineflavor.flavor.kubevirt.io/r.medium created
virtualmachineflavor.flavor.kubevirt.io/r.xlarge created
virtualmachineflavor.flavor.kubevirt.io/r.xsmall created
virtualmachinepreference.flavor.kubevirt.io/linux.cirros created
virtualmachinepreference.flavor.kubevirt.io/linux.fedora created
virtualmachinepreference.flavor.kubevirt.io/linux.rhel9 created
virtualmachinepreference.flavor.kubevirt.io/windows.windows10 created
$ cat ../vmdefs/example.yaml
[...]
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachineFlavor
metadata:
name: m.xsmall
spec:
cpu:
guest: 1
memory:
guest: 512M
[...]
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachinePreference
metadata:
name: linux.cirros
spec:
devices:
preferredCdromBus: virtio
preferredDiskBus: virtio
preferredRng: {}
[...]
$ cat ../vmdefs/cirros.yaml
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: cirros
name: cirros
spec:
flavor:
name: m.xsmall
kind: VirtualMachineFlavor
preference:
name: linux.cirros
kind: VirtualMachinePreference
running: false
template:
metadata:
labels:
kubevirt.io/vm: cirros
spec:
domain:
devices:
disks:
- disk:
name: containerdisk
- disk:
name: cloudinitdisk
resources: {}
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:devel
name: containerdisk
- cloudInitNoCloud:
userData: |
#!/bin/sh
echo 'printed from cloud-init userdata'
name: cloudinitdisk
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/cirros.yaml
selecting docker as container runtime
virtualmachine.kubevirt.io/cirros created
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/virtctl.sh start cirros
selecting docker as container runtime
VM cirros was scheduled to start
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis
selecting docker as container runtime
NAME AGE PHASE IP NODENAME READY
cirros 9s Running 10.244.196.134 node01 True
$ diff <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vms/cirros -o json | jq --sort-keys .spec.template.spec) <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis/cirros -o json | jq --sort-keys .spec)
selecting docker as container runtime
selecting docker as container runtime
2a3,8
> "cpu": {
> "cores": 1,
> "model": "host-model",
> "sockets": 1,
> "threads": 1
> },
5a12,14
> "disk": {
> "bus": "virtio"
> },
8a18,20
> "disk": {
> "bus": "virtio"
> },
11c23,38
< ]
---
> ],
> "interfaces": [
> {
> "bridge": {},
> "name": "default"
> }
> ],
> "rng": {}
> },
> "features": {
> "acpi": {
> "enabled": true
> }
> },
> "firmware": {
> "uuid": "6784d43b-39fb-5ee7-8c17-ef10c49af985"
16c43,50
< "resources": {}
---
> "memory": {
> "guest": "512M"
> },
> "resources": {
> "requests": {
> "memory": "512M"
> }
> }
17a52,57
> "networks": [
> {
> "name": "default",
> "pod": {}
> }
> ],
22c62,63
< "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"
---
> "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",
> "imagePullPolicy": "IfNotPresent"
Windows
Below is a basic example taken from the Design Proposal that defines a single VirtualMachineFlavor
and VirtualMachinePreference
to simplify the creation of Windows based VirtualMachine
and later once started a VirtualMachineInstance
:
VirtualMachineFlavor
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachineFlavor
metadata:
name: clarge
spec:
cpu:
guest: 4
memory:
guest: 8Gi
VirtualMachinePreference
---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachinePreference
metadata:
name: Windows
spec:
clock:
preferredClockOffset:
utc: {}
preferredTimer:
hpet:
present: false
hyperv: {}
pit:
tickPolicy: delay
rtc:
tickPolicy: catchup
cpu:
preferredCPUTopology: preferSockets
devices:
preferredDiskBus: sata
preferredInterfaceModel: e1000
preferredTPM: {}
features:
preferredAcpi: {}
preferredApic: {}
preferredHyperv:
relaxed: {}
spinlocks:
spinlocks: 8191
vapic: {}
preferredSmm: {}
firmware:
preferredUseEfi: true
preferredUseSecureBoot: true
VirtualMachine
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm-windows-clarge-windows
name: vm-windows-clarge-windows
spec:
flavor:
kind: VirtualMachineFlavor
name: clarge
preference:
kind: VirtualMachinePreference
name: Windows
running: false
template:
metadata:
labels:
kubevirt.io/vm: vm-windows-clarge-windows
spec:
domain:
devices:
disks:
- disk: {}
name: containerdisk
resources: {}
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: registry:5000/kubevirt/windows-disk:devel
name: containerdisk
VirtualMachineInstance
---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
annotations:
kubevirt.io/flavor-name: clarge
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/preference-name: Windows
kubevirt.io/storage-observed-api-version: v1alpha3
creationTimestamp: "2022-04-19T10:51:53Z"
finalizers:
- kubevirt.io/virtualMachineControllerFinalize
- foregroundDeleteVirtualMachine
generation: 9
labels:
kubevirt.io/nodeName: node01
kubevirt.io/vm: vm-windows-clarge-windows
name: vm-windows-clarge-windows
namespace: default
ownerReferences:
- apiVersion: kubevirt.io/v1
blockOwnerDeletion: true
controller: true
kind: VirtualMachine
name: vm-windows-clarge-windows
uid: 8974d1e6-5f41-4486-996a-84cd6ebb3b37
resourceVersion: "8052"
uid: 369e9a17-8eca-47cc-91c2-c8f12e0f6f9f
spec:
domain:
clock:
timer:
hpet:
present: false
hyperv:
present: true
pit:
present: true
tickPolicy: delay
rtc:
present: true
tickPolicy: catchup
utc: {}
cpu:
cores: 1
model: host-model
sockets: 4
threads: 1
devices:
disks:
- disk:
bus: sata
name: containerdisk
interfaces:
- bridge: {}
name: default
tpm: {}
features:
acpi:
enabled: true
apic:
enabled: true
hyperv:
relaxed:
enabled: true
spinlocks:
enabled: true
spinlocks: 8191
vapic:
enabled: true
smm:
enabled: true
firmware:
bootloader:
efi:
secureBoot: true
uuid: bc694b87-1373-5514-9694-0f495fbae3b2
machine:
type: q35
memory:
guest: 8Gi
resources:
requests:
memory: 8Gi
networks:
- name: default
pod: {}
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: registry:5000/kubevirt/windows-disk:devel
imagePullPolicy: IfNotPresent
name: containerdisk
status:
activePods:
557c7fef-04b2-47c1-880b-396da944a7d3: node01
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-04-19T10:51:57Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: cannot migrate VMI which does not use masquerade to connect to the pod
network
reason: InterfaceNotLiveMigratable
status: "False"
type: LiveMigratable
guestOSInfo: {}
interfaces:
- infoSource: domain
ipAddress: 10.244.196.149
ipAddresses:
- 10.244.196.149
- fd10:244::c494
mac: 66:f7:21:4e:d9:30
name: default
launcherContainerImageVersion: registry:5000/kubevirt/virt-launcher@sha256:40b2036eae39776560a73263198ff42ffd6a8f09c9aa208f8bbdc91ec35b42cf
migrationMethod: BlockMigration
migrationTransport: Unix
nodeName: node01
phase: Running
phaseTransitionTimestamps:
- phase: Pending
phaseTransitionTimestamp: "2022-04-19T10:51:53Z"
- phase: Scheduling
phaseTransitionTimestamp: "2022-04-19T10:51:53Z"
- phase: Scheduled
phaseTransitionTimestamp: "2022-04-19T10:51:57Z"
- phase: Running
phaseTransitionTimestamp: "2022-04-19T10:51:59Z"
qosClass: Burstable
runtimeUser: 0
virtualMachineRevisionName: revision-start-vm-8974d1e6-5f41-4486-996a-84cd6ebb3b37-2
volumeStatus:
- name: cloudinitdisk
size: 1048576
target: sdb