As I alluded to in my previous demo post we have some exciting new features currently under development concerning instance types and preferences. This post introduces one of these, the ability for KubeVirt to infer the default instance type and preference of a VirtualMachine from a suggested Volume.
The tl;dr being that the {Instancetype,Preference}Matchers will be extended with an inferFromVolume attribute that will reference the name of a Volume associatied with the VirtualMachine. This Volume will then be used to infer defaults by looking for the following annotations:
instancetype.kubevirt.io/defaultInstancetype
instancetype.kubevirt.io/defaultInstancetypeKind (Defaults to VirtualMachineClusterInstancetype)
instancetype.kubevirt.io/defaultPreference
instancetype.kubevirt.io/defaultPreferenceKind (Defaults to VirtualMachineClusterPreference)
Initially only PVC and DataVolume derived Volumes will be supported but this will likely be extended to anything with annotations, such as Containers.
Feedback is welcome in the design review document if you have any!
Demo
This demo is based on some work in progress code posted below:
This is my third demo for KubeVirt, this time introducing the following features and bugfixes:
A new v1alpha2 instancetype API version
New AutoAttachInputDevice auto attach & PreferredAutoAttachInputDevice preference attributes
A bugfix for the PreferredMachineType preference
If you’ve been following my instance type development blog series you will note that some of these aren’t that recent but as I’ve not covered them in a demo until now I wanted to touch on them again.
Expect more demos in the coming weeks as I catch up with the current state of development.
A new v1alpha2 instancetype API version has now been introduced in the above PR from my colleague akrejcir. This switches the ControllerRevisions over to using complete objects instead of just the spec of the object. Amongst other things this means that kubectl can present the complete object to the user within the ControllerRevision as shown below:
Please note that while the API version has been incremented this new version is fully backwardly compatible with v1alpha1 and as a result requires no user modifications to existing v1alpha1 resources.
akrejcir also landed two new subresource APIs that can take either a raw VirtualMachine definition or an existing VirtualMachine resource and expand the VirtualMachineInstanceSpec within using any referenced VirtualMachineInstancetype or VirtualMachinePreference resources.
expand-spec for existing VirtualMachines
The following expands the spec of a defined vm-cirros-csmallVirtualMachine resource that references the example csmall instancetype using diff to show the changes between the original and expanded definition returned by the API:
The following expands the spec of a raw undefined VirtualMachine passed to the API that references the example csmall instancetype again using diff to show the changes between the original raw definition and the returned expanded definition:
An associated PreferredAutoattachInputDevice preference has also been introduced to control AutoattachInputDevice along with the existing preferredInputType and preferredInputBus preferences:
The KubeVirt project has for a while now provided a set of common-templates to help users to define VirtualMachines. These OpenShift/OKD templates cover a range of guest OS’s and workloads (server, desktop, highperformance etc).
I’ve created an instancetype based equivalent to this outside of KubeVirt for the time being. My common-instancetypes repo provides instancetypes and preferences covering all of the combinations covered by common-templates with some hopefully useful additions such as preferences for CirrOS and Alpine Linux.
The repo currently uses kustomize to generate everything so deployment into a cluster is extremely simple:
The next step with the repo is obviously to move it under the main KubeVirt namespace which will lots of housekeeping to get CI setup, generate releases etc.In the meantime issues and pull requests against the original repo would still be welcome and encouraged!
We also have plans to have the cluster wide instancetypes and preferences deployed by default by KubeVirt. This has yet to be raised formally with the community and as such a deployment mechanism hasn’t been agreed upon yet. Hopefully more on this in the near future.
Bug fixes
8142 - Referenced VirtualMachineInstancetypes can be deleted before a VirtualMachine starts
As described in the issue the VirtualMachine controller would previously wait until the first start of a VirtualMachine to create any ControllerRevisions for a referenced instancetype or preference. As such users could easily modify or even remove the referenced resources ahead of this first start causing failures when the request is eventually made:
# ./cluster-up/kubectl.sh apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml [..]virtualmachineinstancetype.instancetype.kubevirt.io/csmall createdvirtualmachine.kubevirt.io/vm-cirros-csmall created# ./cluster-up/kubectl.sh delete virtualmachineinstancetype/csmallvirtualmachineinstancetype.instancetype.kubevirt.io "csmall" deleted# ./cluster-up/virtctl.sh start vm-cirros-csmallError starting VirtualMachine Internal error occurred: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Failure to find instancetype: virtualmachineinstancetypes.instancetype.kubevirt.io "csmall" not found
The fix here was to move the creation of these ControllerRevisions to earlier within the VirtualMachine controller reconcile loop, ensuring that they are created as soon as the VirtualMachine is seen for the first time.
Preferences are only applied to a VirtualMachineInstance when a user has not already provided a corresponding value within their VirtualMachine [1]. This is an issue for PreferredMachineType however as the VirtualMachine mutation webhook always provides some kind of default [2], resulting in the PreferredMachineType never being applied to the VirtualMachineInstance.
The fix here was to lookup and apply preferences during the VirtualMachine mutation webhook to ensure we applied PreferredMachineType when the user hasn’t already provided their own value.
As set out in the issue attempting to create a VirtualMachineSnapshot of a VirtualMachine referencing a VirtualMachineInstancetype would previously leave the VirtualMachineSnapshotInProgress as the VirtualMachine controller was unable to add the required snapshot finalizer.
The fix here was to ensure that any VirtualMachineInstancetype referenced was applied to a copy of the VirtualMachine when checking for conflicts in the VirtualMachine admission webhook, allowing the snapshot finalizer to later be added.
We now have basic user-guide documentation introduction instancetypes. Feedback welcome, please do /cc lyarwood on any issues or PRs related to this doc!
Misc
VirtualMachineInstancePreset deprecation in favor of VirtualMachineInstancetype
This deprecation has now landed. VirtualMachineInstancePresets are based on the PodPresets k8s resource and API that injected data into pods at creation time. However this API never graduated from alpha and was removed in 1.20 [1]. While useful there are some issues with the implementation that have resulted in alternative approaches such as VirtualMachineInstancetypes and VirtualMachinePreferences being made available within KubeVirt.
As per the CRD versioning docs this change updated the generated CRD definition of VirtualMachineInstancePreset marking the currently available versions of v1 and v1alpha3 as deprecated.
More context and discussion is also available on the mailing-list [3].
We now have an area label for instancetypes on the kubevirt/kubevirt repo that I’ve been manually applying to PRs and issues. Please feel free to use this by commenting /area instancetype on anything you think it related to instancetypes! I do hope to automate this for specific files in the future.
With the new v1alpha2 version landing I also wanted to again draw attention to the above issue tracking our progress to v1beta1. Obviously a long way to go but if you do have any suggested changes ahead of v1beta1 please feel free to comment there.
Support for default instancetype and preference PVC annotations
This topic deserves its’ own blog post but for now I’d just like to highlight the design doc and WIP code series above looking at introducing support for default instancetype and preference annoations into KubeVirt. The following example demonstrates the current PVC support in the series but I’d also like to expand this to other volume types where possible. Again, feedback welcome on the design doc or code series itself!
The use of these informers was previously removed by ee4e266. After further discussions on the mailing-list however it has become clear that the removal of these informers from the virt-controller was not required and they can be reintroduced.
virtctl create vm based on instancetype and preferences
I’d like to make a start on this in the coming weeks. The basic idea being that the new command would generate a VirtualMachine definition we could then pipe to kubectl apply, something like the following:
In my initial VirtualMachineFlavor Update #1 post I included an asciinema recorded demo towards the end. I’ve re-recorded the demo given the recent rename and I’ve also created a personal repo of demos to store the original script and recordings outside of asciinema.
Recording
Transcript
## [..] ## Agenda# 1. The Basics# 2. VirtualMachineClusterInstancetype and VirtualMachineClusterPreference# 3. VirtualMachineInstancetype vs VirtualMachinePreference vs VirtualMachine# 4. Versioning# 5. What's next... # - https://github.com/kubevirt/kubevirt/issues/7897# - https://blog.yarwood.me.uk/2022/07/21/kubevirt_instancetype_update_2/### Demo #1 The Basics## Lets start by creating a simple namespaced VirtualMachineInstancetype, VirtualMachinePreference and VirtualMachinecat <<EOF | ./cluster-up/kubectl.sh apply -f ----
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachineInstancetypemetadata:
name: smallspec:
cpu:
guest: 2memory:
guest: 128Mi---
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachinePreferencemetadata:
name: cirrosspec:
devices:
preferredDiskBus: virtiopreferredInterfaceModel: virtio---
apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata:
name: demospec:
instancetype:
kind: VirtualMachineInstancetypename: smallpreference:
kind: VirtualMachinePreferencename: cirrosrunning: falsetemplate:
spec:
domain:
devices: {}
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:develname: containerdisk - cloudInitNoCloud:
userData: | #!/bin/sh
echo 'printed from cloud-init userdata'name: cloudinitdiskEOFselecting docker as container runtimevirtualmachineinstancetype.instancetype.kubevirt.io/small createdvirtualmachinepreference.instancetype.kubevirt.io/cirros createdvirtualmachine.kubevirt.io/demo created# # Starting the VirtualMachine applies the VirtualMachineInstancetype and VirtualMachinePreference to the VirtualMachineInstance./cluster-up/virtctl.sh start demo && ./cluster-up/kubectl.sh wait vms/demo --for=condition=Readyselecting docker as container runtimeVM demo was scheduled to startselecting docker as container runtimevirtualmachine.kubevirt.io/demo condition met# ## We can check this by comparing the two VirtualMachineInstanceSpec fields from the VirualMachine and VirtualMachineInstancediff --color -u <( ./cluster-up/kubectl.sh get vms/demo -o json | jq .spec.template.spec) <( ./cluster-up/kubectl.sh get vmis/demo -o json | jq .spec)selecting docker as container runtimeselecting docker as container runtime--- /dev/fd/63 2022-08-03 13:36:29.588992874 +0100+++ /dev/fd/62 2022-08-03 13:36:29.588992874 +0100@@ -1,15+1,65 @@ {
"domain": {
- "devices": {},
+ "cpu": {+ "cores": 1,+ "model": "host-model",+ "sockets": 2,+ "threads": 1+ },+ "devices": {+ "disks": [+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "containerdisk"+ },+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "cloudinitdisk"+ }+ ],+ "interfaces": [+ {+ "bridge": {},+ "model": "virtio",+ "name": "default"+ }+ ]+ },+ "features": {+ "acpi": {+ "enabled": true+ }+ },+ "firmware": {+ "uuid": "c89d1344-ee03-5c55-99bd-5df16b72bea0"+ },"machine": {
"type": "q35" },
- "resources": {}
+ "memory": {+ "guest": "128Mi"+ },+ "resources": {+ "requests": {+ "memory": "128Mi"+ }+ } },
+ "networks": [+ {+ "name": "default",+ "pod": {}+ }+ ],"volumes": [
{
"containerDisk": {
- "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"+ "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",+ "imagePullPolicy": "IfNotPresent" },
"name": "containerdisk" },
# ## Demo #2 Cluster wide CRDs# ## We also have cluster wide instancetypes and preferences we can use, note these are the default if no kind is provided within the VirtualMachine.cat <<EOF | ./cluster-up/kubectl.sh apply -f ----
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachineClusterInstancetypemetadata:
name: small-clusterspec:
cpu:
guest: 2memory:
guest: 128Mi---
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachineClusterPreferencemetadata:
name: cirros-clusterspec:
devices:
preferredDiskBus: virtiopreferredInterfaceModel: virtio---
apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata:
name: demo-clusterspec:
instancetype:
name: small-clusterpreference:
name: cirros-clusterrunning: falsetemplate:
spec:
domain:
devices: {}
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:develname: containerdisk - cloudInitNoCloud:
userData: | #!/bin/sh
echo 'printed from cloud-init userdata'name: cloudinitdiskEOFselecting docker as container runtimevirtualmachineclusterinstancetype.instancetype.kubevirt.io/small-cluster createdvirtualmachineclusterpreference.instancetype.kubevirt.io/cirros-cluster createdvirtualmachine.kubevirt.io/demo-cluster created# ## InstancetypeMatcher and PreferenceMatcher default to the Cluster CRD Kinds./cluster-up/kubectl.sh get vms/demo-cluster -o json | jq '.spec.instancetype, .spec.preference'selecting docker as container runtime{
"kind": "virtualmachineclusterinstancetype",
"name": "small-cluster"}
{
"kind": "virtualmachineclusterpreference",
"name": "cirros-cluster"}
# ./cluster-up/virtctl.sh start demo-cluster && ./cluster-up/kubectl.sh wait vms/demo-cluster --for=condition=Readydiff --color -u <( ./cluster-up/kubectl.sh get vms/demo-cluster -o json | jq .spec.template.spec) <( ./cluster-up/kubectl.sh get vmis/demo-cluster -o json | jq .spec)selecting docker as container runtimeVM demo-cluster was scheduled to startselecting docker as container runtimevirtualmachine.kubevirt.io/demo-cluster condition metselecting docker as container runtimeselecting docker as container runtime--- /dev/fd/63 2022-08-03 13:37:04.897273573 +0100+++ /dev/fd/62 2022-08-03 13:37:04.897273573 +0100@@ -1,15+1,65 @@ {
"domain": {
- "devices": {},
+ "cpu": {+ "cores": 1,+ "model": "host-model",+ "sockets": 2,+ "threads": 1+ },+ "devices": {+ "disks": [+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "containerdisk"+ },+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "cloudinitdisk"+ }+ ],+ "interfaces": [+ {+ "bridge": {},+ "model": "virtio",+ "name": "default"+ }+ ]+ },+ "features": {+ "acpi": {+ "enabled": true+ }+ },+ "firmware": {+ "uuid": "05fa1ec0-3e45-581d-84e2-36ddc6b50633"+ },"machine": {
"type": "q35" },
- "resources": {}
+ "memory": {+ "guest": "128Mi"+ },+ "resources": {+ "requests": {+ "memory": "128Mi"+ }+ } },
+ "networks": [+ {+ "name": "default",+ "pod": {}+ }+ ],"volumes": [
{
"containerDisk": {
- "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"+ "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",+ "imagePullPolicy": "IfNotPresent" },
"name": "containerdisk" },
# ## Demo #3 Instancetypes vs Preferences vs VirtualMachine# ## Users cannot overwrite anything set by an instancetype in their VirtualMachine, for example CPU topologiescat <<EOF | ./cluster-up/kubectl.sh apply -f ----
apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata:
name: demo-instancetype-conflictspec:
instancetype:
kind: VirtualMachineInstancetypename: smallpreference:
kind: VirtualMachinePreferencename: cirrosrunning: falsetemplate:
spec:
domain:
cpu:
threads: 1cores: 3sockets: 1devices: {}
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:develname: containerdisk - cloudInitNoCloud:
userData: | #!/bin/sh
echo 'printed from cloud-init userdata'name: cloudinitdiskEOFselecting docker as container runtimeThe request is invalid: spec.template.spec.domain.cpu: VM field conflicts with selected Instancetype# ## Users can however overwrite anything set by a preference in their VirtualMachine, for example disk buses etc.cat <<EOF | ./cluster-up/kubectl.sh apply -f ----
apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata:
name: demo-instancetype-user-preferencespec:
instancetype:
kind: VirtualMachineInstancetypename: smallpreference:
kind: VirtualMachinePreferencename: cirrosrunning: falsetemplate:
spec:
domain:
devices:
disks:
- disk:
bus: sataname: containerdiskvolumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:develname: containerdisk - cloudInitNoCloud:
userData: | #!/bin/sh
echo 'printed from cloud-init userdata'name: cloudinitdiskEOFselecting docker as container runtimevirtualmachine.kubevirt.io/demo-instancetype-user-preference created# ./cluster-up/virtctl.sh start demo-instancetype-user-preference && ./cluster-up/kubectl.sh wait vms/demo-instancetype-user-preference --for=condition=Readydiff --color -u <( ./cluster-up/kubectl.sh get vms/demo-instancetype-user-preference -o json | jq .spec.template.spec) <( ./cluster-up/kubectl.sh get vmis/demo-instancetype-user-preference -o json | jq .spec)selecting docker as container runtimeVM demo-instancetype-user-preference was scheduled to startselecting docker as container runtimevirtualmachine.kubevirt.io/demo-instancetype-user-preference condition metselecting docker as container runtimeselecting docker as container runtime--- /dev/fd/63 2022-08-03 13:37:38.099537528 +0100+++ /dev/fd/62 2022-08-03 13:37:38.099537528 +0100@@ -1,5+1,11 @@ {
"domain": {
+ "cpu": {+ "cores": 1,+ "model": "host-model",+ "sockets": 2,+ "threads": 1+ },"devices": {
"disks": [
{
@@ -7,18+13,53 @@"bus": "sata" },
"name": "containerdisk"+ },+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "cloudinitdisk"+ }+ ],+ "interfaces": [+ {+ "bridge": {},+ "model": "virtio",+ "name": "default" }
]
},
+ "features": {+ "acpi": {+ "enabled": true+ }+ },+ "firmware": {+ "uuid": "195ea4a3-8505-5368-b068-9536257886ea"+ },"machine": {
"type": "q35" },
- "resources": {}
+ "memory": {+ "guest": "128Mi"+ },+ "resources": {+ "requests": {+ "memory": "128Mi"+ }+ } },
+ "networks": [+ {+ "name": "default",+ "pod": {}+ }+ ],"volumes": [
{
"containerDisk": {
- "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"+ "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",+ "imagePullPolicy": "IfNotPresent" },
"name": "containerdisk" },
# ## Demo #4 Versioning# ## We have versioning of instancetypes and preferences, note that the InstancetypeMatcher and PreferenceMatcher now have a populated revisionName field./cluster-up/kubectl.sh get vms/demo -o json | jq '.spec.instancetype, .spec.preference'selecting docker as container runtime{
"kind": "VirtualMachineInstancetype",
"name": "small",
"revisionName": "demo-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-1"}
{
"kind": "VirtualMachinePreference",
"name": "cirros",
"revisionName": "demo-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-1"}
# ## These are the names of ControllerRevisions containing a copy of the VirtualMachine{Instancetype,Preference}Spec at the time of application./cluster-up/kubectl.sh get controllerrevisions/$( ./cluster-up/kubectl.sh get vms/demo -o json | jq .spec.instancetype.revisionName | tr -d '"') -o json | jq ../cluster-up/kubectl.sh get controllerrevisions/$( ./cluster-up/kubectl.sh get vms/demo -o json | jq .spec.preference.revisionName | tr -d '"') -o json | jq .selecting docker as container runtimeselecting docker as container runtime{
"apiVersion": "apps/v1",
"data": {
"apiVersion": "",
"spec": "eyJjcHUiOnsiZ3Vlc3QiOjJ9LCJtZW1vcnkiOnsiZ3Vlc3QiOiIxMjhNaSJ9fQ==" },
"kind": "ControllerRevision",
"metadata": {
"creationTimestamp": "2022-08-03T12:36:20Z",
"name": "demo-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-1",
"namespace": "default",
"ownerReferences": [
{
"apiVersion": "kubevirt.io/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "VirtualMachine",
"name": "demo",
"uid": "e67ad6ba-7792-40ab-9cd2-a411b6161971" }
],
"resourceVersion": "53965",
"uid": "3f20c656-ea33-45d1-9195-fb3c4f7085b9" },
"revision": 0}
selecting docker as container runtimeselecting docker as container runtime{
"apiVersion": "apps/v1",
"data": {
"apiVersion": "",
"spec": "eyJkZXZpY2VzIjp7InByZWZlcnJlZERpc2tCdXMiOiJ2aXJ0aW8iLCJwcmVmZXJyZWRJbnRlcmZhY2VNb2RlbCI6InZpcnRpbyJ9fQ==" },
"kind": "ControllerRevision",
"metadata": {
"creationTimestamp": "2022-08-03T12:36:20Z",
"name": "demo-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-1",
"namespace": "default",
"ownerReferences": [
{
"apiVersion": "kubevirt.io/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "VirtualMachine",
"name": "demo",
"uid": "e67ad6ba-7792-40ab-9cd2-a411b6161971" }
],
"resourceVersion": "53966",
"uid": "dc47f75f-b548-41fd-b0db-8af4b458994b" },
"revision": 0}
# # With versioning we can update the VirtualMachineInstancetype, create a new VirtualMachine to assert the changes and then check that our original VirtualMachine hasn't changedcat <<EOF | ./cluster-up/kubectl.sh apply -f ----
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachineInstancetypemetadata:
name: smallspec:
cpu:
guest: 3memory:
guest: 256Mi---
apiVersion: instancetype.kubevirt.io/v1alpha1kind: VirtualMachinePreferencemetadata:
name: cirrosspec:
cpu:
preferredCPUTopology: preferCoresdevices:
preferredDiskBus: virtiopreferredInterfaceModel: virtio---
apiVersion: kubevirt.io/v1kind: VirtualMachinemetadata:
name: demo-updatedspec:
instancetype:
kind: VirtualMachineInstancetypename: smallpreference:
kind: VirtualMachinePreferencename: cirrosrunning: falsetemplate:
spec:
domain:
devices: {}
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:develname: containerdisk - cloudInitNoCloud:
userData: | #!/bin/sh
echo 'printed from cloud-init userdata'name: cloudinitdiskEOFselecting docker as container runtimevirtualmachineinstancetype.instancetype.kubevirt.io/small configuredvirtualmachinepreference.instancetype.kubevirt.io/cirros configuredvirtualmachine.kubevirt.io/demo-updated created# ## Now start the updated VirtualMachine./cluster-up/virtctl.sh start demo-updated && ./cluster-up/kubectl.sh wait vms/demo-updated --for=condition=Readyselecting docker as container runtimeVM demo-updated was scheduled to startselecting docker as container runtimevirtualmachine.kubevirt.io/demo-updated condition met# ## We now see the updated instancetype used by the new VirtualMachine and applied to the VirtualMachineInstancediff --color -u <( ./cluster-up/kubectl.sh get vms/demo-updated -o json | jq .spec.template.spec) <( ./cluster-up/kubectl.sh get vmis/demo-updated -o json | jq .spec)selecting docker as container runtimeselecting docker as container runtime--- /dev/fd/63 2022-08-03 13:38:37.203007409 +0100+++ /dev/fd/62 2022-08-03 13:38:37.204007417 +0100@@ -1,15+1,65 @@ {
"domain": {
- "devices": {},
+ "cpu": {+ "cores": 3,+ "model": "host-model",+ "sockets": 1,+ "threads": 1+ },+ "devices": {+ "disks": [+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "containerdisk"+ },+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "cloudinitdisk"+ }+ ],+ "interfaces": [+ {+ "bridge": {},+ "model": "virtio",+ "name": "default"+ }+ ]+ },+ "features": {+ "acpi": {+ "enabled": true+ }+ },+ "firmware": {+ "uuid": "937dc645-17f0-599b-be81-c1e9dbde8075"+ },"machine": {
"type": "q35" },
- "resources": {}
+ "memory": {+ "guest": "256Mi"+ },+ "resources": {+ "requests": {+ "memory": "256Mi"+ }+ } },
+ "networks": [+ {+ "name": "default",+ "pod": {}+ }+ ],"volumes": [
{
"containerDisk": {
- "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"+ "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",+ "imagePullPolicy": "IfNotPresent" },
"name": "containerdisk" },
# ## With new ControllerRevisions referenced from the underlying VirtualMachine./cluster-up/kubectl.sh get vms/demo-updated -o json | jq '.spec.instancetype, .spec.preference'selecting docker as container runtime{
"kind": "VirtualMachineInstancetype",
"name": "small",
"revisionName": "demo-updated-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-2"}
{
"kind": "VirtualMachinePreference",
"name": "cirros",
"revisionName": "demo-updated-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-2"}
# ## We can also stop and start the original VirtualMachine without changing the VirtualMachineInstance it spawns./cluster-up/virtctl.sh stop demo && ./cluster-up/kubectl.sh wait vms/demo --for=condition=Ready=false./cluster-up/virtctl.sh start demo && ./cluster-up/kubectl.sh wait vms/demo --for=condition=Readydiff --color -u <( ./cluster-up/kubectl.sh get vms/demo -o json | jq .spec.template.spec) <( ./cluster-up/kubectl.sh get vmis/demo -o json | jq .spec)selecting docker as container runtimeVM demo was scheduled to stopselecting docker as container runtimevirtualmachine.kubevirt.io/demo condition metselecting docker as container runtimeError starting VirtualMachine Operation cannot be fulfilled on virtualmachine.kubevirt.io "demo": VM is already runningselecting docker as container runtimeselecting docker as container runtime--- /dev/fd/63 2022-08-03 13:38:51.291119408 +0100+++ /dev/fd/62 2022-08-03 13:38:51.291119408 +0100@@ -1,15+1,65 @@ {
"domain": {
- "devices": {},
+ "cpu": {+ "cores": 1,+ "model": "host-model",+ "sockets": 2,+ "threads": 1+ },+ "devices": {+ "disks": [+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "containerdisk"+ },+ {+ "disk": {+ "bus": "virtio"+ },+ "name": "cloudinitdisk"+ }+ ],+ "interfaces": [+ {+ "bridge": {},+ "model": "virtio",+ "name": "default"+ }+ ]+ },+ "features": {+ "acpi": {+ "enabled": true+ }+ },+ "firmware": {+ "uuid": "c89d1344-ee03-5c55-99bd-5df16b72bea0"+ },"machine": {
"type": "q35" },
- "resources": {}
+ "memory": {+ "guest": "128Mi"+ },+ "resources": {+ "requests": {+ "memory": "128Mi"+ }+ } },
+ "networks": [+ {+ "name": "default",+ "pod": {}+ }+ ],"volumes": [
{
"containerDisk": {
- "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"+ "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",+ "imagePullPolicy": "IfNotPresent" },
"name": "containerdisk" },
# ## The ControllerRevisions are owned by the VirtualMachines, as such removal of the VirtualMachines now removes the ControllerRevisions./cluster-up/kubectl.sh get controllerrevisions./cluster-up/kubectl.sh delete vms/demo vms/demo-updated vms/demo-cluster vms/demo-instancetype-user-preference./cluster-up/kubectl.sh get controllerrevisionsselecting docker as container runtimeNAME CONTROLLER REVISION AGEdemo-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-1 virtualmachine.kubevirt.io/demo 0 2m51sdemo-cluster-cirros-cluster-1562ae69-8a4b-4a75-8507-e0da5041c5d2-1 virtualmachine.kubevirt.io/demo-cluster 0 2m10sdemo-cluster-small-cluster-20c0a541-e24f-47c1-a1d7-1151e981a69c-1 virtualmachine.kubevirt.io/demo-cluster 0 2m10sdemo-instancetype-user-preference-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-1 virtualmachine.kubevirt.io/demo-instancetype-user-preference 0 98sdemo-instancetype-user-preference-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-1 virtualmachine.kubevirt.io/demo-instancetype-user-preference 0 98sdemo-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-1 virtualmachine.kubevirt.io/demo 0 2m51sdemo-updated-cirros-d08c3914-7d2b-43b4-a295-9cd3687bf151-2 virtualmachine.kubevirt.io/demo-updated 0 41sdemo-updated-small-4a28a2f3-fd34-421a-98d8-a2659f9a8eb7-2 virtualmachine.kubevirt.io/demo-updated 0 41slocal-volume-provisioner-55dcc65dc7 daemonset.apps/local-volume-provisioner 1 3h32mrevision-start-vm-5786044a-c20b-41a4-bba3-c7744c624935-2 virtualmachine.kubevirt.io/demo-instancetype-user-preference 2 98srevision-start-vm-a334ac37-aed4-4b98-b8b9-af819f54ffda-2 virtualmachine.kubevirt.io/demo-cluster 2 2m10srevision-start-vm-e67ad6ba-7792-40ab-9cd2-a411b6161971-2 virtualmachine.kubevirt.io/demo 2 2m51srevision-start-vm-f19cf23d-0ad6-438b-a166-20879a704fa9-2 virtualmachine.kubevirt.io/demo-updated 2 41sselecting docker as container runtimevirtualmachine.kubevirt.io "demo" deletedvirtualmachine.kubevirt.io "demo-updated" deletedvirtualmachine.kubevirt.io "demo-cluster" deletedvirtualmachine.kubevirt.io "demo-instancetype-user-preference" deletedselecting docker as container runtimeNAME CONTROLLER REVISION AGElocal-volume-provisioner-55dcc65dc7 daemonset.apps/local-volume-provisioner 1 3h32m
Welcome to part #2 of this series following the development of instancetypes and preferences within KubeVirt! Please note this is just a development journal of sorts, more formal documentation introducing and describing instancetypes will be forthcoming in the near future!
Versioning through ControllerRevisions has been introduced. As previously discussed the underlying VirtualMachineInstancetypeSpec or VirtualMachinePreferenceSpec are stored by the VirtualMachine controller in a ControllerRevision unique to the VirtualMachine being started. A reference to the ControllerRevision is then added to the VirtualMachine for future look ups with the VirtualMachine also itself referenced as an owner of these ControllerRevisions ensuring their removal when the VirtualMachine is deleted.
This PR will introduce new VirtualMachine subresource APIs to expand a referenced instancetype or set of preferences for an existing VirtualMachine or one provided by the caller.
Hopefully these APIs will be useful to users and fellow KubeVirt/OpenShift devs who want to validate or just present a fully rendered version of their VirtualMachine in some way.
It’s worth noting that during the development of this feature we encountered some interesting OpenAPI behaviour that took a while to debug and fix.
AutoattachInputDevice and PreferredAutoattachInputDevice
While working on a possible future migration of the common-templates project to using VirtualMachineInstancetypes and VirtualMachinePreferences it was noted that we had no way of automatically attaching an input device to a VirtualMachine.
This change introduces both a AutoattachInputDevice attribute to control this in a vanilla VirtualMachines and a PreferredAutoattachInputDevice preference to control this behaviour from within a set of preferences.
The PR includes a simple rework of the application of DevicePreferences, applying them before any of the Autoattach logic fires within the VirtualMachine controller. This allows the PreferredAutoattach preferences to control the Autoattach logic with the original application of preferences after this logic has fired ensuring any remaining preferences are also applied to any new devices.
VirtualMachineInstancePreset deprecation in favor of VirtualMachineInstancetype
This proposal still has to be raised formally with the community but as set out in the PR I’d like to start the deprecation cycle of VirtualMachineInstancePreset now as VirtualMachineInstancetype starts to mature as a replacement.
The end goal for this work has been to make the entire DomainSpec within VirtualMachineInstanceSpec optional, hopefully simplifying our VirtualMachine definitions further when used in conjunction with instancetypes and preferences.
The bug above covers a race with the current versioning implementation. This race allows a user to delete a referenced instancetype before the VirtualMachine referencing it has started, stashing a copy of the instancetype in a ControllerRevision. For example:
# ./cluster-up/kubectl.sh apply -f examples/csmall.yaml virtualmachineinstancetype.instancetype.kubevirt.io/csmall created
# ./cluster-up/kubectl.sh apply -f examples/vm-cirros-csmall.yaml virtualmachine.kubevirt.io/vm-cirros-csmall created
# ./cluster-up/kubectl.sh delete virtualmachineinstancetype/csmallvirtualmachineinstancetype.instancetype.kubevirt.io "csmall" deleted
# ./cluster-up/virtctl.sh start vm-cirros-csmallError starting VirtualMachine Internal error occurred: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Failure to find instancetype: virtualmachineinstancetypes.instancetype.kubevirt.io "csmall" not found
I believe we need one or more finalizers here ensuring that referenced instancetypes and preferences are not removed before they are stashed in a ControllerRevision.
An alternative to this would be to create ControllerRevisions within the VirtualMachine admission webhooks earlier in the lifecycle of a VirtualMachine. I had tried this originally but failed to successfully Patch the VirtualMachine with a reference back to the ControllerRevision, often seeing failures with the VirtualMachine controller attempting to reconcile the changes.
Moving the API to v1beta1
With the rename now complete and the future direction hopefully set out above I believe now is a good time to start looking into the graduation of the API itself from the experimental v1alpha1 stage to something more stable.
The software is well tested. Enabling a feature is considered safe. Features are enabled by default.
The support for a feature will not be dropped, though the details may change.
The schema and/or semantics of objects may change in incompatible ways in a subsequent beta or stable release. When this happens, migration instructions are provided. Schema changes may require deleting, editing, and re-creating API objects. The editing process may not be straightforward. The migration may require downtime for applications that rely on the feature.
The software is not recommended for production uses. Subsequent releases may introduce incompatible changes. If you have multiple clusters which can be upgraded independently, you may be able to relax this restriction.
I believe the instancetype API can meet these criteria in the near future if it isn’t already and so I will be looking to start the process soon.
User-guide documentation
With the rename complete I have finally started drafting some upstream user-guide documentation that I hope to post in a PR soon.
Introductory kubevirt.io blog post(s)
Following on from the user-guide documentation I also plan on writing and publishing some material introducing instancetypes and preferences on the kubevirt.io blog.
Much has changed since my last post introducing the VirtualMachine{Flavor,Preference}KubeVirtCRDs. In this post I’m going to touch on some of this, what’s coming next and provide a quick demo at the end.
What’s new
Introduction of VirtualMachine{ClusterPreference,Preference}
The two main PRs referenced by my previous post have landed, refactoring the initial code and introducing the VirtualMachine{ClusterPreference,Preference} CRDs to KubeVirt.
PreferredCPUTopology now defaults to PreferSockets
This was a trivial change as it was something the VirtualMachineInstance mutation webhook already defaults to if no topology is provided but a number of vCPUs are defined through resource requests.
func (mutator*VMIsMutator) setDefaultGuestCPUTopology(vmi*v1.VirtualMachineInstance) {
cores:= uint32(1)
threads:= uint32(1)
sockets:= uint32(1)
vmiCPU:=vmi.Spec.Domain.CPUifvmiCPU==nil|| (vmiCPU.Cores==0&&vmiCPU.Sockets==0&&vmiCPU.Threads==0) {
// create cpu topology struct
ifvmi.Spec.Domain.CPU==nil {
vmi.Spec.Domain.CPU = &v1.CPU{}
}
//if cores, sockets, threads are not set, take value from domain resources request or limits and
//set value into sockets, which have best performance (https://bugzilla.redhat.com/show_bug.cgi?id=1653453)
resources:=vmi.Spec.Domain.ResourcesifcpuLimit, ok:=resources.Limits[k8sv1.ResourceCPU]; ok {
sockets = uint32(cpuLimit.Value())
} elseifcpuRequests, ok:=resources.Requests[k8sv1.ResourceCPU]; ok {
sockets = uint32(cpuRequests.Value())
}
vmi.Spec.Domain.CPU.Sockets = socketsvmi.Spec.Domain.CPU.Cores = coresvmi.Spec.Domain.CPU.Threads = threads }
}
Lots of work went into this PR but ultimately the use cases around the direct use of the VirtualMachineInstance CRD by end users isn’t strong enough to justify the extra complexity introduced by it.
With application of a flavor and preference no longer moving to the VirtualMachineInstance mutation webhook we now had ensure that all devices would be present by the time the existing application would happen within the VirtualMachine controller.
The above change moves and shares code from the VirtualMachineInstance mutation webhook that adds missing any missing Disks for listed Volumes and also adds a default Network and associated Interface if none are provided. This ensures that any preferences applied by the VirtualMachine controller to the VirtualMachineInstance object are also applied to these devices.
For example given the following VirtualMachinePreference that defines a preferredDiskBus and preferredInterfaceModel of virtio a VirtualMachine that doesn’t list any Disks or Interfaces will now have these preferences applied to the devices added during the creation of the VirtualMachineInstance. With these devices now being introduced by the VirtualMachine controller itself instead of the VirtualMachineInstance mutation webhook.
After this I started to notice the occasional CI failure both up and downstream that appeared to marry with the suggested symptoms. Either a recently created VirtualMachine{Flavor,Preference} object would not be seen by another worker or the generation of the object seen by the worker would be older than expected, leading to failures.
As such we decided to remove the use of SharedInformers for flavors and preferences, reverting back to straight client calls for retrieval instead. The impact of this change hasn’t been fully measured yet but is on our radar for the coming weeks to ensure performance isn’t impacted.
This is another large foundational PR making the entire concept usable by users in the real world. The current design is for ControllerRevisions containing the VirtualMachineFlavorSpec and VirtualMachinePreferenceSpec to be created after the initial application of a flavor and preference to a VirtualMachineInstance by the VirtualMachine controller are start time. A reference to these ControllerRevisions is then patched into the FlavorMatcher and PreferenceMatcher associated with the VirtualMachine and used to gather the specs in the future. Hopefully ensuring future restarts will continue to produce the same VirtualMachineInstance object.
I was not involved in the initial design and naming of the CRDs but after coming onboard it was quickly highlighted that while OpenStack uses Flavors all other public cloud providers use some form of Type object to contain their resource and performance characteristics. With that in mind we have agreed to rename the CRDs for KubeVirt.
VirtualMachineFlavor and VirtualMachineClusterFlavor will become VirtualMachineInstancetype and VirtualMachineClusterInstancetype.
This aligns us with the public cloud providers while making it clear that these CRDs relate to the VirtualMachineInstance. We couldn’t shorten this to VirtualMachineType anyway as that could easily clash with the MachineType term from QEMU that we already expose as part of our API.
VirtualMachinePreference and VirtualMachineClusterPreference will also become VirtualMachineInstancePreference and VirtualMachineInstanceClusterPreference.
How and when this happens is still up in the air but the current suggestion is that these new CRDs will live alongside the existing CRDs while we deprecate and eventually remove them from the project.
Design document updates
Between versioning and renaming there’s lots of change listed above and I do want to get back to the design document before this all lands.
The demo itself introduces the current CRDs, their basic behaviour, their interaction with default devices and finally their behaviour with the above versioning PR applied.
I was time limited in the downstream presentation I gave using this recording so please be aware it moves pretty quickly between topics. I’d highly recommend downloading the file and using asciinema to play it locally along with the spacebar to pause between commands.
A common pattern for IaaS is to have abstractions separating the resource sizing and performance of a workload from the user defined values related to launching their custom application. This pattern is evident across all the major cloud providers (also known as hyperscalers) as well as open source IaaS projects like OpenStack. AWS has instance types, GCP has machine types, Azure has instance VM sizes and OpenStack has flavors.
Let’s take AWS for example to help visualize what this abstraction enables. Launching an EC2 instance only requires a few top level arguments, the disk image, instance type, keypair, security group, and subnet:
When creating the EC2 instance the user doesn’t define the amount of resources, what processor to use, how to optimize the performance of the instance, or what hardware to schedule the instance on. Instead all of that information is wrapped up in that single --instance-type c4.xlarge CLI argument. c4 denoting a specific performance profile version, in this case from the Compute Optimized family and xlarge denoting a specific amount of compute resources provided by the instance type, in this case 4 vCPUs, 7.5 GiB of RAM, 750 Mbps EBS bandwidth etc.
While hyperscalers can provide predefined types with performance profiles and compute resources already assigned IaaS and virtualization projects such as OpenStack and KubeVirt can only provide the raw abstractions for operators, admins and even vendors to then create instances of these abstractions specific to each deployment.
KubeVirt’s VirtualMachine API contains many advanced options for tuning a virtual machine performance that goes beyond what typical users need to be aware of. Users are unable to simply define the storage/network they want assigned to their VM and then declare in broad terms what quality of resources and kind of performance they need for their VM.
Instead, the user has to be keenly aware how to request specific compute resources alongside all of the performance tunings available on the VirtualMachine API and how those tunings impact their guest’s operating system in order to get a desired result.
This approach has a few pitfalls such as using embedded profiles within the CRDs, relying on the user to select the correct Flavor or VirtualMachineFlavorProfile that will allow their workload to run correctly, not allowing a user to override some viable attributes at runtime etc.
VirtualMachineFlavor refactor
As suggested in the title of this blog post, the ultimate goal of the Design Proposal is to provide the end user with a simple set of choices when defining a VirtualMachine within KubeVirt. We want to limit this to a flavor, optional set of preferences, volumes for storage and networks for connectivity.
To achieve this the existing VirtualMachineFlavor CRDs will be heavily modified and extended to better encapsulate resource, performance or schedulable attributes of a VM.
This will include the removal of the embedded VirtualMachineFlavorProfile type within the CRDs, this will be replaced with a singular VirtualMachineFlavorSpec type per flavor. The decision to remove VirtualMachineFlavorProfile has been made as the concept isn’t prevalent within the wider Kubernetes ecosystem and could be confusing to end users. Instead users looking to avoid duplication when defining flavors will be directed to use tools such as kustomize to generate their flavors. This tooling is already commonly used when defining resources within Kubernetes and should afford users plenty of flexibility when defining their flavors either statically or as part of a larger GitOps based workflow.
VirtualMachineFlavorSpec will also include elements of CPU, Devices, HostDevices, GPUs, Memory and LaunchSecurity defined fully below. Users will be unable to override any aspect of the flavor (for example, vCPU count or amount of Memory) within the VirtualMachine itself, any attempt to do so resulting in the VirtualMachine being rejected.
Introduction of VirtualMachinePreference
A new set of VirtualMachinePreference CRDs will then be introduced to define any remaining attributes related to ensuring the selected guestOS can run. As the name suggests the VirtualMachinePreference CRDs will only define preferences, so unlike a flavor if a preference conflicts with something user defined within the VirtualMachine it will be ignored. For example, if a user selects a VirtualMachinePreference that requests a preferredDiskBus of virtio but then sets a disk bus of SATA for one or more disk devices within the VirtualMachine the supplied preferredDiskBus preference will not be applied to these disks. Any remaining disks that do not have a disk bus defined will however use the preferredDiskBus preference of virtio.
Versioning of these CRDs is key to ensure VirtualMachine and VirtualMachineInstance remain unchanged even with modifications to an associated Flavor or Preference.
This is currently missing from the Design Proposal but is being worked on and will be incorporated shortly.
What else?
The current Design Proposal does list some useful ideas as non-goals for the initial implementation, these include:
Introspection of imported images to determine the correct guest OS related VirtualMachinePreferences to apply.
Using image labels to determine the correct guest OS related VirtualMachinePreferences to apply.
I’ve created an example repo (many thanks to @fabiand for starting this) using kustomize to generate various classes and sizes of flavors alongside preferences.
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/example.yaml
selecting docker as container runtime
virtualmachineflavor.flavor.kubevirt.io/c.large created
virtualmachineflavor.flavor.kubevirt.io/c.medium created
virtualmachineflavor.flavor.kubevirt.io/c.small created
virtualmachineflavor.flavor.kubevirt.io/c.xlarge created
virtualmachineflavor.flavor.kubevirt.io/c.xsmall created
virtualmachineflavor.flavor.kubevirt.io/g.medium created
virtualmachineflavor.flavor.kubevirt.io/g.xlarge created
virtualmachineflavor.flavor.kubevirt.io/g.xsmall created
virtualmachineflavor.flavor.kubevirt.io/m.large created
virtualmachineflavor.flavor.kubevirt.io/m.medium created
virtualmachineflavor.flavor.kubevirt.io/m.small created
virtualmachineflavor.flavor.kubevirt.io/m.xlarge created
virtualmachineflavor.flavor.kubevirt.io/m.xsmall created
virtualmachineflavor.flavor.kubevirt.io/r.large created
virtualmachineflavor.flavor.kubevirt.io/r.medium created
virtualmachineflavor.flavor.kubevirt.io/r.xlarge created
virtualmachineflavor.flavor.kubevirt.io/r.xsmall created
virtualmachinepreference.flavor.kubevirt.io/linux.cirros created
virtualmachinepreference.flavor.kubevirt.io/linux.fedora created
virtualmachinepreference.flavor.kubevirt.io/linux.rhel9 created
virtualmachinepreference.flavor.kubevirt.io/windows.windows10 created
$ cat ../vmdefs/example.yaml
[...]---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachineFlavor
metadata:
name: m.xsmall
spec:
cpu:
guest: 1 memory:
guest: 512M
[...]---
apiVersion: flavor.kubevirt.io/v1alpha1
kind: VirtualMachinePreference
metadata:
name: linux.cirros
spec:
devices:
preferredCdromBus: virtio
preferredDiskBus: virtio
preferredRng: {}[...]$ cat ../vmdefs/cirros.yaml
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: cirros
name: cirros
spec:
flavor:
name: m.xsmall
kind: VirtualMachineFlavor
preference:
name: linux.cirros
kind: VirtualMachinePreference
running: false
template:
metadata:
labels:
kubevirt.io/vm: cirros
spec:
domain:
devices:
disks:
- disk:
name: containerdisk
- disk:
name: cloudinitdisk
resources: {} terminationGracePeriodSeconds: 0 volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:devel
name: containerdisk
- cloudInitNoCloud:
userData: |
#!/bin/sh echo 'printed from cloud-init userdata' name: cloudinitdisk
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh apply -f ../vmdefs/cirros.yaml
selecting docker as container runtime
virtualmachine.kubevirt.io/cirros created
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/virtctl.sh start cirros
selecting docker as container runtime
VM cirros was scheduled to start
$ KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis
selecting docker as container runtime
NAME AGE PHASE IP NODENAME READY
cirros 9s Running 10.244.196.134 node01 True
$ diff <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vms/cirros -o json | jq --sort-keys .spec.template.spec) <(KUBEVIRT_PROVIDER=k8s-1.23 ./cluster-up/kubectl.sh get vmis/cirros -o json | jq --sort-keys .spec)selecting docker as container runtime
selecting docker as container runtime
2a3,8
> "cpu": {> "cores": 1,
> "model": "host-model",
> "sockets": 1,
> "threads": 1> },
5a12,14
> "disk": {> "bus": "virtio"> },
8a18,20
> "disk": {> "bus": "virtio"> },
11c23,38
< ]---
> ],
> "interfaces": [> {> "bridge": {},
> "name": "default"> }> ],
> "rng": {}> },
> "features": {> "acpi": {> "enabled": true
> }> },
> "firmware": {> "uuid": "6784d43b-39fb-5ee7-8c17-ef10c49af985"16c43,50
< "resources": {}---
> "memory": {> "guest": "512M"> },
> "resources": {> "requests": {> "memory": "512M"> }> }17a52,57
> "networks": [> {> "name": "default",
> "pod": {}> }> ],
22c62,63
< "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel"---
> "image": "registry:5000/kubevirt/cirros-container-disk-demo:devel",
> "imagePullPolicy": "IfNotPresent"
Windows
Below is a basic example taken from the Design Proposal that defines a single VirtualMachineFlavor and VirtualMachinePreference to simplify the creation of Windows based VirtualMachine and later once started a VirtualMachineInstance:
Randomly this week I’ve found myself needing to share a local build of
Cirros with a simple but unreleased
fix
included to unblock some testing upstream in OpenStack Nova’s
CI.
I’ll admit that it has been a while since I’ve had to personally run any kind
of web service (this blog and versions before it being hosted on Gitlab and
Github pages) and so I didn’t really know where to start.
I knew that I wanted something that was both containerised and able to
automatically sort out letsencrypt certs so I could
serve the build over https. After trying and failing to get anywhere with
the default httpd and nginx container images I gave up and asked around
on Twitter.
Shortly after a friend recommended Caddy an Apache
2.0 web server written in Go with automatic https configuration, exactly what I
was after.
As I said before I wanted a containerised web service to run on my Fedora based
VPS, ensuring I didn’t end up with a full service running directly on the host
that would be a pain to remove later. Thankfully Caddy has an offical image on
DockerHub I could pull and use
with Podman, the daemonless container engine and Docker
replacement on Fedora.
Now it is possible to run podman containers under a non-root user but as I
wanted to run the service under ports 80 and 443 I had to launch the
container using root. In theory you could adjust your host config to allow
non-root users to create ports below 1024 via
net.ipv4.ip_unprivileged_port_start=$start_portor just use higher ports
to avoid this but I don’t mind running the service as root given the files are
being shared via a read-only volume anyway.
As we used the --browse flag when launching Caddy you should now find a
simple index page listing your files.
That’s it, simple, fast and hopefully easy to cleanup when I no longer need to
share these files. Maybe I should ask for suggestions on Twitter more often!
Block Device Mapping(s)(BDMs) define how block devices are exposed to an
instance by Nova. At present Nova accepts and stores data relating to these
mappings in various ways across the codebase often leading to the five stages
of grief being experienced
by anyone unlucky enough to come into contact with them.
The user facing format while awkward and unforgiving is pretty well documented
both in the API reference
guide
and project user
documentation.
The internal data structures used by the nova.computeand nova.virt
layers are not however well documented outside of some limited code comments.
I’m personally working on this document and posting this blog post now as I
plan to spend the rest of the OpenStack Wallaby
cycle working on adding
flavor and image defined ephemeral storage
encryption
support into the libvirt
driver. As part
of this work I’ve had to dive into some of these data structures again and
wanted to document things ahead of any changes required by this work. I also
guess it has been a while since I posted anything of value on this blog so what
the hell.
I’ll aim to keep this post updated over the coming weeks and will add a
reference to the published document once merged.
Generic
BlockDeviceMapping
The top level data structure is the
nova.objects.block_device.BlockDeviceMapping(BDM)
object. It is a NovaObject, persisted in the db. Current code creates a BDM
object for every disk associated with an instance, whether it is a volume or
not.
The BDM object describes properties of each disk as specified by the user. It
is initially from a user request, for more details on the format of these
requests please see the Block Device Mapping in
Nova
document.
The Compute API transforms and consolidates all BDMs to ensure that all disks,
explicit or implicit, have a BDM, then persists them. Look in
nova.objects.block_device for all BDM fields, but in essence they contain
information like (source_type=‘image’, destination_type=‘local’,
image_id=’<image uuid’>), or equivalents describing ephemeral disks, swap disks
or volumes, and some associated data.
NOTE
BDM objects are typically stored in variables called bdm with lists in
bdms, although this is obviously not guaranteed (and unfortunately not always
true: bdm in libvirt.block_device is usually a DriverBlockDevice object). This
is a useful reading aid (except when it’s proactively confounding), as there is
also something else typically called ‘block_device_mapping’ which is not a
BlockDeviceMapping object.
block_device_info
Drivers do not directly use BDM objects. Instead, they are transformed into a
different driver-specific representation. This representation is normally
called block_device_info, and is generated by
virt.driver.get_block_device_info(). Its output is based on data in BDMs.
block_device_info is a dict containing:
{
'root_device_name': hypervisor's notion of the root device's name
'ephemerals': A list of all ephemeral disks
'block_device_mapping': A list of all cinder volumes
'swap': A swap disk, or None if there is no swap disk
}
The disks are represented in one of 2 ways, which depends on the specific
driver currently in use. There’s the ’new’ representation, used by the libvirt
and vmwareAPI drivers, and the ’legacy’ representation used by all other
drivers. The legacy representation is a plain dict. It does not contain the
same information as the new representation.
The new representation involves subclasses of
nova.block_device.DriverBlockDevice. As well as containing different
fields, the new representation significantly also retains a reference to the
underlying BDM object. This means that by manipulating the
DriverBlockDevice object, the driver is able to persist data to the BDM
object in the db.
NOTE
Common usage is to pull block_device_mapping out of this dict into a
variable called block_device_mapping. This is not a BlockDeviceMapping
object, or list of them.
NOTE
If block_device_info was passed to the driver by compute manager, it was
probably generated by _get_instance_block_device_info(). By default, this
function filters out all cinder volumes from block_device_mapping which
don’t currently have connection_info. In other contexts this filtering
will not have happened, and block_device_mapping will contain all volumes.
NOTE
Unlike BDMs, block_device_info does not currently represent all disks that
an instance might have. Significantly, it will not contain any representation
of an image-backed local disk, i.e. the root disk of a typical instance which
isn’t boot-from-volume. Other representations used by the libvirt driver
explicitly reconstruct this missing disk.
libvirt
instance_disk_info
The virt driver API defines a method get_instance_disk_info, which returns a
JSON blob. The compute manager calls this and passes the data over RPC
between calls without ever looking at it. This is driver-specific opaque
data. It is also only used by the libvirt driver, despite being part of the
API for all drivers. Other drivers do not return any data. The most
interesting aspect of instance_disk_info is that it is generated from the
libvirt XML, not from nova’s state.
.. note:: instance_disk_info is often named disk_info in code, which
is unfortunate as this clashes with the normal naming of the next
structure. Occasionally the two are used in the same block of code.
instance_disk_info is a list of dicts for some of an instance’s disks.
.. note:: rbd disks (including non-volume disks) and cinder volumes
are not included in instance_disk_info.
Each dicts contains the following:
{
'type': libvirt's notion of the disk's type
'path': libvirt's notion of the disk's path
'virt_disk_size': The disk's virtual size in bytes (the size the guest OS sees)
'backing_file': libvirt's notion of the backing file path
'disk_size': The file size of path, in bytes.
'over_committed_disk_size': As-yet-unallocated disk size, in bytes.
}
disk_info
.. note:: As opposed to instance_disk_info, which is frequently called
disk_info.
This data structure is actually described pretty well in the comment block at
the top of nova.virt.libvirt.blockinfo. It is internal to the libvirt
driver. It contains:
{
'disk_bus': the default bus used by disks
'cdrom_bus': the default bus used by cdrom drives
'mapping': defined below
}
mapping is a dict which maps disk names to a dict describing how that disk
should be passed to libvirt. This mapping contains every disk connected to the
instance, both local and volumes.
First, a note on disk naming. Local disk names used by the libvirt driver are
well defined. They are:
disk: The root disk
disk.local: The flavor-defined ephemeral disk
disk.ephX: Where X is a zero-based index for BDM defined ephemeral disks
disk.swap: The swap disk
disk.config: The config disk
These names are hardcoded, reliable, and used in lots of places.
In disk_info, volumes are keyed by device name, eg ‘vda’, ‘vdb’. Different
buses will be named differently, approximately according to legacy Linux
device naming.
Additionally, disk_info will contain a mapping for ‘root’, which is the
root disk. This will duplicate one of the other entries, either ‘disk’ or a
volume mapping.
Each dict within the mapping dict contains the following 3 required fields
of bus, dev and type with two optional fields of format and boot_index
{
'bus': the guest bus type ('ide', 'virtio', 'scsi', etc)
'dev': the device name 'vda', 'hdc', 'sdf', 'xvde' etc
'type': type of device eg 'disk', 'cdrom', 'floppy'
'format': Which format to apply to the device if applicable
'boot_index': Number designating the boot order of the device
}
NOTE
BlockDeviceMapping and DriverBlockDevice store boot index zero-based.
However, libvirt’s boot index is 1-based, so the value stored here is 1-based.
As in previous cycles I’ve updated some of the Nova specific dashboards
available within the excellent
gerrit-dash-creator project and
started using them prior to dropping offline on paternity.
I’d really like to see more use of these dashboards within Nova to help focus
our limited review bandwidth on active and mergeable changes so if you do have
any ideas please fire off reviews and add me in!
For now I’ve linked to some of the dashboards I’ve been using most often below
with a brief summary and dump of the current .dash logic used by the
gerrit-dash-creator tooling to build the Gerrit dashboard URLs.
The openstack/nova-specs repo contains Nova design
specifications associated
with both the previous and current development
release.
This dashboard specifically targets the current development release as we
should only see reviews landing in gerrit referring to this release at present.
[dashboard]
title = Nova Specs - Victoria
description = Review Inbox
foreach = project:openstack/nova-specs status:open NOT label:Workflow<=-1 branch:master NOT owner:self
[section "You are a reviewer, but haven't voted in the current revision"]
query = file:^specs/victoria/.* NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self label:Verified>=1,zuul
[section "Not blocked by -2s"]
query = file:^specs/victoria/.* NOT label:Code-Review<=-2 NOT label:Code-Review>=2 NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self label:Verified>=1,zuul
[section "No votes and spec is > 1 week old"]
query = file:^specs/victoria/.* NOT label:Code-Review>=-2 age:7d label:Verified>=1,zuul
[section "Needs final +2"]
query = file:^specs/victoria/.* label:Code-Review>=2 NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self label:Verified>=1,zuul NOT label:workflow>=1
[section "Broken Specs (doesn't pass Zuul)"]
query = file:^specs/victoria/.* label:Verified<=-1,zuul
[section "Dead Specs (blocked by a -2)"]
query = file:^specs/victoria/.* label:Code-Review<=-2
[section "Dead Specs (Not Proposed for Victoria)"]
query = NOT file:^specs/victoria/.* file:^specs/.*
[section "Not Specs (tox.ini etc)"]
query = NOT file:^specs/.*
I introduced this dashboard after the
creation of the libvirt
subteam recently during
the U cycle. As you can see from the foreach filter the dashboard only
lists changes touching the standard set of libvirt driver related files within
the openstack/nova codebase. IMHO I think a dashboard for non-libvirt
drivers would also be useful.
[dashboard]
title = Nova Libvirt Driver Review Inbox
description = Review Inbox for the Nova Libvirt Driver
foreach = project:openstack/nova
status:open
NOT owner:self
NOT label:Workflow<=-1
label:Verified>=1,zuul
NOT reviewedby:self
branch:master
(file:^nova/virt/libvirt/.* OR file:^nova/tests/unit/libvirt/.* OR file:^nova/tests/functional/libvirt/.*)
[section "Small patches"]
query = NOT label:Code-Review>=2,self NOT label:Code-Review<=-1,nova-core NOT message:"DNM" delta:<=10
[section "Needs final +2"]
query = NOT label:Code-Review>=2,self label:Code-Review>=2 limit:50 NOT label:workflow>=1
[section "Bug fix, Passed Zuul, No Negative Feedback"]
query = NOT label:Code-Review>=2,self NOT label:Code-Review<=-1,nova-core message:"bug: " limit:50
[section "Wayward Changes (Changes with no code review in the last two days)"]
query = NOT label:Code-Review<=-1 NOT label:Code-Review>=1 age:2d limit:50
[section "Needs feedback (Changes older than 5 days that have not been reviewed by anyone)"]
query = NOT label:Code-Review<=-1 NOT label:Code-Review>=1 age:5d limit:50
[section "Passed Zuul, No Negative Feedback"]
query = NOT label:Code-Review>=2 NOT label:Code-Review<=-1 limit:50
[section "Needs revisit (You were a reviewer but haven't voted in the current revision)"]
query = reviewer:self limit:50
I have been a Nova Stable Core for a few years now and during the time I have
relied heavily on Gerrit dashboards and queries to help keep track of changes
as they move through our many stable branches. This has been made slightly more
complex by the introduction of
extended-maintenance
branches but more on that below. For now this dashboard covers the ussuri,
train and stein stable branches.
I’m currently using the by branch Nova stable dashboards as these allow me
to track changes through each required branch easily without any additional
clicking within Gerrit. There is however an allinone dashboard if you prefer
that approach.
Finally, for anyone paying attention you might have noticed I’m also using a
nova-merged
query in Gerrit to track recently merged changes into master. This has helped
me catch and proactively backport useful fixes to stable many times.
[dashboard]
title = Nova Stable Maintenance Review Inbox
description = Review Inbox
foreach = (project:openstack/nova OR project:openstack/python-novaclient) status:open NOT owner:self NOT label:Workflow<=-1 label:Verified>=1,zuul NOT reviewedby:self
[section " stable/ussuri You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/ussuri
[section "stable/ussuri Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/ussuri
[section "stable/ussuri Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/ussuri
[section " stable/train You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/train
[section "stable/train Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/train
[section "stable/train Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/train
[section " stable/stein You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/stein
[section "stable/stein Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/stein
[section "stable/stein Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/stein
In addition to the nova-stable dashboard above I also have a dashboard for our
extended-maintenance
branches. At present these are (or are about to be) rocky, queens and pike.
[dashboard]
title = Nova Extended Maintenance Review Inbox
description = Review Inbox
foreach = (project:openstack/nova OR project:openstack/python-novaclient) status:open NOT owner:self NOT label:Workflow<=-1 label:Verified>=1,zuul NOT reviewedby:self
[section " stable/rocky You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/rocky
[section "stable/rocky Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/rocky
[section "stable/rocky Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/rocky
[section " stable/queens You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/queens
[section "stable/queens Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/queens
[section "stable/queens Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/queens
[section " stable/pike You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/pike
[section "stable/pike Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/pike
[section "stable/pike Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/pike