Block Device Mapping?
Block Device Mapping(s)(BDMs) define how block devices are exposed to an instance by Nova. At present Nova accepts and stores data relating to these mappings in various ways across the codebase often leading to the five stages of grief being experienced by anyone unlucky enough to come into contact with them.
The user facing format while awkward and unforgiving is pretty well documented
both in the API reference
guide
and project user
documentation.
The internal data structures used by the nova.compute
and nova.virt
layers are not however well documented outside of some limited code comments.
The following post is taken from a OpenStack Nova reference document I’m currently working on to correct this. This is based on an email sent by my colleague Matthew Booth almost 5 years ago now but is still as relevant today as it was then.
I’m personally working on this document and posting this blog post now as I plan to spend the rest of the OpenStack Wallaby cycle working on adding flavor and image defined ephemeral storage encryption support into the libvirt driver. As part of this work I’ve had to dive into some of these data structures again and wanted to document things ahead of any changes required by this work. I also guess it has been a while since I posted anything of value on this blog so what the hell.
I’ll aim to keep this post updated over the coming weeks and will add a reference to the published document once merged.
Generic
BlockDeviceMapping
The top level data structure is the
nova.objects.block_device.BlockDeviceMapping
(BDM)
object. It is a NovaObject
, persisted in the db. Current code creates a BDM
object for every disk associated with an instance, whether it is a volume or
not.
The BDM object describes properties of each disk as specified by the user. It is initially from a user request, for more details on the format of these requests please see the Block Device Mapping in Nova document.
The Compute API transforms and consolidates all BDMs to ensure that all disks,
explicit or implicit, have a BDM, then persists them. Look in
nova.objects.block_device
for all BDM fields, but in essence they contain
information like (source_type=‘image’, destination_type=‘local’,
image_id=’<image uuid’>), or equivalents describing ephemeral disks, swap disks
or volumes, and some associated data.
NOTE
BDM objects are typically stored in variables called bdm
with lists in
bdms
, although this is obviously not guaranteed (and unfortunately not always
true: bdm in libvirt.block_device is usually a DriverBlockDevice object). This
is a useful reading aid (except when it’s proactively confounding), as there is
also something else typically called ‘block_device_mapping’ which is not a
BlockDeviceMapping object.
block_device_info
Drivers do not directly use BDM objects. Instead, they are transformed into a
different driver-specific representation. This representation is normally
called block_device_info
, and is generated by
virt.driver.get_block_device_info()
. Its output is based on data in BDMs.
block_device_info
is a dict containing:
{
'root_device_name': hypervisor's notion of the root device's name
'ephemerals': A list of all ephemeral disks
'block_device_mapping': A list of all cinder volumes
'swap': A swap disk, or None if there is no swap disk
}
The disks are represented in one of 2 ways, which depends on the specific driver currently in use. There’s the ’new’ representation, used by the libvirt and vmwareAPI drivers, and the ’legacy’ representation used by all other drivers. The legacy representation is a plain dict. It does not contain the same information as the new representation.
The new representation involves subclasses of
nova.block_device.DriverBlockDevice
. As well as containing different
fields, the new representation significantly also retains a reference to the
underlying BDM object. This means that by manipulating the
DriverBlockDevice
object, the driver is able to persist data to the BDM
object in the db.
NOTE
Common usage is to pull block_device_mapping
out of this dict into a
variable called block_device_mapping
. This is not a BlockDeviceMapping
object, or list of them.
NOTE
If block_device_info was passed to the driver by compute manager, it was
probably generated by _get_instance_block_device_info()
. By default, this
function filters out all cinder volumes from block_device_mapping
which
don’t currently have connection_info
. In other contexts this filtering
will not have happened, and block_device_mapping
will contain all volumes.
NOTE
Unlike BDMs, block_device_info
does not currently represent all disks that
an instance might have. Significantly, it will not contain any representation
of an image-backed local disk, i.e. the root disk of a typical instance which
isn’t boot-from-volume. Other representations used by the libvirt driver
explicitly reconstruct this missing disk.
libvirt
instance_disk_info
The virt driver API defines a method get_instance_disk_info, which returns a JSON blob. The compute manager calls this and passes the data over RPC between calls without ever looking at it. This is driver-specific opaque data. It is also only used by the libvirt driver, despite being part of the API for all drivers. Other drivers do not return any data. The most interesting aspect of instance_disk_info is that it is generated from the libvirt XML, not from nova’s state.
.. note:: instance_disk_info
is often named disk_info
in code, which
is unfortunate as this clashes with the normal naming of the next
structure. Occasionally the two are used in the same block of code.
instance_disk_info
is a list of dicts for some of an instance’s disks.
.. note:: rbd disks (including non-volume disks) and cinder volumes are not included in instance_disk_info.
Each dicts contains the following:
{
'type': libvirt's notion of the disk's type
'path': libvirt's notion of the disk's path
'virt_disk_size': The disk's virtual size in bytes (the size the guest OS sees)
'backing_file': libvirt's notion of the backing file path
'disk_size': The file size of path, in bytes.
'over_committed_disk_size': As-yet-unallocated disk size, in bytes.
}
disk_info
.. note:: As opposed to instance_disk_info
, which is frequently called
disk_info
.
This data structure is actually described pretty well in the comment block at
the top of nova.virt.libvirt.blockinfo
. It is internal to the libvirt
driver. It contains:
{
'disk_bus': the default bus used by disks
'cdrom_bus': the default bus used by cdrom drives
'mapping': defined below
}
mapping
is a dict which maps disk names to a dict describing how that disk
should be passed to libvirt. This mapping contains every disk connected to the
instance, both local and volumes.
First, a note on disk naming. Local disk names used by the libvirt driver are well defined. They are:
disk
: The root diskdisk.local
: The flavor-defined ephemeral diskdisk.ephX
: Where X is a zero-based index for BDM defined ephemeral disksdisk.swap
: The swap diskdisk.config
: The config disk
These names are hardcoded, reliable, and used in lots of places.
In disk_info
, volumes are keyed by device name, eg ‘vda’, ‘vdb’. Different
buses will be named differently, approximately according to legacy Linux
device naming.
Additionally, disk_info will contain a mapping for ‘root’, which is the root disk. This will duplicate one of the other entries, either ‘disk’ or a volume mapping.
Each dict within the mapping
dict contains the following 3 required fields
of bus, dev and type with two optional fields of format and boot_index
{
'bus': the guest bus type ('ide', 'virtio', 'scsi', etc)
'dev': the device name 'vda', 'hdc', 'sdf', 'xvde' etc
'type': type of device eg 'disk', 'cdrom', 'floppy'
'format': Which format to apply to the device if applicable
'boot_index': Number designating the boot order of the device
}
NOTE
BlockDeviceMapping
and DriverBlockDevice
store boot index zero-based.
However, libvirt’s boot index is 1-based, so the value stored here is 1-based.