OpenStack Nova Block Device Mapping Data Structures

Block Device Mapping?

Block Device Mapping(s)(BDMs) define how block devices are exposed to an instance by Nova. At present Nova accepts and stores data relating to these mappings in various ways across the codebase often leading to the five stages of grief being experienced by anyone unlucky enough to come into contact with them.

The user facing format while awkward and unforgiving is pretty well documented both in the API reference guide and project user documentation. The internal data structures used by the nova.computeand nova.virt layers are not however well documented outside of some limited code comments.

The following post is taken from a OpenStack Nova reference document I’m currently working on to correct this. This is based on an email sent by my colleague Matthew Booth almost 5 years ago now but is still as relevant today as it was then.

I’m personally working on this document and posting this blog post now as I plan to spend the rest of the OpenStack Wallaby cycle working on adding flavor and image defined ephemeral storage encryption support into the libvirt driver. As part of this work I’ve had to dive into some of these data structures again and wanted to document things ahead of any changes required by this work. I also guess it has been a while since I posted anything of value on this blog so what the hell.

I’ll aim to keep this post updated over the coming weeks and will add a reference to the published document once merged.

Generic

BlockDeviceMapping

The top level data structure is the nova.objects.block_device.BlockDeviceMapping(BDM) object. It is a NovaObject, persisted in the db. Current code creates a BDM object for every disk associated with an instance, whether it is a volume or not.

The BDM object describes properties of each disk as specified by the user. It is initially from a user request, for more details on the format of these requests please see the Block Device Mapping in Nova document.

The Compute API transforms and consolidates all BDMs to ensure that all disks, explicit or implicit, have a BDM, then persists them. Look in nova.objects.block_device for all BDM fields, but in essence they contain information like (source_type=‘image’, destination_type=‘local’, image_id=’<image uuid’>), or equivalents describing ephemeral disks, swap disks or volumes, and some associated data.


NOTE

BDM objects are typically stored in variables called bdm with lists in bdms, although this is obviously not guaranteed (and unfortunately not always true: bdm in libvirt.block_device is usually a DriverBlockDevice object). This is a useful reading aid (except when it’s proactively confounding), as there is also something else typically called ‘block_device_mapping’ which is not a BlockDeviceMapping object.


block_device_info

Drivers do not directly use BDM objects. Instead, they are transformed into a different driver-specific representation. This representation is normally called block_device_info, and is generated by virt.driver.get_block_device_info(). Its output is based on data in BDMs. block_device_info is a dict containing:


  {
    'root_device_name': hypervisor's notion of the root device's name
    'ephemerals': A list of all ephemeral disks
    'block_device_mapping': A list of all cinder volumes
    'swap': A swap disk, or None if there is no swap disk
  }

The disks are represented in one of 2 ways, which depends on the specific driver currently in use. There’s the ’new’ representation, used by the libvirt and vmwareAPI drivers, and the ’legacy’ representation used by all other drivers. The legacy representation is a plain dict. It does not contain the same information as the new representation.

The new representation involves subclasses of nova.block_device.DriverBlockDevice. As well as containing different fields, the new representation significantly also retains a reference to the underlying BDM object. This means that by manipulating the DriverBlockDevice object, the driver is able to persist data to the BDM object in the db.


NOTE

Common usage is to pull block_device_mapping out of this dict into a variable called block_device_mapping. This is not a BlockDeviceMapping object, or list of them.


NOTE

If block_device_info was passed to the driver by compute manager, it was probably generated by _get_instance_block_device_info(). By default, this function filters out all cinder volumes from block_device_mapping which don’t currently have connection_info. In other contexts this filtering will not have happened, and block_device_mapping will contain all volumes.


NOTE

Unlike BDMs, block_device_info does not currently represent all disks that an instance might have. Significantly, it will not contain any representation of an image-backed local disk, i.e. the root disk of a typical instance which isn’t boot-from-volume. Other representations used by the libvirt driver explicitly reconstruct this missing disk.


libvirt

instance_disk_info

The virt driver API defines a method get_instance_disk_info, which returns a JSON blob. The compute manager calls this and passes the data over RPC between calls without ever looking at it. This is driver-specific opaque data. It is also only used by the libvirt driver, despite being part of the API for all drivers. Other drivers do not return any data. The most interesting aspect of instance_disk_info is that it is generated from the libvirt XML, not from nova’s state.

.. note:: instance_disk_info is often named disk_info in code, which is unfortunate as this clashes with the normal naming of the next structure. Occasionally the two are used in the same block of code.

instance_disk_info is a list of dicts for some of an instance’s disks.

.. note:: rbd disks (including non-volume disks) and cinder volumes are not included in instance_disk_info.

Each dicts contains the following:

  {
    'type': libvirt's notion of the disk's type
    'path': libvirt's notion of the disk's path
    'virt_disk_size': The disk's virtual size in bytes (the size the guest OS sees)
    'backing_file': libvirt's notion of the backing file path
    'disk_size': The file size of path, in bytes.
    'over_committed_disk_size': As-yet-unallocated disk size, in bytes.
  }

disk_info

.. note:: As opposed to instance_disk_info, which is frequently called disk_info.

This data structure is actually described pretty well in the comment block at the top of nova.virt.libvirt.blockinfo. It is internal to the libvirt driver. It contains:

  {
    'disk_bus': the default bus used by disks
    'cdrom_bus': the default bus used by cdrom drives
    'mapping': defined below
  }

mapping is a dict which maps disk names to a dict describing how that disk should be passed to libvirt. This mapping contains every disk connected to the instance, both local and volumes.

First, a note on disk naming. Local disk names used by the libvirt driver are well defined. They are:

  • disk: The root disk
  • disk.local: The flavor-defined ephemeral disk
  • disk.ephX: Where X is a zero-based index for BDM defined ephemeral disks
  • disk.swap: The swap disk
  • disk.config: The config disk

These names are hardcoded, reliable, and used in lots of places.

In disk_info, volumes are keyed by device name, eg ‘vda’, ‘vdb’. Different buses will be named differently, approximately according to legacy Linux device naming.

Additionally, disk_info will contain a mapping for ‘root’, which is the root disk. This will duplicate one of the other entries, either ‘disk’ or a volume mapping.

Each dict within the mapping dict contains the following 3 required fields of bus, dev and type with two optional fields of format and boot_index

  {
    'bus': the guest bus type ('ide', 'virtio', 'scsi', etc)
    'dev': the device name 'vda', 'hdc', 'sdf', 'xvde' etc
    'type': type of device eg 'disk', 'cdrom', 'floppy'
    'format': Which format to apply to the device if applicable
    'boot_index': Number designating the boot order of the device
  }

NOTE

BlockDeviceMapping and DriverBlockDevice store boot index zero-based. However, libvirt’s boot index is 1-based, so the value stored here is 1-based.


Contents

comments powered by Disqus