File sharing over https with Caddy and Podman

Randomly this week I’ve found myself needing to share a local build of Cirros with a simple but unreleased fix included to unblock some testing upstream in OpenStack Nova’s CI.

I’ll admit that it has been a while since I’ve had to personally run any kind of web service (this blog and versions before it being hosted on Gitlab and Github pages) and so I didn’t really know where to start.

I knew that I wanted something that was both containerised and able to automatically sort out letsencrypt certs so I could serve the build over https. After trying and failing to get anywhere with the default httpd and nginx container images I gave up and asked around on Twitter.

Shortly after a friend recommended Caddy an Apache 2.0 web server written in Go with automatic https configuration, exactly what I was after.

As I said before I wanted a containerised web service to run on my Fedora based VPS, ensuring I didn’t end up with a full service running directly on the host that would be a pain to remove later. Thankfully Caddy has an offical image on DockerHub I could pull and use with Podman, the daemonless container engine and Docker replacement on Fedora.

Now it is possible to run podman containers under a non-root user but as I wanted to run the service under ports 80 and 443 I had to launch the container using root. In theory you could adjust your host config to allow non-root users to create ports below 1024 via net.ipv4.ip_unprivileged_port_start=$start_port or just use higher ports to avoid this but I don’t mind running the service as root given the files are being shared via a read-only volume anyway.

To launch Caddy I used the following commands, obviously with podman already installed and configured.

$ sudo mkdir /run/caddy_data
$ sudo podman run -d -p 80:80 -p 443:443 \
                     -v $PATH_TO_SHARE:/srv:ro \
                     -v /run/caddy_data:/data \
                     --name caddy \
                     docker.io/library/caddy:latest \
                     caddy file-server --domain $domain --browse --root /srv 
$ sudo podman ps
CONTAINER ID  IMAGE                           COMMAND               CREATED       STATUS           PORTS                                     NAMES
65f82d224dde  docker.io/library/caddy:latest  caddy file-server...  35 hours ago  Up 35 hours ago  0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp  caddy

To ensure the service remained active I also created a systemd service.

$ sudo podman generate systemd -n caddy > /etc/systemd/system/container-caddy.service

Note that at the time of writing on Fedora 33 you need to workaround Podman issue #8369, replacing the use of /var/run with /run.

$ sudo sed -i 's/\/var\/run/\/run/g' container-caddy.service

Once that’s done you can enable and start the service just like any other systemd service.

$ sudo systemctl enable container-caddy.service
$ sudo systemctl start container-caddy.service
$ sudo systemctl status container-caddy.service
● container-caddy.service - Podman container-caddy.service
     Loaded: loaded (/etc/systemd/system/container-caddy.service; enabled; vendor preset: disabled)
     Active: active (running) since Fri 2021-03-05 08:38:32 GMT; 9min ago
       Docs: man:podman-generate-systemd(1)
    Process: 232308 ExecStart=/usr/bin/podman start caddy (code=exited, status=0/SUCCESS)
   Main PID: 230641 (conmon)
      Tasks: 0 (limit: 9496)
     Memory: 892.0K
        CPU: 129ms
     CGroup: /system.slice/container-caddy.service
             ‣ 230641 /usr/bin/conmon --api-version 1 -c 65f82d224dde0ba6342ac4e1b2e9b61d26d2f7efe887fb08b18632cf849a1ace -u 65f82d224dde0ba6342a>

Mar 05 08:38:32 $domain systemd[1]: Starting Podman container-caddy.service...
Mar 05 08:38:32 $domain systemd[1]: Started Podman container-caddy.service.

As we used the --browse flag when launching Caddy you should now find a simple index page listing your files.

That’s it, simple, fast and hopefully easy to cleanup when I no longer need to share these files. Maybe I should ask for suggestions on Twitter more often!

Read More

OpenStack Nova Block Device Mapping Data Structures

Block Device Mapping?

Block Device Mapping(s)(BDMs) define how block devices are exposed to an instance by Nova. At present Nova accepts and stores data relating to these mappings in various ways across the codebase often leading to the five stages of grief being experienced by anyone unlucky enough to come into contact with them.

The user facing format while awkward and unforgiving is pretty well documented both in the API reference guide and project user documentation. The internal data structures used by the nova.computeand nova.virt layers are not however well documented outside of some limited code comments.

The following post is taken from a OpenStack Nova reference document I’m currently working on to correct this. This is based on an email sent by my colleague Matthew Booth almost 5 years ago now but is still as relevant today as it was then.

I’m personally working on this document and posting this blog post now as I plan to spend the rest of the OpenStack Wallaby cycle working on adding flavor and image defined ephemeral storage encryption support into the libvirt driver. As part of this work I’ve had to dive into some of these data structures again and wanted to document things ahead of any changes required by this work. I also guess it has been a while since I posted anything of value on this blog so what the hell.

I’ll aim to keep this post updated over the coming weeks and will add a reference to the published document once merged.

Generic

BlockDeviceMapping

The top level data structure is the nova.objects.block_device.BlockDeviceMapping(BDM) object. It is a NovaObject, persisted in the db. Current code creates a BDM object for every disk associated with an instance, whether it is a volume or not.

The BDM object describes properties of each disk as specified by the user. It is initially from a user request, for more details on the format of these requests please see the Block Device Mapping in Nova document.

The Compute API transforms and consolidates all BDMs to ensure that all disks, explicit or implicit, have a BDM, then persists them. Look in nova.objects.block_device for all BDM fields, but in essence they contain information like (source_type=‘image’, destination_type=‘local’, image_id='<image uuid'>), or equivalents describing ephemeral disks, swap disks or volumes, and some associated data.


NOTE

BDM objects are typically stored in variables called bdm with lists in bdms, although this is obviously not guaranteed (and unfortunately not always true: bdm in libvirt.block_device is usually a DriverBlockDevice object). This is a useful reading aid (except when it’s proactively confounding), as there is also something else typically called ‘block_device_mapping’ which is not a BlockDeviceMapping object.


block_device_info

Drivers do not directly use BDM objects. Instead, they are transformed into a different driver-specific representation. This representation is normally called block_device_info, and is generated by virt.driver.get_block_device_info(). Its output is based on data in BDMs. block_device_info is a dict containing:


  {
    'root_device_name': hypervisor's notion of the root device's name
    'ephemerals': A list of all ephemeral disks
    'block_device_mapping': A list of all cinder volumes
    'swap': A swap disk, or None if there is no swap disk
  }

The disks are represented in one of 2 ways, which depends on the specific driver currently in use. There’s the ‘new’ representation, used by the libvirt and vmwareAPI drivers, and the ‘legacy’ representation used by all other drivers. The legacy representation is a plain dict. It does not contain the same information as the new representation.

The new representation involves subclasses of nova.block_device.DriverBlockDevice. As well as containing different fields, the new representation significantly also retains a reference to the underlying BDM object. This means that by manipulating the DriverBlockDevice object, the driver is able to persist data to the BDM object in the db.


NOTE

Common usage is to pull block_device_mapping out of this dict into a variable called block_device_mapping. This is not a BlockDeviceMapping object, or list of them.


NOTE

If block_device_info was passed to the driver by compute manager, it was probably generated by _get_instance_block_device_info(). By default, this function filters out all cinder volumes from block_device_mapping which don’t currently have connection_info. In other contexts this filtering will not have happened, and block_device_mapping will contain all volumes.


NOTE

Unlike BDMs, block_device_info does not currently represent all disks that an instance might have. Significantly, it will not contain any representation of an image-backed local disk, i.e. the root disk of a typical instance which isn’t boot-from-volume. Other representations used by the libvirt driver explicitly reconstruct this missing disk.


libvirt

instance_disk_info

The virt driver API defines a method get_instance_disk_info, which returns a JSON blob. The compute manager calls this and passes the data over RPC between calls without ever looking at it. This is driver-specific opaque data. It is also only used by the libvirt driver, despite being part of the API for all drivers. Other drivers do not return any data. The most interesting aspect of instance_disk_info is that it is generated from the libvirt XML, not from nova’s state.

.. note:: instance_disk_info is often named disk_info in code, which is unfortunate as this clashes with the normal naming of the next structure. Occasionally the two are used in the same block of code.

instance_disk_info is a list of dicts for some of an instance’s disks.

.. note:: rbd disks (including non-volume disks) and cinder volumes are not included in instance_disk_info.

Each dicts contains the following:

  {
    'type': libvirt's notion of the disk's type
    'path': libvirt's notion of the disk's path
    'virt_disk_size': The disk's virtual size in bytes (the size the guest OS sees)
    'backing_file': libvirt's notion of the backing file path
    'disk_size': The file size of path, in bytes.
    'over_committed_disk_size': As-yet-unallocated disk size, in bytes.
  }

disk_info

.. note:: As opposed to instance_disk_info, which is frequently called disk_info.

This data structure is actually described pretty well in the comment block at the top of nova.virt.libvirt.blockinfo. It is internal to the libvirt driver. It contains:

  {
    'disk_bus': the default bus used by disks
    'cdrom_bus': the default bus used by cdrom drives
    'mapping': defined below
  }

mapping is a dict which maps disk names to a dict describing how that disk should be passed to libvirt. This mapping contains every disk connected to the instance, both local and volumes.

First, a note on disk naming. Local disk names used by the libvirt driver are well defined. They are:

  • disk: The root disk
  • disk.local: The flavor-defined ephemeral disk
  • disk.ephX: Where X is a zero-based index for BDM defined ephemeral disks
  • disk.swap: The swap disk
  • disk.config: The config disk

These names are hardcoded, reliable, and used in lots of places.

In disk_info, volumes are keyed by device name, eg ‘vda’, ‘vdb’. Different buses will be named differently, approximately according to legacy Linux device naming.

Additionally, disk_info will contain a mapping for ‘root’, which is the root disk. This will duplicate one of the other entries, either ‘disk’ or a volume mapping.

Each dict within the mapping dict contains the following 3 required fields of bus, dev and type with two optional fields of format and boot_index

  {
    'bus': the guest bus type ('ide', 'virtio', 'scsi', etc)
    'dev': the device name 'vda', 'hdc', 'sdf', 'xvde' etc
    'type': type of device eg 'disk', 'cdrom', 'floppy'
    'format': Which format to apply to the device if applicable
    'boot_index': Number designating the boot order of the device
  }


NOTE

BlockDeviceMapping and DriverBlockDevice store boot index zero-based. However, libvirt’s boot index is 1-based, so the value stored here is 1-based.


Read More

Openstack Nova Victoria Cycle Gerrit Dashboards

As in previous cycles I’ve updated some of the Nova specific dashboards available within the excellent gerrit-dash-creator project and started using them prior to dropping offline on paternity.

I’d really like to see more use of these dashboards within Nova to help focus our limited review bandwidth on active and mergeable changes so if you do have any ideas please fire off reviews and add me in!

For now I’ve linked to some of the dashboards I’ve been using most often below with a brief summary and dump of the current .dash logic used by the gerrit-dash-creator tooling to build the Gerrit dashboard URLs.

nova-specs

The openstack/nova-specs repo contains Nova design specifications associated with both the previous and current development release. This dashboard specifically targets the current development release as we should only see reviews landing in gerrit referring to this release at present.

[dashboard]
title = Nova Specs - Victoria
description = Review Inbox
foreach = project:openstack/nova-specs status:open NOT label:Workflow<=-1 branch:master NOT owner:self

[section "You are a reviewer, but haven't voted in the current revision"]
query = file:^specs/victoria/.* NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self label:Verified>=1,zuul

[section "Not blocked by -2s"]
query = file:^specs/victoria/.* NOT label:Code-Review<=-2 NOT label:Code-Review>=2 NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self label:Verified>=1,zuul

[section "No votes and spec is > 1 week old"]
query = file:^specs/victoria/.* NOT label:Code-Review>=-2 age:7d label:Verified>=1,zuul

[section "Needs final +2"]
query = file:^specs/victoria/.* label:Code-Review>=2 NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self label:Verified>=1,zuul  NOT label:workflow>=1

[section "Broken Specs (doesn't pass Zuul)"]
query = file:^specs/victoria/.* label:Verified<=-1,zuul

[section "Dead Specs (blocked by a -2)"]
query = file:^specs/victoria/.* label:Code-Review<=-2

[section "Dead Specs (Not Proposed for Victoria)"]
query = NOT file:^specs/victoria/.* file:^specs/.*

[section "Not Specs (tox.ini etc)"]
query = NOT file:^specs/.*

nova-libvirt

I introduced this dashboard after the creation of the libvirt subteam recently during the U cycle. As you can see from the foreach filter the dashboard only lists changes touching the standard set of libvirt driver related files within the openstack/nova codebase. IMHO I think a dashboard for non-libvirt drivers would also be useful.

[dashboard]
title = Nova Libvirt Driver Review Inbox
description = Review Inbox for the Nova Libvirt Driver
foreach =  project:openstack/nova 
           status:open
           NOT owner:self
           NOT label:Workflow<=-1
           label:Verified>=1,zuul
           NOT reviewedby:self
           branch:master
           (file:^nova/virt/libvirt/.* OR file:^nova/tests/unit/libvirt/.* OR file:^nova/tests/functional/libvirt/.*)

[section "Small patches"]
query = NOT label:Code-Review>=2,self NOT label:Code-Review<=-1,nova-core NOT message:"DNM" delta:<=10

[section "Needs final +2"]
query = NOT label:Code-Review>=2,self label:Code-Review>=2 limit:50 NOT label:workflow>=1

[section "Bug fix, Passed Zuul, No Negative Feedback"]
query = NOT label:Code-Review>=2,self NOT label:Code-Review<=-1,nova-core message:"bug: " limit:50

[section "Wayward Changes (Changes with no code review in the last two days)"]
query = NOT label:Code-Review<=-1 NOT label:Code-Review>=1 age:2d limit:50

[section "Needs feedback (Changes older than 5 days that have not been reviewed by anyone)"]
query = NOT label:Code-Review<=-1 NOT label:Code-Review>=1 age:5d limit:50

[section "Passed Zuul, No Negative Feedback"]
query = NOT label:Code-Review>=2 NOT label:Code-Review<=-1 limit:50

[section "Needs revisit (You were a reviewer but haven't voted in the current revision)"]
query = reviewer:self limit:50

nova-stable

I have been a Nova Stable Core for a few years now and during the time I have relied heavily on Gerrit dashboards and queries to help keep track of changes as they move through our many stable branches. This has been made slightly more complex by the introduction of extended-maintenance branches but more on that below. For now this dashboard covers the ussuri, train and stein stable branches.

I’m currently using the by branch Nova stable dashboards as these allow me to track changes through each required branch easily without any additional clicking within Gerrit. There is however an allinone dashboard if you prefer that approach.

Finally, for anyone paying attention you might have noticed I’m also using a nova-merged query in Gerrit to track recently merged changes into master. This has helped me catch and proactively backport useful fixes to stable many times.

[dashboard]
title = Nova Stable Maintenance Review Inbox
description = Review Inbox
foreach = (project:openstack/nova OR project:openstack/python-novaclient) status:open NOT owner:self NOT label:Workflow<=-1 label:Verified>=1,zuul NOT reviewedby:self

[section " stable/ussuri You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/ussuri

[section "stable/ussuri Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/ussuri

[section "stable/ussuri Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/ussuri

[section " stable/train You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/train

[section "stable/train Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/train

[section "stable/train Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/train

[section " stable/stein You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/stein

[section "stable/stein Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/stein

[section "stable/stein Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/stein

nova-em

In addition to the nova-stable dashboard above I also have a dashboard for our extended-maintenance branches. At present these are (or are about to be) rocky, queens and pike.

[dashboard]
title = Nova Extended Maintenance Review Inbox
description = Review Inbox
foreach = (project:openstack/nova OR project:openstack/python-novaclient) status:open NOT owner:self NOT label:Workflow<=-1 label:Verified>=1,zuul NOT reviewedby:self

[section " stable/rocky You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/rocky

[section "stable/rocky Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/rocky

[section "stable/rocky Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/rocky

[section " stable/queens You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/queens

[section "stable/queens Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/queens

[section "stable/queens Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/queens

[section " stable/pike You are a reviewer, but haven't voted in the current revision"]
query = NOT label:Code-Review<=-1,self NOT label:Code-Review>=1,self reviewer:self branch:stable/pike

[section "stable/pike Needs final +2"]
query = label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 NOT label:workflow>=1 branch:stable/pike

[section "stable/pike Passed Zuul, No Negative Core Feedback"]
query = NOT label:Code-Review>=2 NOT(reviewerin:stable-maint-core label:Code-Review<=-1) limit:50 branch:stable/pike

Read More

OpenStack TripleO FFU M2 progress report

http://lists.openstack.org/pipermail/openstack-dev/2017-December/125321.html

This is a brief progress report from the Upgrades squad for the fast-forward upgrades (FFU) feature in TripleO, introducing N to Q upgrades.

tl;dr Good initial progress, missed M2 goal of nv CI jobs, pushing on to M3.

Overview

For anyone unfamiliar with the concept of fast-forward upgrades the following sentence from the spec gives a brief high level introduction:

> Fast-forward upgrades are upgrades that move an environment from release `N`
> to `N+X` in a single step, where `X` is greater than `1` and for fast-forward
> upgrades is typically `3`.

The spec itself obviously goes into more detail and I’d recommend anyone wanting to know more about our approach for FFU in TripleO to start there:

https://specs.openstack.org/openstack/tripleo-specs/specs/queens/fast-forward-upgrades.html

Note that the spec is being updated at present by the following change, introducing more details on the FFU task layout, ordering, dependency on the on-going major upgrade rework in Q, canary compute validation etc:

WIP ffu: Spec update for M2 https://review.openstack.org/#/c/526353/

M2 Status

The original goal for Queens M2 was to have one or more non-voting FFU jobs deployed somewhere able to run through the basic undercloud and overcloud upgrade workflows, exercising as many compute service dependencies as we could up to and including Nova. Unfortunately while Sofer has made some great progress with this we do not have any running FFU jobs at present:

http://lists.openstack.org/pipermail/openstack-dev/2017-December/125316.html

We do however have documented demos that cover FFU for some limited overcloud environments from Newton to Queens:

OpenStack TripleO FFU Keystone Demo N to Q https://blog.yarwood.me.uk/2017/11/16/openstack_fastforward_tripleo_keystone/

OpenStack TripleO FFU Nova Demo N to Q https://blog.yarwood.me.uk/2017/12/01/openstack_fastforward_tripleo_nova/

These demos currently use a stack of changes against THT with the first ~4 or so changes introducing the FFU framework:

https://review.openstack.org/#/q/status:open+project:openstack/tripleo-heat-templates+branch:master+topic:bp/fast-forward-upgrades

FWIW getting these initial changes merged would help avoid the current change storm every time this series is rebased to pick up upgrade or deploy related bug fixes.

Also note that the demos currently use the raw Ansible playbooks stack outputs to run through the FFU tasks, upgrade tasks and deploy tasks. This is by no means what the final UX will be, with python-tripleoclient and workflow work to be completed ahead of M3.

M3 Goals

The squad will be focusing on the following goals for M3:

  • Non-voting RDO CI jobs defined and running
  • FFU THT changes tested by the above jobs and merged
  • python-tripleoclient & required Mistral workflows merged
  • Use of ceph-ansible for Ceph upgrades
  • Draft developer and user docs under review

FFU squad

Finally, a quick note to highlight that this report marks the end of my own personal involvement with the FFU feature in TripleO. I’m not going far, returning to work on Nova and happy to make time to talk about and review FFU related changes etc. The members of the upgrade squad taking this forward and your main points of contact for FFU in TripleO will be:

  • Sofer (chem)
  • Lukas (social)
  • Marios (marios)

My thanks again to Sofer, Lukas, Marios, the rest of the upgrade squad and wider TripleO community for your guidance and patience when putting up with my constant inane questioning regarding FFU over the past few months!

Read More

OpenStack TripleO FFU Nova Demo N to Q

Update 04/12/17 : The initial deployment documented in this demo no longer works due to the removal of a number of plan migration steps that have now been promoted into the Queens repos. We are currently looking into ways to reintroduce these for use in master UC Newton OC FFU development deployments, until then anyone attempting to run through this demo should start with a Newton OC and UC before upgrading the UC to master.

This is another TripleO fast-forward upgrade demo post, this time focusing on a basic stack of Keystone, Glance, Cinder, Neutron and Nova. At present there are several workarounds still required to allow the upgrade to complete, please see the workaround sections for more details.

Environment

As with the original demo I’m still using tripleo-quickstart to deploy my initial environment, this time with 1 controller and 1 compute, with a Queens undercloud and Newton overcloud. In addition I’m also using a new general config to deploy a minimal control stack able to host Nova.

$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/minimal-nova.yml $VIRTHOST

UC - docker_registry.yaml

Again with this demo we are not caching containers locally, the following command will create a docker_registry.yaml file referencing the RDO registry for use during the final deployment of the overcloud to Queens:

$ ssh -F $WD/ssh.config.ansible undercloud
$ openstack overcloud container image prepare \
  --namespace trunk.registry.rdoproject.org/master \
  --tag tripleo-ci-testing \
  --output-env-file ~/docker_registry.yaml

UC - tripleo-heat-templates

We then need to update the version of tripleo-heat-templates deployed on the undercloud host:

$ ssh -F $WD/ssh.config.ansible undercloud
$ cd /home/stack/tripleo-heat-templates
$ git fetch git://git.openstack.org/openstack/tripleo-heat-templates refs/changes/19/518719/9 && git checkout FETCH_HEAD

Finally, as we are using a customised controller role the following services need to be added to the overcloud_services.yml file on the undercloud node under ControllerServices:

parameter_defaults:
  ControllerServices:
[..]
       - OS::TripleO::Services::Docker
       - OS::TripleO::Services::Iscsid
       - OS::TripleO::Services::NovaPlacement

UC - tripleo-common

At present we are waiting for a promotion of tripleo-common that includes various bugfixes when updating the overcloud stack, generating outputs etc. For the time being we can simply install directly from master to workaround these issues.

$ ssh -F $WD/ssh.config.ansible undercloud
$ git clone https://github.com/openstack/tripleo-common.git ; cd tripleo-common
$ sudo python setup.py install ; cd ~

OC - Update heat-agents

As documented in my previous demo post we need to remove any legacy heiradata from all overcloud hosts prior to updating the heat stack:

$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml \
             /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles \
             /etc/puppet/hieradata/*.yaml

We also need to update the heat-agents on all nodes to their Ocata versions:

$ git clone https://github.com/openstack/tripleo-repos.git ; cd tripleo-repos
$ sudo python setup.py install
$ sudo tripleo-repos -b ocata current
$ sudo yum update -y python-heat-agent \
                     python-heat-agent-puppt
$ sudo yum install -y openstack-heat-agents \
                      python-heat-agent-ansible \
                      python-heat-agent-apply-config \
                      python-heat-agent-docker-cmd \
                      python-heat-agent-hiera \
                      python-heat-agent-json-file 

OC - Workarounds #1

$ sudo yum remove openstack-ceilometer* -y

UC - Update stack outputs

With the workarounds in place we can now update the stack using the updated version of tripleo-heat-templates on the undercloud. Once again we need to use the original deploy command with a number of additional environment files included:

$ . stackrc
$ openstack overcloud deploy \
  --templates /home/stack/tripleo-heat-templates \
[..]
  -e /home/stack/docker_registry.yaml \
  -e /home/stack/tripleo-heat-templates/environments/docker.yaml \
  -e /home/stack/tripleo-heat-templates/environments/fast-forward-upgrade.yaml \
  -e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml

UC - Download config

Once the stack has been updated we can download the config with the following command:

$ . stackrc
$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-Oalkee-config

UC - FFU and Upgrade plays

Before running through any of the generated playbooks I personally like to add the profile_tasks callback to the callback_whitelist for Ansible within /etc/ansible/ansible.cfg. This provides timestamps during the playbook run and a summary of the slowest tasks at the end.

# enable callback plugins, they can output to stdout but cannot be 'stdout' type.
callback_whitelist = profile_tasks

We first run the fast_forward_upgrade_playbook to complete the upgrade to Pike:

$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
     /home/stack/config/tripleo-jUY9FB-config/fast_forward_upgrade_playbook.yaml 
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11              : ok=62   changed=8    unreachable=0    failed=0   
192.168.24.16              : ok=123  changed=55   unreachable=0    failed=0   

Friday 01 December 2017  20:39:58 +0000 (0:00:03.967)       0:06:16.615 ******* 
=============================================================================== 
Stop neutron_server ------------------------------------------------------------------------------------------------------------ 32.53s
stop openstack-cinder-volume --------------------------------------------------------------------------------------------------- 16.03s
Stop neutron_l3_agent ---------------------------------------------------------------------------------------------------------- 14.36s
Stop and disable nova-compute service ------------------------------------------------------------------------------------------ 13.16s
Cinder package update ---------------------------------------------------------------------------------------------------------- 12.73s
stop openstack-cinder-scheduler ------------------------------------------------------------------------------------------------ 12.68s
Setup cell_v2 (sync nova/cell DB) ---------------------------------------------------------------------------------------------- 11.79s
Cinder package update ---------------------------------------------------------------------------------------------------------- 11.30s
Neutron package update --------------------------------------------------------------------------------------------------------- 10.99s
Keystone package update -------------------------------------------------------------------------------------------------------- 10.77s
glance package update ---------------------------------------------------------------------------------------------------------- 10.28s
Keystone package update --------------------------------------------------------------------------------------------------------- 9.80s
glance package update ----------------------------------------------------------------------------------------------------------- 8.72s
Neutron package update ---------------------------------------------------------------------------------------------------------- 8.62s
Stop and disable nova-consoleauth service --------------------------------------------------------------------------------------- 7.94s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.62s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.24s
Stop and disable nova-scheduler service ----------------------------------------------------------------------------------------- 6.36s
Run puppet apply to set tranport_url in nova.conf ------------------------------------------------------------------------------- 5.78s
install tripleo-repos ----------------------------------------------------------------------------------------------------------- 4.70s

We then run the upgrade_steps_playbook to start the upgrade to Queens:

$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
    /home/stack/tripleo-Oalkee-config/upgrade_steps_playbook.yaml
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11              : ok=57   changed=45   unreachable=0    failed=0   
192.168.24.16              : ok=165  changed=146  unreachable=0    failed=0   

Friday 01 December 2017  20:51:55 +0000 (0:00:00.038)       0:10:47.865 ******* 
=============================================================================== 
Update all packages ----------------------------------------------------------------------------------------------------------- 263.71s
Update all packages ----------------------------------------------------------------------------------------------------------- 256.79s
Install docker packages on upgrade if missing ---------------------------------------------------------------------------------- 13.77s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.71s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.12s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.36s
Install docker packages on upgrade if missing ----------------------------------------------------------------------------------- 3.14s
Stop and disable mysql service -------------------------------------------------------------------------------------------------- 1.97s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.66s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.57s
Stop keepalived service --------------------------------------------------------------------------------------------------------- 1.48s
Stop and disable rabbitmq service ----------------------------------------------------------------------------------------------- 1.47s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.31s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.08s
Check if openstack-ceilometer-compute is deployed ------------------------------------------------------------------------------- 0.70s
Check if iscsid service is deployed --------------------------------------------------------------------------------------------- 0.67s
Start keepalived service -------------------------------------------------------------------------------------------------------- 0.48s
Check for nova placement running under apache ----------------------------------------------------------------------------------- 0.46s
Stop and disable mongodb service on upgrade ------------------------------------------------------------------------------------- 0.45s
remove old cinder cron jobs ----------------------------------------------------------------------------------------------------- 0.45s

OC - Workarounds #2

On overcloud-novacompute-0 the following file needs to be removed to workaround a known issue:

$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo rm /etc/iscsi/.initiator_reset

UC - Deploy play

Finally we run through the deploy_steps_playbook:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
    /home/stack/tripleo-Oalkee-config/deploy_steps_playbook.yaml
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11              : ok=48   changed=11   unreachable=0    failed=0   
192.168.24.16              : ok=76   changed=10   unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=0   

Friday 01 December 2017  21:04:58 +0000 (0:00:00.041)       0:10:24.723 ******* 
=============================================================================== 
Run docker-puppet tasks (generate config) ------------------------------------------------------------------------------------- 186.65s
Run docker-puppet tasks (bootstrap tasks) ------------------------------------------------------------------------------------- 101.10s
Start containers for step 3 ---------------------------------------------------------------------------------------------------- 98.61s
Start containers for step 4 ---------------------------------------------------------------------------------------------------- 41.37s
Run puppet host configuration for step 1 --------------------------------------------------------------------------------------- 32.53s
Start containers for step 1 ---------------------------------------------------------------------------------------------------- 25.76s
Run puppet host configuration for step 5 --------------------------------------------------------------------------------------- 17.91s
Run puppet host configuration for step 4 --------------------------------------------------------------------------------------- 14.47s
Run puppet host configuration for step 3 --------------------------------------------------------------------------------------- 13.41s
Run docker-puppet tasks (bootstrap tasks) -------------------------------------------------------------------------------------- 10.39s
Run puppet host configuration for step 2 --------------------------------------------------------------------------------------- 10.37s
Start containers for step 5 ---------------------------------------------------------------------------------------------------- 10.12s
Run docker-puppet tasks (bootstrap tasks) --------------------------------------------------------------------------------------- 9.78s
Start containers for step 2 ----------------------------------------------------------------------------------------------------- 6.32s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 4.37s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.46s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.80s
create libvirt persistent data directories -------------------------------------------------------------------------------------- 1.21s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.03s
Check if /var/lib/docker-puppet/docker-puppet-tasks4.json exists ---------------------------------------------------------------- 1.00s

Verification

I’ll revisit this in the coming days and add a more complete set of tasks to verify the end environment but for now we can run a simple boot from volume instance (as Swift, the default store for Glance was not installed):

$ cinder create 1
$ cinder set-bootable 46d278f7-31fc-4e45-b5df-eb8220800b1a true
$ nova flavor-create 1 1 512 1 1
$ nova boot --boot-volume 46d278f7-31fc-4e45-b5df-eb8220800b1a --flavor 1 test 
[..]
$ nova list
+--------------------------------------+------+--------+------------+-------------+-------------------+
| ID                                   | Name | Status | Task State | Power State | Networks          |
+--------------------------------------+------+--------+------------+-------------+-------------------+
| 05821616-1239-4ca9-8baa-6b0ca4ea3a6b | test | ACTIVE | -          | Running     | priv=192.168.0.16 |
+--------------------------------------+------+--------+------------+-------------+-------------------+

We can also see the various containerised services running on the overcloud:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                                             COMMAND                  CREATED             STATUS                      PORTS               NAMES
d80d6f072604        trunk.registry.rdoproject.org/master/centos-binary-glance-api:tripleo-ci-testing                  "kolla_start"            13 minutes ago      Up 12 minutes (healthy)                         glance_api
61fbf47241ce        trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing                    "kolla_start"            13 minutes ago      Up 13 minutes                                   nova_metadata
9defdb5efe0f        trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing                    "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_api
874716d99a44        trunk.registry.rdoproject.org/master/centos-binary-nova-novncproxy:tripleo-ci-testing             "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_vnc_proxy
21ca0fd8d8ec        trunk.registry.rdoproject.org/master/centos-binary-neutron-server:tripleo-ci-testing              "kolla_start"            13 minutes ago      Up 13 minutes                                   neutron_api
e0eed85b860a        trunk.registry.rdoproject.org/master/centos-binary-cinder-volume:tripleo-ci-testing               "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         cinder_volume
0882e08ac198        trunk.registry.rdoproject.org/master/centos-binary-nova-consoleauth:tripleo-ci-testing            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_consoleauth
e3ebc4b066c9        trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing                    "kolla_start"            13 minutes ago      Up 13 minutes                                   nova_api_cron
c7d05a04a8a3        trunk.registry.rdoproject.org/master/centos-binary-cinder-api:tripleo-ci-testing                  "kolla_start"            13 minutes ago      Up 13 minutes                                   cinder_api_cron
2f3c1e244997        trunk.registry.rdoproject.org/master/centos-binary-neutron-openvswitch-agent:tripleo-ci-testing   "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_ovs_agent
bfeb120bf77a        trunk.registry.rdoproject.org/master/centos-binary-neutron-metadata-agent:tripleo-ci-testing      "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_metadata_agent
43b2c09aecf8        trunk.registry.rdoproject.org/master/centos-binary-nova-scheduler:tripleo-ci-testing              "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_scheduler
a7a3024b63f6        trunk.registry.rdoproject.org/master/centos-binary-neutron-dhcp-agent:tripleo-ci-testing          "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_dhcp
3df990a68046        trunk.registry.rdoproject.org/master/centos-binary-cinder-scheduler:tripleo-ci-testing            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         cinder_scheduler
94461ba833aa        trunk.registry.rdoproject.org/master/centos-binary-neutron-l3-agent:tripleo-ci-testing            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_l3_agent
4bee34f9fce2        trunk.registry.rdoproject.org/master/centos-binary-cinder-api:tripleo-ci-testing                  "kolla_start"            13 minutes ago      Up 13 minutes                                   cinder_api
e8bec9348fe3        trunk.registry.rdoproject.org/master/centos-binary-nova-conductor:tripleo-ci-testing              "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_conductor
22db40c25881        trunk.registry.rdoproject.org/master/centos-binary-keystone:tripleo-ci-testing                    "/bin/bash -c '/usr/l"   15 minutes ago      Up 15 minutes                                   keystone_cron
26769acaaf5e        trunk.registry.rdoproject.org/master/centos-binary-keystone:tripleo-ci-testing                    "kolla_start"            16 minutes ago      Up 16 minutes (healthy)                         keystone
99037a5e5c36        trunk.registry.rdoproject.org/master/centos-binary-iscsid:tripleo-ci-testing                      "kolla_start"            16 minutes ago      Up 16 minutes                                   iscsid
9f4aae72c201        trunk.registry.rdoproject.org/master/centos-binary-nova-placement-api:tripleo-ci-testing          "kolla_start"            16 minutes ago      Up 16 minutes                                   nova_placement
311302abc297        trunk.registry.rdoproject.org/master/centos-binary-horizon:tripleo-ci-testing                     "kolla_start"            16 minutes ago      Up 16 minutes                                   horizon
d465e4f5b7e6        trunk.registry.rdoproject.org/master/centos-binary-mariadb:tripleo-ci-testing                     "kolla_start"            17 minutes ago      Up 17 minutes (unhealthy)                       mysql
b9e062f1d857        trunk.registry.rdoproject.org/master/centos-binary-rabbitmq:tripleo-ci-testing                    "kolla_start"            18 minutes ago      Up 18 minutes (healthy)                         rabbitmq
a57f053afc03        trunk.registry.rdoproject.org/master/centos-binary-memcached:tripleo-ci-testing                   "/bin/bash -c 'source"   18 minutes ago      Up 18 minutes                                   memcached
baeb6d1087e6        trunk.registry.rdoproject.org/master/centos-binary-redis:tripleo-ci-testing                       "kolla_start"            18 minutes ago      Up 18 minutes                                   redis
faafa1bf2d2e        trunk.registry.rdoproject.org/master/centos-binary-haproxy:tripleo-ci-testing                     "kolla_start"            18 minutes ago      Up 18 minutes                                   haproxy
$ exit
$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                                             COMMAND             CREATED             STATUS                    PORTS               NAMES
0363d7008e87        trunk.registry.rdoproject.org/master/centos-binary-neutron-openvswitch-agent:tripleo-ci-testing   "kolla_start"       12 minutes ago      Up 12 minutes (healthy)                       neutron_ovs_agent
c1ff23ee9f16        trunk.registry.rdoproject.org/master/centos-binary-cron:tripleo-ci-testing                        "kolla_start"       12 minutes ago      Up 12 minutes                                 logrotate_crond
d81d8207ec9a        trunk.registry.rdoproject.org/master/centos-binary-nova-compute:tripleo-ci-testing                "kolla_start"       12 minutes ago      Up 12 minutes                                 nova_migration_target
abd9b79e2af8        trunk.registry.rdoproject.org/master/centos-binary-ceilometer-compute:tripleo-ci-testing          "kolla_start"       12 minutes ago      Up 12 minutes                                 ceilometer_agent_compute
aa581489ac9a        trunk.registry.rdoproject.org/master/centos-binary-nova-compute:tripleo-ci-testing                "kolla_start"       12 minutes ago      Up 12 minutes (healthy)                       nova_compute
d4ade28175f0        trunk.registry.rdoproject.org/master/centos-binary-iscsid:tripleo-ci-testing                      "kolla_start"       14 minutes ago      Up 14 minutes                                 iscsid
ae4652853098        trunk.registry.rdoproject.org/master/centos-binary-nova-libvirt:tripleo-ci-testing                "kolla_start"       14 minutes ago      Up 14 minutes                                 nova_libvirt
aac8fea2d496        trunk.registry.rdoproject.org/master/centos-binary-nova-libvirt:tripleo-ci-testing                "kolla_start"       14 minutes ago      Up 14 minutes                                 nova_virtlogd

Conclusion

So in conclusion this demo takes a simple multi-host OpenStack deployment of Keystone, Glance, Cinder, Neutron and Nova from baremetal Newton to containerised Queens in ~26 minutes. There are many things still to resolve and validate with FFU but for now, ahead of M2 this is a pretty good start.

Read More

OpenStack TripleO FFU Keystone Demo N to Q

This post will introduce a very rough demo of the new TripleO Fast-forward Upgrades (FFU) feature, warts and all, using an overcloud with only Keystone deployed. This should prove to be a useful starting point for anyone interested in this feature and could even be an approach used for future per-service FFU CI jobs.

Environment

I’m currently using the tripleo-quickstart project to deploy virtualised test environments. For this demo I’m using the following command line to create the demo environment:

$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

This is made possible by following unmerged changes to tripleo-quickstart:

https://review.openstack.org/#/q/topic:keystone_only_overcloud

Once deployed you should find the 10.0.3 Newton version of Keystone deployed on overcloud-controller-0:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
[..]
$ rpm -qi openstack-keystone
Name        : openstack-keystone
Epoch       : 1
Version     : 10.0.3
Release     : 0.20170726120406.bd49c3e.el7.centos
Architecture: noarch
Install Date: Fri 10 Nov 2017 04:24:46 AM UTC
Group       : Unspecified
Size        : 175014
License     : ASL 2.0
Signature   : (none)
Source RPM  : openstack-keystone-10.0.3-0.20170726120406.bd49c3e.el7.centos.src.rpm
Build Date  : Wed 26 Jul 2017 12:07:53 PM UTC
Build Host  : n30.pufty.ci.centos.org
Relocations : (not relocatable)
URL         : http://keystone.openstack.org/
Summary     : OpenStack Identity Service
Description :
Keystone is a Python implementation of the OpenStack
(http://www.openstack.org) identity service API.

Before starting the upgrade I recommend that snapshots of the undercloud and overcloud-controller-0 libvirt domains are taken on the virthost:

$ ssh -F $WD/ssh.config.ansible virthost
$ for domain in $(virsh list | grep running | awk '{print $2 }'); do virsh snapshot-create-as ${domain} ${domain}_start ; done

UC - docker_registry.yaml

As with a normal container based deployment on >=Pike we will need a Docker registry file mapping each service to a container image. The following command will create this file, pointing to the offical RDO registry:

$ openstack overcloud container image prepare \
  --namespace trunk.registry.rdoproject.org/master \
  --tag tripleo-ci-testing \
  --output-env-file ~/docker_registry.yaml

Note that this will result in the container images being pulled from the remote RDO registry during the upgrade. We can pre-cache these images on the undercloud to speed the process up. However as we are only using a single host and minimal number of services in this demo I have chosen to skip this for now.

UC - tripleo-heat-templates

FFU itself is controlled by an Ansible playbook using tasks that are contained within the tripleo-heat-templates (THT) project. The following gerrit topic lists all of the current FFU changes up for review:

https://review.openstack.org/#/q/status:open+project:openstack/tripleo-heat-templates+branch:master+topic:bp/fast-forward-upgrades

For this demo we need to update the local copy of THT on the undercloud to include a subset of these changes:

$ cd /home/stack/tripleo-heat-templates
$ git fetch git://git.openstack.org/openstack/tripleo-heat-templates refs/changes/19/518719/2 && git checkout FETCH_HEAD

We also need the following noop-deploy-steps.yaml environment file that allows us to use openstack overcloud deploy to update the stack outputs of the overcloud without forcing an actual redeploy of any resources:

$ curl https://git.openstack.org/cgit/openstack/tripleo-heat-templates/plain/environments/noop-deploy-steps.yaml?h=refs/changes/97/520097/1 > environments/noop-deploy-steps.yaml

Finally, as we have deployed a custom set of services for the Controller role we now have to ensure that the Docker service is added to the role prior to our upgrade:

$ cat overcloud_services.yaml 
parameter_defaults:
  ControllerServices:
       - OS::TripleO::Services::Docker
       - OS::TripleO::Services::Kernel
       - OS::TripleO::Services::Keystone
       - OS::TripleO::Services::RabbitMQ
       - OS::TripleO::Services::MySQL
       - OS::TripleO::Services::HAproxy
       - OS::TripleO::Services::Keepalived
       - OS::TripleO::Services::Ntp
       - OS::TripleO::Services::Timezone
       - OS::TripleO::Services::TripleoPackages

OC - Ocata heat-agents

An older os-apply-config hiera hook and any legacy hiera data needs to be removed from the overcloud prior to our upgrade. The following ML post has more details on this workaround:

http://lists.openstack.org/pipermail/openstack-dev/2017-January/110922.html

For the time being this isn’t part of the upgrade playbook and so we need to run the following commands that will update the heat-agents on the host to their Ocata versions and remove the legacy data:

$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles /etc/puppet/hieradata/*.yaml
$ sudo yum install -y \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/openstack-heat-agents-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-ansible-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-apply-config-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-docker-cmd-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-hiera-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-json-file-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-puppet-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm

OC - Remove ceilometer

At present there is a packaging issue when upgrading the openstack-ceilometer packages directly from Newton to Queens. As these packages are installed by default in the Newton overcloud-full image used to deploy the environment but not used in our demo we can simply remove them for the time being:

$ sudo yum remove openstack-ceilometer* -y

UC - Update stack outputs

We can now use the openstack overcloud deploy command to update the overcloud stack and generate the new stack outputs, including the FFU playbook. To do this we simply add the previously created docker_registry.yaml, environments/docker.yaml and environments/noop-deploy-steps.yaml environment files to the original command used to deploy the environment.

$ . stackrc
$ openstack overcloud deploy \
  --templates /home/stack/tripleo-heat-templates \
[..]
  -e /home/stack/docker_registry.yaml \
  -e /home/stack/tripleo-heat-templates/environments/docker.yaml \
  -e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml

The original command is logged under ~/overcloud_deploy.log on the undercloud, for example:

$ grep openstack\ overcloud\ deploy overcloud_deploy.log 
2017-11-16 14:36:11 | + openstack overcloud deploy --templates /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml --validation-warnings-fatal -e /home/stack/overcloud_services.yaml --compute-scale 0 --ntp-server pool.ntp.org

UC - Download config

Now that the stack outputs have been updated we can download the overcloud config containing the FFU playbook onto the undercloud:

$ openstack overcloud config download

There is a known issue with the generated upgrade tasks at the moment where the ordering of conditionals causes Ansible to fail. To workaround this, simply edit the following Ansible tasks within the Controller/upgrade_tasks.yaml file to ensure the step conditional is always checked first:

- block:
  - name: Upgrade os-net-config
    yum: name=os-net-config state=latest
  - changed_when: os_net_config_upgrade.rc == 2
    command: os-net-config --no-activate -c /etc/os-net-config/config.json -v --detailed-exit-codes
    failed_when: os_net_config_upgrade.rc not in [0,2]
    name: take new os-net-config parameters into account now
    register: os_net_config_upgrade
  tags: step3
  when:
  - step|int == 3
  - not os_net_config_need_upgrade.stdout and os_net_config_has_config.rc == 0

UC - Run playbook

With the config present on the undercloud we can finally start the FFU upgrade using the following command line:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
    /home/stack/tmp/fast_forward_upgrade_playbook.yaml

OC - Verification

Once the FFU upgrade is complete we can verify that Keystone is functional in the overcloud with a few simple commands:

$ ssh -F $WD/ssh.config.ansible undercloud
$ . overcloudrc
$ openstack endpoint list
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
| ID                               | Region    | Service Name | Service Type | Enabled | Interface | URL                        |
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
| 15fd404ff8c14971b4251b81624edab8 | regionOne | keystone     | identity     | True    | admin     | http://192.168.24.10:35357 |
| 2e513f5fdfc140ec916b081b47a2b8f7 | regionOne | keystone     | identity     | True    | internal  | http://172.16.2.12:5000    |
| 96980f0f9ac44c718c038ef54af814bc | regionOne | keystone     | identity     | True    | public    | http://10.0.0.8:5000       |
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
$ openstack service list
+----------------------------------+------------+----------+
| ID                               | Name       | Type     |
+----------------------------------+------------+----------+
| 3fc546421e9048f39b2b847b13fa8ea5 | keystone   | identity |
| 7f819190dc6f44d8b995021277b24d67 | ceilometer | metering |
+----------------------------------+------------+----------+

We can also log into the overcloud-controller-0 host and verify that the relevant containers are running:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                COMMAND                  CREATED              STATUS                          PORTS               NAMES
4f40f0cf98aa        192.168.24.1:8787/master/centos-binary-keystone:tripleo-ci-testing   "/bin/bash -c '/usr/l"   About a minute ago   Up About a minute                                   keystone_cron
0b9d5cc17f5d        192.168.24.1:8787/master/centos-binary-keystone:tripleo-ci-testing   "kolla_start"            About a minute ago   Up About a minute (healthy)                         keystone
db967d899aaf        192.168.24.1:8787/master/centos-binary-mariadb:tripleo-ci-testing    "kolla_start"            About a minute ago   Up About a minute (unhealthy)                       mysql
1f0b9aa72ec7        192.168.24.1:8787/master/centos-binary-rabbitmq:tripleo-ci-testing   "kolla_start"            2 minutes ago        Restarting (1) 29 seconds ago                       rabbitmq
8e689f5bac22        192.168.24.1:8787/master/centos-binary-haproxy:tripleo-ci-testing    "kolla_start"            2 minutes ago        Up 2 minutes                                        haproxy

As I said at the start this is a very rough demo that we can hopefully clean up and iterate on quickly over the coming weeks. The current goal is to have another working demo available by M2 that covers all of the required services to upgrade the computes so we can also start verification of the data plane during the upgrade.

Read More

Openstack TripleO FFU Getting started

This post will be a living document where I will detail how TripleO developers can initially provision and iterate quickly while working on service upgrade tasks the new fast-forward upgrade feature for in TripleO Queens.

Initial environment

This section details how to configure the initial environment with specific undercloud (UC) and overcloud (OC) versions and layouts using tripleo-quickstart.

Newton UC & OC

This basic combnination is required for end to end testing of fast forward upgrades:

$ bash quickstart.sh -R newton $VIRTHOST

Note however that the following changes are required so that vbmc is used by the undercloud instead of pxe_ssh (removed in Pike):

https://review.openstack.org/#/q/topic:allow_vbmc_newton

Master UC & Newton OC

This combination is a useful starting point for developers looking to work on tripleo-heat-templates changes for a given service:

$ bash quickstart.sh -R master-undercloud-newton-overcloud $VIRTHOST

The master-undercloud-newton-overcloud release config is introduced by the following change:

https://review.openstack.org/#/c/511464/

Master UC & Newton OC with only Keystone deployed

https://review.openstack.org/#/q/topic:keystone_only_overcloud

$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

Read More

OpenStack - Fast-forward upgrades - Report

http://lists.openstack.org/pipermail/openstack-dev/2017-September/122347.html

My thanks again to everyone who attended and contributed to the skip-level upgrades track over the first two days of last weeks PTG. I’ve included a short summary of our discussions below with a list of agreed actions for Queens at the end.

tl;dr s/skip-level/fast-forward/g

https://etherpad.openstack.org/p/queens-PTG-skip-level-upgrades

Monday

During our first session we briefly discussed the history of the skip-level upgrades effort within the community and the various misunderstandings that have arisen from previous conversations around this topic at past events.

We agreed that at present the only way to perform upgrades between N and N+>=2 releases of OpenStack was to upgrade linearly through each major release, without skipping between the starting and target release of the upgrade.

This is contrary to previous discussions on the topic where it had been suggested that releases could be skipped if DB migrations for these releases were applied in bulk later in the process. As projects within the community currently offer no such support for this it was agreed to continue to use the supported N to N+1 upgrade jumps, albeit in a minimal, offline way.

The name skip-level upgrades has had an obvious role to play in the confusion here and as such the renaming of this effort was discussed at length. Various suggestions are listed on the pad but for the time being I’m going to stick with the basic fast-forward upgrades name (FFOU, OFF, BOFF, FFUD etc were all close behind). This removes any notion of releases being skipped and should hopefully avoid any further confusion in the future.

Support by the projects for offline upgrades was then discussed with a recent Ironic issue highlighted as an example where projects have required services to run before the upgrade could be considered complete. The additional requirement of ensuring both workloads and the data plane remain active during the upgrade was also then discussed. It was agreed that both the supports-upgrades and supports-accessible-upgrades tags should be updated to reflect these requirements for fast-forward upgrades.

Given the above it was agreed that this new definition of what fast-forward upgrades are and the best practices associated with them should be clearly documented somewhere. Various operators in the room highlighted that they would like to see a high level document outline the steps required to achieve this, hopefully written by someone with past experience of running this type of upgrade.

I failed to capture the names of the individuals who were interested in helping out here. If anyone is interested in helping out here please feel free to add your name to the actions either at the end of this mail or at the bottom of the pad.

In the afternoon we reviewed the current efforts within the community to implement fast-forward upgrades, covering TripleO, Charms (Juju) and openstack-ansible. While this was insightful to many in the room there didn’t appear to be any obvious areas of collaboration outside of sharing best practice and defining the high level flow of a fast-forward upgrade.

Tuesday

Tuesday started with a discussion around NFV considerations with fast-forward upgrades. These ranged from the previously mentioned need for the data plane to remain active during the upgrade to the restricted nature of upgrades in NFV environments in terms of time and number of reboots.

It was highlighted that there are some serious as yet unresolved bugs in Nova regarding the live migration of instances using SR-IOV devices. This currently makes the moving of workloads either prior to or during the upgrade particularly difficult.

Rollbacks were also discussed and the need for any best practice documentation around fast-forward upgrades to include steps to allow the recovery of environments if things fail was also highlighted.

We then revisited an idea from the first day of finding or creating a SIG for this effort to call home. It was highlighted that there was a suggestion in the packaging room to create a Deployment / Lifecycle SIG. After speaking with a few individuals later in the week I’ve taken the action to reach out on the openstack-sigs mailing list for further input.

Finally, during a brief discussion on ways we could collaborate and share tooling for fast-forward upgrades a new tool to migrate configuration files between N to N+>=2 releases was introduced. While interesting it was seen as a more generic utility that could also be used between N to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after this session ended to gain more feedback from that team.

Actions

  • Modify the supports-upgrades and supports-accessible-upgrades tags

    I have yet to look into the formal process around making changes to these tags but I will aim to make a start ASAP.

  • Find an Ops lead for the documentation effort

    I failed to take down the names of some of the operators who were talking this through at the time. If they or anyone else is still interested in helping here please let me know!

  • Find or create a relevant SIG for this effort

    As discussed above this could be as part of the lifecycle SIG or an independent upgrades SIG. Expect a separate mail to the SIG list regarding this shortly.

  • Identify a room chair for Sydney

    Unfortunately I will not be present in Sydney to lead a similar session. If anyone is interested in helping please feel free to respond here or reach out to me directly!

My thanks again to everyone who attended the track, I had a blast leading the room and hope that the attendees found both the track and some of the outcomes listed above useful.

Read More