OpenStack TripleO FFU Keystone Demo N to Q

This post will introduce a very rough demo of the new TripleO Fast-forward Upgrades (FFU) feature, warts and all, using an overcloud with only Keystone deployed. This should prove to be a useful starting point for anyone interested in this feature and could even be an approach used for future per-service FFU CI jobs.


I’m currently using the tripleo-quickstart project to deploy virtualised test environments. For this demo I’m using the following command line to create the demo environment:

$ bash -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

This is made possible by following unmerged changes to tripleo-quickstart:

Once deployed you should find the 10.0.3 Newton version of Keystone deployed on overcloud-controller-0:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ rpm -qi openstack-keystone
Name        : openstack-keystone
Epoch       : 1
Version     : 10.0.3
Release     : 0.20170726120406.bd49c3e.el7.centos
Architecture: noarch
Install Date: Fri 10 Nov 2017 04:24:46 AM UTC
Group       : Unspecified
Size        : 175014
License     : ASL 2.0
Signature   : (none)
Source RPM  : openstack-keystone-10.0.3-0.20170726120406.bd49c3e.el7.centos.src.rpm
Build Date  : Wed 26 Jul 2017 12:07:53 PM UTC
Build Host  :
Relocations : (not relocatable)
URL         :
Summary     : OpenStack Identity Service
Description :
Keystone is a Python implementation of the OpenStack
( identity service API.

Before starting the upgrade I recommend that snapshots of the undercloud and overcloud-controller-0 libvirt domains are taken on the virthost:

$ ssh -F $WD/ssh.config.ansible virthost
$ for domain in $(virsh list | grep running | awk '{print $2 }'); do virsh snapshot-create-as ${domain} ${domain}_start ; done

UC - docker_registry.yaml

As with a normal container based deployment on >=Pike we will need a Docker registry file mapping each service to a container image. The following command will create this file, pointing to the offical RDO registry:

$ openstack overcloud container image prepare \
  --namespace \
  --tag tripleo-ci-testing \
  --output-env-file ~/docker_registry.yaml

Note that this will result in the container images being pulled from the remote RDO registry during the upgrade. We can pre-cache these images on the undercloud to speed the process up. However as we are only using a single host and minimal number of services in this demo I have chosen to skip this for now.

UC - tripleo-heat-templates

FFU itself is controlled by an Ansible playbook using tasks that are contained within the tripleo-heat-templates (THT) project. The following gerrit topic lists all of the current FFU changes up for review:

For this demo we need to update the local copy of THT on the undercloud to include a subset of these changes:

$ cd /home/stack/tripleo-heat-templates
$ git fetch git:// refs/changes/19/518719/2 && git checkout FETCH_HEAD

We also need the following noop-deploy-steps.yaml environment file that allows us to use openstack overcloud deploy to update the stack outputs of the overcloud without forcing an actual redeploy of any resources:

$ curl > environments/noop-deploy-steps.yaml

Finally, as we have deployed a custom set of services for the Controller role we now have to ensure that the Docker service is added to the role prior to our upgrade:

$ cat overcloud_services.yaml 
       - OS::TripleO::Services::Docker
       - OS::TripleO::Services::Kernel
       - OS::TripleO::Services::Keystone
       - OS::TripleO::Services::RabbitMQ
       - OS::TripleO::Services::MySQL
       - OS::TripleO::Services::HAproxy
       - OS::TripleO::Services::Keepalived
       - OS::TripleO::Services::Ntp
       - OS::TripleO::Services::Timezone
       - OS::TripleO::Services::TripleoPackages

OC - Ocata heat-agents

An older os-apply-config hiera hook and any legacy hiera data needs to be removed from the overcloud prior to our upgrade. The following ML post has more details on this workaround:

For the time being this isn’t part of the upgrade playbook and so we need to run the following commands that will update the heat-agents on the host to their Ocata versions and remove the legacy data:

$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles /etc/puppet/hieradata/*.yaml
$ sudo yum install -y \ \ \ \ \ \ \ \

OC - Remove ceilometer

At present there is a packaging issue when upgrading the openstack-ceilometer packages directly from Newton to Queens. As these packages are installed by default in the Newton overcloud-full image used to deploy the environment but not used in our demo we can simply remove them for the time being:

$ sudo yum remove openstack-ceilometer* -y

UC - Update stack outputs

We can now use the openstack overcloud deploy command to update the overcloud stack and generate the new stack outputs, including the FFU playbook. To do this we simply add the previously created docker_registry.yaml, environments/docker.yaml and environments/noop-deploy-steps.yaml environment files to the original command used to deploy the environment.

$ . stackrc
$ openstack overcloud deploy \
  --templates /home/stack/tripleo-heat-templates \
  -e /home/stack/docker_registry.yaml \
  -e /home/stack/tripleo-heat-templates/environments/docker.yaml \
  -e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml

The original command is logged under ~/overcloud_deploy.log on the undercloud, for example:

$ grep openstack\ overcloud\ deploy overcloud_deploy.log 
2017-11-16 14:36:11 | + openstack overcloud deploy --templates /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml --validation-warnings-fatal -e /home/stack/overcloud_services.yaml --compute-scale 0 --ntp-server

UC - Download config

Now that the stack outputs have been updated we can download the overcloud config containing the FFU playbook onto the undercloud:

$ openstack overcloud config download

There is a known issue with the generated upgrade tasks at the moment where the ordering of conditionals causes Ansible to fail. To workaround this, simply edit the following Ansible tasks within the Controller/upgrade_tasks.yaml file to ensure the step conditional is always checked first:

- block:
  - name: Upgrade os-net-config
    yum: name=os-net-config state=latest
  - changed_when: os_net_config_upgrade.rc == 2
    command: os-net-config --no-activate -c /etc/os-net-config/config.json -v --detailed-exit-codes
    failed_when: os_net_config_upgrade.rc not in [0,2]
    name: take new os-net-config parameters into account now
    register: os_net_config_upgrade
  tags: step3
  - step|int == 3
  - not os_net_config_need_upgrade.stdout and os_net_config_has_config.rc == 0

UC - Run playbook

With the config present on the undercloud we can finally start the FFU upgrade using the following command line:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \

OC - Verification

Once the FFU upgrade is complete we can verify that Keystone is functional in the overcloud with a few simple commands:

$ ssh -F $WD/ssh.config.ansible undercloud
$ . overcloudrc
$ openstack endpoint list
| ID                               | Region    | Service Name | Service Type | Enabled | Interface | URL                        |
| 15fd404ff8c14971b4251b81624edab8 | regionOne | keystone     | identity     | True    | admin     | |
| 2e513f5fdfc140ec916b081b47a2b8f7 | regionOne | keystone     | identity     | True    | internal  |    |
| 96980f0f9ac44c718c038ef54af814bc | regionOne | keystone     | identity     | True    | public    |       |
$ openstack service list
| ID                               | Name       | Type     |
| 3fc546421e9048f39b2b847b13fa8ea5 | keystone   | identity |
| 7f819190dc6f44d8b995021277b24d67 | ceilometer | metering |

We can also log into the overcloud-controller-0 host and verify that the relevant containers are running:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                COMMAND                  CREATED              STATUS                          PORTS               NAMES
4f40f0cf98aa   "/bin/bash -c '/usr/l"   About a minute ago   Up About a minute                                   keystone_cron
0b9d5cc17f5d   "kolla_start"            About a minute ago   Up About a minute (healthy)                         keystone
db967d899aaf    "kolla_start"            About a minute ago   Up About a minute (unhealthy)                       mysql
1f0b9aa72ec7   "kolla_start"            2 minutes ago        Restarting (1) 29 seconds ago                       rabbitmq
8e689f5bac22    "kolla_start"            2 minutes ago        Up 2 minutes                                        haproxy

As I said at the start this is a very rough demo that we can hopefully clean up and iterate on quickly over the coming weeks. The current goal is to have another working demo available by M2 that covers all of the required services to upgrade the computes so we can also start verification of the data plane during the upgrade.

Read More

Openstack TripleO FFU Getting started

This post will be a living document where I will detail how TripleO developers can initially provision and iterate quickly while working on service upgrade tasks the new fast-forward upgrade feature for in TripleO Queens.

Initial environment

This section details how to configure the initial environment with specific undercloud (UC) and overcloud (OC) versions and layouts using tripleo-quickstart.

Newton UC & OC

This basic combnination is required for end to end testing of fast forward upgrades:

$ bash -R newton $VIRTHOST

Note however that the following changes are required so that vbmc is used by the undercloud instead of pxe_ssh (removed in Pike):

Master UC & Newton OC

This combination is a useful starting point for developers looking to work on tripleo-heat-templates changes for a given service:

$ bash -R master-undercloud-newton-overcloud $VIRTHOST

The master-undercloud-newton-overcloud release config is introduced by the following change:

Master UC & Newton OC with only Keystone deployed

$ bash -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

Read More

OpenStack - Fast-forward upgrades - Report

My thanks again to everyone who attended and contributed to the skip-level upgrades track over the first two days of last weeks PTG. I’ve included a short summary of our discussions below with a list of agreed actions for Queens at the end.

tl;dr s/skip-level/fast-forward/g


During our first session we briefly discussed the history of the skip-level upgrades effort within the community and the various misunderstandings that have arisen from previous conversations around this topic at past events.

We agreed that at present the only way to perform upgrades between N and N+>=2 releases of OpenStack was to upgrade linearly through each major release, without skipping between the starting and target release of the upgrade.

This is contrary to previous discussions on the topic where it had been suggested that releases could be skipped if DB migrations for these releases were applied in bulk later in the process. As projects within the community currently offer no such support for this it was agreed to continue to use the supported N to N+1 upgrade jumps, albeit in a minimal, offline way.

The name skip-level upgrades has had an obvious role to play in the confusion here and as such the renaming of this effort was discussed at length. Various suggestions are listed on the pad but for the time being I’m going to stick with the basic fast-forward upgrades name (FFOU, OFF, BOFF, FFUD etc were all close behind). This removes any notion of releases being skipped and should hopefully avoid any further confusion in the future.

Support by the projects for offline upgrades was then discussed with a recent Ironic issue highlighted as an example where projects have required services to run before the upgrade could be considered complete. The additional requirement of ensuring both workloads and the data plane remain active during the upgrade was also then discussed. It was agreed that both the supports-upgrades and supports-accessible-upgrades tags should be updated to reflect these requirements for fast-forward upgrades.

Given the above it was agreed that this new definition of what fast-forward upgrades are and the best practices associated with them should be clearly documented somewhere. Various operators in the room highlighted that they would like to see a high level document outline the steps required to achieve this, hopefully written by someone with past experience of running this type of upgrade.

I failed to capture the names of the individuals who were interested in helping out here. If anyone is interested in helping out here please feel free to add your name to the actions either at the end of this mail or at the bottom of the pad.

In the afternoon we reviewed the current efforts within the community to implement fast-forward upgrades, covering TripleO, Charms (Juju) and openstack-ansible. While this was insightful to many in the room there didn’t appear to be any obvious areas of collaboration outside of sharing best practice and defining the high level flow of a fast-forward upgrade.


Tuesday started with a discussion around NFV considerations with fast-forward upgrades. These ranged from the previously mentioned need for the data plane to remain active during the upgrade to the restricted nature of upgrades in NFV environments in terms of time and number of reboots.

It was highlighted that there are some serious as yet unresolved bugs in Nova regarding the live migration of instances using SR-IOV devices. This currently makes the moving of workloads either prior to or during the upgrade particularly difficult.

Rollbacks were also discussed and the need for any best practice documentation around fast-forward upgrades to include steps to allow the recovery of environments if things fail was also highlighted.

We then revisited an idea from the first day of finding or creating a SIG for this effort to call home. It was highlighted that there was a suggestion in the packaging room to create a Deployment / Lifecycle SIG. After speaking with a few individuals later in the week I’ve taken the action to reach out on the openstack-sigs mailing list for further input.

Finally, during a brief discussion on ways we could collaborate and share tooling for fast-forward upgrades a new tool to migrate configuration files between N to N+>=2 releases was introduced. While interesting it was seen as a more generic utility that could also be used between N to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after this session ended to gain more feedback from that team.


  • Modify the supports-upgrades and supports-accessible-upgrades tags

    I have yet to look into the formal process around making changes to these tags but I will aim to make a start ASAP.

  • Find an Ops lead for the documentation effort

    I failed to take down the names of some of the operators who were talking this through at the time. If they or anyone else is still interested in helping here please let me know!

  • Find or create a relevant SIG for this effort

    As discussed above this could be as part of the lifecycle SIG or an independent upgrades SIG. Expect a separate mail to the SIG list regarding this shortly.

  • Identify a room chair for Sydney

    Unfortunately I will not be present in Sydney to lead a similar session. If anyone is interested in helping please feel free to respond here or reach out to me directly!

My thanks again to everyone who attended the track, I had a blast leading the room and hope that the attendees found both the track and some of the outcomes listed above useful.

Read More

16 bridges charity walk - We did it!

I’m finally back from a work trip to the US and wanted share that we completed the 16 bridges 15 bridges (as the Golden Jubilee Bridge(s) remain closed) in just over 5 hours 10 days ago!

Again, our thanks to everyone who donated, it’s going to a wonderful charity and will hopefully make a difference to the lives of people living with cardiomyopathy!

I’ve already started looking into similar walks we could take part in next year, with an obvious candidate of the Wye Valley Challenge being most likely at the moment. The full 100km version from Chepstow to Hereford might prove slightly too much for a novice like me however there are shorter, 45km versions ending in Hereford.

Read More

OpenStack - Skip level upgrades - PTG


A short reminder that I’ll be chairing the skip-level upgrades room at next week’s OpenStack PTG in Denver. So far ~15 of you have shown interest in this track on the etherpad so I’m looking forward to some useful discussions over the two days. For now we still have available slots so if you do have suggestions please feel free to add them directly on the pad!

At present the agenda for the room (Durango, Atrium level) looks like this:


  • 09:00 - 10:00 - #####
  • 10:00 - 10:30 - Retrospective of what was discussed in Boston, outcomes, etc.
  • 10:30 - 11:00 - Have operator requirements changed since Boston?
  • 11:00 - 14:00 - #####
  • 14:00 - 16:00 - What efforts (if any) are underway to enable skip level upgrades within the community?
  • 16:00 - 18:00 - #####


  • 09:00 - 10:30 - #####
  • 10:30 - 11:00 - NFV considerations
  • 11:00 - 11:30 - API versions control
  • 11:30 - 14:00 - #####
  • 14:00 - 16:00 - How can we collaborate and share tools for skip level upgrades within the community?
  • 16:00 - 18:00 - Should we think about a different way of releasing?


Later in the week I will also be participating in the TripleO track, with a session on Thursday to discuss my WIP skip-level upgrade spec. I’ll be working on this during the week leading up to this session so feel free to review this ahead of time or just grab me in the hallway for a chat if this is something that interests you!

Read More

OpenStack - Skip level upgrades - Introduction

I’ve been fortunate enough to be part of a team looking into skip level upgrades recently ahead of the start of the Queens development cycle for OpenStack. What follows is an introduction to the concept of skip level upgrades and an overview of our initial PoC work in this area. Future posts will also cover our plans for enabling skip level upgrades within TripleO and possible work with the wider community to enable this within other deployment tools.


Skip level upgrades are as the name suggests, upgrades that move an environment from release N to N+X in a single step, where X is greater than 1 and for skip level upgrades is typically 3. For example in the context of OpenStack N to N+3 can refer to an upgrade from the Newton release of Openstack to the Queens release, skipping Ocata and Pike:

Newton    Ocata     Pike       Queens
+-----+   +-----+   +-----+    +-----+
|     |   | N+1 |   | N+2 |    |     |
|  N  | ---------------------> | N+3 |
|     |   |     |   |     |    |     |
+-----+   +-----+   +-----+    +-----+

There are existing alternative methods available for skipping a number of releases during an upgrade. For example, parallel cloud migration is a commonly cited alternative. This is where an additional environment is stood up alongside the original, with workloads migrated to the new environment:

        |     |
env#1   |  N  |
        |     |
           \       Queens
            \      +-----+
             \     |     |
env#2         `->  | N+3 |
                   |     |

The requirement for this type of upgrade is driven by users looking to standardise on a given release (typically LTS), whilst retaining the ability to skip forward when the release hits EOL. This negates the need to keep up with the major release cycle that in the case of OpenStack continues to be every 6 months.

It is worth highlighting that the topic of skip level upgrades is not new to the OpenStack community, with attempts to provide skip level upgrade functionality within the community before now, typically within the various deployment projects. For example openstack-ansible’s leap-upgrades project that attempted to move environments between Juno/Kilo and Newton.

More recently the topic of skip level upgrades was discussed at the OpenStack Forum in Boston in May. A RFC thread was also posted to the development mailing list, however no formal actions came of either discussion. I’m looking to restart this discussion at the next PTG in Denver, more on that later.


Now that we understand what skip level upgrades actually are, it’s time to set out some basic requirements for the state of the environment during the upgrade. At the start of this process our team sat down and drafted the following:

  • The control plane is inaccessible for the duration of the upgrade
  • The upgrade must complete successfully or rollback within 4 hours
  • The data plane and workloads must remain available for the duration of the upgrade.

Proof of concept

With the requirements set out, our first real task was to prove that this was even possible with an OpenStack environment. Given the releases available at the time, we began by manually upgrading an existing Mitaka based RHOSP 9 environment running on RHEL 7.3 to our recently released Ocata based RHOSP 11 release running on RHEL 7.4.

Mitaka    Newton    Ocata
+-----+   +-----+   +-----+
|     |   | N+1 |   |     |
|  N  | ----------> | N+2 |
|     |   |     |   |     |
+-----+   +-----+   +-----+
RHEL73              RHEL74

We were aware that whilst the goal of skip level upgrades is to give the impression of a single jump between releases, in practice this isn’t possible with OpenStack. Upgrades of OpenStack components are verified by the community across N to N+1 jumps, so whilst we wanted to skip ahead to Ocata we knew we would also have to upgrade through Newton to get there.

The following outlines, at a very high level, the steps we followed during the PoC to upgrade the environment from Mitaka to Ocata:

  • Rolling minor update of the underlying OS
  • Disable control plane and compute services
  • Upgrade a single controller to N+1 and then N+2
    • Update packages
    • Introduce new services as required (nova-placement for example)
    • Update service configuration files
    • Run DB syncs, migrations etc
    • Repeat for N+2
  • Upgrade remaining controllers directly to N+2
    • Update packages
    • Introduce new services as required
    • Update service configuration files
  • Upgrade remaining hosts to N+2
    • Update packages
    • Update service configuration files
  • Enable control plane and compute services
  • Verify workload availability during upgrade
  • Validate the post upgrade environment.

Let’s take a look at each of these steps below in more detail.

Rolling minor update of the underlying OS

This initial rolling minor update moved hosts from RHEL 7.3 to RHEL 7.4 whilst also pulling in OVS from our RHOSP11 repos in a bid to limit the number of reboots required in the environment. In practise operators could perform this minor update well ahead of any skip level upgrade, reducing any impact on the overall time required for the upgrade itself.

Disable control plane and compute services

As listed above under requirements, a full control plane outage is accounted for during the skip level upgrade. Note that this does not include the infrastructure services providing the database, messaging queues etc. Compute services are also stopped at this time but should not have any impact on the running workloads and data plane.

Upgrade a single controller to N+1 and then N+2

The main work of upgrading between releases is carried out on a single controller. Packages are updated, new services such as nova-placement are deployed as required, configuration files updated and DB migrations completed. This process is repeated on this host until we reach the target release.

Upgrade remaining controllers directly to N+2

Once the single controller has been upgraded to our target release we then skip any remaining controllers ahead to this target release. Updating packages, introducing new services and updating configuration files on these controllers as required.

Upgrade remaining hosts to N+2

This is then repeated for any remaining hosts, such as computes, object storage hosts etc. Again this should not interrupt running workloads or the data plane.

Enable control plane and compute services

Once all hosts are updated to the target release the control and compute services are restarted.

Verify workload availability during upgrade

During our PoC we ran multiple instances across various L2 and L3 networks, using Ansible to first launch and then later collect the results of asynchronous jobs (ping, ssh etc) that had been running between these instances during the upgrade.

Validate the post upgrade environment

Finally Tempest was used to validate the end state of the environment post upgrade.

After many iterations, the eventual introduction of Ansible to automate all the things and much cursing at the amount of time to reconfigure the environment after each run it was agreed that the team would move on to look into the possible implementation of the above in TripleO.


Thanks for making it this far! Before I end this post I wanted to highlight that I’ve recently agreed to lead the skip level upgrades room at the upcoming Denver PTG. I’ll be posting more details on this shortly to the OpenStack development mailing list but wanted to take this opportunity to encourage anyone interested in this topic to attend and discuss possible ways we can make skip level upgrades a possibility across the various deployment tools within the community.

Read More

16 bridges charity walk

As I alluded to in my opening post Katie and I are raising money for Cardiomyopathy UK over the coming months, starting with a walk through London in September. But why, you might ask, are we doing this?

Well, the charity held a conference that we both attended shortly after I was hospitalised in August 2016. Thankfully that episode has since been deemed to simply be Atrial fibrillation that my ICD mistook for Ventricular fibrillation, a very serious and potentially life threatening issue.

At the time we were both struggling to deal with the reality of my long suspected, but never fully confirmed condition ARVC presenting itself. The Cardiomyopathy UK National Conference in November of last year was brilliant, eye opening and helped greatly during that time. Now that things aren’t so bleak we wanted to give something back and this walk is the first step (ha!) in achieving that.

We’ve just been given the 25km route that I’ve included below :

We’ve also started training with a few slow jaunts around Hereford :

Finally, if you would like to donate, it would be very much appreciated. Your money will help people like us, going through scary and confusing times, to access information, support and advice as well as contribute to training for medical professionals. We are gratefully accepting donations via our page with our current progress towards our goal shown below :

Read More

Hey, it has been a while..

So it has been a while since I published anything on my original Wordpress based blog, so long in fact that I’ve decided to do away with the old and move onto something new, shiny and statically generated. All previously written and frankly rather embarrassing content has been deleted and lost forever.

This new blog, for anyone that is interested, is now hosted on GitLab pages and statically generated using Hugo. I hopefully have some kind of flare in the mail for switching away from Wordpress to these new shiny things, I could always use more flare.

Anyway, moving forward this blog is going to capture some of my work currently around OpenStack, charity events for Cardiomyopathy UK, hobby projects such as automating my home with Home Assistant , ramblings about the software industry while also documenting my numerous spelling and grammatical failures forever.

That’s enough text for this test post, on with the actual content!

Read More