OpenStack TripleO FFU M2 progress report

This is a brief progress report from the Upgrades squad for the fast-forward upgrades (FFU) feature in TripleO, introducing N to Q upgrades.

tl;dr Good initial progress, missed M2 goal of nv CI jobs, pushing on to M3.


For anyone unfamiliar with the concept of fast-forward upgrades the following sentence from the spec gives a brief high level introduction:

> Fast-forward upgrades are upgrades that move an environment from release `N`
> to `N+X` in a single step, where `X` is greater than `1` and for fast-forward
> upgrades is typically `3`.

The spec itself obviously goes into more detail and I’d recommend anyone wanting to know more about our approach for FFU in TripleO to start there:

Note that the spec is being updated at present by the following change, introducing more details on the FFU task layout, ordering, dependency on the on-going major upgrade rework in Q, canary compute validation etc:

WIP ffu: Spec update for M2

M2 Status

The original goal for Queens M2 was to have one or more non-voting FFU jobs deployed somewhere able to run through the basic undercloud and overcloud upgrade workflows, exercising as many compute service dependencies as we could up to and including Nova. Unfortunately while Sofer has made some great progress with this we do not have any running FFU jobs at present:

We do however have documented demos that cover FFU for some limited overcloud environments from Newton to Queens:

OpenStack TripleO FFU Keystone Demo N to Q

OpenStack TripleO FFU Nova Demo N to Q

These demos currently use a stack of changes against THT with the first ~4 or so changes introducing the FFU framework:

FWIW getting these initial changes merged would help avoid the current change storm every time this series is rebased to pick up upgrade or deploy related bug fixes.

Also note that the demos currently use the raw Ansible playbooks stack outputs to run through the FFU tasks, upgrade tasks and deploy tasks. This is by no means what the final UX will be, with python-tripleoclient and workflow work to be completed ahead of M3.

M3 Goals

The squad will be focusing on the following goals for M3:

  • Non-voting RDO CI jobs defined and running
  • FFU THT changes tested by the above jobs and merged
  • python-tripleoclient & required Mistral workflows merged
  • Use of ceph-ansible for Ceph upgrades
  • Draft developer and user docs under review

FFU squad

Finally, a quick note to highlight that this report marks the end of my own personal involvement with the FFU feature in TripleO. I’m not going far, returning to work on Nova and happy to make time to talk about and review FFU related changes etc. The members of the upgrade squad taking this forward and your main points of contact for FFU in TripleO will be:

  • Sofer (chem)
  • Lukas (social)
  • Marios (marios)

My thanks again to Sofer, Lukas, Marios, the rest of the upgrade squad and wider TripleO community for your guidance and patience when putting up with my constant inane questioning regarding FFU over the past few months!

Read More

OpenStack TripleO FFU Nova Demo N to Q

Update 04/12/17 : The initial deployment documented in this demo no longer works due to the removal of a number of plan migration steps that have now been promoted into the Queens repos. We are currently looking into ways to reintroduce these for use in master UC Newton OC FFU development deployments, until then anyone attempting to run through this demo should start with a Newton OC and UC before upgrading the UC to master.

This is another TripleO fast-forward upgrade demo post, this time focusing on a basic stack of Keystone, Glance, Cinder, Neutron and Nova. At present there are several workarounds still required to allow the upgrade to complete, please see the workaround sections for more details.


As with the original demo I’m still using tripleo-quickstart to deploy my initial environment, this time with 1 controller and 1 compute, with a Queens undercloud and Newton overcloud. In addition I’m also using a new general config to deploy a minimal control stack able to host Nova.

$ bash -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/minimal-nova.yml $VIRTHOST

UC - docker_registry.yaml

Again with this demo we are not caching containers locally, the following command will create a docker_registry.yaml file referencing the RDO registry for use during the final deployment of the overcloud to Queens:

$ ssh -F $WD/ssh.config.ansible undercloud
$ openstack overcloud container image prepare \
  --namespace \
  --tag tripleo-ci-testing \
  --output-env-file ~/docker_registry.yaml

UC - tripleo-heat-templates

We then need to update the version of tripleo-heat-templates deployed on the undercloud host:

$ ssh -F $WD/ssh.config.ansible undercloud
$ cd /home/stack/tripleo-heat-templates
$ git fetch git:// refs/changes/19/518719/9 && git checkout FETCH_HEAD

Finally, as we are using a customised controller role the following services need to be added to the overcloud_services.yml file on the undercloud node under ControllerServices:

       - OS::TripleO::Services::Docker
       - OS::TripleO::Services::Iscsid
       - OS::TripleO::Services::NovaPlacement

UC - tripleo-common

At present we are waiting for a promotion of tripleo-common that includes various bugfixes when updating the overcloud stack, generating outputs etc. For the time being we can simply install directly from master to workaround these issues.

$ ssh -F $WD/ssh.config.ansible undercloud
$ git clone ; cd tripleo-common
$ sudo python install ; cd ~

OC - Update heat-agents

As documented in my previous demo post we need to remove any legacy heiradata from all overcloud hosts prior to updating the heat stack:

$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml \
             /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles \

We also need to update the heat-agents on all nodes to their Ocata versions:

$ git clone ; cd tripleo-repos
$ sudo python install
$ sudo tripleo-repos -b ocata current
$ sudo yum update -y python-heat-agent \
$ sudo yum install -y openstack-heat-agents \
                      python-heat-agent-ansible \
                      python-heat-agent-apply-config \
                      python-heat-agent-docker-cmd \
                      python-heat-agent-hiera \

OC - Workarounds #1

$ sudo yum remove openstack-ceilometer* -y

UC - Update stack outputs

With the workarounds in place we can now update the stack using the updated version of tripleo-heat-templates on the undercloud. Once again we need to use the original deploy command with a number of additional environment files included:

$ . stackrc
$ openstack overcloud deploy \
  --templates /home/stack/tripleo-heat-templates \
  -e /home/stack/docker_registry.yaml \
  -e /home/stack/tripleo-heat-templates/environments/docker.yaml \
  -e /home/stack/tripleo-heat-templates/environments/fast-forward-upgrade.yaml \
  -e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml

UC - Download config

Once the stack has been updated we can download the config with the following command:

$ . stackrc
$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-Oalkee-config

UC - FFU and Upgrade plays

Before running through any of the generated playbooks I personally like to add the profile_tasks callback to the callback_whitelist for Ansible within /etc/ansible/ansible.cfg. This provides timestamps during the playbook run and a summary of the slowest tasks at the end.

# enable callback plugins, they can output to stdout but cannot be 'stdout' type.
callback_whitelist = profile_tasks

We first run the fast_forward_upgrade_playbook to complete the upgrade to Pike:

$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
PLAY RECAP *****************************************************************************************************************************              : ok=62   changed=8    unreachable=0    failed=0              : ok=123  changed=55   unreachable=0    failed=0   

Friday 01 December 2017  20:39:58 +0000 (0:00:03.967)       0:06:16.615 ******* 
Stop neutron_server ------------------------------------------------------------------------------------------------------------ 32.53s
stop openstack-cinder-volume --------------------------------------------------------------------------------------------------- 16.03s
Stop neutron_l3_agent ---------------------------------------------------------------------------------------------------------- 14.36s
Stop and disable nova-compute service ------------------------------------------------------------------------------------------ 13.16s
Cinder package update ---------------------------------------------------------------------------------------------------------- 12.73s
stop openstack-cinder-scheduler ------------------------------------------------------------------------------------------------ 12.68s
Setup cell_v2 (sync nova/cell DB) ---------------------------------------------------------------------------------------------- 11.79s
Cinder package update ---------------------------------------------------------------------------------------------------------- 11.30s
Neutron package update --------------------------------------------------------------------------------------------------------- 10.99s
Keystone package update -------------------------------------------------------------------------------------------------------- 10.77s
glance package update ---------------------------------------------------------------------------------------------------------- 10.28s
Keystone package update --------------------------------------------------------------------------------------------------------- 9.80s
glance package update ----------------------------------------------------------------------------------------------------------- 8.72s
Neutron package update ---------------------------------------------------------------------------------------------------------- 8.62s
Stop and disable nova-consoleauth service --------------------------------------------------------------------------------------- 7.94s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.62s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.24s
Stop and disable nova-scheduler service ----------------------------------------------------------------------------------------- 6.36s
Run puppet apply to set tranport_url in nova.conf ------------------------------------------------------------------------------- 5.78s
install tripleo-repos ----------------------------------------------------------------------------------------------------------- 4.70s

We then run the upgrade_steps_playbook to start the upgrade to Queens:

$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
PLAY RECAP *****************************************************************************************************************************              : ok=57   changed=45   unreachable=0    failed=0              : ok=165  changed=146  unreachable=0    failed=0   

Friday 01 December 2017  20:51:55 +0000 (0:00:00.038)       0:10:47.865 ******* 
Update all packages ----------------------------------------------------------------------------------------------------------- 263.71s
Update all packages ----------------------------------------------------------------------------------------------------------- 256.79s
Install docker packages on upgrade if missing ---------------------------------------------------------------------------------- 13.77s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.71s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.12s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.36s
Install docker packages on upgrade if missing ----------------------------------------------------------------------------------- 3.14s
Stop and disable mysql service -------------------------------------------------------------------------------------------------- 1.97s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.66s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.57s
Stop keepalived service --------------------------------------------------------------------------------------------------------- 1.48s
Stop and disable rabbitmq service ----------------------------------------------------------------------------------------------- 1.47s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.31s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.08s
Check if openstack-ceilometer-compute is deployed ------------------------------------------------------------------------------- 0.70s
Check if iscsid service is deployed --------------------------------------------------------------------------------------------- 0.67s
Start keepalived service -------------------------------------------------------------------------------------------------------- 0.48s
Check for nova placement running under apache ----------------------------------------------------------------------------------- 0.46s
Stop and disable mongodb service on upgrade ------------------------------------------------------------------------------------- 0.45s
remove old cinder cron jobs ----------------------------------------------------------------------------------------------------- 0.45s

OC - Workarounds #2

On overcloud-novacompute-0 the following file needs to be removed to workaround a known issue:

$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo rm /etc/iscsi/.initiator_reset

UC - Deploy play

Finally we run through the deploy_steps_playbook:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
PLAY RECAP *****************************************************************************************************************************              : ok=48   changed=11   unreachable=0    failed=0              : ok=76   changed=10   unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=0   

Friday 01 December 2017  21:04:58 +0000 (0:00:00.041)       0:10:24.723 ******* 
Run docker-puppet tasks (generate config) ------------------------------------------------------------------------------------- 186.65s
Run docker-puppet tasks (bootstrap tasks) ------------------------------------------------------------------------------------- 101.10s
Start containers for step 3 ---------------------------------------------------------------------------------------------------- 98.61s
Start containers for step 4 ---------------------------------------------------------------------------------------------------- 41.37s
Run puppet host configuration for step 1 --------------------------------------------------------------------------------------- 32.53s
Start containers for step 1 ---------------------------------------------------------------------------------------------------- 25.76s
Run puppet host configuration for step 5 --------------------------------------------------------------------------------------- 17.91s
Run puppet host configuration for step 4 --------------------------------------------------------------------------------------- 14.47s
Run puppet host configuration for step 3 --------------------------------------------------------------------------------------- 13.41s
Run docker-puppet tasks (bootstrap tasks) -------------------------------------------------------------------------------------- 10.39s
Run puppet host configuration for step 2 --------------------------------------------------------------------------------------- 10.37s
Start containers for step 5 ---------------------------------------------------------------------------------------------------- 10.12s
Run docker-puppet tasks (bootstrap tasks) --------------------------------------------------------------------------------------- 9.78s
Start containers for step 2 ----------------------------------------------------------------------------------------------------- 6.32s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 4.37s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.46s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.80s
create libvirt persistent data directories -------------------------------------------------------------------------------------- 1.21s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.03s
Check if /var/lib/docker-puppet/docker-puppet-tasks4.json exists ---------------------------------------------------------------- 1.00s


I’ll revisit this in the coming days and add a more complete set of tasks to verify the end environment but for now we can run a simple boot from volume instance (as Swift, the default store for Glance was not installed):

$ cinder create 1
$ cinder set-bootable 46d278f7-31fc-4e45-b5df-eb8220800b1a true
$ nova flavor-create 1 1 512 1 1
$ nova boot --boot-volume 46d278f7-31fc-4e45-b5df-eb8220800b1a --flavor 1 test 
$ nova list
| ID                                   | Name | Status | Task State | Power State | Networks          |
| 05821616-1239-4ca9-8baa-6b0ca4ea3a6b | test | ACTIVE | -          | Running     | priv= |

We can also see the various containerised services running on the overcloud:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                                             COMMAND                  CREATED             STATUS                      PORTS               NAMES
d80d6f072604                  "kolla_start"            13 minutes ago      Up 12 minutes (healthy)                         glance_api
61fbf47241ce                    "kolla_start"            13 minutes ago      Up 13 minutes                                   nova_metadata
9defdb5efe0f                    "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_api
874716d99a44             "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_vnc_proxy
21ca0fd8d8ec              "kolla_start"            13 minutes ago      Up 13 minutes                                   neutron_api
e0eed85b860a               "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         cinder_volume
0882e08ac198            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_consoleauth
e3ebc4b066c9                    "kolla_start"            13 minutes ago      Up 13 minutes                                   nova_api_cron
c7d05a04a8a3                  "kolla_start"            13 minutes ago      Up 13 minutes                                   cinder_api_cron
2f3c1e244997   "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_ovs_agent
bfeb120bf77a      "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_metadata_agent
43b2c09aecf8              "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_scheduler
a7a3024b63f6          "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_dhcp
3df990a68046            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         cinder_scheduler
94461ba833aa            "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         neutron_l3_agent
4bee34f9fce2                  "kolla_start"            13 minutes ago      Up 13 minutes                                   cinder_api
e8bec9348fe3              "kolla_start"            13 minutes ago      Up 13 minutes (healthy)                         nova_conductor
22db40c25881                    "/bin/bash -c '/usr/l"   15 minutes ago      Up 15 minutes                                   keystone_cron
26769acaaf5e                    "kolla_start"            16 minutes ago      Up 16 minutes (healthy)                         keystone
99037a5e5c36                      "kolla_start"            16 minutes ago      Up 16 minutes                                   iscsid
9f4aae72c201          "kolla_start"            16 minutes ago      Up 16 minutes                                   nova_placement
311302abc297                     "kolla_start"            16 minutes ago      Up 16 minutes                                   horizon
d465e4f5b7e6                     "kolla_start"            17 minutes ago      Up 17 minutes (unhealthy)                       mysql
b9e062f1d857                    "kolla_start"            18 minutes ago      Up 18 minutes (healthy)                         rabbitmq
a57f053afc03                   "/bin/bash -c 'source"   18 minutes ago      Up 18 minutes                                   memcached
baeb6d1087e6                       "kolla_start"            18 minutes ago      Up 18 minutes                                   redis
faafa1bf2d2e                     "kolla_start"            18 minutes ago      Up 18 minutes                                   haproxy
$ exit
$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                                             COMMAND             CREATED             STATUS                    PORTS               NAMES
0363d7008e87   "kolla_start"       12 minutes ago      Up 12 minutes (healthy)                       neutron_ovs_agent
c1ff23ee9f16                        "kolla_start"       12 minutes ago      Up 12 minutes                                 logrotate_crond
d81d8207ec9a                "kolla_start"       12 minutes ago      Up 12 minutes                                 nova_migration_target
abd9b79e2af8          "kolla_start"       12 minutes ago      Up 12 minutes                                 ceilometer_agent_compute
aa581489ac9a                "kolla_start"       12 minutes ago      Up 12 minutes (healthy)                       nova_compute
d4ade28175f0                      "kolla_start"       14 minutes ago      Up 14 minutes                                 iscsid
ae4652853098                "kolla_start"       14 minutes ago      Up 14 minutes                                 nova_libvirt
aac8fea2d496                "kolla_start"       14 minutes ago      Up 14 minutes                                 nova_virtlogd


So in conclusion this demo takes a simple multi-host OpenStack deployment of Keystone, Glance, Cinder, Neutron and Nova from baremetal Newton to containerised Queens in ~26 minutes. There are many things still to resolve and validate with FFU but for now, ahead of M2 this is a pretty good start.

Read More

OpenStack TripleO FFU Keystone Demo N to Q

This post will introduce a very rough demo of the new TripleO Fast-forward Upgrades (FFU) feature, warts and all, using an overcloud with only Keystone deployed. This should prove to be a useful starting point for anyone interested in this feature and could even be an approach used for future per-service FFU CI jobs.


I’m currently using the tripleo-quickstart project to deploy virtualised test environments. For this demo I’m using the following command line to create the demo environment:

$ bash -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

This is made possible by following unmerged changes to tripleo-quickstart:

Once deployed you should find the 10.0.3 Newton version of Keystone deployed on overcloud-controller-0:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ rpm -qi openstack-keystone
Name        : openstack-keystone
Epoch       : 1
Version     : 10.0.3
Release     : 0.20170726120406.bd49c3e.el7.centos
Architecture: noarch
Install Date: Fri 10 Nov 2017 04:24:46 AM UTC
Group       : Unspecified
Size        : 175014
License     : ASL 2.0
Signature   : (none)
Source RPM  : openstack-keystone-10.0.3-0.20170726120406.bd49c3e.el7.centos.src.rpm
Build Date  : Wed 26 Jul 2017 12:07:53 PM UTC
Build Host  :
Relocations : (not relocatable)
URL         :
Summary     : OpenStack Identity Service
Description :
Keystone is a Python implementation of the OpenStack
( identity service API.

Before starting the upgrade I recommend that snapshots of the undercloud and overcloud-controller-0 libvirt domains are taken on the virthost:

$ ssh -F $WD/ssh.config.ansible virthost
$ for domain in $(virsh list | grep running | awk '{print $2 }'); do virsh snapshot-create-as ${domain} ${domain}_start ; done

UC - docker_registry.yaml

As with a normal container based deployment on >=Pike we will need a Docker registry file mapping each service to a container image. The following command will create this file, pointing to the offical RDO registry:

$ openstack overcloud container image prepare \
  --namespace \
  --tag tripleo-ci-testing \
  --output-env-file ~/docker_registry.yaml

Note that this will result in the container images being pulled from the remote RDO registry during the upgrade. We can pre-cache these images on the undercloud to speed the process up. However as we are only using a single host and minimal number of services in this demo I have chosen to skip this for now.

UC - tripleo-heat-templates

FFU itself is controlled by an Ansible playbook using tasks that are contained within the tripleo-heat-templates (THT) project. The following gerrit topic lists all of the current FFU changes up for review:

For this demo we need to update the local copy of THT on the undercloud to include a subset of these changes:

$ cd /home/stack/tripleo-heat-templates
$ git fetch git:// refs/changes/19/518719/2 && git checkout FETCH_HEAD

We also need the following noop-deploy-steps.yaml environment file that allows us to use openstack overcloud deploy to update the stack outputs of the overcloud without forcing an actual redeploy of any resources:

$ curl > environments/noop-deploy-steps.yaml

Finally, as we have deployed a custom set of services for the Controller role we now have to ensure that the Docker service is added to the role prior to our upgrade:

$ cat overcloud_services.yaml 
       - OS::TripleO::Services::Docker
       - OS::TripleO::Services::Kernel
       - OS::TripleO::Services::Keystone
       - OS::TripleO::Services::RabbitMQ
       - OS::TripleO::Services::MySQL
       - OS::TripleO::Services::HAproxy
       - OS::TripleO::Services::Keepalived
       - OS::TripleO::Services::Ntp
       - OS::TripleO::Services::Timezone
       - OS::TripleO::Services::TripleoPackages

OC - Ocata heat-agents

An older os-apply-config hiera hook and any legacy hiera data needs to be removed from the overcloud prior to our upgrade. The following ML post has more details on this workaround:

For the time being this isn’t part of the upgrade playbook and so we need to run the following commands that will update the heat-agents on the host to their Ocata versions and remove the legacy data:

$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles /etc/puppet/hieradata/*.yaml
$ sudo yum install -y \ \ \ \ \ \ \ \

OC - Remove ceilometer

At present there is a packaging issue when upgrading the openstack-ceilometer packages directly from Newton to Queens. As these packages are installed by default in the Newton overcloud-full image used to deploy the environment but not used in our demo we can simply remove them for the time being:

$ sudo yum remove openstack-ceilometer* -y

UC - Update stack outputs

We can now use the openstack overcloud deploy command to update the overcloud stack and generate the new stack outputs, including the FFU playbook. To do this we simply add the previously created docker_registry.yaml, environments/docker.yaml and environments/noop-deploy-steps.yaml environment files to the original command used to deploy the environment.

$ . stackrc
$ openstack overcloud deploy \
  --templates /home/stack/tripleo-heat-templates \
  -e /home/stack/docker_registry.yaml \
  -e /home/stack/tripleo-heat-templates/environments/docker.yaml \
  -e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml

The original command is logged under ~/overcloud_deploy.log on the undercloud, for example:

$ grep openstack\ overcloud\ deploy overcloud_deploy.log 
2017-11-16 14:36:11 | + openstack overcloud deploy --templates /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml --validation-warnings-fatal -e /home/stack/overcloud_services.yaml --compute-scale 0 --ntp-server

UC - Download config

Now that the stack outputs have been updated we can download the overcloud config containing the FFU playbook onto the undercloud:

$ openstack overcloud config download

There is a known issue with the generated upgrade tasks at the moment where the ordering of conditionals causes Ansible to fail. To workaround this, simply edit the following Ansible tasks within the Controller/upgrade_tasks.yaml file to ensure the step conditional is always checked first:

- block:
  - name: Upgrade os-net-config
    yum: name=os-net-config state=latest
  - changed_when: os_net_config_upgrade.rc == 2
    command: os-net-config --no-activate -c /etc/os-net-config/config.json -v --detailed-exit-codes
    failed_when: os_net_config_upgrade.rc not in [0,2]
    name: take new os-net-config parameters into account now
    register: os_net_config_upgrade
  tags: step3
  - step|int == 3
  - not os_net_config_need_upgrade.stdout and os_net_config_has_config.rc == 0

UC - Run playbook

With the config present on the undercloud we can finally start the FFU upgrade using the following command line:

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \

OC - Verification

Once the FFU upgrade is complete we can verify that Keystone is functional in the overcloud with a few simple commands:

$ ssh -F $WD/ssh.config.ansible undercloud
$ . overcloudrc
$ openstack endpoint list
| ID                               | Region    | Service Name | Service Type | Enabled | Interface | URL                        |
| 15fd404ff8c14971b4251b81624edab8 | regionOne | keystone     | identity     | True    | admin     | |
| 2e513f5fdfc140ec916b081b47a2b8f7 | regionOne | keystone     | identity     | True    | internal  |    |
| 96980f0f9ac44c718c038ef54af814bc | regionOne | keystone     | identity     | True    | public    |       |
$ openstack service list
| ID                               | Name       | Type     |
| 3fc546421e9048f39b2b847b13fa8ea5 | keystone   | identity |
| 7f819190dc6f44d8b995021277b24d67 | ceilometer | metering |

We can also log into the overcloud-controller-0 host and verify that the relevant containers are running:

$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID        IMAGE                                                                COMMAND                  CREATED              STATUS                          PORTS               NAMES
4f40f0cf98aa   "/bin/bash -c '/usr/l"   About a minute ago   Up About a minute                                   keystone_cron
0b9d5cc17f5d   "kolla_start"            About a minute ago   Up About a minute (healthy)                         keystone
db967d899aaf    "kolla_start"            About a minute ago   Up About a minute (unhealthy)                       mysql
1f0b9aa72ec7   "kolla_start"            2 minutes ago        Restarting (1) 29 seconds ago                       rabbitmq
8e689f5bac22    "kolla_start"            2 minutes ago        Up 2 minutes                                        haproxy

As I said at the start this is a very rough demo that we can hopefully clean up and iterate on quickly over the coming weeks. The current goal is to have another working demo available by M2 that covers all of the required services to upgrade the computes so we can also start verification of the data plane during the upgrade.

Read More

Openstack TripleO FFU Getting started

This post will be a living document where I will detail how TripleO developers can initially provision and iterate quickly while working on service upgrade tasks the new fast-forward upgrade feature for in TripleO Queens.

Initial environment

This section details how to configure the initial environment with specific undercloud (UC) and overcloud (OC) versions and layouts using tripleo-quickstart.

Newton UC & OC

This basic combnination is required for end to end testing of fast forward upgrades:

$ bash -R newton $VIRTHOST

Note however that the following changes are required so that vbmc is used by the undercloud instead of pxe_ssh (removed in Pike):

Master UC & Newton OC

This combination is a useful starting point for developers looking to work on tripleo-heat-templates changes for a given service:

$ bash -R master-undercloud-newton-overcloud $VIRTHOST

The master-undercloud-newton-overcloud release config is introduced by the following change:

Master UC & Newton OC with only Keystone deployed

$ bash -w $WD -t all -R master-undercloud-newton-overcloud  \
   -c config/general_config/keystone-only.yml \
   -N config/nodes/1ctlr.yml $VIRTHOST

Read More

OpenStack - Fast-forward upgrades - Report

My thanks again to everyone who attended and contributed to the skip-level upgrades track over the first two days of last weeks PTG. I’ve included a short summary of our discussions below with a list of agreed actions for Queens at the end.

tl;dr s/skip-level/fast-forward/g


During our first session we briefly discussed the history of the skip-level upgrades effort within the community and the various misunderstandings that have arisen from previous conversations around this topic at past events.

We agreed that at present the only way to perform upgrades between N and N+>=2 releases of OpenStack was to upgrade linearly through each major release, without skipping between the starting and target release of the upgrade.

This is contrary to previous discussions on the topic where it had been suggested that releases could be skipped if DB migrations for these releases were applied in bulk later in the process. As projects within the community currently offer no such support for this it was agreed to continue to use the supported N to N+1 upgrade jumps, albeit in a minimal, offline way.

The name skip-level upgrades has had an obvious role to play in the confusion here and as such the renaming of this effort was discussed at length. Various suggestions are listed on the pad but for the time being I’m going to stick with the basic fast-forward upgrades name (FFOU, OFF, BOFF, FFUD etc were all close behind). This removes any notion of releases being skipped and should hopefully avoid any further confusion in the future.

Support by the projects for offline upgrades was then discussed with a recent Ironic issue highlighted as an example where projects have required services to run before the upgrade could be considered complete. The additional requirement of ensuring both workloads and the data plane remain active during the upgrade was also then discussed. It was agreed that both the supports-upgrades and supports-accessible-upgrades tags should be updated to reflect these requirements for fast-forward upgrades.

Given the above it was agreed that this new definition of what fast-forward upgrades are and the best practices associated with them should be clearly documented somewhere. Various operators in the room highlighted that they would like to see a high level document outline the steps required to achieve this, hopefully written by someone with past experience of running this type of upgrade.

I failed to capture the names of the individuals who were interested in helping out here. If anyone is interested in helping out here please feel free to add your name to the actions either at the end of this mail or at the bottom of the pad.

In the afternoon we reviewed the current efforts within the community to implement fast-forward upgrades, covering TripleO, Charms (Juju) and openstack-ansible. While this was insightful to many in the room there didn’t appear to be any obvious areas of collaboration outside of sharing best practice and defining the high level flow of a fast-forward upgrade.


Tuesday started with a discussion around NFV considerations with fast-forward upgrades. These ranged from the previously mentioned need for the data plane to remain active during the upgrade to the restricted nature of upgrades in NFV environments in terms of time and number of reboots.

It was highlighted that there are some serious as yet unresolved bugs in Nova regarding the live migration of instances using SR-IOV devices. This currently makes the moving of workloads either prior to or during the upgrade particularly difficult.

Rollbacks were also discussed and the need for any best practice documentation around fast-forward upgrades to include steps to allow the recovery of environments if things fail was also highlighted.

We then revisited an idea from the first day of finding or creating a SIG for this effort to call home. It was highlighted that there was a suggestion in the packaging room to create a Deployment / Lifecycle SIG. After speaking with a few individuals later in the week I’ve taken the action to reach out on the openstack-sigs mailing list for further input.

Finally, during a brief discussion on ways we could collaborate and share tooling for fast-forward upgrades a new tool to migrate configuration files between N to N+>=2 releases was introduced. While interesting it was seen as a more generic utility that could also be used between N to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after this session ended to gain more feedback from that team.


  • Modify the supports-upgrades and supports-accessible-upgrades tags

    I have yet to look into the formal process around making changes to these tags but I will aim to make a start ASAP.

  • Find an Ops lead for the documentation effort

    I failed to take down the names of some of the operators who were talking this through at the time. If they or anyone else is still interested in helping here please let me know!

  • Find or create a relevant SIG for this effort

    As discussed above this could be as part of the lifecycle SIG or an independent upgrades SIG. Expect a separate mail to the SIG list regarding this shortly.

  • Identify a room chair for Sydney

    Unfortunately I will not be present in Sydney to lead a similar session. If anyone is interested in helping please feel free to respond here or reach out to me directly!

My thanks again to everyone who attended the track, I had a blast leading the room and hope that the attendees found both the track and some of the outcomes listed above useful.

Read More

16 bridges charity walk - We did it!

I’m finally back from a work trip to the US and wanted share that we completed the 16 bridges 15 bridges (as the Golden Jubilee Bridge(s) remain closed) in just over 5 hours 10 days ago!

Again, our thanks to everyone who donated, it’s going to a wonderful charity and will hopefully make a difference to the lives of people living with cardiomyopathy!

I’ve already started looking into similar walks we could take part in next year, with an obvious candidate of the Wye Valley Challenge being most likely at the moment. The full 100km version from Chepstow to Hereford might prove slightly too much for a novice like me however there are shorter, 45km versions ending in Hereford.

Read More

OpenStack - Skip level upgrades - PTG


A short reminder that I’ll be chairing the skip-level upgrades room at next week’s OpenStack PTG in Denver. So far ~15 of you have shown interest in this track on the etherpad so I’m looking forward to some useful discussions over the two days. For now we still have available slots so if you do have suggestions please feel free to add them directly on the pad!

At present the agenda for the room (Durango, Atrium level) looks like this:


  • 09:00 - 10:00 - #####
  • 10:00 - 10:30 - Retrospective of what was discussed in Boston, outcomes, etc.
  • 10:30 - 11:00 - Have operator requirements changed since Boston?
  • 11:00 - 14:00 - #####
  • 14:00 - 16:00 - What efforts (if any) are underway to enable skip level upgrades within the community?
  • 16:00 - 18:00 - #####


  • 09:00 - 10:30 - #####
  • 10:30 - 11:00 - NFV considerations
  • 11:00 - 11:30 - API versions control
  • 11:30 - 14:00 - #####
  • 14:00 - 16:00 - How can we collaborate and share tools for skip level upgrades within the community?
  • 16:00 - 18:00 - Should we think about a different way of releasing?


Later in the week I will also be participating in the TripleO track, with a session on Thursday to discuss my WIP skip-level upgrade spec. I’ll be working on this during the week leading up to this session so feel free to review this ahead of time or just grab me in the hallway for a chat if this is something that interests you!

Read More

OpenStack - Skip level upgrades - Introduction

I’ve been fortunate enough to be part of a team looking into skip level upgrades recently ahead of the start of the Queens development cycle for OpenStack. What follows is an introduction to the concept of skip level upgrades and an overview of our initial PoC work in this area. Future posts will also cover our plans for enabling skip level upgrades within TripleO and possible work with the wider community to enable this within other deployment tools.


Skip level upgrades are as the name suggests, upgrades that move an environment from release N to N+X in a single step, where X is greater than 1 and for skip level upgrades is typically 3. For example in the context of OpenStack N to N+3 can refer to an upgrade from the Newton release of Openstack to the Queens release, skipping Ocata and Pike:

Newton    Ocata     Pike       Queens
+-----+   +-----+   +-----+    +-----+
|     |   | N+1 |   | N+2 |    |     |
|  N  | ---------------------> | N+3 |
|     |   |     |   |     |    |     |
+-----+   +-----+   +-----+    +-----+

There are existing alternative methods available for skipping a number of releases during an upgrade. For example, parallel cloud migration is a commonly cited alternative. This is where an additional environment is stood up alongside the original, with workloads migrated to the new environment:

        |     |
env#1   |  N  |
        |     |
           \       Queens
            \      +-----+
             \     |     |
env#2         `->  | N+3 |
                   |     |

The requirement for this type of upgrade is driven by users looking to standardise on a given release (typically LTS), whilst retaining the ability to skip forward when the release hits EOL. This negates the need to keep up with the major release cycle that in the case of OpenStack continues to be every 6 months.

It is worth highlighting that the topic of skip level upgrades is not new to the OpenStack community, with attempts to provide skip level upgrade functionality within the community before now, typically within the various deployment projects. For example openstack-ansible’s leap-upgrades project that attempted to move environments between Juno/Kilo and Newton.

More recently the topic of skip level upgrades was discussed at the OpenStack Forum in Boston in May. A RFC thread was also posted to the development mailing list, however no formal actions came of either discussion. I’m looking to restart this discussion at the next PTG in Denver, more on that later.


Now that we understand what skip level upgrades actually are, it’s time to set out some basic requirements for the state of the environment during the upgrade. At the start of this process our team sat down and drafted the following:

  • The control plane is inaccessible for the duration of the upgrade
  • The upgrade must complete successfully or rollback within 4 hours
  • The data plane and workloads must remain available for the duration of the upgrade.

Proof of concept

With the requirements set out, our first real task was to prove that this was even possible with an OpenStack environment. Given the releases available at the time, we began by manually upgrading an existing Mitaka based RHOSP 9 environment running on RHEL 7.3 to our recently released Ocata based RHOSP 11 release running on RHEL 7.4.

Mitaka    Newton    Ocata
+-----+   +-----+   +-----+
|     |   | N+1 |   |     |
|  N  | ----------> | N+2 |
|     |   |     |   |     |
+-----+   +-----+   +-----+
RHEL73              RHEL74

We were aware that whilst the goal of skip level upgrades is to give the impression of a single jump between releases, in practice this isn’t possible with OpenStack. Upgrades of OpenStack components are verified by the community across N to N+1 jumps, so whilst we wanted to skip ahead to Ocata we knew we would also have to upgrade through Newton to get there.

The following outlines, at a very high level, the steps we followed during the PoC to upgrade the environment from Mitaka to Ocata:

  • Rolling minor update of the underlying OS
  • Disable control plane and compute services
  • Upgrade a single controller to N+1 and then N+2
    • Update packages
    • Introduce new services as required (nova-placement for example)
    • Update service configuration files
    • Run DB syncs, migrations etc
    • Repeat for N+2
  • Upgrade remaining controllers directly to N+2
    • Update packages
    • Introduce new services as required
    • Update service configuration files
  • Upgrade remaining hosts to N+2
    • Update packages
    • Update service configuration files
  • Enable control plane and compute services
  • Verify workload availability during upgrade
  • Validate the post upgrade environment.

Let’s take a look at each of these steps below in more detail.

Rolling minor update of the underlying OS

This initial rolling minor update moved hosts from RHEL 7.3 to RHEL 7.4 whilst also pulling in OVS from our RHOSP11 repos in a bid to limit the number of reboots required in the environment. In practise operators could perform this minor update well ahead of any skip level upgrade, reducing any impact on the overall time required for the upgrade itself.

Disable control plane and compute services

As listed above under requirements, a full control plane outage is accounted for during the skip level upgrade. Note that this does not include the infrastructure services providing the database, messaging queues etc. Compute services are also stopped at this time but should not have any impact on the running workloads and data plane.

Upgrade a single controller to N+1 and then N+2

The main work of upgrading between releases is carried out on a single controller. Packages are updated, new services such as nova-placement are deployed as required, configuration files updated and DB migrations completed. This process is repeated on this host until we reach the target release.

Upgrade remaining controllers directly to N+2

Once the single controller has been upgraded to our target release we then skip any remaining controllers ahead to this target release. Updating packages, introducing new services and updating configuration files on these controllers as required.

Upgrade remaining hosts to N+2

This is then repeated for any remaining hosts, such as computes, object storage hosts etc. Again this should not interrupt running workloads or the data plane.

Enable control plane and compute services

Once all hosts are updated to the target release the control and compute services are restarted.

Verify workload availability during upgrade

During our PoC we ran multiple instances across various L2 and L3 networks, using Ansible to first launch and then later collect the results of asynchronous jobs (ping, ssh etc) that had been running between these instances during the upgrade.

Validate the post upgrade environment

Finally Tempest was used to validate the end state of the environment post upgrade.

After many iterations, the eventual introduction of Ansible to automate all the things and much cursing at the amount of time to reconfigure the environment after each run it was agreed that the team would move on to look into the possible implementation of the above in TripleO.


Thanks for making it this far! Before I end this post I wanted to highlight that I’ve recently agreed to lead the skip level upgrades room at the upcoming Denver PTG. I’ll be posting more details on this shortly to the OpenStack development mailing list but wanted to take this opportunity to encourage anyone interested in this topic to attend and discuss possible ways we can make skip level upgrades a possibility across the various deployment tools within the community.

Read More