OpenStack - Fast-forward upgrades - Report

2017-09-20

Comments

http://lists.openstack.org/pipermail/openstack-dev/2017-September/122347.html

My thanks again to everyone who attended and contributed to the skip-level upgrades track over the first two days of last weeks PTG. I’ve included a short summary of our discussions below with a list of agreed actions for Queens at the end.

tl;dr s/skip-level/fast-forward/g

https://etherpad.openstack.org/p/queens-PTG-skip-level-upgrades

Monday

During our first session we briefly discussed the history of the skip-level upgrades effort within the community and the various misunderstandings that have arisen from previous conversations around this topic at past events.

We agreed that at present the only way to perform upgrades between N and N+>=2 releases of OpenStack was to upgrade linearly through each major release, without skipping between the starting and target release of the upgrade.

This is contrary to previous discussions on the topic where it had been suggested that releases could be skipped if DB migrations for these releases were applied in bulk later in the process. As projects within the community currently offer no such support for this it was agreed to continue to use the supported N to N+1 upgrade jumps, albeit in a minimal, offline way.

The name skip-level upgrades has had an obvious role to play in the confusion here and as such the renaming of this effort was discussed at length. Various suggestions are listed on the pad but for the time being I’m going to stick with the basic fast-forward upgrades name (FFOU, OFF, BOFF, FFUD etc were all close behind). This removes any notion of releases being skipped and should hopefully avoid any further confusion in the future.

Support by the projects for offline upgrades was then discussed with a recent Ironic issue highlighted as an example where projects have required services to run before the upgrade could be considered complete. The additional requirement of ensuring both workloads and the data plane remain active during the upgrade was also then discussed. It was agreed that both the supports-upgrades and supports-accessible-upgrades tags should be updated to reflect these requirements for fast-forward upgrades.

Given the above it was agreed that this new definition of what fast-forward upgrades are and the best practices associated with them should be clearly documented somewhere. Various operators in the room highlighted that they would like to see a high level document outline the steps required to achieve this, hopefully written by someone with past experience of running this type of upgrade.

I failed to capture the names of the individuals who were interested in helping out here. If anyone is interested in helping out here please feel free to add your name to the actions either at the end of this mail or at the bottom of the pad.

In the afternoon we reviewed the current efforts within the community to implement fast-forward upgrades, covering TripleO, Charms (Juju) and openstack-ansible. While this was insightful to many in the room there didn’t appear to be any obvious areas of collaboration outside of sharing best practice and defining the high level flow of a fast-forward upgrade.

Tuesday

Tuesday started with a discussion around NFV considerations with fast-forward upgrades. These ranged from the previously mentioned need for the data plane to remain active during the upgrade to the restricted nature of upgrades in NFV environments in terms of time and number of reboots.

It was highlighted that there are some serious as yet unresolved bugs in Nova regarding the live migration of instances using SR-IOV devices. This currently makes the moving of workloads either prior to or during the upgrade particularly difficult.

Rollbacks were also discussed and the need for any best practice documentation around fast-forward upgrades to include steps to allow the recovery of environments if things fail was also highlighted.

We then revisited an idea from the first day of finding or creating a SIG for this effort to call home. It was highlighted that there was a suggestion in the packaging room to create a Deployment / Lifecycle SIG. After speaking with a few individuals later in the week I’ve taken the action to reach out on the openstack-sigs mailing list for further input.

Finally, during a brief discussion on ways we could collaborate and share tooling for fast-forward upgrades a new tool to migrate configuration files between N to N+>=2 releases was introduced. While interesting it was seen as a more generic utility that could also be used between N to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after this session ended to gain more feedback from that team.

Actions

Modify the supports-upgrades and supports-accessible-upgrades tags

I have yet to look into the formal process around making changes to these tags but I will aim to make a start ASAP.
Find an Ops lead for the documentation effort

I failed to take down the names of some of the operators who were talking this through at the time. If they or anyone else is still interested in helping here please let me know!
Find or create a relevant SIG for this effort

As discussed above this could be as part of the lifecycle SIG or an independent upgrades SIG. Expect a separate mail to the SIG list regarding this shortly.
Identify a room chair for Sydney

Unfortunately I will not be present in Sydney to lead a similar session. If anyone is interested in helping please feel free to respond here or reach out to me directly!

My thanks again to everyone who attended the track, I had a blast leading the room and hope that the attendees found both the track and some of the outcomes listed above useful.

Monday

Tuesday

Actions

Contents