OpenStack TripleO FFU M2 progress report
http://lists.openstack.org/pipermail/openstack-dev/2017-December/125321.html
This is a brief progress report from the Upgrades squad for the fast-forward upgrades (FFU) feature in TripleO, introducing N to Q upgrades.
tl;dr Good initial progress, missed M2 goal of nv CI jobs, pushing on to M3.
Overview
For anyone unfamiliar with the concept of fast-forward upgrades the following sentence from the spec gives a brief high level introduction:
> Fast-forward upgrades are upgrades that move an environment from release `N`
> to `N+X` in a single step, where `X` is greater than `1` and for fast-forward
> upgrades is typically `3`.
The spec itself obviously goes into more detail and I’d recommend anyone wanting to know more about our approach for FFU in TripleO to start there:
https://specs.openstack.org/openstack/tripleo-specs/specs/queens/fast-forward-upgrades.html
Note that the spec is being updated at present by the following change, introducing more details on the FFU task layout, ordering, dependency on the on-going major upgrade rework in Q, canary compute validation etc:
WIP ffu: Spec update for M2 https://review.openstack.org/#/c/526353/
M2 Status
The original goal for Queens M2 was to have one or more non-voting FFU jobs deployed somewhere able to run through the basic undercloud and overcloud upgrade workflows, exercising as many compute service dependencies as we could up to and including Nova. Unfortunately while Sofer has made some great progress with this we do not have any running FFU jobs at present:
http://lists.openstack.org/pipermail/openstack-dev/2017-December/125316.html
We do however have documented demos that cover FFU for some limited overcloud environments from Newton to Queens:
OpenStack TripleO FFU Keystone Demo N to Q https://blog.yarwood.me.uk/2017/11/16/openstack_fastforward_tripleo_keystone/
OpenStack TripleO FFU Nova Demo N to Q https://blog.yarwood.me.uk/2017/12/01/openstack_fastforward_tripleo_nova/
These demos currently use a stack of changes against THT with the first ~4 or so changes introducing the FFU framework:
FWIW getting these initial changes merged would help avoid the current change storm every time this series is rebased to pick up upgrade or deploy related bug fixes.
Also note that the demos currently use the raw Ansible playbooks stack outputs to run through the FFU tasks, upgrade tasks and deploy tasks. This is by no means what the final UX will be, with python-tripleoclient and workflow work to be completed ahead of M3.
M3 Goals
The squad will be focusing on the following goals for M3:
- Non-voting RDO CI jobs defined and running
- FFU THT changes tested by the above jobs and merged
- python-tripleoclient & required Mistral workflows merged
- Use of ceph-ansible for Ceph upgrades
- Draft developer and user docs under review
FFU squad
Finally, a quick note to highlight that this report marks the end of my own personal involvement with the FFU feature in TripleO. I’m not going far, returning to work on Nova and happy to make time to talk about and review FFU related changes etc. The members of the upgrade squad taking this forward and your main points of contact for FFU in TripleO will be:
- Sofer (chem)
- Lukas (social)
- Marios (marios)
My thanks again to Sofer, Lukas, Marios, the rest of the upgrade squad and wider TripleO community for your guidance and patience when putting up with my constant inane questioning regarding FFU over the past few months!
OpenStack TripleO FFU Nova Demo N to Q
Update 04/12/17 : The initial deployment documented in this demo no longer works due to the removal of a number of plan migration steps that have now been promoted into the Queens repos. We are currently looking into ways to reintroduce these for use in master UC Newton OC FFU development deployments, until then anyone attempting to run through this demo should start with a Newton OC and UC before upgrading the UC to master.
This is another TripleO fast-forward upgrade demo post, this time focusing on a basic stack of Keystone, Glance, Cinder, Neutron and Nova. At present there are several workarounds still required to allow the upgrade to complete, please see the workaround sections for more details.
Environment
As with the original demo I’m still using tripleo-quickstart to deploy my initial environment, this time with 1 controller and 1 compute, with a Queens undercloud and Newton overcloud. In addition I’m also using a new general config to deploy a minimal control stack able to host Nova.
$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud \
-c config/general_config/minimal-nova.yml $VIRTHOST
UC - docker_registry.yaml
Again with this demo we are not caching containers locally, the following
command will create a docker_registry.yaml
file referencing the RDO registry
for use during the final deployment of the overcloud to Queens:
$ ssh -F $WD/ssh.config.ansible undercloud
$ openstack overcloud container image prepare \
--namespace trunk.registry.rdoproject.org/master \
--tag tripleo-ci-testing \
--output-env-file ~/docker_registry.yaml
UC - tripleo-heat-templates
We then need to update the version of tripleo-heat-templates deployed on the
undercloud
host:
$ ssh -F $WD/ssh.config.ansible undercloud
$ cd /home/stack/tripleo-heat-templates
$ git fetch git://git.openstack.org/openstack/tripleo-heat-templates refs/changes/19/518719/9 && git checkout FETCH_HEAD
Finally, as we are using a customised controller role the following services
need to be added to the overcloud_services.yml
file on the undercloud
node
under ControllerServices
:
parameter_defaults:
ControllerServices:
[..]
- OS::TripleO::Services::Docker
- OS::TripleO::Services::Iscsid
- OS::TripleO::Services::NovaPlacement
UC - tripleo-common
At present we are waiting for a promotion of tripleo-common that includes various bugfixes when updating the overcloud stack, generating outputs etc. For the time being we can simply install directly from master to workaround these issues.
$ ssh -F $WD/ssh.config.ansible undercloud
$ git clone https://github.com/openstack/tripleo-common.git ; cd tripleo-common
$ sudo python setup.py install ; cd ~
OC - Update heat-agents
As documented in my previous demo
post
we need to remove any legacy heiradata from all overcloud
hosts prior to
updating the heat stack:
$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml \
/usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles \
/etc/puppet/hieradata/*.yaml
We also need to update the heat-agents on all nodes to their Ocata versions:
$ git clone https://github.com/openstack/tripleo-repos.git ; cd tripleo-repos
$ sudo python setup.py install
$ sudo tripleo-repos -b ocata current
$ sudo yum update -y python-heat-agent \
python-heat-agent-puppt
$ sudo yum install -y openstack-heat-agents \
python-heat-agent-ansible \
python-heat-agent-apply-config \
python-heat-agent-docker-cmd \
python-heat-agent-hiera \
python-heat-agent-json-file
OC - Workarounds #1
$ sudo yum remove openstack-ceilometer* -y
UC - Update stack outputs
With the workarounds in place we can now update the stack using the updated version of tripleo-heat-templates on the undercloud. Once again we need to use the original deploy command with a number of additional environment files included:
$ . stackrc
$ openstack overcloud deploy \
--templates /home/stack/tripleo-heat-templates \
[..]
-e /home/stack/docker_registry.yaml \
-e /home/stack/tripleo-heat-templates/environments/docker.yaml \
-e /home/stack/tripleo-heat-templates/environments/fast-forward-upgrade.yaml \
-e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml
UC - Download config
Once the stack has been updated we can download the config with the following command:
$ . stackrc
$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-Oalkee-config
UC - FFU and Upgrade plays
Before running through any of the generated playbooks I personally like to add
the profile_tasks
callback to the callback_whitelist
for Ansible within
/etc/ansible/ansible.cfg
. This provides timestamps during the playbook run
and a summary of the slowest tasks at the end.
# enable callback plugins, they can output to stdout but cannot be 'stdout' type.
callback_whitelist = profile_tasks
We first run the fast_forward_upgrade_playbook
to complete the upgrade to Pike:
$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
/home/stack/config/tripleo-jUY9FB-config/fast_forward_upgrade_playbook.yaml
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11 : ok=62 changed=8 unreachable=0 failed=0
192.168.24.16 : ok=123 changed=55 unreachable=0 failed=0
Friday 01 December 2017 20:39:58 +0000 (0:00:03.967) 0:06:16.615 *******
===============================================================================
Stop neutron_server ------------------------------------------------------------------------------------------------------------ 32.53s
stop openstack-cinder-volume --------------------------------------------------------------------------------------------------- 16.03s
Stop neutron_l3_agent ---------------------------------------------------------------------------------------------------------- 14.36s
Stop and disable nova-compute service ------------------------------------------------------------------------------------------ 13.16s
Cinder package update ---------------------------------------------------------------------------------------------------------- 12.73s
stop openstack-cinder-scheduler ------------------------------------------------------------------------------------------------ 12.68s
Setup cell_v2 (sync nova/cell DB) ---------------------------------------------------------------------------------------------- 11.79s
Cinder package update ---------------------------------------------------------------------------------------------------------- 11.30s
Neutron package update --------------------------------------------------------------------------------------------------------- 10.99s
Keystone package update -------------------------------------------------------------------------------------------------------- 10.77s
glance package update ---------------------------------------------------------------------------------------------------------- 10.28s
Keystone package update --------------------------------------------------------------------------------------------------------- 9.80s
glance package update ----------------------------------------------------------------------------------------------------------- 8.72s
Neutron package update ---------------------------------------------------------------------------------------------------------- 8.62s
Stop and disable nova-consoleauth service --------------------------------------------------------------------------------------- 7.94s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.62s
Update nova packages ------------------------------------------------------------------------------------------------------------ 7.24s
Stop and disable nova-scheduler service ----------------------------------------------------------------------------------------- 6.36s
Run puppet apply to set tranport_url in nova.conf ------------------------------------------------------------------------------- 5.78s
install tripleo-repos ----------------------------------------------------------------------------------------------------------- 4.70s
We then run the upgrade_steps_playbook
to start the upgrade to Queens:
$ . stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
/home/stack/tripleo-Oalkee-config/upgrade_steps_playbook.yaml
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11 : ok=57 changed=45 unreachable=0 failed=0
192.168.24.16 : ok=165 changed=146 unreachable=0 failed=0
Friday 01 December 2017 20:51:55 +0000 (0:00:00.038) 0:10:47.865 *******
===============================================================================
Update all packages ----------------------------------------------------------------------------------------------------------- 263.71s
Update all packages ----------------------------------------------------------------------------------------------------------- 256.79s
Install docker packages on upgrade if missing ---------------------------------------------------------------------------------- 13.77s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.71s
Upgrade os-net-config ----------------------------------------------------------------------------------------------------------- 5.12s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.36s
Install docker packages on upgrade if missing ----------------------------------------------------------------------------------- 3.14s
Stop and disable mysql service -------------------------------------------------------------------------------------------------- 1.97s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.66s
Check for os-net-config upgrade ------------------------------------------------------------------------------------------------- 1.57s
Stop keepalived service --------------------------------------------------------------------------------------------------------- 1.48s
Stop and disable rabbitmq service ----------------------------------------------------------------------------------------------- 1.47s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.31s
take new os-net-config parameters into account now ------------------------------------------------------------------------------ 1.08s
Check if openstack-ceilometer-compute is deployed ------------------------------------------------------------------------------- 0.70s
Check if iscsid service is deployed --------------------------------------------------------------------------------------------- 0.67s
Start keepalived service -------------------------------------------------------------------------------------------------------- 0.48s
Check for nova placement running under apache ----------------------------------------------------------------------------------- 0.46s
Stop and disable mongodb service on upgrade ------------------------------------------------------------------------------------- 0.45s
remove old cinder cron jobs ----------------------------------------------------------------------------------------------------- 0.45s
OC - Workarounds #2
On overcloud-novacompute-0
the following file needs to be removed to
workaround a known issue:
$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo rm /etc/iscsi/.initiator_reset
UC - Deploy play
Finally we run through the deploy_steps_playbook
:
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
/home/stack/tripleo-Oalkee-config/deploy_steps_playbook.yaml
[..]
PLAY RECAP *****************************************************************************************************************************
192.168.24.11 : ok=48 changed=11 unreachable=0 failed=0
192.168.24.16 : ok=76 changed=10 unreachable=0 failed=0
localhost : ok=1 changed=0 unreachable=0 failed=0
Friday 01 December 2017 21:04:58 +0000 (0:00:00.041) 0:10:24.723 *******
===============================================================================
Run docker-puppet tasks (generate config) ------------------------------------------------------------------------------------- 186.65s
Run docker-puppet tasks (bootstrap tasks) ------------------------------------------------------------------------------------- 101.10s
Start containers for step 3 ---------------------------------------------------------------------------------------------------- 98.61s
Start containers for step 4 ---------------------------------------------------------------------------------------------------- 41.37s
Run puppet host configuration for step 1 --------------------------------------------------------------------------------------- 32.53s
Start containers for step 1 ---------------------------------------------------------------------------------------------------- 25.76s
Run puppet host configuration for step 5 --------------------------------------------------------------------------------------- 17.91s
Run puppet host configuration for step 4 --------------------------------------------------------------------------------------- 14.47s
Run puppet host configuration for step 3 --------------------------------------------------------------------------------------- 13.41s
Run docker-puppet tasks (bootstrap tasks) -------------------------------------------------------------------------------------- 10.39s
Run puppet host configuration for step 2 --------------------------------------------------------------------------------------- 10.37s
Start containers for step 5 ---------------------------------------------------------------------------------------------------- 10.12s
Run docker-puppet tasks (bootstrap tasks) --------------------------------------------------------------------------------------- 9.78s
Start containers for step 2 ----------------------------------------------------------------------------------------------------- 6.32s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 4.37s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 3.46s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.80s
create libvirt persistent data directories -------------------------------------------------------------------------------------- 1.21s
Write the config_step hieradata ------------------------------------------------------------------------------------------------- 1.03s
Check if /var/lib/docker-puppet/docker-puppet-tasks4.json exists ---------------------------------------------------------------- 1.00s
Verification
I’ll revisit this in the coming days and add a more complete set of tasks to verify the end environment but for now we can run a simple boot from volume instance (as Swift, the default store for Glance was not installed):
$ cinder create 1
$ cinder set-bootable 46d278f7-31fc-4e45-b5df-eb8220800b1a true
$ nova flavor-create 1 1 512 1 1
$ nova boot --boot-volume 46d278f7-31fc-4e45-b5df-eb8220800b1a --flavor 1 test
[..]
$ nova list
+--------------------------------------+------+--------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+-------------------+
| 05821616-1239-4ca9-8baa-6b0ca4ea3a6b | test | ACTIVE | - | Running | priv=192.168.0.16 |
+--------------------------------------+------+--------+------------+-------------+-------------------+
We can also see the various containerised services running on the overcloud:
$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d80d6f072604 trunk.registry.rdoproject.org/master/centos-binary-glance-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 12 minutes (healthy) glance_api
61fbf47241ce trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes nova_metadata
9defdb5efe0f trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) nova_api
874716d99a44 trunk.registry.rdoproject.org/master/centos-binary-nova-novncproxy:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) nova_vnc_proxy
21ca0fd8d8ec trunk.registry.rdoproject.org/master/centos-binary-neutron-server:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes neutron_api
e0eed85b860a trunk.registry.rdoproject.org/master/centos-binary-cinder-volume:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) cinder_volume
0882e08ac198 trunk.registry.rdoproject.org/master/centos-binary-nova-consoleauth:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) nova_consoleauth
e3ebc4b066c9 trunk.registry.rdoproject.org/master/centos-binary-nova-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes nova_api_cron
c7d05a04a8a3 trunk.registry.rdoproject.org/master/centos-binary-cinder-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes cinder_api_cron
2f3c1e244997 trunk.registry.rdoproject.org/master/centos-binary-neutron-openvswitch-agent:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) neutron_ovs_agent
bfeb120bf77a trunk.registry.rdoproject.org/master/centos-binary-neutron-metadata-agent:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) neutron_metadata_agent
43b2c09aecf8 trunk.registry.rdoproject.org/master/centos-binary-nova-scheduler:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) nova_scheduler
a7a3024b63f6 trunk.registry.rdoproject.org/master/centos-binary-neutron-dhcp-agent:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) neutron_dhcp
3df990a68046 trunk.registry.rdoproject.org/master/centos-binary-cinder-scheduler:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) cinder_scheduler
94461ba833aa trunk.registry.rdoproject.org/master/centos-binary-neutron-l3-agent:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) neutron_l3_agent
4bee34f9fce2 trunk.registry.rdoproject.org/master/centos-binary-cinder-api:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes cinder_api
e8bec9348fe3 trunk.registry.rdoproject.org/master/centos-binary-nova-conductor:tripleo-ci-testing "kolla_start" 13 minutes ago Up 13 minutes (healthy) nova_conductor
22db40c25881 trunk.registry.rdoproject.org/master/centos-binary-keystone:tripleo-ci-testing "/bin/bash -c '/usr/l" 15 minutes ago Up 15 minutes keystone_cron
26769acaaf5e trunk.registry.rdoproject.org/master/centos-binary-keystone:tripleo-ci-testing "kolla_start" 16 minutes ago Up 16 minutes (healthy) keystone
99037a5e5c36 trunk.registry.rdoproject.org/master/centos-binary-iscsid:tripleo-ci-testing "kolla_start" 16 minutes ago Up 16 minutes iscsid
9f4aae72c201 trunk.registry.rdoproject.org/master/centos-binary-nova-placement-api:tripleo-ci-testing "kolla_start" 16 minutes ago Up 16 minutes nova_placement
311302abc297 trunk.registry.rdoproject.org/master/centos-binary-horizon:tripleo-ci-testing "kolla_start" 16 minutes ago Up 16 minutes horizon
d465e4f5b7e6 trunk.registry.rdoproject.org/master/centos-binary-mariadb:tripleo-ci-testing "kolla_start" 17 minutes ago Up 17 minutes (unhealthy) mysql
b9e062f1d857 trunk.registry.rdoproject.org/master/centos-binary-rabbitmq:tripleo-ci-testing "kolla_start" 18 minutes ago Up 18 minutes (healthy) rabbitmq
a57f053afc03 trunk.registry.rdoproject.org/master/centos-binary-memcached:tripleo-ci-testing "/bin/bash -c 'source" 18 minutes ago Up 18 minutes memcached
baeb6d1087e6 trunk.registry.rdoproject.org/master/centos-binary-redis:tripleo-ci-testing "kolla_start" 18 minutes ago Up 18 minutes redis
faafa1bf2d2e trunk.registry.rdoproject.org/master/centos-binary-haproxy:tripleo-ci-testing "kolla_start" 18 minutes ago Up 18 minutes haproxy
$ exit
$ ssh -F $WD/ssh.config.ansible overcloud-novacompute-0
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0363d7008e87 trunk.registry.rdoproject.org/master/centos-binary-neutron-openvswitch-agent:tripleo-ci-testing "kolla_start" 12 minutes ago Up 12 minutes (healthy) neutron_ovs_agent
c1ff23ee9f16 trunk.registry.rdoproject.org/master/centos-binary-cron:tripleo-ci-testing "kolla_start" 12 minutes ago Up 12 minutes logrotate_crond
d81d8207ec9a trunk.registry.rdoproject.org/master/centos-binary-nova-compute:tripleo-ci-testing "kolla_start" 12 minutes ago Up 12 minutes nova_migration_target
abd9b79e2af8 trunk.registry.rdoproject.org/master/centos-binary-ceilometer-compute:tripleo-ci-testing "kolla_start" 12 minutes ago Up 12 minutes ceilometer_agent_compute
aa581489ac9a trunk.registry.rdoproject.org/master/centos-binary-nova-compute:tripleo-ci-testing "kolla_start" 12 minutes ago Up 12 minutes (healthy) nova_compute
d4ade28175f0 trunk.registry.rdoproject.org/master/centos-binary-iscsid:tripleo-ci-testing "kolla_start" 14 minutes ago Up 14 minutes iscsid
ae4652853098 trunk.registry.rdoproject.org/master/centos-binary-nova-libvirt:tripleo-ci-testing "kolla_start" 14 minutes ago Up 14 minutes nova_libvirt
aac8fea2d496 trunk.registry.rdoproject.org/master/centos-binary-nova-libvirt:tripleo-ci-testing "kolla_start" 14 minutes ago Up 14 minutes nova_virtlogd
Conclusion
So in conclusion this demo takes a simple multi-host OpenStack deployment of Keystone, Glance, Cinder, Neutron and Nova from baremetal Newton to containerised Queens in ~26 minutes. There are many things still to resolve and validate with FFU but for now, ahead of M2 this is a pretty good start.
OpenStack TripleO FFU Keystone Demo N to Q
This post will introduce a very rough demo of the new TripleO Fast-forward Upgrades (FFU) feature, warts and all, using an overcloud with only Keystone deployed. This should prove to be a useful starting point for anyone interested in this feature and could even be an approach used for future per-service FFU CI jobs.
Environment
I’m currently using the tripleo-quickstart project to deploy virtualised test environments. For this demo I’m using the following command line to create the demo environment:
$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud \
-c config/general_config/keystone-only.yml \
-N config/nodes/1ctlr.yml $VIRTHOST
This is made possible by following unmerged changes to tripleo-quickstart:
https://review.openstack.org/#/q/topic:keystone_only_overcloud
Once deployed you should find the 10.0.3 Newton version of Keystone deployed on overcloud-controller-0:
$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
[..]
$ rpm -qi openstack-keystone
Name : openstack-keystone
Epoch : 1
Version : 10.0.3
Release : 0.20170726120406.bd49c3e.el7.centos
Architecture: noarch
Install Date: Fri 10 Nov 2017 04:24:46 AM UTC
Group : Unspecified
Size : 175014
License : ASL 2.0
Signature : (none)
Source RPM : openstack-keystone-10.0.3-0.20170726120406.bd49c3e.el7.centos.src.rpm
Build Date : Wed 26 Jul 2017 12:07:53 PM UTC
Build Host : n30.pufty.ci.centos.org
Relocations : (not relocatable)
URL : http://keystone.openstack.org/
Summary : OpenStack Identity Service
Description :
Keystone is a Python implementation of the OpenStack
(http://www.openstack.org) identity service API.
Before starting the upgrade I recommend that snapshots of the undercloud
and
overcloud-controller-0 libvirt domains are taken on the virthost:
$ ssh -F $WD/ssh.config.ansible virthost
$ for domain in $(virsh list | grep running | awk '{print $2 }'); do virsh snapshot-create-as ${domain} ${domain}_start ; done
UC - docker_registry.yaml
As with a normal container based deployment on >=Pike we will need a Docker registry file mapping each service to a container image. The following command will create this file, pointing to the offical RDO registry:
$ openstack overcloud container image prepare \
--namespace trunk.registry.rdoproject.org/master \
--tag tripleo-ci-testing \
--output-env-file ~/docker_registry.yaml
Note that this will result in the container images being pulled from the remote RDO registry during the upgrade. We can pre-cache these images on the undercloud to speed the process up. However as we are only using a single host and minimal number of services in this demo I have chosen to skip this for now.
UC - tripleo-heat-templates
FFU itself is controlled by an Ansible playbook using tasks that are contained within the tripleo-heat-templates (THT) project. The following gerrit topic lists all of the current FFU changes up for review:
For this demo we need to update the local copy of THT on the undercloud to include a subset of these changes:
$ cd /home/stack/tripleo-heat-templates
$ git fetch git://git.openstack.org/openstack/tripleo-heat-templates refs/changes/19/518719/2 && git checkout FETCH_HEAD
We also need the following
noop-deploy-steps.yaml
environment file that allows us to use openstack overcloud deploy
to update
the stack outputs of the overcloud without forcing an actual redeploy of any
resources:
$ curl https://git.openstack.org/cgit/openstack/tripleo-heat-templates/plain/environments/noop-deploy-steps.yaml?h=refs/changes/97/520097/1 > environments/noop-deploy-steps.yaml
Finally, as we have deployed a custom set of services for the Controller role we now have to ensure that the Docker service is added to the role prior to our upgrade:
$ cat overcloud_services.yaml
parameter_defaults:
ControllerServices:
- OS::TripleO::Services::Docker
- OS::TripleO::Services::Kernel
- OS::TripleO::Services::Keystone
- OS::TripleO::Services::RabbitMQ
- OS::TripleO::Services::MySQL
- OS::TripleO::Services::HAproxy
- OS::TripleO::Services::Keepalived
- OS::TripleO::Services::Ntp
- OS::TripleO::Services::Timezone
- OS::TripleO::Services::TripleoPackages
OC - Ocata heat-agents
An older os-apply-config hiera hook and any legacy hiera data needs to be removed from the overcloud prior to our upgrade. The following ML post has more details on this workaround:
http://lists.openstack.org/pipermail/openstack-dev/2017-January/110922.html
For the time being this isn’t part of the upgrade playbook and so we need to run the following commands that will update the heat-agents on the host to their Ocata versions and remove the legacy data:
$ sudo rm -f /usr/libexec/os-apply-config/templates/etc/puppet/hiera.yaml /usr/libexec/os-refresh-config/configure.d/40-hiera-datafiles /etc/puppet/hieradata/*.yaml
$ sudo yum install -y \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/openstack-heat-agents-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-ansible-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-apply-config-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-docker-cmd-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-hiera-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-json-file-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm \
https://trunk.rdoproject.org/centos7-ocata/current-tripleo/python-heat-agent-puppet-1.0.1-0.20170412210405.769d0de.el7.centos.noarch.rpm
OC - Remove ceilometer
At present there is a packaging issue when upgrading the openstack-ceilometer
packages directly from Newton to Queens. As these packages are installed by
default in the Newton overcloud-full image used to deploy the environment but
not used in our demo we can simply remove them for the time being:
$ sudo yum remove openstack-ceilometer* -y
UC - Update stack outputs
We can now use the openstack overcloud deploy
command to update the overcloud
stack and generate the new stack outputs, including the FFU playbook. To do
this we simply add the previously created docker_registry.yaml,
environments/docker.yaml and environments/noop-deploy-steps.yaml environment
files to the original command used to deploy the environment.
$ . stackrc
$ openstack overcloud deploy \
--templates /home/stack/tripleo-heat-templates \
[..]
-e /home/stack/docker_registry.yaml \
-e /home/stack/tripleo-heat-templates/environments/docker.yaml \
-e /home/stack/tripleo-heat-templates/environments/noop-deploy-steps.yaml
The original command is logged under ~/overcloud_deploy.log
on the
undercloud, for example:
$ grep openstack\ overcloud\ deploy overcloud_deploy.log
2017-11-16 14:36:11 | + openstack overcloud deploy --templates /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --block-storage-flavor oooq_blockstorage --swift-storage-flavor oooq_objectstorage --timeout 90 -e /home/stack/cloud-names.yaml -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /home/stack/tripleo-heat-templates/environments/low-memory-usage.yaml --validation-warnings-fatal -e /home/stack/overcloud_services.yaml --compute-scale 0 --ntp-server pool.ntp.org
UC - Download config
Now that the stack outputs have been updated we can download the overcloud config containing the FFU playbook onto the undercloud:
$ openstack overcloud config download
There is a known issue with the generated upgrade tasks at the moment where the ordering of conditionals causes Ansible to fail. To workaround this, simply edit the following Ansible tasks within the Controller/upgrade_tasks.yaml file to ensure the step conditional is always checked first:
- block:
- name: Upgrade os-net-config
yum: name=os-net-config state=latest
- changed_when: os_net_config_upgrade.rc == 2
command: os-net-config --no-activate -c /etc/os-net-config/config.json -v --detailed-exit-codes
failed_when: os_net_config_upgrade.rc not in [0,2]
name: take new os-net-config parameters into account now
register: os_net_config_upgrade
tags: step3
when:
- step|int == 3
- not os_net_config_need_upgrade.stdout and os_net_config_has_config.rc == 0
UC - Run playbook
With the config present on the undercloud we can finally start the FFU upgrade using the following command line:
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory \
/home/stack/tmp/fast_forward_upgrade_playbook.yaml
OC - Verification
Once the FFU upgrade is complete we can verify that Keystone is functional in the overcloud with a few simple commands:
$ ssh -F $WD/ssh.config.ansible undercloud
$ . overcloudrc
$ openstack endpoint list
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
| 15fd404ff8c14971b4251b81624edab8 | regionOne | keystone | identity | True | admin | http://192.168.24.10:35357 |
| 2e513f5fdfc140ec916b081b47a2b8f7 | regionOne | keystone | identity | True | internal | http://172.16.2.12:5000 |
| 96980f0f9ac44c718c038ef54af814bc | regionOne | keystone | identity | True | public | http://10.0.0.8:5000 |
+----------------------------------+-----------+--------------+--------------+---------+-----------+----------------------------+
$ openstack service list
+----------------------------------+------------+----------+
| ID | Name | Type |
+----------------------------------+------------+----------+
| 3fc546421e9048f39b2b847b13fa8ea5 | keystone | identity |
| 7f819190dc6f44d8b995021277b24d67 | ceilometer | metering |
+----------------------------------+------------+----------+
We can also log into the overcloud-controller-0
host and verify that the
relevant containers are running:
$ ssh -F $WD/ssh.config.ansible overcloud-controller-0
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4f40f0cf98aa 192.168.24.1:8787/master/centos-binary-keystone:tripleo-ci-testing "/bin/bash -c '/usr/l" About a minute ago Up About a minute keystone_cron
0b9d5cc17f5d 192.168.24.1:8787/master/centos-binary-keystone:tripleo-ci-testing "kolla_start" About a minute ago Up About a minute (healthy) keystone
db967d899aaf 192.168.24.1:8787/master/centos-binary-mariadb:tripleo-ci-testing "kolla_start" About a minute ago Up About a minute (unhealthy) mysql
1f0b9aa72ec7 192.168.24.1:8787/master/centos-binary-rabbitmq:tripleo-ci-testing "kolla_start" 2 minutes ago Restarting (1) 29 seconds ago rabbitmq
8e689f5bac22 192.168.24.1:8787/master/centos-binary-haproxy:tripleo-ci-testing "kolla_start" 2 minutes ago Up 2 minutes haproxy
As I said at the start this is a very rough demo that we can hopefully clean up and iterate on quickly over the coming weeks. The current goal is to have another working demo available by M2 that covers all of the required services to upgrade the computes so we can also start verification of the data plane during the upgrade.
16 bridges charity walk - certificate
I know, I know, another post about our 16 bridges walk. We just received our certificate from Cardiomyopathy UK showing that we raised a grand total of £635.00. My thanks again to everyone who donated!
Openstack TripleO FFU Getting started
This post will be a living document where I will detail how TripleO developers can initially provision and iterate quickly while working on service upgrade tasks the new fast-forward upgrade feature for in TripleO Queens.
Initial environment
This section details how to configure the initial environment with specific undercloud (UC) and overcloud (OC) versions and layouts using tripleo-quickstart.
Newton UC & OC
This basic combnination is required for end to end testing of fast forward upgrades:
$ bash quickstart.sh -R newton $VIRTHOST
Note however that the following changes are required so that vbmc is used by the undercloud instead of pxe_ssh (removed in Pike):
https://review.openstack.org/#/q/topic:allow_vbmc_newton
Master UC & Newton OC
This combination is a useful starting point for developers looking to work on tripleo-heat-templates changes for a given service:
$ bash quickstart.sh -R master-undercloud-newton-overcloud $VIRTHOST
The master-undercloud-newton-overcloud
release config is introduced by the
following change:
https://review.openstack.org/#/c/511464/
Master UC & Newton OC with only Keystone deployed
https://review.openstack.org/#/q/topic:keystone_only_overcloud
$ bash quickstart.sh -w $WD -t all -R master-undercloud-newton-overcloud \
-c config/general_config/keystone-only.yml \
-N config/nodes/1ctlr.yml $VIRTHOST
OpenStack - Fast-forward upgrades - Report
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122347.html
My thanks again to everyone who attended and contributed to the skip-level upgrades track over the first two days of last weeks PTG. I’ve included a short summary of our discussions below with a list of agreed actions for Queens at the end.
tl;dr s/skip-level/fast-forward/g
https://etherpad.openstack.org/p/queens-PTG-skip-level-upgrades
Monday
During our first session we briefly discussed the history of the skip-level upgrades effort within the community and the various misunderstandings that have arisen from previous conversations around this topic at past events.
We agreed that at present the only way to perform upgrades between N and N+>=2 releases of OpenStack was to upgrade linearly through each major release, without skipping between the starting and target release of the upgrade.
This is contrary to previous discussions on the topic where it had been suggested that releases could be skipped if DB migrations for these releases were applied in bulk later in the process. As projects within the community currently offer no such support for this it was agreed to continue to use the supported N to N+1 upgrade jumps, albeit in a minimal, offline way.
The name skip-level upgrades has had an obvious role to play in the confusion here and as such the renaming of this effort was discussed at length. Various suggestions are listed on the pad but for the time being I’m going to stick with the basic fast-forward upgrades name (FFOU, OFF, BOFF, FFUD etc were all close behind). This removes any notion of releases being skipped and should hopefully avoid any further confusion in the future.
Support by the projects for offline upgrades was then discussed with a recent Ironic issue highlighted as an example where projects have required services to run before the upgrade could be considered complete. The additional requirement of ensuring both workloads and the data plane remain active during the upgrade was also then discussed. It was agreed that both the supports-upgrades and supports-accessible-upgrades tags should be updated to reflect these requirements for fast-forward upgrades.
Given the above it was agreed that this new definition of what fast-forward upgrades are and the best practices associated with them should be clearly documented somewhere. Various operators in the room highlighted that they would like to see a high level document outline the steps required to achieve this, hopefully written by someone with past experience of running this type of upgrade.
I failed to capture the names of the individuals who were interested in helping out here. If anyone is interested in helping out here please feel free to add your name to the actions either at the end of this mail or at the bottom of the pad.
In the afternoon we reviewed the current efforts within the community to implement fast-forward upgrades, covering TripleO, Charms (Juju) and openstack-ansible. While this was insightful to many in the room there didn’t appear to be any obvious areas of collaboration outside of sharing best practice and defining the high level flow of a fast-forward upgrade.
Tuesday
Tuesday started with a discussion around NFV considerations with fast-forward upgrades. These ranged from the previously mentioned need for the data plane to remain active during the upgrade to the restricted nature of upgrades in NFV environments in terms of time and number of reboots.
It was highlighted that there are some serious as yet unresolved bugs in Nova regarding the live migration of instances using SR-IOV devices. This currently makes the moving of workloads either prior to or during the upgrade particularly difficult.
Rollbacks were also discussed and the need for any best practice documentation around fast-forward upgrades to include steps to allow the recovery of environments if things fail was also highlighted.
We then revisited an idea from the first day of finding or creating a SIG for this effort to call home. It was highlighted that there was a suggestion in the packaging room to create a Deployment / Lifecycle SIG. After speaking with a few individuals later in the week I’ve taken the action to reach out on the openstack-sigs mailing list for further input.
Finally, during a brief discussion on ways we could collaborate and share tooling for fast-forward upgrades a new tool to migrate configuration files between N to N+>=2 releases was introduced. While interesting it was seen as a more generic utility that could also be used between N to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after this session ended to gain more feedback from that team.
Actions
-
Modify the supports-upgrades and supports-accessible-upgrades tags
I have yet to look into the formal process around making changes to these tags but I will aim to make a start ASAP.
-
Find an Ops lead for the documentation effort
I failed to take down the names of some of the operators who were talking this through at the time. If they or anyone else is still interested in helping here please let me know!
-
Find or create a relevant SIG for this effort
As discussed above this could be as part of the lifecycle SIG or an independent upgrades SIG. Expect a separate mail to the SIG list regarding this shortly.
-
Identify a room chair for Sydney
Unfortunately I will not be present in Sydney to lead a similar session. If anyone is interested in helping please feel free to respond here or reach out to me directly!
My thanks again to everyone who attended the track, I had a blast leading the room and hope that the attendees found both the track and some of the outcomes listed above useful.
16 bridges charity walk - We did it!
I’m finally back from a work trip to the US and wanted share that we completed
the 16 bridges 15 bridges (as the Golden Jubilee Bridge(s) remain closed)
in just over 5 hours 10 days ago!
Again, our thanks to everyone who donated, it’s going to a wonderful charity and will hopefully make a difference to the lives of people living with cardiomyopathy!
I’ve already started looking into similar walks we could take part in next year, with an obvious candidate of the Wye Valley Challenge being most likely at the moment. The full 100km version from Chepstow to Hereford might prove slightly too much for a novice like me however there are shorter, 45km versions ending in Hereford.
OpenStack - Skip level upgrades - PTG
PTG
A short reminder that I’ll be chairing the skip-level upgrades room at next week’s OpenStack PTG in Denver. So far ~15 of you have shown interest in this track on the etherpad so I’m looking forward to some useful discussions over the two days. For now we still have available slots so if you do have suggestions please feel free to add them directly on the pad!
At present the agenda for the room (Durango, Atrium level) looks like this:
Monday
- 09:00 - 10:00 - #####
- 10:00 - 10:30 - Retrospective of what was discussed in Boston, outcomes, etc.
- 10:30 - 11:00 - Have operator requirements changed since Boston?
- 11:00 - 14:00 - #####
- 14:00 - 16:00 - What efforts (if any) are underway to enable skip level upgrades within the community?
- 16:00 - 18:00 - #####
Tuesday
- 09:00 - 10:30 - #####
- 10:30 - 11:00 - NFV considerations
- 11:00 - 11:30 - API versions control
- 11:30 - 14:00 - #####
- 14:00 - 16:00 - How can we collaborate and share tools for skip level upgrades within the community?
- 16:00 - 18:00 - Should we think about a different way of releasing?
TripleO
Later in the week I will also be participating in the TripleO track, with a
session on Thursday to
discuss my WIP
skip-level upgrade
spec. I’ll be working on this during
the week leading up to this session so feel free to review this ahead of time
or just grab me in the hallway for a chat if this is something that interests
you!
OpenStack - Skip level upgrades - Introduction
I’ve been fortunate enough to be part of a team looking into skip level upgrades recently ahead of the start of the Queens development cycle for OpenStack. What follows is an introduction to the concept of skip level upgrades and an overview of our initial PoC work in this area. Future posts will also cover our plans for enabling skip level upgrades within TripleO and possible work with the wider community to enable this within other deployment tools.
Introduction
Skip level upgrades are as the name suggests, upgrades that move an environment
from release N
to N+X
in a single step, where X
is greater than 1
and
for skip level upgrades is typically 3
. For example in the context of
OpenStack N
to N+3
can refer to an upgrade from the Newton release of
Openstack to the Queens release, skipping Ocata and Pike:
Newton Ocata Pike Queens
+-----+ +-----+ +-----+ +-----+
| | | N+1 | | N+2 | | |
| N | ---------------------> | N+3 |
| | | | | | | |
+-----+ +-----+ +-----+ +-----+
There are existing alternative methods available for skipping a number of releases during an upgrade. For example, parallel cloud migration is a commonly cited alternative. This is where an additional environment is stood up alongside the original, with workloads migrated to the new environment:
Newton
+-----+
| |
env#1 | N |
| |
+-----+
------------------------------------
\ Queens
\ +-----+
\ | |
env#2 `-> | N+3 |
| |
+-----+
The requirement for this type of upgrade is driven by users looking to standardise on a given release (typically LTS), whilst retaining the ability to skip forward when the release hits EOL. This negates the need to keep up with the major release cycle that in the case of OpenStack continues to be every 6 months.
It is worth highlighting that the topic of skip level upgrades is not new to the OpenStack community, with attempts to provide skip level upgrade functionality within the community before now, typically within the various deployment projects. For example openstack-ansible’s leap-upgrades project that attempted to move environments between Juno/Kilo and Newton.
More recently the topic of skip level upgrades was discussed at the OpenStack Forum in Boston in May. A RFC thread was also posted to the development mailing list, however no formal actions came of either discussion. I’m looking to restart this discussion at the next PTG in Denver, more on that later.
Requirements
Now that we understand what skip level upgrades actually are, it’s time to set out some basic requirements for the state of the environment during the upgrade. At the start of this process our team sat down and drafted the following:
- The control plane is inaccessible for the duration of the upgrade
- The upgrade must complete successfully or rollback within 4 hours
- The data plane and workloads must remain available for the duration of the upgrade.
Proof of concept
With the requirements set out, our first real task was to prove that this was even possible with an OpenStack environment. Given the releases available at the time, we began by manually upgrading an existing Mitaka based RHOSP 9 environment running on RHEL 7.3 to our recently released Ocata based RHOSP 11 release running on RHEL 7.4.
Mitaka Newton Ocata
+-----+ +-----+ +-----+
| | | N+1 | | |
| N | ----------> | N+2 |
| | | | | |
+-----+ +-----+ +-----+
RHOSP9 RHOSP10 RHOSP11
RHEL73 RHEL74
We were aware that whilst the goal of skip level upgrades is to give the
impression of a single jump between releases, in practice this isn’t possible
with OpenStack. Upgrades of OpenStack components are verified by the community
across N
to N+1
jumps, so whilst we wanted to skip ahead to Ocata we knew
we would also have to upgrade through Newton to get there.
The following outlines, at a very high level, the steps we followed during the PoC to upgrade the environment from Mitaka to Ocata:
- Rolling minor update of the underlying OS
- Disable control plane and compute services
- Upgrade a single controller to
N+1
and thenN+2
- Update packages
- Introduce new services as required (nova-placement for example)
- Update service configuration files
- Run DB syncs, migrations etc
- Repeat for
N+2
- Upgrade remaining controllers directly to
N+2
- Update packages
- Introduce new services as required
- Update service configuration files
- Upgrade remaining hosts to
N+2
- Update packages
- Update service configuration files
- Enable control plane and compute services
- Verify workload availability during upgrade
- Validate the post upgrade environment.
Let’s take a look at each of these steps below in more detail.
Rolling minor update of the underlying OS
This initial rolling minor update moved hosts from RHEL 7.3 to RHEL 7.4 whilst also pulling in OVS from our RHOSP11 repos in a bid to limit the number of reboots required in the environment. In practise operators could perform this minor update well ahead of any skip level upgrade, reducing any impact on the overall time required for the upgrade itself.
Disable control plane and compute services
As listed above under requirements, a full control plane outage is accounted for during the skip level upgrade. Note that this does not include the infrastructure services providing the database, messaging queues etc. Compute services are also stopped at this time but should not have any impact on the running workloads and data plane.
Upgrade a single controller to N+1
and then N+2
The main work of upgrading between releases is carried out on a single controller. Packages are updated, new services such as nova-placement are deployed as required, configuration files updated and DB migrations completed. This process is repeated on this host until we reach the target release.
Upgrade remaining controllers directly to N+2
Once the single controller has been upgraded to our target release we then skip any remaining controllers ahead to this target release. Updating packages, introducing new services and updating configuration files on these controllers as required.
Upgrade remaining hosts to N+2
This is then repeated for any remaining hosts, such as computes, object storage hosts etc. Again this should not interrupt running workloads or the data plane.
Enable control plane and compute services
Once all hosts are updated to the target release the control and compute services are restarted.
Verify workload availability during upgrade
During our PoC we ran multiple instances across various L2 and L3 networks, using Ansible to first launch and then later collect the results of asynchronous jobs (ping, ssh etc) that had been running between these instances during the upgrade.
Validate the post upgrade environment
Finally Tempest was used to validate the end state of the environment post upgrade.
After many iterations, the eventual introduction of Ansible to automate all the things and much cursing at the amount of time to reconfigure the environment after each run it was agreed that the team would move on to look into the possible implementation of the above in TripleO.
PTG
Thanks for making it this far! Before I end this post I wanted to highlight that I’ve recently agreed to lead the skip level upgrades room at the upcoming Denver PTG. I’ll be posting more details on this shortly to the OpenStack development mailing list but wanted to take this opportunity to encourage anyone interested in this topic to attend and discuss possible ways we can make skip level upgrades a possibility across the various deployment tools within the community.