Rackspace Private Cloud Backup and Recovery


This guide provides guidance for backup and recovery of a Rackspace Private Cloud installation.

1. Introduction
Intended Audience
Document Change History
Additional Resources
Contact Rackspace
2. Backup and Recovery with backup-manager
Ubuntu's backup-manager
Backing Up the Controller Node
Backing Up the Compute Node
Recovery
Recovering the Controller Node
Recovering the Compute Node
3. Advanced Controller Node Backup and Recovery
Configure the Standby Node
Capturing Chef State
Synchronizing the Image State
Synchronizing the MySQL State
Failover

Chapter 1. Introduction

Table of Contents

Intended Audience
Document Change History
Additional Resources
Contact Rackspace

Rackspace has developed Rackspace Private Cloud Software, a fast, free, and easy way to download and install a Rackspace Private Cloud powered by OpenStack in any data center. Rackspace Private Cloud Software is suitable for anyone who wants to install a stable, tested, and supportable OpenStack private cloud, and can be used for all scenarios from initial evaluations to production deployments.

Rackspace Private Cloud Software v 2.0 supports the Folsom release of OpenStack.

Intended Audience

This document provides guidance for backing up the OpenStack cloud that is created with Rackspace Private Cloud Software, and for recovering the controller and compute nodes in the event of failure.

To use this document, you should have prior knowledge of OpenStack and cloud computing, basic Linux administration skills, Opscode's chef tool, and Rackspace Private Cloud Software.

Document Change History

This version of Rackspace Private Cloud Backup and Recovery replaces and obsoletes all previous versions. The most recent changes are described in the table below:

Revision Date Summary of Changes
August 15, 2012
  • Rackspace Private Cloud Software v 1.0
November 15, 2012
  • Rackspace Private Cloud Software v 2.0
  • Added "Advanced Controller Node Backup and Recovery".

Additional Resources

Contact Rackspace

For more information about sales and support, contact us at . For feedback on the documentation, contact us at , or leave a comment at the Knowledge Center.

Chapter 2. Backup and Recovery with backup-manager

Table of Contents

Ubuntu's backup-manager
Backing Up the Controller Node
Backing Up the Compute Node
Recovery
Recovering the Controller Node
Recovering the Compute Node

After you have installed Rackspace Private Cloud Software, there are no backups configured for your cluster. If you want to back up your cluster, this document provides guidance on performing a backup.

Note that this document does not attempt to cover broader problems of instance failure and recovery, and it does not address backup policy questions, such as appropriate retention policies and rotations.

The following directories contain crucial components for backup:

  • /var/lib/glance/images
  • /var/lib/chef/backups
  • /etc/mysql

You must also back up the following MySQL databases:

  • mysql
  • nova
  • glance
  • dash
  • keystone

Ubuntu's backup-manager

Ubuntu includes a simple tool called backup-manager. Any backup tool compatible with Ubuntu 12.04 can be used, but backup-manager is relatively simple to use for users already familiar with UNIX and bash scripts.

The following procedure will configure backup-manager to use the script that you will create to back up the controller node.

  1. Log into the controller node and switch to root access with sudo -i. You will need root access for all of the procedures in this chapter.
  2. Install and launch backup-manager:
    $ apt-get install -y backup-manager
    
  3. You will be prompted to provide a directory in which to store the backup-manager archives. You may accept the default of /var/archives.
  4. Choose root as the owner user of the repository.
  5. Choose root as the owner group of the repository.
  6. Designate the following directories for backup:
    • /var/lib/chef/backups
    • /var/lib/glance/images
    • /etc/mysql
  7. In the /etc/backup-manager.conffile, edit the following variables to match these settings:
    export BM_PRE_BACKUP_COMMAND="/usr/local/bin/chef-backup.sh"
    export BM_MYSQL_DATABASES="nova glance keystone dash mysql"
    export BM_MYSQL_ADMINPASS=[root password defined in /root/.my.cnf]
    export BM_ARCHIVE_METHOD="tarball mysql"
    

You can now configure the backup-manager configuration file to suit your retention policy and upload requirements. For example, you can create a simple data redundancy plan by uploading the backup to a secondary server that is accessible via SSH. Refer to the backup-manager documentation for more information about configuring the backup.

By default, backup-manager automatically executes nightly. You can also generate a backup manually.

Note: Because all image files are backed up, these backups can be quite large. Ensure that you have enough space.

Backing Up the Controller Node

In a Rackspace Private Cloud installation, the controller node houses all the configuration information for the cluster, all OpenStack databases, and all images. To back up the configuration data, follow this procedure.

  1. Create a script file in /usr/local/bin/chef-backup.shand include the following:
     #!/bin/bash
        BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
        set -e
        
        topics="node environment" # "client role cookbook"
        declare -A flags
        flags=([default]=-Fj [node]=-lFj)
        
        for topic in $topics; do
            OUT_DIR=${BACKUP_DIR}/${topic}
            flag=${flags[${topic}]:-${flags[default]}}
            rm -rf ${OUT_DIR}
            mkdir -p ${OUT_DIR}
    
            echo "Dumping $topic data"
            for item in $(knife ${topic} list | awk '{print $1; }'); do
                if [ "$topic" != "cookbook" ]; then
                    knife ${topic} show $flag $item > ${OUT_DIR}/${item}.js
                else
                    knife cookbook download $item -N --force -d $OUT_DIR
                fi
            done
        done
    

    This script will place the configuration data for your cluster in the directory that you specify in the BACKUP_DIR environment variable. By default, it will choose /var/lib/chef/backups.

  2. Ensure that this script has executed by checking the backup directory.
  3. If it has done so successfully, you may now run backup-manager to back up your controller node to your desired backup destination or wait for backup-manager to execute automatically.

You can also set up a cron job to schedule backup-manager. Run the command crontab -u root -e and enter a cron job specifier. The following sample specifier would run backup-manager every night at midnight.

@midnight /usr/bin/backup-manager                    

Backing Up the Compute Node

The only information unique to the compute nodes is the disks for running instances. The instances can be backed up with the OpenStack snapshot tools in the Horizon dashboard and in the nova-compute API. You can also install backup tools inside the instances themselves.

Recovery

For the recovery process, you will reinstall the components with the Rackspace Private Cloud Software ISO. Before you begin, ensure that you have the correct networking information:

  • The IP addresses that you want to assign to each controller and compute node. This can be an IPv4 address in the format xxx.xxx.xxx.xxx or a CIDR range, and it must be able to access the internet.
  • Network subnet mask.
  • Network default gateway. This address is usually a xxx.xxx.xxx.1 address.
  • The server host name. You may be able to define this yourself, or you may need to contact your network administrator for the name.
  • Fully-qualified domain name for the host.
  • The address for the nova fixed network, in CIDR format. Instances created in the OpenStack cluster will have IP addresses in this range.
  • Optional DMZ network address. This address is also in CIDR format. Specifying a DMZ enables network traffic between instances and resources outside of the nova fixed network without network address translation. For example, if the nova fixed network is 10.1.0.0/16 and you specify a DMZ of 10.2.0.0/16, any devices or hosts in that range will be able to communicate with the instances on the nova fixed network.
  • A password for an admin OpenStack user.
  • A password for a non-admin OpenStack user, as well as a username if you do not want to use the default of demo.
  • A full real name, username, and password for an operating system user.

Recovering the Controller Node

  1. Use the ISO to re-install the controller node.
  2. Log into the controller node and switch to root access with sudo -i.
  3. Restore all backed up files to their appropriate locations.
  4. Use the following script to restore the chef server contents after /var/lib/chef/backupsis restored:
    #!/bin/bash                                                                                    
    BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
    #restore chef node attributes and environment overrides
    for n in $(ls ${BACKUP_DIR}/{node,environment}/*.js | 
        grep -v '_default.js$'); do
        knife $(basename $(dirname "${n}")) from file "${n}"
    done
    
  5. Restore all databases. If you are using backup-manager, use the following script to restore the MySQL databases. The script will restore all databases with no user intervention.
    #!/bin/bash
    BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
    cd "$BACKUP_DIR"
    for sql in $(ls *.sql.bz2 | perl -ne '{ if (m/-mysql\./) 
        { $line = $_} else {print;}} END {print $line}'); do
        db=$(echo $sql | cut -d- -f2 | cut -d. -f1)
        bzcat "$sql" | mysql "$db"
    done
    cd -
    
  6. Restart MySQL. Do NOT run chef-client before MySQL has been restarted. Doing so could cause data loss.
  7. Run chef-client on the controller node.
  8. Delete the client certificate on all compute nodes (located in /etc/chef/client.pem).
  9. Rerun chef-client on all compute nodes.

Recovering the Compute Node

Before restoring compute nodes, you must remove existing compute node data from the controller node.

  1. Log into the controller node and switch to root access with sudo -i.
  2. Execute the following command to remove compute node data:
    $ knife client delete name_of_compute_node
    
  3. Use the ISO to re-install the compute node.

You can now re-create the instances. Note that when a compute node fails, all instance data is lost, so you must the instance data from configuration management, other backup recovery methods, or deployment of snapshots. IP addresses on instances will not be stable, so some reconfiguration may be necessary.

Chapter 3. Advanced Controller Node Backup and Recovery

Table of Contents

Configure the Standby Node
Capturing Chef State
Synchronizing the Image State
Synchronizing the MySQL State
Failover

Rackspace Private Cloud Software installs all "control plane" services on the controller node, including and not limited to:

  • All API services
  • The MySQL database that OpenStack uses to maintain information about the state of the clusters
  • All images uploaded to Glance

You can reduce recovery times and potential data loss windows by configuring a standby server for the controller node. This does not constitute a true "high-availability" solution, but increases application layer resilience.

In this chapter, the main controller node will be referred to as the "active node", and the backup controller node as the "standby node". The configuration process includes the following stages:

In the event of failure, follow the steps in the Failover procedure to switch to the standby node.

Configure the Standby Node

Install Rackspace Private Cloud software on the device that you want to use as a standby node.

  1. Boot the ISO on the standby node.
  2. After the ISO has launched and loaded, accept the EULA statement.
  3. Select Controller.
  4. Enter the NIC address. If you have more than one, you must designate one as public and one as private.
  5. When prompted, enter the node IP address, subnet mask, gateway, name server, and host name. Use the same host name as that of the active controller node.
  6. Enter the address for the nova fixed network.
  7. If you want to configure a DMZ network, enter the DMZ address and the DMZ gateway address. Be sure that you have at least two NICs on the server.
  8. Enter a password for the admin user. You will use this admin username and password to access the API and the dashboard.
  9. For the additional non-admin user, accept the default demo or enter your own and provide a password at the prompt. This user will not have admin privileges, but will be able to perform basic OpenStack functions, such as creating instances from images. Creating the user will also automatically create a project (also known as a tentant) for this user.
  10. Enter the real name, user name, and password for the operating system user account. For example, the user Jane Doe would enter the following information:
    • Full name for the new user: Jane Doe
    • Username for your account: jdoe
    • Password: mysecurepassword

    At this point, it will take approximately 5-10 minutes for the Ubuntu operating system installation to complete.

  11. If you have a proxy, enter the proxy URL at the prompt in the format http://proxy_ip_address:proxy_ip_port. If you do not have a proxy, press enter to skip this step and leave the proxy information blank.

At this point, the installation process will run for approximately 30 minutes without the need for user intervention. The device will reboot during the installation process. You will see a screen with the Rackspace Private Cloud logo, followed by a screen that displays a progress bar; you can use Ctrl+Alt+F2 to toggle between the progress bar screen and a Linux TTY screen (Ctrl+Alt+Fn+F2 on a Mac). You can follow the log during installation by switching to the correct TTY screen and viewing the log in /var/log/post-install.log.

After the installation is complete, you can view the install log by logging into the operating system with the username and password that you configured in Step 9. The log is stored in /var/log/post-install.log.

Capturing Chef State

Whenever a node is added to your cloud, you should back up the configuration data.

Create a script file in /usr/local/bin/chef-backup.sh and include the following:

 #!/bin/bash
    BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
    set -e
    
    topics="node environment" # "client role cookbook"
    declare -A flags
    flags=([default]=-Fj [node]=-lFj)
    
    for topic in $topics; do
        OUT_DIR=${BACKUP_DIR}/${topic}
        flag=${flags[${topic}]:-${flags[default]}}
        rm -rf ${OUT_DIR}
        mkdir -p ${OUT_DIR}

        echo "Dumping $topic data"
        for item in $(knife ${topic} list | awk '{print $1; }'); do
            if [ "$topic" != "cookbook" ]; then
                knife ${topic} show $flag $item > ${OUT_DIR}/${item}.js
            else
                knife cookbook download $item -N --force -d $OUT_DIR
            fi
        done
    done

This script will place the configuration data for your cluster in the directory that you specify in the BACKUP_DIR environment variable. By default, it will choose /var/lib/chef/backups. Ensure that this script has executed. If it has done so successfully, you may now run backup-manager to back up your controller node to your desired backup destination.

Synchronizing the Image State

The most robust mechanism for ensuring the security of your images is to configure an OpenStack Storage (Swift) cluster and configure Glance to store images in the cluster. If this is not a viable option, you can use rsync to save the image to your standby node.

This section describes the rsync method in detail. For information about using OpenStack Storage, refer to Rackspace Private Cloud Software OpenStack Storage Installation.

  1. Connect to the active node via ssh and use sudo -i to switch to root access.
  2. Verify that rsync is installed. If it is not, install it with apt-get install rsync.
  3. Use cat to obtain the contents of the public key.
    $ cat /root/.ssh/id_rsa.pub               

    The command will output the public key in a string similar to the following:

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJUXnQwaTpw5Gj072PHF
    jxD6Av3gSdDYx1blyNB/L3CA52tvRGWwwwFzbrbqHWE+VpYgeoiL6ePul5H
    W4ENG1QYxkc6xTpWNHfM4lZNHOXEguxRDhM5W0MAAlO9tr62NETe4AvpUtI
    NskwdCWkthyt0c+jG0pW4FxuHFfdrF2S55pL4Sfh1SkDGEicCbpPtcvFXc0
    /aIRgB9/coDE2SEsCiMQDcCfKZR/tWmezDmTY0dAE2qsSPIw75QzCySujbs
    4t+rP8/mrjYqo0urYbYlhV7zvcoZNrgbaxciZJ2NXzh253Yy2NN9Wp9QAix
    lCOLAqChPoTZah9iwYHchwy+Q4d root@controller01
    

    Save this output by whatever means you prefer (copy-pasting it to a text editor, etc.).

  4. Connect to the standby node via ssh and use sudo -i to switch to root access.
  5. Verify that rsync is installed. If it is not, install it with apt-get install rsync.
  6. Open the standby node's /root/.ssh/authorized_keys file in your preferred editor and paste in the active node's id_rsa.pub key as a single line, with no line breaks.
  7. Connect to the active node again and use the following command to verify that rsync is working correctly. Replace standby_node_IPwith the IP address of the standby node.
    $ rsync -av /var/lib/glance/images root@standby_node_IP:/var/lib/glance/images
    

    The command should produce output similar to the following:

    sending incremental file list
    ./
    4ba6e796-a69d-4c61-b9f3-e8c398b1aa5b
    58ee77f9-6028-47d3-8f3c-5ee1deec128b
    929e52b3-d37f-4cfa-9578-73dab283bafd                                
    

This output shows that the synchronization between the active node and the standby node is successful. You will now need to create a cron job to automate the synchronization. Because rsync only copies new or changed images, the initial transfer may be slow. However, subsequent synchronization runs will run more quickly.

In the following procedure, rsync is configured to copy chef backup information in addition to the Glance images, and to run every five minutes.

  1. On the standby node, create a chef backups directory.
    $ mkdir -p /var/lib/chef/backups
    
  2. On the active node, create a script file in /usr/local/bin/rsync_job.sh and include the following, replacing standby_node_IPwith the standby node's IP address.
     #!/bin/bash
     if ! pgrep '^rsync_job.sh$' &>/dev/null; then
         rsync -av /var/lib/glance/images/ root@standby_node_IP:/var/lib/glance/images
         rsync -av --delete /var/lib/chef/backups/ root@standby_node_IP:/var/lib/chef/backups
     fi
    
  3. On the active node, edit the access permissions to ensure that the script is executable.
    $ chmod +rx /usr/local/bin/rsync_job.sh
    
  4. On the active node, set up the cron job with the command crontab -u root -e. Enter the following cron job specifier to make the rsync job run every five minutes:
    */5 * * * * /usr/local/bin/rsync_job.sh
    

In this configuration, images deleted on the active node are not deleted on the standby node. This means that you can retrieve an image from the standby node if it is accidentally deleted from the active node, but if you are frequently adding and deleting images, the undeleted images can take up a lot of disk space. To automatically remove images from the standby node when they are deleted from the active node, add a --delete flag to the rsync images command in the script file:

 #!/bin/bash
 if ! pgrep '^rsync_job.sh$' &>/dev/null; then
     rsync -av --delete /var/lib/glance/images/ root@standby_node_IP:/var/lib/glance/images
     rsync -av --delete /var/lib/chef/backups/ root@standby_node_IP:/var/lib/chef/backups
 fi

To ensure that the latest chef state is backed up as well, add a chef backup script to rsync_job.sh.

#!/bin/bash
    BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
    set -e
    
    topics="node environment" # "client role cookbook"
    declare -A flags
    flags=([default]=-Fj [node]=-lFj)
    
    for topic in $topics; do
        OUT_DIR=${BACKUP_DIR}/${topic}
        flag=${flags[${topic}]:-${flags[default]}}
        rm -rf ${OUT_DIR}
        mkdir -p ${OUT_DIR}

        echo "Dumping $topic data"
        for item in $(knife ${topic} list | awk '{print $1; }'); do
            if [ "$topic" != "cookbook" ]; then
                knife ${topic} show $flag $item > ${OUT_DIR}/${item}.js
            else
                knife cookbook download $item -N --force -d $OUT_DIR
            fi
        done
    done
 if ! pgrep '^rsync_job.sh$' &>/dev/null; then
     rsync -av --delete /var/lib/glance/images/ root@standby_node_IP:/var/lib/glance/images
     rsync -av --delete /var/lib/chef/backups/ root@standby_node_IP:/var/lib/chef/backups
 fi

Synchronizing the MySQL State

In addition to Glance and chef information, you must also ensure that your MySQL state is also in sync between the active node and the standby node.

Note: Running the FLUSH TABLES command will be disruptive. The control plane of OpenStack will be inoperable until that step is complete.

  1. Perform a backup on the active node as described in Backing Up the Controller Node.
  2. Using that backup, run steps 2-4 of Recovering the Controller Node on the standby node. The recovery of /etc/mysql is the most important element.
  3. On the active node, create a configuration file at /etc/mysql/conf.d/replication.cnfand include the following content:
    [mysqld]
    log-bin=mysql-bin
    server-id=1
    
  4. Run the command restart mysql on the active node.
  5. Run mysql on the active node and enter the following at the prompts, replacing standby_password and standby_node_IPwith the password and IP address for the standby node.
    mysql> CREATE USER 'repl'@'standby_node_IP' IDENTIFIED BY 'standby_password';
    mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'standby_node_IP';
    
  6. On the standby node, create a configuration file at /etc/mysql/conf.d/replication.cnfand include the following content:
    [mysqld]
    log-bin=mysql-bin
    server-id=2
    
  7. Run restart mysql on the standby node. Leave the ssh session to the standby node open.
  8. Create two new ssh sessions to the active node.
  9. In the first ssh session on the active node, run mysql. At the prompt, enter the command FLUSH TABLES WITH READ LOCK;. Leave the session open on the mysqlprompt.
    mysql> FLUSH TABLES WITH READ LOCK;
    mysql> 
    
  10. In the second ssh session on the active node, run the following command:
    $ mysqldump --all-databases --master-data >dbdump.db
    
  11. When this command is completed, switch back to the first ssh session and enter UNLOCK TABLES; at the mysqlprompt. Exit mysql.
    mysql> UNLOCK TABLES;
    mysql> exit
    
  12. Return to the second ssh session and issue the following command to transfer the dbdump.db file to the standby node, replacing standby_node_IPwith the standby node's IP address.
    $ scp dbdump.db root@$standby_node_IP:
    
  13. On the standby node, run the following commands.
    $ grep 'CHANGE MASTER TO MASTER_LOG' dbdump.db
    

    This command will return a statement that includes the filename of the master log file and a master log position.

  14. Run the following set of commands to initiate the replication process. The standby_password is the one used for the user repl in step 5, and the grep_log_file and grep_positionvariables are the filename and position returned by the grep command in step 13.
    $ mysql < /root/dbdump.db
    $ mysql -e "CHANGE MASTER TO MASTER_HOST='active_node_IP", \ MASTER_USER='repl', MASTER_PASSWORD='standby_password', \ MASTER_LOG_FILE='grep_log_file', MASTER_LOG_POS=grep_position'" $ knife node edit `hostname -f` $ vim /root/.my.cnf # Replace password $ mysql -e 'start slave;'

The MySQL replication is now configured on the active and standby nodes. For more information about MySQL replication, refer to "How to Set Up Replication" in the MySQL documentation.

Failover

In the event that the active node fails and you need to switch to the standby node, follow this procedure.

  1. Power down the active node.
  2. Use the following script to restore the chef server contents on the standby node:
    #!/bin/bash
    BACKUP_DIR=${BACKUP_DIR:-/var/lib/chef/backups}
    
    #restore chef node attributes and environment overrides
    for n in ${BACKUP_DIR}/{node,environment}/*.js; do
      knife $(basename $(dirname "${n}")) from file "${n}"
    done
    
  3. Assign the IP address of the formerly active node to the standby node. At this point, all services should begin working.
  4. Open the /etc/hosts file in a text editor and comment out the line that binds the hostname to the standby node IP address. For example, in a configuration where the standby node's IP address is 172.16.137.10 and the hostname is standby.myhost.com, the line would look like this:
    # 172.16.137.10 standby.myhost.com
    
  5. Run chef-client on the node.
  6. Delete the client.pem file from all the nodes in the cluster and run chef-client on all compute nodes.

 

 



Was this content helpful?




© 2011-2013 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER