• Sales: 1-800-961-2888
  • Support: 1-800-961-4454

Managing OpenStack Object Storage With Chef


WARNING: The Rackspace Private Cloud offering is being updated, and this documentation is obsolete.

This section describes some common OpenStack Object Storage maintenance tasks that you can perform with Chef.

Adding a Node
Managing Drive Failure
Short Failures
Long Failures
Managing Node Failure
Short Node Failures
Long Node Failures

For detailed OpenStack Object Storage troubleshooting and administrative procedures, refer to the OpenStack Swift developer documentation.

Adding a Node

You can add a new OpenStack Object Storage node at any time with the following process.

  1. Decide if the node will go into an existing zone or a new zone.
  2. Install Ubuntu 12.04 on the node.
  3. Configure chef on the new node.
  4. Add the appropriate Chef roles to the node.
  5. Set the swift zone node attribute on the new node.
  6. Run chef-client on the new node twice, once to discover the candidate drives and again to mount them.
  7. Run chef-client on the management-server to populate the rings.
  8. Run chef-client on the nodes to pull the new ring.

Managing Drive Failure

swift-drive-audit periodically runs from cron and examines /var/log/kern.log for error messages pertaining to drives.  When one is found the drive is automatically unmounted and swift replicates the data objects to the handoff nodes.  If the issue is minor and does not require a drive replacement, refer to Short Failures.  However, if the drive does require replacement refer to Long Failures.  For more information refer to Detecting Failed Drives in the OpenStack documentation.

Short Failures

  1. Unmount the drive.
  2. Examine the drive and resolve the issue, referring to the drive documentation if necessary. You may need to reformat the drive.
  3. Mount the drive again and verify that replication is running. Use the process in Test the OpenStack Object Storage Cluster.

Long Failures

If the drive failure is serious, follow this procedure to remove it from the ring and re-balance the ring.

Note: Depending on the configuration of your swift cluster the failed drive may not be in all three rings. You will need to remove the failed drive from the ring where it is present.

  1. Log into the management server and switch to root access with sudo -i
  2. Get the disk ID from the ring for the failed drive.
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder
    
  3. Remove the failed drive from the ring, specifying the node IP address, the corresponding port, and the device ID.
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder \
      remove failed_node_IP/device_ID
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder \
      remove failed_node_IP/device_ID
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder \
      remove failed_node_IP/device_ID
    
  4. Re-balance the ring
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder \
      rebalance
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder \
      rebalance
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder \
      rebalance
    
  5. Push the ring.
    $ cd /etc/swift/ring-workspace/rings
    $ git commit -a -m "ring update"
    $ git push
  6. Redistribute the ring to the storage and proxy nodes by running chef-client.

Managing Node Failure

In most cases node failures are short events, such as reboots. OpenStack Object Storage replication handles object replication and storage through the handoff nodes and no action needs to be taken to maintain the consistency of the cluster.

OpenStack Object Storage uses a tombstone file to replicate file deletion alongside file creation, and by default the reclaim_age is set to 86400 seconds to expire tombstone files. If a storage node will be offline for more than a day, the node should be removed from the cluster until the node can be brought back online. This action preserves data consistency.

For example, a file could be deleted from one node just before the node fails, and the tombstone file would not be replicated to other members of the swift cluster. If the failed node is then brought back online after the reclaim_age period, the tombstone file could be replicated to other nodes and cause data consistency issues.

Short Node Failures

The best advice is to get the server back online. If the device is going to be down for more than 1 day refer to Long Node Failures. No modification of the swift ring is required if drives are not replaced.

Long Node Failures

If the storage node will be down for more than 1 day, you should remove the node from the ring.

  1. View all devices associated with a storage node from the ring.
    $ cd /etc/swift/ring-workspace/rings
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder \
      search failed_node_IP
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder \
      search failed_node_IP
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder search \
      failed_node_IP
    
  2. Remove all devices associated with a storage node from the ring.
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder \
      remove failed_node_IP
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder \
      remove failed_node_IP
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder \
      remove failed_node_IP
    
  3. Re-balance the ring.
    $ swift-ring-builder /etc/swift/ring-workspace/rings/account.builder \
      rebalance
    $ swift-ring-builder /etc/swift/ring-workspace/rings/container.builder \
      rebalance
    $ swift-ring-builder /etc/swift/ring-workspace/rings/object.builder \
      rebalance
    
  4. Push the ring.
    $ cd /etc/swift/ring-workspace/rings
    $ git commit -a -m "ring update"
    $ git push
    
  5. Distribute the ring to storage and proxy nodes chef-client. When the node has been recovered you will need to add the devices on the node back into the ring.

    Note: If you re-installed the Operating System on the node you should wipe the drives and start from scratch.

  6. Run chef-client on management server.
    $ chef-client
    
  7. Verify the new ring.
    $ cd /etc/swift/ring-workspace
    $ cat ./generate-rings.sh
    
  8. Generate the new ring.
    $ sed -i 's/exit 0/# exit 0/g' generate-rings.sh
    $ ./generate-rings.sh
    
  9. Push the rings.
    $ cd /etc/swift/ring-workspace/rings
    $ git commit -a -m "ring update"
    $ git push
    
  10. Redistribute the ring to the storage and proxy nodes by running chef-client.






© 2011-2013 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER