Using Swiftly to upload an image to be imported


Prerequisites

The steps in this article assume that you have already properly prepared an image for import into the public cloud and that you have already installed the swiftly Cloud Files client. If you need help on either of these prerequisites, please consult these Knowledge Center articles:

Uploading your image

Understanding how Large Objects are stored in Cloud Files

In order to provide high availability and data resiliency, there's a limit to the size of any file in Cloud Files. This limit is 5GB, which is smaller than most VM images, and smaller than a lot of things people may want to store. To get around this, a Large Object (any file larger than 5GB) must be split into segments that are bound together by a manifest. There are two types of manifest objects in Cloud Files, Dynamic Large Objects and Static Large Objects. For image uploads, we recommend that you use a Static Large Object, so that's what we'll focus on here.

With a Static Large Object, the manifest is an explicit listing of the size, MD5 checksum, and location of each segment that makes up the Large Object. Fortunately, the swiftly tool will divide your local file into segments, upload the segments in parallel, and create a manifest for you automatically, so we won't have to discuss those details here. (If you're interested, you can read all about Cloud Files Large Objects in the Cloud Files API documentation.)

Set some environment variables

Do the following in a bash shell:

CF_USERNAME=       # your Rackspace cloud username
CF_API_KEY=        # your Rackspace cloud API key
CF_REGION=         # 3 char region code for where you're uploading (e.g., ORD)
SOURCEFILE=        # the local file you are uploading
CONTAINER=         # the container in Cloud Files where the image will go
IMAGEFILENAME=     # the name you want the image to be called in Cloud Files
SWFLY_SEG_BYTES=134217728    
SWFLY_CONCURRENCY=20

Make sure that the container into which you want to upload your object already exists in the appropriate region in Cloud Files. (You can create it in the Cloud Control Panel, if necessary.)

You'll recall that we mentioned earlier that swiftly will automatically segment your image file and upload the segments in parallel. The two SWFLY environment variables listed in the preceding example will be used to control this.

  • SWFLY_SEG_BYTES specifies the size, in bytes, that swiftly will use for each segment (except, of course, for the last segment, which could be smaller). The value above is 128MB expressed in bytes. It's the value that Cloud Files engineers suggest for this purpose. You can experiment with different values to see if you get better performance, but you don't want to go smaller than this (certainly no smaller than 100MB), and you probably shouldn't go larger than 1GB.
  • SWFLY_CONCURRENCY specifies the maximum number of parallel threads that swiftly will use to upload the object. This is the value suggested by the author of swiftly. You may want to experiment a bit, but keep in mind that if you set it too high, your parallel uploads may saturate your network card and actually slow down the overall file transfer.

Invoking Swiftly

After you've got your environment variables set, you can invoke swiftly from the command line to perform the upload. (You may want to do this in a screen session. If you're not familiar with the GNU screen program, you can find a quick introduction in the article: Installing the Swiftly Cloud Files Client.)

swiftly \
  --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
  --auth-user=$CF_USERNAME \
  --auth-key=$CF_API_KEY \
  --region=$CF_REGION \
  --concurrency=$SWFLY_CONCURRENCY \
  put \
    --segment-size=s${SWFLY_SEG_BYTES} \
    --input=$SOURCEFILE \
  ${CONTAINER}/${IMAGEFILENAME}

Depending on your use case, you may be importing from a cloud server that's already in the Rackspace open cloud. If that's the case, you'll want to add the '--snet' option to the command so that the file will be transferred over the internal cloud network. Additionally, if you want swiftly to keep you fully notified about what it's doing as it uploads your image file, you can add the '--verbose' option. If you add these, your invocation will look like this:

swiftly \
  --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
  --auth-user=$CF_USERNAME \
  --auth-key=$CF_API_KEY \
  --region=$CF_REGION \
  --snet \
  --verbose \
  --concurrency=$SWFLY_CONCURRENCY \
  put \
    --segment-size=s${SWFLY_SEG_BYTES} \
    --input=$SOURCEFILE \
  ${CONTAINER}/${IMAGEFILENAME}

One last comment. Notice that the swiftly invocation contains the following line:

--segment-size=s${SWFLY_SEG_BYTES}

The s after the equals sign is not a typo. It's telling swiftly to create a Static Large Object. As mentioned earlier, we highly recommend that you upload your image as a Static Large Object (so highly that we're not going to talk about the alternative type of Large Object in this article!).

Checking Your Upload

Let's suppose that these were among the environment variable settings you used for your upload:

CF_REGION="DFW"
SOURCEFILE="my-awesome-image.vhd"
CONTAINER="uploaded-images"
IMAGEFILENAME="my-custom-image.vhd"

# and this was the image you uploaded:
$ ls -l
total 2524008
-rw-rw-r-- 1 joeuser joeuser 2584576512 Apr 24 03:01 my-awesome-image.vhd

First, let's take a look at the manifest for the Static Large Object that was created in Cloud Files.

# get the manifest
swiftly \
 --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
 --auth-user=$CF_USERNAME \
 --auth-key=$CF_API_KEY \
 --region=$CF_REGION \
 get \
 --query=multipart-manifest=get \
 --output=my-manifest.json \
 ${CONTAINER}/${IMAGEFILENAME}

# look at the manifest
$ cat my-manifest.json | python -m json.tool
[
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "bc5dc9c7f93b214e648e3ce2b9ee4bd1",
        "last_modified": "2014-04-24T03:46:16.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000000"
    },
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "c4a2dbe171bd60a3a23198baa916879c",
        "last_modified": "2014-04-24T03:46:23.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000001"
    },
/* etc. */
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "9acffa882c4bf8beb3025e856f6e9d01",
        "last_modified": "2014-04-24T03:47:52.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000018"
    },
    {
        "bytes": 34439680,
        "content_type": "application/octet-stream",
        "hash": "dd594916413c2e1ef05875606b813528",
        "last_modified": "2014-04-24T03:47:55.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000019"
    }
]

You can see the segment list in the manifest. Alternatively, you can look at your Cloud Files account in the Cloud Control Panel.

This is the Cloud Files page in the Cloud Control Panel, displaying containers in the DFW region:

The uploaded-images container holds the manifest file, and the uploaded-images_segments container holds the parts that would be merged to make the image file. Each segment file ends with a number sequence identifying the segment (like 00000000, 00000001, 00000002, etc.).

Note that while the manifest in uploaded-images depends on the image segments in uploaded-images_segments, Cloud Files won't prevent you from deleting segments. If any of those segments are deleted, you'll get an error when you attempt to download the Static Large Object that comprises the image file.

If you look inside the uploaded-images container, the image file that the manifest describes will be displayed rather than the manifest itself - in this example, my-custom-image.vhd. The file size Cloud Files displays in the container view will reflect the overall size of the Large Static Object, because that's the size of the object you'd get if you downloaded the file. The actual storage used in that container is the few kilobytes used by the manifest file, because the actual image data is being stored in the uploaded-images_segments container.

Summary

Hopefully you haven't skipped directly to this section!  We really encourage you to read through the entire article.

  • To download the image from the example above, you'll request the object named "my-custom-image.vhd" from the "uploaded-images" container in the DFW region of your Cloud Files account.
  • To import the example image using Cloud Images, you'll create an import task, specifying the value for import_from as uploaded-images/my-custom-image.vhd
  • Do not delete any of the segments in the "uploaded-images_segments" container or you will corrupt your image!
  • Swiftly takes care of dividing your image file into segments, uploading the segments to their own container, and creating the Static Large Object manifest in the container you requested automatically.  But it's important to know what it's doing and how your data is stored so that you don't corrupt your image by mistake.


Was this content helpful?




© 2015 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER