Using Swiftly to upload an image to be imported


Prerequisites

The steps in this article assume that you have already properly prepared an image for import into the public cloud and that you have already installed the swiftly Cloud Files client. If you need help on either of these prerequisites, please consult these Knowledge Center articles:

Uploading your image

Understanding how Large Objects are stored in Cloud Files

In order to provide high availability and data resiliency, there's a limit to the size of any file in Cloud Files. This limit is 5GB, which is smaller than most VM images, and smaller than a lot of things people may want to store. To get around this, a Large Object (any file larger than 5GB) must be split into segments that are bound together by a manifest. There are two types of manifest objects in Cloud Files, Dynamic Large Objects and Static Large Objects. For image uploads, we recommend that you use a Static Large Object, so that's what we'll focus on here.

With a Static Large Object, the manifest is an explicit listing of the size, MD5 checksum, and location of each segment that makes up the Large Object. Fortunately, the swiftly tool will divide your local file into segments, upload the segments in parallel, and create a manifest for you automatically, so we won't have to discuss those details here. (If you're interested, you can read all about Cloud Files Large Objects in the Cloud Files API documentation.)

Set some environment variables

Do the following in a bash shell:

CF_USERNAME=       # your Rackspace cloud username
CF_API_KEY=        # your Rackspace cloud API key
CF_REGION=         # 3 char region code for where you're uploading (e.g., ORD)
SOURCEFILE=        # the local file you are uploading
CONTAINER=         # the container in Cloud Files where the image will go
IMAGEFILENAME=     # the name you want the image to be called in Cloud Files
SWFLY_SEG_BYTES=134217728    
SWFLY_CONCURRENCY=20

Make sure that the container into which you want to upload your object already exists in the appropriate region in Cloud Files. (You can create it in the Cloud Control Panel, if necessary.)

You'll recall that we mentioned earlier that swiftly will automatically segment your image file and upload the segments in parallel. The two SWFLY environment variables listed in the preceding example will be used to control this.

  • SWFLY_SEG_BYTES specifies the size, in bytes, that swiftly will use for each segment (except, of course, for the last segment, which could be smaller). The value above is 128MB expressed in bytes. It's the value that Cloud Files engineers suggest for this purpose. You can experiment with different values to see if you get better performance, but you don't want to go smaller than this (certainly no smaller than 100MB), and you probably shouldn't go larger than 1GB.
  • SWFLY_CONCURRENCY specifies the maximum number of parallel threads that swiftly will use to upload the object. This is the value suggested by the author of swiftly. You may want to experiment a bit, but keep in mind that if you set it too high, your parallel uploads may saturate your network card and actually slow down the overall file transfer.

Invoking Swiftly

After you've got your environment variables set, you can invoke swiftly from the command line to perform the upload. (You may want to do this in a screen session. If you're not familiar with the GNU screen program, you can find a quick introduction in the article: Installing the Swiftly Cloud Files Client.)

swiftly \
  --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
  --auth-user=$CF_USERNAME \
  --auth-key=$CF_API_KEY \
  --region=$CF_REGION \
  --concurrency=$SWFLY_CONCURRENCY \
  put \
    --segment-size=s${SWFLY_SEG_BYTES} \
    --input=$SOURCEFILE \
  ${CONTAINER}/${IMAGEFILENAME}

Depending on your use case, you may be importing from a cloud server that's already in the Rackspace open cloud. If that's the case, you'll want to add the '--snet' option to the command so that the file will be transferred over the internal cloud network. Additionally, if you want swiftly to keep you fully notified about what it's doing as it uploads your image file, you can add the '--verbose' option. If you add these, your invocation will look like this:

swiftly \
  --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
  --auth-user=$CF_USERNAME \
  --auth-key=$CF_API_KEY \
  --region=$CF_REGION \
  --snet \
  --verbose \
  --concurrency=$SWFLY_CONCURRENCY \
  put \
    --segment-size=s${SWFLY_SEG_BYTES} \
    --input=$SOURCEFILE \
  ${CONTAINER}/${IMAGEFILENAME}

One last comment. Notice that the swiftly invocation contains the following line:

--segment-size=s${SWFLY_SEG_BYTES}

The s after the equals sign is not a typo. It's telling swiftly to create a Static Large Object. As mentioned earlier, we highly recommend that you upload your image as a Static Large Object (so highly that we're not going to talk about the alternative type of Large Object in this article!).

Checking Your Upload

Let's suppose that these were among the environment variable settings you used for your upload:

CF_REGION="DFW"
SOURCEFILE="my-awesome-image.vhd"
CONTAINER="uploaded-images"
IMAGEFILENAME="my-custom-image.vhd"

# and this was the image you uploaded:
$ ls -l
total 2524008
-rw-rw-r-- 1 joeuser joeuser 2584576512 Apr 24 03:01 my-awesome-image.vhd

First, let's take a look at the manifest for the Static Large Object that was created in Cloud Files.

# get the manifest
swiftly \
 --auth-url=https://identity.api.rackspacecloud.com/v2.0 \
 --auth-user=$CF_USERNAME \
 --auth-key=$CF_API_KEY \
 --region=$CF_REGION \
 get \
 --query=multipart-manifest=get \
 --output=my-manifest.json \
 ${CONTAINER}/${IMAGEFILENAME}

# look at the manifest
$ cat my-manifest.json | python -m json.tool
[
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "bc5dc9c7f93b214e648e3ce2b9ee4bd1",
        "last_modified": "2014-04-24T03:46:16.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000000"
    },
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "c4a2dbe171bd60a3a23198baa916879c",
        "last_modified": "2014-04-24T03:46:23.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000001"
    },
/* etc. */
    {
        "bytes": 134217728,
        "content_type": "application/octet-stream",
        "hash": "9acffa882c4bf8beb3025e856f6e9d01",
        "last_modified": "2014-04-24T03:47:52.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000018"
    },
    {
        "bytes": 34439680,
        "content_type": "application/octet-stream",
        "hash": "dd594916413c2e1ef05875606b813528",
        "last_modified": "2014-04-24T03:47:55.000000",
        "name": "/uploaded-images_segments/my-custom-image.vhd/1398308466.19/2584576512/00000019"
    }
]

You can see the segment list in the manifest. Alternatively, you can look at your Cloud Files account in the Cloud Control Panel.

Here's the Control Panel "Files" page that's showing what containers we have in the DFW region.

Screenshot of the contents of a Cloud Files container

Notice that there are two containers: the container "uploaded-images" that existed before we did the upload, and a new container named "uploaded-images_segments". If you look at the size of each container, you can see that the "uploaded-images" container is very small. That's because all it contains is the Static Large Object manifest file, which is a text file in JSON format. (You saw the content of this file in the preceding example.)

Swiftly has created the "uploaded-images_segments" container. Not surprisingly, it contains the actual image data divided up into segment files. You can see that the size of this container is 2.41GB, which, again not surprisingly, is the size of the image we uploaded.

Let's take a look at the segments container.  (We've edited the image so that you only see first four and the final three segments displayed.)

Screenshot of the contents of a Cloud Files container containing the segments of a Static Large Object (first few entries)Screenshot of the contents of a Cloud Files container containing the segments of a Static Large Object (first few entries)

As we instructed swiftly, each segment is 128MB (except for the last segment). The key point is that you do not want to delete any of these segments unless you plan to delete them all. Further, you should not delete any of these segments unless you intend to delete the manifest object that lives in the "uploaded-images" container. Keep in mind that these two containers, "uploaded-images" and "uploaded-images_segments" are very tightly coupled but Cloud Files does not know that they are connected. So you can delete a segment from the "uploaded-images_segments" and you will not receive a warning that a Static Large Object depends upon the existence of these segments. What will happen is that an error will occur when you try to download the Static Large Object ... but since Cloud Files does not have an "undelete" function, by that time it will be too late to save the deleted segment.

Finally, let's take a look at the "uploaded-images" container.

Screenshot of the contents of a Cloud Files container a single Static Large Object

You can see that the size of the file "my-custom-image.vhd" is listed as 2.41GB. That's to indicate that if you download this file, you can expect to get 2.41GB of data. But don't worry, you aren't being charged twice for your image -- recall that this container only holds 4.84KB of data.

Summary

Hopefully you haven't skipped directly to this section!  We really encourage you to read through the entire article.

  • To download the image from the example above, you'll request the object named "my-custom-image.vhd" from the "uploaded-images" container in the DFW region of your Cloud Files account.
  • To import the example image using Cloud Images, you'll create an import task, specifying the value for import_from as uploaded-images/my-custom-image.vhd
  • Do not delete any of the segments in the "uploaded-images_segments" container or you will corrupt your image!
  • Swiftly takes care of dividing your image file into segments, uploading the segments to their own container, and creating the Static Large Object manifest in the container you requested automatically.  But it's important to know what it's doing and how your data is stored so that you don't corrupt your image by mistake.


Was this content helpful?




© 2011-2013 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER