Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Nested Folders in Rackspace Cloud Files

6

Rackspace Cloud Files is a cloud storage system that is great for storing large amounts of information. A common misconception is that this storage system behaves like a traditional file system, complete with byte-level manipulation and nested folders. It is the second of these that I want to talk about: how to simulate a nested directory (or folder) structure in Rackspace Cloud Files.

Cloud Files is better understood as a storage system, not a file system with three basic parts: accounts, containers, and objects. These three parts can be easily seen in the URL referencing an object. The URL one uses for the ReST API is of the form . Containers are large-scale groupings of objects, operating at a higher level, conceptually, than folders. If objects were books, containers may be genres. Containers cannot be nested. That is, one cannot put a container inside of another container.

However, it is fairly easy to simulate a directory structure with objects. These “virtual directories” are not directories, per se, but object name prefixes over which one can iterate. An example should make this concept easy to understand. Suppose I wanted to store books in Cloud Files. From my analogy above, I can use the genre of the book as my container name. The object name will be of the form “author/title.” This way, I can list all books by a particular author (within a genre).

Let’s load the following books into Cloud Files:

•    The Pit and the Pendulum, Poe, Horror
•    The Masque of the Red Death, Poe, Horror
•    Pride and Prejudice and Zombies, Grahame-Smith, Horror
•    The Far Side Gallery, Larson, Comics
•    Something Under the Bed Is Drooling, Watterson, Comics
•    It’s A Magical World, Watterson, Comics

First, I will create two containers, horror and comics. Next I will name my files according to the pattern I laid out above. I will have the files “poe/the_pit_and_the_pendulum”, “poe/the_masque_of_the_red_death”, “larson/the_far_side_gallery”, etc. Then I will upload these files to their appropriate container. As a final step, I need to upload “directory marker” files. These are empty (zero-sized) files with a content-type of “application/directory.”

[NOTE: The following gets technical. For those wishing to use this feature of Cloud Files and not wanting to program, I recommend using a third-party tool like Cyberduck (if you are using a Mac). Cyberduck handles virtual nested directories completely transparently.]

Now to take advantage of these “virtual directories”, I can do container listings and give an appropriate path value. In the Python language bindings, this would look similar to the following:

1             container = cf_connection.get_container(‘horror’)

2             books_by_poe = container.get_objects(path=’poe’)

The path parameter on the get_objects call returns all objects in the given value. In this case, it returns the two books in the virtual “poe” directory. Similarly, if I had given the value “grahame-smith,” I would have found his adaptation of the classic love story.

In my example, I’ve used two genre containers and virtual directories only one level deep. I could just as easily put everything into one container and nested the authors under a genre virtual directory. An object name would then be like “comics/larson/the_far_side_gallery.” The only limitation to using this feature in Cloud Files is keeping the length of the object name (including all virtual directories) under the maximum allowed (1024 characters).

For more detailed information on how to implement virtual directories, see the Cloud Files developer guide. The relevant information is found in the “Pseudo hierarchical folders/directories” section.

About the Author

This is a post written and contributed by John Dickinson.

John Dickinson is a developer at Rackspace Hosting where he works on Rackspace’s Cloud Files product. He has been active in the OpenStack community since its inception and is the Project Technical Lead for OpenStack Swift. John has built enormous data storage clusters that store billions of objects and petabytes of data for customers. Rackspace Cloud Files is currently one of the largest production deployment of OpenStack Object Storage. Be sure to read John's personal blog for more insights on the OpenStack technologies.


More
6 Comments

Is there a restriction of how many objects can be stored in a container?…can i store millions of objects in a single container and does is impact the performance?

avatar Giga on February 4, 2010 | Reply

Giga,

There is currently no limit on the number of objects per container, but performance for some use cases will slow down (and level off) after adding millions of objects. Object read performance will not be affected. Personally, I’d recommend keeping containers to about 10 million objects. Actual usage would depend on your specific use case, though. There is nothing top stop you from putting a billion items in a container.

avatar John Dickinson on February 4, 2010 | Reply

Thank you very much John…i am planning a photo upload type website and expecting thousands of uploads a day.

Another question, basically my script authenticates & connects each time a user uploads a photo, so there maybe multiple simultaneous authentication requests…Is this the best way or there is a way to maintain a persistant connection.

BTB, I use the php binding for cloudfiles. thanks again for your help.

avatar Giga on February 5, 2010 | Reply

Giga,

You will get better overall performance if you persist your connections. The language bindings should reuse connections when possible, but, of course, this is determined by how you are using the bindings in your app. Less requests on the back end directly translates to less latency for your users.

Auth tokens are good for up to 24 hours and can be reused. The only time you need to request a new token is when your last request returned unauthorized. Again, the language bindings should handle this for you as they are able.

From your description, it sounds like connections and tokens can be reused per page or per user action but not between users. It should be possible for you to reuse the connections and auth tokens. Not knowing the design of your app, I don’t know how simple of difficult that change would be.

Several Cloud Files developers and other users are in the #cloudfiles channel on freenode. Come in and ask us any questions you have.

avatar John on February 5, 2010 | Reply

Thank you very much john, i really appreciate your repy

avatar Giga on February 8, 2010 | Reply

If I were to publish the container to CDN, how would directories work in that case? For eg. my virtual directory structure in a container is /products/250/img.jpg, would the CDN URL work in the same fashion i.e. http://cdnno.rackcdn.com/products/250/img.jpg?

avatar Pratik Thakkar on June 19, 2012 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.