Syncing to Cloud Files with fileconveyor


There are many file syncing applications out there, but few work the way we want them to or are as versatile as an open source application called fileconveyor. Thankfully, the source code is fully documented and the tool is easily installed. In a matter of minutes you can have the project up and running a sync between the local files on your server to a destination like Rackspace Cloud Files.

Using fileconveyor to sync files to the CDN lets you use ecommerce solutions like Magento or CMS applications like Drupal or WordPress with Cloud Files without relying on a plug-in to handle the file transfers.

Prerequisites

You can run fileconveyor on Linux or Mac OS X. Windows is not supported at the time of this writing. This document was written for fileconverter version 0.3.

Your system will need to have python 2.5 or higher installed.

Installation will require git and pip.

Install git

If you don't have git installed already, you can download it from the project's website:

http://git-scm.com/

Most Linux distributions also have git in their main package respository, under the package name "git".

Install pip

You'll also need the python package manager pip.

If you don't have pip installed the easiest way to get it is to install the python setuptools package. You can download the installer from its website:

http://pypi.python.org/pypi/setuptools

As an alternative you can use a Linux package manager to install setuptools. On most distributions the package name is "python-setuptools".

Once you've installed setuptools you can install pip by running:

sudo easy_install pip

Install fileconveyor

Now you can install fileconveyor.

Change to the directory you want to hold the fileconveyor files, then run:

sudo pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor

The fileconveyor source files will be downloaded to the src/fileconveyor directory, relative to where you run the pip command. For example, if you run pip in the /usr/local directory, the fileconveyor script directory will be in /usr/local/src/fileconveyor.

Running the install with sudo (or as root) lets pip handle installing dependencies like django and python-cloudfiles.

Sample configuration

Before running fileconveyor you'll need to configure it by creating a file named "config.xml" in the same directory as the arbitrator.py file.

If you are starting in the directory you were in when you started the install, you can run:

sudo nano src/fileconveyor/fileconveyor/config.xml

For a simple configuration that will sync the contents of a directory with a Cloud Files container, paste the following text into the file:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <!-- Sources -->
  <sources ignoredDirs="">
    <source name="test" scanPath="/var/www/html/test" />
  </sources>

  <!-- Servers -->
  <servers>
    <server name="Rackspace Cloud Files" transporter="cloudfiles">
      <username>USERNAME</username>
      <api_key>APIKEY</api_key>
      <container>CONTAINER</container>
    </server>
  </servers>
 <!-- Rules -->
  <rules>
    <rule for="test" label="Test">
      <destinations>
        <destination server="Rackspace Cloud Files" path="test" />
      </destinations>
    </rule>
  </rules>
</config>

You'll need to modify the config to fit your environment and account details.

  • In the "Sources" section change the "scanPath" property to the directory you want to sync.
  • In the "Servers" section set "username" and "api_key" to match your credentials, and set "container" to the name of the container to hold the synced files.
  • In the "Rules" section set the "path" property to the subdirectory to sync to in the container. Leave the value blank to sync to the root of the container (path="").

It's possible to perform more complex syncs by using multiple rules, syncing from multiple sources, or having fileconveyor change the filename or some of a file's properties before copying it to Cloud Files (using "processors"). More details can be found in fileconveyor's documentation and on the project's website.

Running fileconveyor

With the configuration all set, it's time to run fileconveyor for its initial sync. The "arbitrator.py" script handles launching fileconveyor's various components:

sudo python src/fileconveyor/fileconveyor/arbitrator.py

The fileconveyor program is written to be run as a console script, without an included init script or means of forking the process to run as a daemon. For testing purposes you can run the script directly from a command line. For persistent use you'll want to either set up an init script or run the program from a screen session, as in:

screen python src/fileconveyor/fileconveyor/arbitrator.py

Once the initial sync completes you should be able to see the results in the target container via the Cloud Control Panel.

Further details

The sample configuration we provide is simple, and you can do much more with fileconveyor to customize its operation to your needs. Check the documentation in the source directory and the project web page for full details, but here are a few more options:

  • Running verify.py will check the source directory against the Cloud Files container to confirm that the files synced properly.

  • These instructions have you install and run via sudo, but fileconveyor doesn't require root privileges to run. You can also chown the fileconveyor directory and its contents to an unprivileged user.

  • The application runs off a django backend source to connect to the various servers and any DeprecationWarning entries in the log can be safely disregarded.

  • You can edit values in the settings.py file to make the locations of the SQlite databases, the pid file, and other system files more permanent. For example:

    RESTART_AFTER_UNHANDLED_EXCEPTION = True
    RESTART_INTERVAL = 10
    LOG_FILE = '/var/log/fileconveyor.log'
    PID_FILE = '/var/run/fileconveyor/fileconveyor.pid'
    PERSISTENT_DATA_DB = '/etc/fileconveyor/persistent_data.db'
    SYNCED_FILES_DB = '/etc/fileconveyor/synced_files.db'
    WORKING_DIR = '/tmp/fileconveyor'
    MAX_FILES_IN_PIPELINE = 50
    MAX_SIMULTANEOUS_PROCESSORCHAINS = 1
    MAX_SIMULTANEOUS_TRANSPORTERS = 10
    MAX_TRANSPORTER_QUEUE_SIZE = 1
    QUEUE_PROCESS_BATCH_SIZE = 20
    CALLBACKS_CONSOLE_OUTPUT = False
    CONSOLE_LOGGER_LEVEL = logging.INFO
    FILE_LOGGER_LEVEL = logging.DEBUG
    RETRY_INTERVAL = 30
    


Was this content helpful?




© 2014 Rackspace US, Inc.

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License


See license specifics and DISCLAIMER