There are many file syncing applications out there, but few work the way we want them to or are as versatile as an open source application called fileconveyor. Thankfully, the source code is fully documented and the tool is easily installed. In a matter of minutes you can have the project up and running a sync between the local files on your server to a destination like Rackspace Cloud Files.
Using fileconveyor to sync files to the CDN lets you use ecommerce solutions like Magento or CMS applications like Drupal or WordPress with Cloud Files without relying on a plug-in to handle the file transfers.
You can run fileconveyor on Linux or Mac OS X. Windows is not supported at the time of this writing. This document was written for fileconverter version 0.3.
Your system will need to have python 2.5 or higher installed.
Installation will require git and pip.
If you don't have git installed already, you can download it from the project's website:
http://git-scm.com/
Most Linux distributions also have git in their main package respository, under the package name "git".
You'll also need the python package manager pip.
If you don't have pip installed the easiest way to get it is to install the python setuptools package. You can download the installer from its website:
http://pypi.python.org/pypi/setuptools
As an alternative you can use a Linux package manager to install setuptools. On most distributions the package name is "python-setuptools".
Once you've installed setuptools you can install pip by running:
sudo easy_install pip
Now you can install fileconveyor.
Change to the directory you want to hold the fileconveyor files, then run:
sudo pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor
The fileconveyor source files will be downloaded to the src/fileconveyor directory, relative to where you run the pip command. For example, if you run pip in the /usr/local directory, the fileconveyor script directory will be in /usr/local/src/fileconveyor.
Running the install with sudo (or as root) lets pip handle installing dependencies like django and python-cloudfiles.
Before running fileconveyor you'll need to configure it by creating a file named "config.xml" in the same directory as the arbitrator.py file.
If you are starting in the directory you were in when you started the install, you can run:
sudo nano src/fileconveyor/fileconveyor/config.xml
For a simple configuration that will sync the contents of a directory with a Cloud Files container, paste the following text into the file:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<!-- Sources -->
<sources ignoredDirs="">
<source name="test" scanPath="/var/www/html/test" />
</sources>
<!-- Servers -->
<servers>
<server name="Rackspace Cloud Files" transporter="cloudfiles">
<username>USERNAME</username>
<api_key>APIKEY</api_key>
<container>CONTAINER</container>
</server>
</servers>
<!-- Rules -->
<rules>
<rule for="test" label="Test">
<destinations>
<destination server="Rackspace Cloud Files" path="test" />
</destinations>
</rule>
</rules>
</config>
You'll need to modify the config to fit your environment and account details.
It's possible to perform more complex syncs by using multiple rules, syncing from multiple sources, or having fileconveyor change the filename or some of a file's properties before copying it to Cloud Files (using "processors"). More details can be found in fileconveyor's documentation and on the project's website.
With the configuration all set, it's time to run fileconveyor for its initial sync. The "arbitrator.py" script handles launching fileconveyor's various components:
sudo python src/fileconveyor/fileconveyor/arbitrator.py
The fileconveyor program is written to be run as a console script, without an included init script or means of forking the process to run as a daemon. For testing purposes you can run the script directly from a command line. For persistent use you'll want to either set up an init script or run the program from a screen session, as in:
screen python src/fileconveyor/fileconveyor/arbitrator.py
Once the initial sync completes you should be able to see the results in the target container via the Cloud Control Panel.
The sample configuration we provide is simple, and you can do much more with fileconveyor to customize its operation to your needs. Check the documentation in the source directory and the project web page for full details, but here are a few more options:
Running verify.py will check the source directory against the Cloud Files container to confirm that the files synced properly.
These instructions have you install and run via sudo, but fileconveyor doesn't require root privileges to run. You can also chown the fileconveyor directory and its contents to an unprivileged user.
The application runs off a django backend source to connect to the various servers and any DeprecationWarning entries in the log can be safely disregarded.
You can edit values in the settings.py file to make the locations of the SQlite databases, the pid file, and other system files more permanent. For example:
RESTART_AFTER_UNHANDLED_EXCEPTION = True
RESTART_INTERVAL = 10
LOG_FILE = '/var/log/fileconveyor.log'
PID_FILE = '/var/run/fileconveyor/fileconveyor.pid'
PERSISTENT_DATA_DB = '/etc/fileconveyor/persistent_data.db'
SYNCED_FILES_DB = '/etc/fileconveyor/synced_files.db'
WORKING_DIR = '/tmp/fileconveyor'
MAX_FILES_IN_PIPELINE = 50
MAX_SIMULTANEOUS_PROCESSORCHAINS = 1
MAX_SIMULTANEOUS_TRANSPORTERS = 10
MAX_TRANSPORTER_QUEUE_SIZE = 1
QUEUE_PROCESS_BATCH_SIZE = 20
CALLBACKS_CONSOLE_OUTPUT = False
CONSOLE_LOGGER_LEVEL = logging.INFO
FILE_LOGGER_LEVEL = logging.DEBUG
RETRY_INTERVAL = 30
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License

33 Comments
uk authentication
Is there a place where the specific auth url for uk has to be defined?
re: UK Cloud Files
The first step will be finding where python has your packages installed. This is probably in /usr/local/lib/pythonX.X/dist-packages, where "X.X" would be the installed python version.
In the packages directory look for the directory for django-cumulus. Inside that directory, cd into the "cumulus" directory.
In the cumulus directory, edit "settings.py".
Early in that file it defines several properties for "CUMULUS", including the line:
'AUTH_URL': 'us_authurl',
Change the "us" to "uk", so it looks like:
'AUTH_URL': 'uk_authurl',
Hopefully that should make it so the next time you launch fileconveyor it will connect to UK Cloud Files.
Thnx for your feedback, the
I've changed the AUTH_URL setting in /usr/lib/python2.6/site-packages/cumulus/settings.py.
Crashes Consistently
OSError: [Errno 2] No such file or directory: '/home/clients/test.com/htdocs/wp-content/themes/dbs_bp/4913'
The 4913 at the end should be a file name like settings.php, and stays consistent (its always 4913). The path is valid.
Just hoping someone has a clue since it seems like very nice tool.
re: 4913
I assume you haven't made any changes to pyinotify.py, the python package that would be pushing the file change notification to fileconveyor. Maybe the 4913 is being added to a tempfile, and the file's being deleted before fileconveyor gets to where it's preparing to sync it?
One test I'd run would be to copy a file from an unsynced directory into the synced directory, to see if the copied file syncs or throws the same error. That could at least tell you if the issue is being created by WordPress somehow when it edits files or if it's in fileconveyor itself.
Yes, I thought it was strange
I meant to ask earlier should files sync automatically the first time the arbitrary.py is run, because it doesn't. I assumed it would but its yet to copy one file. In any case, copying a file into the synced folder has the same effect -- nothing at all happens, no errors, no messages on the console, nothing is uploaded.
Thanks for the help!
I tried your suggestion, and copying a file into the synced folder literally does nothing.
re: Strange
https://github.com/wimleers/fileconveyor
You might also run verify.py in the fileconveyor package to see if it reveals anything of interest.
Failing that, open an issue on the project's github site. The author might be able to provide more insight into why you would see that behavior.
I actually started with the
verify.php says ...
Finished verifying synced files. Results:
- Number of checked synced files: 0
- Number of invalid synced files: 0
I did open a support request.
I hadn't noticed a log file ... there is something interesting ... I am getting a:
Filter queue: dropped '/home/clients/example.com/htdocs/wp-content/themes/dbs_bp/functions.php' because it doesn't match any rules.
There is a lot of those. In the rules section I changed the path setting to just "" per the instructions above, but still am not clear what that could/should be. Right now I am just trying to sync one folder as a test. So maybe that is the problem? It throws an error on start up if I comment out that section.
Thanks.
re: Config
By all means: http://pastie
re: Stuff to try
Things to try:
- Try removing the trailing slash ("/") from the end of the scanPath value in the source definition (so end with "/dbs_bp" instead of "/dbs_bp/"). The examples lack that trailing slash, so the parser might get confused by it.
- In the rule's destination, you might change path="/" to path="", since it sounds like that will still tell it to use the root of the container.
- Try removing the "<filter>" block entirely. If that isn't there then it should try syncing everything in the directory (so you might also change the source directory, if there's a ton of stuff in there). If there's no filter and it starts syncing, then it could be that the filter wasn't matching files properly.
I very much appreciate any
1. I had started without the trailing slash. I tried adding that as a shot in the dark -- makes no difference.
2. On the path setting I have tried "", "/", ".", "*". No difference.
3. I just tried without the filter section, and still not syncing.
Still getting that crazy 4913 error if anything is changed while its running.
Some progress .... I have the
I still have the crash though any time a file is change :(
More good news ... after much
http://pastie.org/7530640
I am adding this to my support request with the author. Not sure I know enough python to fix it properly.
re: Progress
set backupdir=~/tmp
Hopefully that will ensure vim doesn't create any extra files in the directory to be synced and throw things off.
Vim was set up that way
re: vim
set backupcopy=yes
It sounds like that directive should tell vim to use a backup copy for the editing, then replace the contents of the original when saved instead of unlinking the original and making a new file.
Hopefully that will help.
Thanks for the suggestion.
re: No difference
It may work "well enough" for
I first heard of fileconveyor on a blog by someone else at Rackspace that was focused on Magento. That post said RS had forked fileconveyor and made some changes. Do you know anything about that? http://www.rackspace.com/blog/easily-sync-server-files-to-the-cloud/
re: blog post
re: fork
Excellent! This would be a
re: timeline
Thanks!
I've run into a similar
I filed another issue report with the author (no answers yet). If Rackspace is going to be maintaining their own fork, maybe someone can address this. A solid way of ignoring files and folders would really be nice.
re: WP Supercache
I'll pass this further information along to the guy working on the fork. He mentioned trying to nail down a bug with inotify, it's possible your problems could be related to that issue.
Can the file conveyor be used
Am I correct that the process is:
file conveyor:
server writes something local -> file conveyor is run once in a while-> file is on the cloud
cloudfuse:
server writes something on the mounted cloud container
Cloudfuse has not been updated for a while and I have had some issues with it that are not addressed.
Thanks,
re: CloudFuse
re: Can the file conveyor be used
re: Support
sync error to cf container in Chicago(ORD)
Arbitrator.Transporter - ERROR - The transporter 'Cloud Files' has failed while transporting the file 'filename' (action: 1). Error: 'ord_container_name'.
re: sync error with ORD
Add new comment