Online Collaboration
Contents |
Collaborative Editing
Suppose you are working on an article or a book with a number of coauthors. All coauthors work on parts of the book in parallel. Even if only one person works on each chapter, say, there are files that everyone changes frequently, e.g., a central bibliography file. Not all of the people involved change the files, some, e.g., your publisher, just wants to see where the project's at and view or download files. Everyone wants to have access to the most up-to-date versions of the files. The files are in various formats: LaTeX source files and BibTeX bibliographies, PDFs of the output, Word files for things like style guidelines. How do you manage something like this more efficiently than emailing files back and forth?
The Problems
- Duplication of work/Keeping everything up to date
- The most pressing problem is that unless files are shared in some way, there is a very real danger that not everyone will be working with the most up-to-date versions of the files. You don't want to make changes to a chapter only to discover later that someone else has completely rewritten the chapter, but not told you about it. You don't want to add a bunch of references into the bibliography only to discover that someone else has already added them, but hasn't shared their updated bibliography file.
- There should be a mechanism for keeping everyone's files up-to-date at all times, and allow access to the most up-to-date versions of all files.
- Conflicts
- If two or more people work on the same file and make changes simultaneously and then share their changes (via whatever method is used to keep things up-to-date), ther changes will conflict. Say, both A and B access the latest version of the central bibliography and each adds an entry, then A shares their version, followed by B. A's version doesn't contain B's entry, and B's doesn't contain A's. If B's version counts as "most up-to-date" because it's shared later, A's edits will be lost.
- There should be a mechanism for handling conflicts automatically ("merging"), or at least for alerting collaborators that conflicts exist.
- Versions
- Sometime people make mistakes, inadvertedly delete text or entire files, change things that shouldn't be changed, etc. If these mistakes get propagated to everyone's copy by the "keep up-to-date" mechanism, someone has to do a lot of work to fix things. This can be avoided if there is a way to "undo" changes, to revert to an earlier version of the files.
- There should be a mechanism for versioning: it should be possible to recover older versions of files.
- Offline Use
- Not everyone has constant internet access. It should be possible to have all the files on a local computer, and work there (which is especially important if you need to not just edit, but compile files, as when working with LaTeX) without an internet connection.
- Usability
- Not all your collaborators are computer whizzes running Linux and doing everything on a command line. They want things to be simple to install, to have a nice (graphical) user interface, and to not have to learn lots of technical material to get things running.
- The solution should be user-friendly. Ideally, there should be a web interface at least for read-only access.
- Cost
- The solution should not cost anything. In particular, the software should be freely available, you should be able to utilise existing (free) services, and it should not be labor-intensive to implement the solution (i.e., it should not involve setting up your own server).
- Cross-platform
- Your coauthors work on a variety of different platforms. The solution must be available on Windows, Mac OS, and Linux.
- Many Formats
- You have a number of different files formats, some text-based (.tex, .bib), some binary (.pdf, .doc).
- Security
- Your work should only be accessible by authorized users. You do not want the world to see what you're doing, and even less allow them to make changes.
- File Formats
- text (and consequently also LaTeX) files on Unix-based systems end every line with a line-feed character (LF); on Windows, every line ends with a carriage-return/line-feed (CR/LF) pair. When you email a Unix text file to a Windows machine, all the lines are run together (lines are missing the CR); if you email a Windows text file to a Unix machine, lines will be interspersed with blank lines. So if you have people working on Windows and Unix/MacOS, you constantly have to convert text files using utilities like dos2unix. That is annoying.
Thanks to the commenters on this blog post for suggestions.
Revision Control Systems
Revision Control systems were develped to facilitate collaboration between people working on large software projects. They are designed to address the problems of versioning and resolving editing conflicts, in particular.
- CVS is the most widely used system.
- Quick Reference to revision control software.
- Feature Comparison of various options.
Transfer Protocols
Any solution will require sending files back and forth between users or between users and a central server. There are two ways of transferring files back and forth to a server that are useful in this context.
- WebDAV
- WebDAV is a relatively new protocol that extends HTTP (the file transfer protocol used for web pages). It is the protocol behind Windows' "webfolder" function; on Windows, Mac OS, and linux it is possible to "mount" a WebDAV server so that it appears as if it were a local directory/folder. It can also be accessed by standalone client programs. Some file synchronization programs such as sitecopy support it, and some free storage services also support access through it. If the server also supports the DeltaV versioning extensions to WebDAV, files on the server can be put under version control; every change to a file on the server then results in a separate version of the file, and one can then access previous versions. Because WebDAV works on top of HTTP, text files are automatically converted between Unx and Windows formats, no conversion is necessary (provided your files get transferred using the text/plain media type).
- ssh
- ssh (secure shell) is a way to access a remote server over a secure, authenticated connection. Many revision control programs as well as the file synchronization tool unison require it. To use file transfer over ssh, you need a server that supports it and accounts for everyone accessing the server. Free storage services don't support it.
Online Storage Services
- Online storage services comparison
- freepository provides free CVS repositories (300MB), access via web-based interface and CVS clients (cross-platform).
- box.net offers 1GB of free space and WebDAV access, but collaboration features are not included in the free accounts.
- sharemation/xythos has free accounts with web-based and WebDAV (with DeltaV versioning extensions) access, but only 5MB.
A Low-rent Solution
A low-cost option: use unison bi-directional file synchronization software with a star-like topology (a central server that everyone syncs to) to keep everyone up-to-date. Works cross-platform, has graphical user interface, and handles conflicts automatically by merging changes. Drawbacks:
- No versioning (but can keep predefined number of backups).
- No checkin/checkout/locking, so there's no control over who gets to edit a file at any given time (but, unison can do automagic merging, so if two people edit a file simultaneously and then sync, all changes should be reflected in the central version).
- No way to see who made which changes, or when.
- You need ssh access to the server, so you can't do this with a free service like box.net. However, if you can make a WebDAV server appear as a local drive or directory, you could use unison to sync between two local directories. On Linux, you can do this using the davfs2 file system (which may require re-compiling your kernel), on Windows there are a couple of programs that map WebDAV sites to drive letters, such as Xythos drive or WebDrive (not free, though check with your university: many schools offer WebDAV based file storage services and have site licenses for such software).
Using WebDAV with Versioning
Use a WebDAV server that supports the DeltaV versioning extensions. Using the right clients, you can access the server and keep a local copy of the files. Use the client to checkout/checkin/lock files that you're working on. Anyone with the right access rights can access the files over Windows Explorer, Finder (Mac), or Nautilus (Gnome); anyone with a DeltaV-aware WebDAV client can read/write while obeying file locks.
For this solution, you need a WebDAV server with versioning enabled. The only free services I've found that does that is sharemation (5MB disk space) and BSCW (10MB disk space). Versioning of PDFs will quickly eat this up, but in WebDAV you can decide which files to put under version control and which ones not. Both sharemation and BSCW, in addition, have nice web-based interfaces, BSCW even lets you add things like discussions and polls to your folders. Drawbacks:
- If you want more diskspace, then you have to run your own server. Xythos, the software behind sharemation, is commercial, and so is BSCW, but the BSCW license is free for non-commercial use.
- WebDAV does versioning, but it doesn't deal with conflicts. If A and B (in the example above) both upload their changed versions to the WebDAV server, it will result in two versions of the file, but neither contains all the edits.
Using a Revision Control System
The cleanest solution is to use a revision control system. Most of these, however, are difficult to set up, don't have graphical user interfaces, don't run on all platforms, or require a separate server.
A passable solution is to use CVS and a free CVS repository. You can get a free 300MB CVS repository on freepository.com. This has a web interface, so you can access the repository from anywhere and any platform, and it will let you add other users to your repository. It supports all the CVS niceties (versioning, branches, diffs, etc), and you can access it with a CVS client as well. freepository, however, uses the sserver CVS extension to access the server over an SSL connection, and that's not supported out-of-the-box in all CVS clients. They provide precompiled binaries for a number of Windows and Linux clients, though. CvsNT now supports sserver natively, so it will run under OS X as well. Two problems:
- The web interface lets you update a file, and this will create a new revision, but if someone else checked in something between the time you downloaded the file and updated it with your changes, their changes will be overwritten (i.e., no automagic merging). Conflicts are only resolved if you use CVS to update/checkin your files.
- Binary files (e.g., PDF) don't work with CVS versioning--each update results in a new copy of the entire file (one for each revision). That may eat up you 300MB on the free account pretty fast.
A Combined Solution
Suppose you decide to use a revision control system, but can't use the Xythos or BSCW web front-end for your CVS repository (e.g., the disk quotas are too small, or you don't want to use CVS, which is becoming superceded by newer and fancier revision control systems like subversion or bazaar). If you have a cooperative IT department, they might let you install that system on your university's server; or perhaps they are already running one. Now you still have to solve the problem that the non-geeky collaborators, who won't run the newfangled software to keep their tree synchronized, don't have easy access to the current files. (Not all the revision control systems have web interfaces; even if they do, it requires additional work to insall them.)
Well, the geeks who can deal with installing command-line programs and actually work on the LaTeX can use the revision control system to keep their files in sync. One of them then additionally keeps their source tree synced (using some utility that does uni-directional syncing over WebDAV, such as sitecopy) with a box.net account. The people who don't need write access, but would like to be able to see the most current versions of the files can access it over the box.net account. You exclude binaries (e.g., PDF) on the CVS side, but the most current versions of the PDFs get mirrored to the box.net account.