On collaborative (remote) paper writing

Writing scientific conference or journal papers is an art by itself. This article is not about writing great papers as there already are many good articles that focus on paper writing itself and cover technical aspects, structural aspects, or writing style aspects. In this article I want to give an overview of collaborative writing and some experiences I had during the last couple of scientific paper submissions.

Simple collaborative writing starts in a sequential mode where always one party is writing and another party gives feedback or carries out slight changes to the paper. This simple model fits the adviser and grad student model fairly well, where the grad student produces individual drafts of a paper and the adviser gives feedback on each draft. The draft can start off with a vague description of the actual research project and many research questions might only be answered along the way, yet the model is very simple and also easy to coordinate: either push based, where the adviser explicitly asks for feedback or pull based, where the adviser periodically pulls for progress from the student.

But as soon as either (i) more people join the project, (ii) not all team members are at the same physical location (i.e., all hands on meetings are no longer possible or easily possible), or (iii) the collaboration becomes more interactive it gets more complicated. Some of the questions that arise are:

* How can we write concurrently on different sections?
* How can we coordinate a common goal or structural changes?
* How do we stay focuesed and on track?

In my (limited and still very short) experience it makes sense to group the collaborative writing process into three phases: (i) the brainstorming and research phase when the project is still very volatile, (ii) the distributed paper writing phase where the key points are shaped, and (iii) the freeze phase where individual sections are finished and closed off before the final submission.

Brainstorming and research phase

During this phase the project is still fresh and very volatile: not all research questions have to be fully defined yet. A shared document (a wiki page or a shared Google document) is crucial to fast track this process as it offers a convenient way to write down notes and a rough design section of the project. An additional section can cover related work and key differences between the current project and all other related projects. At this point in time it does not make sense to write abstract, introduction, conclusion yet, as the direction of the project might still change.

As time progresses, the first results of the evaluation can be added into their own section. This is also a great time to do weekly (or bi-weekly) status meetings, either via phone call, skype/hangout, or email. This phase progresses until a couple of weeks before the submission deadline and the results in the evaluation should improve continuously.

Distributed paper writing phase

The "hot" paper writing phase starts a couple of weeks (at least 2, ideally 3-5) before the paper deadline. The existing text is moved from the wiki page (or Google document) into a source repository and formatted according to the submission guidelines of the conference (or journal).

To produce a great final paper I propose a weekly build cycle but you can obviously adjust the time for each sub task to your needs. Each cycle consists of the following sub tasks: (i) 4 days of concurrent modifications with (sub-)section-level locks, (ii) 1 day reading pass, and (iii) asking for external feedback.

Concurrent modification

During this task all team members collaborate on the paper and update individual sections of the paper concurrently. The paper and all files are already in a source revision system. Most of these systems like git or svn handle partial concurrent textual changes and simple conflicts even in monolithic files fairly well (e.g., person A changes the first section while person B changes the second section concurrently). The discussion of using a single monolithic file versus per section files is almost religious (just like vim versus emacs) but I like monolithic files due to, e.g., the simplicity of moving text around, and the ease of global string replacement.

But as soon as two persons change the same section concurrently it becomes very hard to resolve conflicts. To reduce the risk of conflicts it helps if we use explicit locks (i.e., only the person that currently holds the lock for a specific section is allowed to edit and change that section). Depending on the size of the team different lock strategies are possible. For small teams it can be advantageous to send explicit emails to all team members, thereby pushing explicit lock information. If the teams are larger then the amount of emails explodes and it becomes confusing who currently holds which locks. The wiki page (or Google document) from the first phase comes to the rescue again: the shared document can be used to keep track (on a per section basis) who holds which locks and if all team members acquire/release their locks as they work on their sections the risk for conflicts is eliminated. The shared document also allows queues on who requires the lock when it is released, depending on the protocol the releasing team member can send a direct push email to the person that continues on that section.

Bookkeeping and full passes

Every couple of days it makes sense to temporarily freeze all sections to allow a synchronized reading pass. Team members concurrently read the paper from beginning to end. In this pass there are only two kinds of changes allowed: (i) typos, phrasing, and wording and (ii) adding todo notes to individual sections that are then handled by the team member who is responsible for this section in the next concurrent modification pass.

After the reading passes each team member commits a list of remaining task to the shared document and lists open questions that are discussed during the next meeting.

External reviews

At the end of each bookkeeping pass it makes sense to generate a draft and send it to (i) advisers or more remote team members so that they can give feedback on the progress as well, (ii) friends who never read the paper to get valuable first-hand comments, and, as soon as the paper is mature enough, (iii) send it to external reviewers that can give detailed and harsh feedback.

Depending on the amount of passes you can do (i.e., how many weeks you have left until the deadline) you can stretch out your friends and send different papers to different sets of friends, but remember that you should not overburden your colleagues as they might work towards the same deadline (and always return the favor of reviewing papers for them as well) and that each person can only read a paper for the first time once (this step is very important as a potential reviewer will read the paper as a first time reader without any prior knowledge about the idea, design, or other background information).

The feedback from this task is then applied concurrently during the next concurrent writing phase. Individual reviews will dribble in asynchronously and can be discussed on demand.

Freeze phase

One to two days before the deadline the lead author should start to freeze sections by marking sections as frozen in the shared locking document. A frozen section indicates that only the lead author may change any text in this section and only after a second person has reviewed this change.

The last hours before the deadline are always very stressful and human mistakes tend to pile up. Freezing sections and requiring a two person review reduces the risk of human errors and makes the submission process smoother. Also remember to submit individual versions of the paper after every couple of changes, best start submitting the first versions when you start freezing sections.

Conclusion

Writing papers is fun and writing in a big collaboration that is remote can be fun too! Collaboration and team work always includes additional challenges but if you prepare well and are willing to adhere to a strict regimen: (i) synchronization is crucial and you should know what the other team members are doing, (ii) lock pages only work if people actually keep track of the logs, and (iii) you must keep track of the schedule to send out the drafts for external reviews. So enjoy your next collaborative research project!

blogroll

social