Writing scientific conference or journal papers is an art by itself. This
article is not about writing great papers as there already are many good
articles that focus on paper writing itself and cover technical aspects,
structural aspects, or writing style aspects. In this article I want to give an
overview of collaborative writing and some experiences I had during the last
couple of scientific paper submissions.
Simple collaborative writing starts in a sequential mode where always one
party is writing and another party gives feedback or carries out slight changes
to the paper. This simple model fits the adviser and grad student model fairly
well, where the grad student produces individual drafts of a paper and the
adviser gives feedback on each draft. The draft can start off with a vague
description of the actual research project and many research questions might
only be answered along the way, yet the model is very simple and also easy to
coordinate: either push based, where the adviser explicitly asks for feedback or
pull based, where the adviser periodically pulls for progress from the student.
But as soon as either (i) more people join the project, (ii) not all team
members are at the same physical location (i.e., all hands on meetings are no
longer possible or easily possible), or (iii) the collaboration becomes more
interactive it gets more complicated. Some of the questions that arise are:
* How can we write concurrently on different sections?
* How can we coordinate a common goal or structural changes?
* How do we stay focuesed and on track?
In my (limited and still very short) experience it makes sense to group the
collaborative writing process into three phases: (i) the brainstorming and
research phase when the project is still very volatile, (ii) the distributed
paper writing phase where the key points are shaped, and (iii) the freeze phase
where individual sections are finished and closed off before the final
submission.
Brainstorming and research phase
During this phase the project is still fresh and very volatile: not all research
questions have to be fully defined yet. A shared document (a wiki page or a
shared Google document) is crucial to fast track this process as it
offers a convenient way to write down notes and a rough design section of the
project. An additional section can cover related work and key differences
between the current project and all other related projects. At this point in
time it does not make sense to write abstract, introduction, conclusion yet, as
the direction of the project might still change.
As time progresses, the first results of the evaluation can be added into their
own section. This is also a great time to do weekly (or bi-weekly) status
meetings, either via phone call, skype/hangout, or email. This phase progresses
until a couple of weeks before the submission deadline and the results in the
evaluation should improve continuously.
Distributed paper writing phase
The "hot" paper writing phase starts a couple of weeks (at least 2, ideally 3-5)
before the paper deadline. The existing text is moved from the wiki page (or
Google document) into a source repository and formatted according to the
submission guidelines of the conference (or journal).
To produce a great final paper I propose a weekly build cycle but you can
obviously adjust the time for each sub task to your needs. Each cycle
consists of the following sub tasks: (i) 4 days of concurrent modifications with
(sub-)section-level locks, (ii) 1 day reading pass, and (iii) asking for
external feedback.
Concurrent modification
During this task all team members collaborate on the paper and update
individual sections of the paper concurrently. The paper and all files are
already in a source revision system. Most of these systems like git or svn
handle partial concurrent textual changes and simple conflicts even in
monolithic files fairly well (e.g., person A changes the first section while
person B changes the second section concurrently). The discussion of using a
single monolithic file versus per section files is almost religious (just like
vim versus emacs) but I like monolithic files due to, e.g., the simplicity of
moving text around, and the ease of global string replacement.
But as soon as two persons change the same section concurrently it becomes very
hard to resolve conflicts. To reduce the risk of conflicts it helps if we use
explicit locks (i.e., only the person that currently holds the lock for a
specific section is allowed to edit and change that section). Depending on the
size of the team different lock strategies are possible. For small teams it
can be advantageous to send explicit emails to all team members, thereby pushing
explicit lock information. If the teams are larger then the amount of emails
explodes and it becomes confusing who currently holds which locks. The wiki page
(or Google document) from the first phase comes to the rescue again: the shared
document can be used to keep track (on a per section basis) who holds which
locks and if all team members acquire/release their locks as they work on their
sections the risk for conflicts is eliminated. The shared document also allows
queues on who requires the lock when it is released, depending on the protocol
the releasing team member can send a direct push email to the person that
continues on that section.
Bookkeeping and full passes
Every couple of days it makes sense to temporarily freeze all sections to allow
a synchronized reading pass. Team members concurrently read the paper from
beginning to end. In this pass there are only two kinds of changes allowed: (i)
typos, phrasing, and wording and (ii) adding todo notes to individual sections
that are then handled by the team member who is responsible for this section in
the next concurrent modification pass.
After the reading passes each team member commits a list of remaining task to
the shared document and lists open questions that are discussed during the next
meeting.
External reviews
At the end of each bookkeeping pass it makes sense to generate a draft and send
it to (i) advisers or more remote team members so that they can give feedback on
the progress as well, (ii) friends who never read the paper to get valuable
first-hand comments, and, as soon as the paper is mature enough, (iii) send it
to external reviewers that can give detailed and harsh feedback.
Depending on the amount of passes you can do (i.e., how many weeks you have left
until the deadline) you can stretch out your friends and send different papers
to different sets of friends, but remember that you should not overburden your
colleagues as they might work towards the same deadline (and always return the
favor of reviewing papers for them as well) and that each person can only read a
paper for the first time once (this step is very important as a potential
reviewer will read the paper as a first time reader without any prior knowledge
about the idea, design, or other background information).
The feedback from this task is then applied concurrently during the next
concurrent writing phase. Individual reviews will dribble in asynchronously
and can be discussed on demand.
Freeze phase
One to two days before the deadline the lead author should start to freeze
sections by marking sections as frozen in the shared locking document. A
frozen section indicates that only the lead author may change any text in this
section and only after a second person has reviewed this change.
The last hours before the deadline are always very stressful and human mistakes
tend to pile up. Freezing sections and requiring a two person review reduces the
risk of human errors and makes the submission process smoother. Also remember to
submit individual versions of the paper after every couple of changes, best
start submitting the first versions when you start freezing sections.
Conclusion
Writing papers is fun and writing in a big collaboration that is remote can be
fun too! Collaboration and team work always includes additional challenges but
if you prepare well and are willing to adhere to a strict regimen:
(i) synchronization is crucial and you should know what the other team members
are doing, (ii) lock pages only work if people actually keep track of the logs,
and (iii) you must keep track of the schedule to send out the drafts for
external reviews. So enjoy your next collaborative research project!