Tuesday, August 4, 2009

Tips for Writing Technical Papers

Jennifer Widom, January 2006


Here are the notes from a presentation I gave at the Stanford InfoLab Friday lunch, 1/27/06. The presentation covered:

Running Example

As a running (fictitious!) example, suppose you've designed and run experiments with a new algorithm for external multipass merge-sort. Your algorithm reduces the complexity from O(n log n) to O(n), under the premise that it's acceptable to have some bounded "unsortedness" in the result. You plan to write up the results for submission to a major conference.

Note: This example was used throughout the live presentation but I haven't followed through much in these notes. Thus, the notes include several exercises for the reader.

Paper Title

Titles can be long and descriptive: or short and sweet: Hector believes it's important for the paper (or at least the algorithm) to have a cute name that sticks in people's minds:

The Abstract

State the problem, your approach and solution, and the main contributions of the paper. Include little if any background and motivation. Be factual but comprehensive. The material in the abstract should not be repeated later word for word in the paper.

(Exercise: Write an abstract for the multiway sort example.)

The Introduction

Here is the Stanford InfoLab's patented five-point structure for Introductions. Unless there's a good argument against it, the Introduction should consist of five paragraphs answering the following five questions:
  1. What is the problem?
  2. Why is it interesting and important?
  3. Why is it hard? (E.g., why do naive approaches fail?)
  4. Why hasn't it been solved before? (Or, what's wrong with previous proposed solutions? How does mine differ?)
  5. What are the key components of my approach and results? Also include any specific limitations.
(Exercise: Answer these questions for the multiway sort example.)

Then have a final paragraph or subsection: "Summary of Contributions". It should list the major contributions in bullet form, mentioning in which sections they can be found. This material doubles as an outline of the rest of the paper, saving space and eliminating redundancy.

(Exercise: Write the bullet list for the multiway sort example.)

Related Work

The perennial question: Should related work be covered near the beginning of the paper or near the end?

The Body

Critical rule of thumb: A clear new important technical contribution should have been articulated by the time the reader finishes page 3 (i.e., a quarter of the way through the paper). Aside from this rule of thumb, which applies to every paper, the structure of the body varies a lot depending on content. Important components are:

Performance Experiments

We could have an entire treatise on this topic alone and I am surely not the expert. Here are some random thoughts:

The Conclusions

In general a short summarizing paragraph will do, and under no circumstances should the paragraph simply repeat material from the Abstract or Introduction. In some cases it's possible to now make the original claims more concrete, e.g., by referring to quantitative performance results.

Future Work

This material is important -- part of the value of a paper is showing how the work sets new research directions. I like bullet lists here. (Actually I like them in general.) A couple of things to keep in mind:

The Acknowledgements

Don't forget them or you'll have people with hurt feelings. Acknowledge anyone who contributed in any way: through discussions, feedback on drafts, implementation, etc. If in doubt about whether to include someone, include them.

Citations

Spend the effort to make all citations complete and consistent. Do not just copy random inconsistent BibTex (or other) entries from the web and call it a day. Check over your final bibliography carefully and make sure every entry looks right.

Appendices

Appendices should contain detailed proofs and algorithms only. Appendices can be crucial for overlength papers, but are still useful otherwise. Think of appendices as random-access substantiation of underlying gory details. As a rule of thumb:

Grammar and Small-Scale Presentation Issues

In general everyone writing papers is strongly encouraged to read the short and very useful The Elements of Style by Strunk and White. Here's a random list of pet peeves.
  • Just like a program, all "variables" (terminology and notation) in the paper should be defined before being used, and should be defined only once. (Exception: Sometimes after a long hiatus it's useful to remind the reader of a definition.) Global definitions should be grouped into the Preliminaries section; other definitions should be given just before their first use.
  • Do not use "etc." unless the remaining items are completely obvious.
    • Acceptable: We shall number the phases 1, 3, 5, 7, etc.
    • Unacceptable: We measure performance factors such as volatility, scalability, etc.
  • Never say "for various reasons". (Example: We decided not to consider the alternative, for various reasons.) Tell the reader the reasons!
  • Avoid nonreferential use of "this", "that", "these", "it", and so on (Ullman pet peeve). Requiring explicit identification of what "this" refers to enforces clarity of writing. Here is a typical example of nonreferential "this": Our experiments test several different environments and the algorithm does well in some but not all of them. This is important because ...

    (Exercise: The above rule is violated at least once in this document. Find the violations.)

  • Italics are for definitions or quotes, not for emphasis (Gries pet peeve). Your writing should be constructed such that context alone provides sufficient emphasis.

    (Exercise: The above rule is violated at least once in this document. Find the violations.)

  • People frequently use "which" versus "that" incorrectly. "That" is defining; "which" is nondefining. Examples of correct use:
    • The algorithms that are easy to implement all run in linear time.
    • The algorithms, which are easy to implement, all run in linear time.

Mechanics

Versions and Distribution

No comments:

Post a Comment