Rules for Researchers

Mar 13, 2017

Below are a selected list of rules that I have accumulated over time about academic production and output, which I impose upon students, but have never before put in one place.

Papers should be focused and to the point, and not begin with trite observations like "Congestion is a problem the world over." Usually you can delete your opening paragraph if it begins like that, and the reader is no worse off.
Strunk and White correctly say "Omit needless words." Be merciless with the delete key. If you feel bad about that, cut and paste your precious words into a new document call "__Cuts.tex ." You can later bring them back or use them elsewhere. Always ask why each word is in the document. If you cannot give a good answer, it is deletable.
Get an editor.
You should use a version control system for writing, so you can access earlier drafts as needed. Some software has this built in now.
The Abstract should not say the same thing as the Introduction or the Conclusion.
The Introduction should not say the same thing as the Conclusion.
A minimum standard for a good paper is transparency and replicability. Can the reader understand what you did, and repeat it, and get the same answer.
Keep the equivalent of a lab notebook. So for instance, when doing statistics, record each regression and variable definition change.
Exploratory analysis is useful, it helps you understand the data. It is important to correlate and to graph your variables.
Formulate clear hypotheses based on your best understanding of how the universe works. If your data do not corroborate your hypotheses, you might consider having alternative hypotheses. Nevertheless, do not discard your original hypotheses just because they were not corroborated.
Do not p-hack (or engage in other disreputable methods) intentionally or unintentionally. Getting statistically non-significant results is fine, and avoiding this is not worth losing your ethics for, and in any case adds to human knowledge. Maybe not everything gives you cancer. It may make publication more difficult, which is unfortunate. In the end, replicability is critical.
If you must report a (reduced) model with only significant results to satisfy a reviewer, also report the model with the variables that you tested that were insignificant (the complete model).
Every document (dissertation, thesis, report, paper) with more than a handful of variables shall have a table of nomenclature which includes each variable and its definition.
Each variable shall have one, and only one, definition per document.
Each defined term in the document shall be represented by one and only one variable.
Lowercase and uppercase versions of the same letter should be logically related. For instance, use lowercase letters to define the PDF (probability distribution function) or individual instance, and uppercase letters the CDF (cumulative distribution function) or population, so when you sum: i=1 to I, k=1 to K, etc.
All variables shall be a single letter or symbol. Double or triple letter variables can be confused with multiplication. If you have more than 52 symbols in your paper (26 letters for both lower and upper case), consider (a) there are too many, and (b) using Greek or Hebrew characters.
Use subscripts liberally to differentiate things that, for instance, are of a class but measured differently, or computed with different assumptions.
All equations shall have all of their variables defined.
All maps shall have legends and scales and north shall be on top (unless stated otherwise).
All units shall be metric (SI) units. Imperial units may be listed as alternates.
All graphs shall have their axes labeled, clearly, with units as appropriate.
Legends and scales shall be as consistent as possible between graphics, so that they can be compared.
Pseudo-3D graphs are morally wrong, and aim to deceive and obfuscate. Real 3D graphs with 3-axes are fine if you can make them readable on a static 2D page.
The use of Microsoft Word is forbidden. Consider using LaTeX instead. Stick to the same template for your writing in as many documents as possible, this will ease compilation and mixing and matching for future research projects. Many reports to sponsors become theses and journal articles, it is convenient to be able to reuse the text with a minimum of reformatting.
Use a standard reference database, like BibDesk for BibTeX. Get your references from Google Scholar or similar to maintain naming conventions for references.
Use the same template for all your presentations, and use consistent fonts and styles, so you can mix and match slides with a minimum of grief. Beamer or Keynote are recommended.
When you submit a paper to a journal, put an event in your calendar to email the editor after 90 days to remind them to get the reviews back to you. Journals can be black holes.
The "reviewer is always right" even when they are wrong. You need to either suck it up and be obsequious and acknowledge their points, even if you don't agree with them, or send to another journal. Sending to another journal of course restarts the review clock.
Papers are almost never accepted the first round. This is a sad fact and does not reflect on you so much as on the academic process and the belief that nothing is perfect.
Good papers are often rejected. Bad papers are sometimes accepted.
Highly ranked journals do not necessarily have better papers than less highly ranked journals.
Impact factors and H-indices have embedded scale biases favoring large journals with lots of papers (given the non-normal distribution of paper citation). They are mostly nonsense at the level of the journal.
In the end, science favors truth, so flashy but wrong claims will eventually be rebutted. In the meantime those wrong claims will garner many citations. This does not justify knowingly being wrong.
Articles in hot, controversial areas garner more citations immediately than in new areas. In the long run it is better to pioneer a new important field than to be the second or third or twentieth entrant in a crowded one.
Review papers garner lots of citations, and deservedly so. They are not less important for scientific progress than the papers they review, they help organize the knowledge to date, identify gaps, holes, areas of consensus, and areas of dispute.
If your paper is rejected, submit it somewhere else quickly, unless it genuinely was problematic. Papers under review are better than papers sitting on your computer.
Make an open access version of your publication available (either publish in an open access journal or place a copy of the paper in an open archive (University conservancy, arXiv, RePEc, etc). This increases your citations, but more importantly increases the free flow of knowledge.
Make your data (and code) publicly available in an open data archive, unless you cannot because of privacy or ownership considerations. This includes documenting the data properly with metadata so that someone else (or you in a year) can properly use it.
Have your own website with your own domain, that you own, that follows you across jobs. Do not rely on proprietary sites. Do not rely on your employer/university. This website should link to all your products (papers, projects, code, models, data sets, etc.), which are open access and on the web of course. I like Wordpress, but there are others.
Along with information management, time management is critical. In research, few deadlines are externally imposed. A fixed paper submission deadline may be the key to success of conferences like the Transportation Research Board, the whole community organizes its workflow to satisfy the hard August 1 deadline. Unfortunately, many people require urgency before they act. Getting a paper out the door is rarely urgent in the absence of deadlines, and thus many papers become non-publications. The strategies in Getting Things Done are useful. This, more than intelligence or creativity or even hard work is the key to success.

Rules for Researchers

Discussion about this post