This blog post summarizes how to cite software used in research papers and how developers can set recommendations for how the software should be cited.
I recommend reading Software Sustainability Institute’s blog post and Smith et al (2016) “Software citation principles” about citing software, they were very helpful.
The tradition and importance of citing other’s work is noticeable in academia, at least when it comes to journal articles. Vital open source software used to conduct the research is necessarily not cited. However, fields have come further than others, a friend told me that the field of mathematics has started to classify papers by software. Outside of academia it’s not obvious that people cite the work that they used in their projects. E.g., if a company uses open source software to build a product it’s not given that it credits the developers of that software.
The importance of citation include:
(Note that some proprietary software licenses require that the software is cited.)
When using LaTeX
it’s possible to automatically generate references from a .bib
file.
There are two main ways to generate references: BibTeX and BibLaTeX.
BibLaTeX is a modern reimplementation of the LaTeX reference
system with modern features and is in general recommended over BibTeX.
In BibTex the
.bib
database supports the following types: Article, book, booklet, conference, inbook,
incollection, inproceedings, manual, masterthesis, misc, proceedings, techreport and unpublished.
However, in BibLaTeX this list is extended.
Since more and more research requires computer science tools it’s vital to cite these tools.
This is, however, not always as
easy as it sounds. Should I cite every library
I use? How should I cite a project with only a
Github
page? Although not straight forward, if the work was of significant importance to your
project it should be cited in one way or another.
A guide line is that
it depends on the focus of the research: “Did the software play an important role?” and “Did the
software bring anything novel to the research?”
The R project has an entry in their FAQ
on how to cite R
,
To cite R in publications,
@Manual{,
title = {R: A Language and Environment for Statistical
Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = YEAR,
url = {https://www.R-project.org}
}
where `YEAR` is the release year of the version of `R` used and can determined as `R.version$year`.
This BibTeX
entry can also be found by the command
citation()
in R
,
To cite R in publications use:
R Core Team (2018). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL https://www.R-project.org/.
A BibTeX entry for LaTeX users is
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2018},
url = {https://www.R-project.org/},
}
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also ‘citation("pkgname")’ for
citing R packages.
To get only the BibTeX
output in R
,
cite <- citation()
toBibtex(cite)
Similarly, recommended citation for packages in R
can be found by citation('package_name')
,
e.g., citation('ggplot2')
outputs the file
CITATION,
@Book{,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2016},
isbn = {978-3-319-24277-4},
url = {https://ggplot2.tidyverse.org},
}
Citation files for R
packages are usually found under inst/CITATION
. Github will link to these
citation files under the tab “Cite this repository” but it will not try to parse the file. Github
only parses CITATION.cff
files.
Note that BibLaTeX
has an additional type
called software
: “The standard styles will treat this entry
type as an alias for @misc.”
Furthermore, there is no standard specifying if the research paper itself should contain all the information about the software needed to reproduce the results. This is circumstantial and depends on where the research is published and if the code used to conduct the research is available. If the paper itself should contain all information necessary to reproduce the results, specific information such as, which version or git commit may be needed. Although, Smith et al (2016) do not recommend citing commits, since they are not permanent (projects can migrate), which again highlights potential problems with citing software.
By setting a recommended way of citing the software, developers encourage citation of their work and makes it easy for people to cite it.
The
Github docs
have advice for how to add a CITATION.cff
file to the repository to help users cite the software.
CITATION.cff
(Citation File Format) is a human and machine readable text file, read more about it
here. If a developer puts a CITATION.cff
file in the
root of the default branch of a repo it’s automatically linked in Github and information is rendered on the
repo’s page. A BibTeX
snippet is also made available that users can copy.
Following the R
structure the SSI
blog post
recommends developers to put a text file in the root directory called CITATION
together with
information on how to cite the software.
If there are white papers .bib
files (BibTeX for most compatibility) can be created.
The Journal of Open Source Software (JOSS) is built so that
“after you’ve done the hard work of writing great
software, it shouldn’t take weeks and months to write a paper about your work.” This is also a good
way for white papers to get published.
There are a few advantages and disadvantages of citing papers associated with software. The advantages include the general advantages with citing a paper. The disadvantages (if only method of citation), however, include: The software may change and the used version of the software is not cited. This can be mitigated if the code for the research project is public and include sufficiently specified requirement files or the software is cited in addition to the paper.
Different journals have different style guides for how to cite software. These recommendations ranges from not citing (having a footnote or in-text citation instead) to guidelines how the citation should be done.
There is no category for white paper in the BibTeX
standard. However one of the following types
would be suitable (fields in parenthesis):
Example using misc
:
@misc{nakamoto2008,
title = {Bitcoin: A Peer-to-Peer Electronic Cash System},
howpublished = {White paper},
year = {2008},
url = {https://bitcoin.org/bitcoin.pdf},
author = {Satoshi Nakamoto}
}
However, when using BibLaTeX there is a suitable entry:
With the fields:
Note however that intitution
is a required field of the type
and if no such is suitable then
the misc
type
might still be the best option, as in the case with the Bitcoin white paper.
Note: The blog post by the Software Sustainability Institute emphasize that, if DOIs (digital object identifier exist, use them. The post states that “the advantage of DOIs is that they separate the description of an output from its location, allowing the output to move over time” and also mildly advice against using URLs.
Grey literature is research produced outside of academia and include the following (see links for more elaborate explanation):
I have also seen other “colors”, e.g., the Ethereum Beige Paper, explaining the Ethereum Yellow Paper with simpler syntax.
We can all agree that it’s of significant importance to cite work that is contributing to our research. However, there is no standard way to cite software. Which method to use boils down to what the request is from the author, where you are publishing and how you fulfill your research ethic obligations. Thus, when citing software first look to what the recommendation is. If there is no recommendation, then from my understanding there are the following advisable ways to cite software:
manual
(BibTeX) or software
(BibLaTeX).misc
(BibTeX) or report
(BibLaTeX).Multiple citation methods can be used simultaneously, e.g., both a white paper and a software could be cited.