Wikipedia Influences Language in Science Publications

Wikipedia is the 5th most popular website globally.[1] The online encyclopedia contains about 0.5 – 1.0 million scientific articles, but scientists still rarely cite them as a source of knowledge in their papers. However, this does not mean that they are not reading Wikipedia articles. In a recently published working paper[2], Neil Thompson and Douglas Hanley describe an experiment that focuses on language patterns rather than citations. The authors find that words and phrases from Wikipedia articles are indeed reflected in the scientific literature.

Why Wikipedia is badly cited

There is roughly one Wikipedia article for every 120 scientific journal articles. And while the number of publications that cite Wikipedia has increased over time[3], these occurrences are still very rare considering the size of Wikipedia’s corpus and the popularity of the encyclopedia. Thompson and Hanley describe two main reasons for that:

  1. Academia is actively fighting Wikipedia with university policies that discourage the use of Wikipedia as a source for academic papers.
  2. Facts appearing on Wikipedia might be perceived as generally known and accepted, and therefore unnecessary to be referenced.

Language drift as an indicator of Wikipedia’s impact on science

Measuring the correlation between citation counts of scientific articles and their reference on Wikipedia can give some insight into how the scientific literature impacts Wikipedia.[3] But for the reasons outlined above, this approach provides little understanding how Wikipedia shapes scientific articles in return.

In their approach, Thompson and Hanley focus instead on text-mining techniques to measure the correlation of language between Wikipedia articles and scientific papers. In particular, they look for similarities in word patterns and how these change over time. They determine the natural drift in language caused by the introduction of new terms into a field as roughly one new term for every 250 words.

To establish the influence of the language used in Wikipedia on top of the natural language drift, the authors then performed an experiment. PhD students were hired to create new Wikipedia articles on scientific topics that weren’t yet covered in Wikipedia. Half of these articles were published, while the other half was held back. The hypothesis here was that the text from the published Wikipedia articles should appear more often in future scientific articles compared to text from the non-published Wikipedia articles. The authors find exactly that: word patterns from published Wikipedia articles show up more often in the subsequent scientific literature. The rate at which that happens is approximately 1 in 300 words.

Greater effect in lower income countries

When looking at journal impact, the influence of Wikipedia is more pronounced in less-cited journals. Top journals show only a small impact from Wikipedia. This makes sense, as those articles usually describe new-to-the-world science. However, once these ideas get reflected in Wikipedia, they then shape follow-on research.

The authors also analyzed whether the income level of a country influences the effect of Wikipedia on the scientific literature, using the GDP per capita as a proxy. The findings show that the effect is strongest in countries with lower income, where scientists may have less access to traditional scientific information. These results suggest that public resources such as Wikipedia help in extending access to science for those with less resources.

Implications for public policy

Thompson and Hanley hope that their findings motivate scientists to contribute content and edits to Wikipedia. In addition, public policy interventions could promote science through further development of public repositories of science. For instance, grants could require scientists to contribute to a public repository after publication of their work. Grants could also be given directly to public repositories to support their operating costs.

Ultimately, authors of scientific publications that are referenced in Wikipedia can embrace their contribution: By making their ideas accessible to the public, they help shape the science of tomorrow.




[2] Thompson, N. & Hanley, D. Science is shaped by Wikipedia: Evidence from a randomized control trial. MIT Sloan Research Paper No. 5238-17 (September 19, 2017).

[3] Kousha, K. & Thelwall, M. Are wikipedia citations important evidence of the impact of scholarly articles and books? Journal of the Association for Information Science and Technology 68, 762–779 (2017).