It is a classic but misleading image that researchers work alone, isolated from the world and the scientific community at large: in reality, research is based on continuous interaction within the scientific community, first to understand the work of others and then to share our own findings.
Reading and writing papers that are published in academic journals and presented at conferences is a central part of a researcher’s life. When researchers write academic papers, they need to cite the work of their colleagues to provide context, detail sources of inspiration, and explain differences in approach or results. Positive citations by other researchers are an important indicator of the visibility of a researcher’s own work.
But what happens when this citation system is manipulated? Recent Journal Articles from the Information Science and Technology AssociationResearch by our academic research team, which includes information scientists, computer scientists, and mathematicians, has uncovered a clever way to artificially inflate citation counts through metadata manipulation: “stealth hidden references.”
Hidden operations
People are becoming more aware of scientific publications, how they work, and their potential flaws. In the last year alone: 10,000 scientific papers retractedThe problems with citation games and the damage they cause to the scientific community (including the undermining of credibility) are well documented.
Citations of scientific articles follow a standardized referencing system: each reference explicitly states at least the title, author name, publication year, journal or conference name, and page number of the cited publication. These details are stored as metadata and are not directly visible in the article text, but are assigned a Digital Object Identifier (DOI), which is a unique identifier for each scientific publication.
References in scientific publications allow authors to justify methodological choices and present the results of previous studies, emphasizing the iterative and collaborative nature of science.
However, by chance, we discovered that some bad actors, while submitting their papers to scientific databases, add additional references that do not appear in the text but are present in the paper’s metadata. The result? A sudden increase in the number of citations for certain researchers and journals, even though these references are not cited by the authors in their papers.
Accidental discovery
The study was conducted by Professor Guillaume Cabanac of the University of Toulouse. Pub PierIn a post on PewDiePie, a website dedicated to post-publication peer review, where scientists discuss and analyze publications, he detailed how he noticed the discrepancy: the Hindawi article, which he suspected was fraudulent because it contained awkward wording, had received far more citations than it had downloaded, which is highly unusual.
This post caught the attention of several detectives, who are now JASIST article. We used scientific search engines to look for papers that cited our first paper. We found nothing in Google Scholar, but we found references in Crossref and Dimensions. What’s the difference? Google Scholar tends to rely primarily on the body of the paper to extract references that appear in the references section, while Crossref and Dimensions use metadata provided by the publisher.
A new type of fraud
To understand the extent of the manipulation, we looked at three scientific journals published by Technoscience Academy, the publisher of the paper containing the questionable citations.
Our study consists of three steps:
-
We have listed references that are explicitly present in the HTML or PDF version of the article.
-
We compared these lists with the metadata recorded by Crossref and discovered additional references that were added to the metadata but not shown in the article.
-
Further inconsistencies were found when we looked at Dimensions, a bibliographic platform that uses Crossref as a metadata source.
In journals published by Technoscience Academy, at least 9% of the recorded references were “hidden references”. These additional references were only included in the metadata, distorting the citation counts and giving an unfair advantage to certain authors. Some legitimate references were also missing and not present in the metadata.
Moreover, our analysis of the smuggled cited literature shows that some researchers have benefited significantly: for example, one researcher affiliated with the Technoscience Academy benefited from over 3,000 additional fraudulent citations, while several journals from the same publisher also benefited from hundreds of additional smuggled citations.
We posted the results of our study because we wanted external validation of our results. As a preprintCrossref notified both Crossref and Dimensions of our findings and provided links to the pre-printed study. Dimensions acknowledged the fraudulent citations and confirmed that their database reflects Crossref’s data. Crossref confirmed Additional references Retraction Watch They also stressed that this was the first time they had been notified of such a problem with the database, and the publisher has taken steps to correct the problem based on Crossref’s investigation.
Impact and potential solutions
Why is this finding important? Citation counts heavily influence research funding, academic promotion, and institutional rankings. Manipulating citations could lead to unjust decisions based on faulty data. More concerning, the finding raises questions about the integrity of scientific impact measurement systems, which have been pointed out by researchers for years. These systems can be manipulated to promote unhealthy competition among researchers, tempting them to take shortcuts to publish faster or earn more citations.
To combat this practice, we propose several measures.
-
Rigorous validation of metadata by publishers and agencies such as Crossref.
-
Independent audits to ensure data reliability.
-
Increased transparency in managing references and citations.
This study is, to our knowledge, the first to report on the manipulation of metadata. We also discuss the impact this has on the evaluation of researchers. This study reiterates that an over-reliance on metrics to evaluate researchers, their research, and their impact is inherently flawed and potentially erroneous.
Such over-reliance can lead to making hypotheses after the results are known, Harkingsplitting one dataset into multiple papers (salami slicing), data manipulation, plagiarism. It also prevents transparency, which is key to getting more research done. Strong and Efficient Research. The problematic citation metadata and hidden references have apparently been corrected, but the fixes are This is common in scientific corrections.it was too late.
Roni Bezanson He is Assistant Professor of Data Visualization at Linköping University. Guillaume Cabanac He is a university professor at the Toulouse Institute of Informatics. Thierry Viéville is Inria Research Director for Scientific Mediation at Inria. This article is reprinted from conversation Under Creative Commons License. read Original Article.