Detecting potential reference list manipulation within a citation network
Abstract Although citations are used as a quantifiable, objective metric of academic influence, references could be added to a paper to inflate the perceived influence of a body of research. This reference list manipulation (RLM) could take place during the peer-review process, or prior to it. Surveys have estimated how many people may have been affected by coercive RLM at one time or another, but it is not known how many authors engage in RLM, nor to what degree. By examining a subset of active, highly published authors (n = 20,803) in PubMed, we find the frequency of non-self citations (NSC) to one author coming from one paper approximates Zipf’s law, permitting the task to be approached statistically. Framed as an anomaly detection problem, higher confidence is gained the more outlier status is correlated across dimensions relative to non-outliers. We find the NSC Gini Index correlates highly with anomalous patterns across multiple RLM-related distributions. Between 81 (FDR < 0.05) and 231 (FDR < 0.10) authors are outliers on the curve, suggestive of chronic, repeated RLM. Approximately 16% of all authors may have engaged in RLM to some degree. Authors who use 18% or more of their references for self-citation are significantly more likely to have NSC Gini distortions, suggesting a potential willingness to coerce others to cite them.