Normalizing Citations - Beyond the H-index

Mar 08, 2018

The proper metric for an academic's influence on the academic world of academic publishing is academic citations. An academic might make many (say 100) small contributions, each cited a small number (say 10) of times, or one contribution cited widely (say 1000) times. Neither is inherently superior, despite claims to the contrary, a

Citation needed. Source: Unknown.

nd for the academic in question, it was probably easier to write one widely cited piece than 100 smaller ones, but that was unpredictable at the time.

Academic citations are cumulative distribution function, they can never go down (they can with retractions, but we will neglect that). So by this measure on average senior academics appear more influential than younger academics, which they of course are. But this is not a useful measure for filtering prospective candidates for hiring and promotion, which is why these metrics exist, to sort people based on productivity and establish a social hierarchy.

So to begin, we have two corrections to make. First, senior academics have more opportunities to write papers. A junior academic simply has not had the cumulative time to author 100 papers. Second, the senior academic's papers have had more time to accumulate citations. So I suggest dividing total citations by Years^2 to account for these two temporal accumulating factors.

But which "Years"? Years since terminal degree? -- This favors the young who start publishing before their degree. Years since they began their degree? Almost no one has any paper in year 1 of their graduate career. So we can estimate and split the difference and say years since graduation with terminal degree +2, on the theory that by the time you graduate you should have had at least 3 papers, and that means you started about 2 years before graduation. Still this is highly sensitive to assumptions for younger academics, it will wash out for the older academics. Domains will vary of course in terms of publishing culture.

There are other problems, for instance, co-authorship. At the extreme, all 108 billion people who ever lived have contributed fractionally to every paper, but they don't all get co-authorship (except on experimental physics papers). But someone who puts all of their PhDs on all of their group's papers is gaming the system to the detriment of those who assign more individually authored papers. So each citation should be divided by the fraction of authorship that the academic in question deserves. While this is impossible to assess, (promotion files sometimes ask for percentages on co-authored papers, but this is never systematically estimated or consistent). Computing an average dividing by the number of authors on the paper is a good surrogate.

I am not in this business of bibliometrics, I will leave that to others. But hopefully someone in the industry (Scopus, Web of Science, Google Scholar) can run the proposed corrections on these databases and produce a normalized citation measure as a standard output.

Normalizing Citations - Beyond the H-index

Discussion about this post