In the middle of 2022, the new Labor government cancelled the current Excellence in Research Australia round.1 The ‘ERA’ as it has been known since it began (copying a similar model developed in the UK) was tasked with ranking disciplines in Australian universities through a process that involved an evaluation of publication and other metrics of research achievement, alongside (in disciplines that have historically depended on peer review to underline quality) detailed peer review of materials submitted to the ERA process by universities and their qualifying disciplines. In calling for a suspension of ERA 2023 the Minister for Education Jason Clare wrote a letter to the Australian Research Council that comprised a ‘Statement of Expectations.’ Specifically with regard to the ERA process this statement reads as follows:
In light of the sector’s concerns about workload, I ask that you discontinue preparations for the 2023 ERA round and commence work to develop a transition plan, in consultation with the sector and my Department, to establish a modern data driven approach informed by expert review. In addition, I ask you to continue your work with my Department on developing research engagement and impact indicators to inform the Engagement and Impact assessments.
I ask you provide me with a transition plan by the end of 2022, which in addition to any recommendations from the ARC review, can be considered for implementation in 2024–2025.2
This statement was welcomed by the ARC who posted a public response citing the CEO of the ARC Ms Judy Zielke. This reads in part:
In response to the Statement, the ARC will be prioritising the development of a modern data driven approach to Excellence in Research for Australia (ERA) informed by expert review, for implementation in 2024–25. This means an ERA evaluation round will not be undertaken in 2023.3
Both the Minister’s statement and the CEO of the ARC’s response underlined the importance of data ‘informed by expert review’. The ARC then established ‘a working group drawing on experts from across the university sector, peak bodies and the Department of Education to provide advice on this transition.’
The review took place, receiving 223 submissions, and its recommendations, authored by Margaret Shiel, Susan Dodds, and Mark Hutchinson, entitled Trusting Australia’s Ability: Review of the Australian Research Council Act 2001, was sent to the Minister and released publicly in April 2023.
A version of the current paper was developed for the Australian University Heads of English’s ‘The Value of Literature’ conference in Melbourne in December 2022, and it was edited down to key points as part of the AUHE submission to this ARC review (as the peak body for the study of literature at Australian universities) in response to those questions specifically related to the ERA. Because the ERA assessment process was felt to place too heavy a burden on universities, the review, as expected, recommended that the ERA process be scrapped. The AUHE working party were pleased that arguments we, among other submissions, made about the flaws with metrics-based systems used in isolation were clearly accepted by the review panel as outlined below. I quote recommendation 10 from this report in full here:
Recommendation 10: Evaluation of Excellence and Impact
We recommend that:
i. the role of the ARC in relation to evaluation of excellence, impact and research capability within Australian universities be re-affirmed by inclusion in the ARC Act
ii. the Excellence in Research for Australia (ERA) and Engagement and Impact (El) exercises be discontinued
iii. the resourcing for evaluations be maintained so the ARC retains an internationally recognised expert evaluation capability;
iv. the ARC collaborates with TEQSA to develop assessment processes that enable TEQSA to draw on the expertise or the ARC to make decisions on the extent to which current and future higher education provider meet research provider standards;
v. the ARC develops a framework for reqular evaluation and reporting on the outcomes or the NCGP program over a timeframe that allows the full impact of research funding to be assessed and the public benefit explained; and
vi. the ARC develops a program to evaluate current and future research capabilities within Australian universities, giving priority in the first instance to the capability of Aboriginal and Torres Strait Islander researchers and research that impacts on Indigenous Australians.
We do not recommend that ERA and El be replaced by a metrics-based exercise because of the evidence that such metrics can be biased or inherently flawed in the absence for expert review and interpretation. (Sheil et al. Trusting, 57)
The recommendations are interesting not only for the clear statement concerning the limitations of applying existing data-driven systems to research in Literary Studies, and the humanities more generally. They also underline that the process of review and building systems of analysis in our fields is ongoing, though will now be passed on to or developed in collaboration with TEQSA (Tertiary Education Quality and Standards Agency).4 The debate remains very much ‘live’, then, and arguments developed in this paper remain relevant to this process. Further, the arguments remain very much relevant to ongoing processes of assessment developed by individual universities throughout Australia (and elsewhere), where, in many if not most cases, data driven metrics are predominantly used to assess the value and impact of research in Literary Studies.
In light of this it is essential that potential pitfalls with current data-driven systems currently operating in relation to and directly affecting research in the humanities be clearly brought into the light. TEQSA and the ARC, and the research offices of Australian universities, must take into account the failings of certain prominent ‘data based’ systems of ranking, and to clearly recognise the drawbacks of these systems when they are applied to certain disciplines within the humanities. Here I will be focusing on English, but the problem is equally significant for History, Philosophy and the Creative Arts.
It must in turn be clearly recognised that these failings are often specific to these humanities and creative arts disciplines which work through qualitative methodologies and which value research outputs that are published not just in refereed journals but in refereed books and book chapters and for creative arts in Non-Traditional Research Outputs (NTROs), such as novels, collections of poetry, plays, musical compositions, works of visual art and so on. Here I will leave aside the problem of NTROs which is a separate issue and concentrate on refereed academic publications in Literary Studies [ARC Field of Research Category 4705]. In this paper I will argue that TEQSA and the ARC must consult with peak bodies that represent disciplines in the humanities to work towards developing approaches that better represent those disciplines.
The failings of data-driven metrics occur in two main ways. One of these is to do with the collection and propagation of what might objectively be qualified as ‘bad data’ as a way of representing these disciplines. This is seen in the major journal rankings and citation ranking systems that currently operate in the university sector and in particular Scopus/Scimago,5 and Clarivate Analytics/Web of Science.6 While these systems are well accepted and accurate in regard to many citation-heavy disciplines that publish in journals captured by these providers, they are, as is well recognised, poor in relation to the humanities (see Bornmann; Hammarfelt ‘Beyond’; Hammarfelt and Haddow; Linmans; Raughvargers; Gómez-Sancho and Pérez-Esparrells; Hazelkorn).
The other is to do with the collection and propagation of what might objectively be called ‘bad peer review’ as a way of representing humanities disciplines in international university ranking systems. This is seen in major university ranking systems that currently operate in the university sector, most prominently in Australia the QS Survey, and the Times Higher Education Survey. In both cases the major flaws are simple to see and are openly admitted to both in the university rankings and the citation tracking, and have been pointed out by researchers examining these systems of ranking (see Reale et al.; Scott; Sowter).
The problems, indeed, are somewhat connected. With regard to the tracking of citations of publications in Literary Studies there are many problems but perhaps the most telling is that neither Scopus nor Clarivate (Web of Science) comprehensively count books (academic monographs) or book chapters (peer-reviewed chapters in thematically organised academic collections of essays). In addition, while Scopus is estimated to index twice as many journals as Clarivate/Web of Science, according to a 2017 KPMG report, Scopus only counts ‘around 22% of the published work of Australian humanities researchers in a calendar year’ (KPMG UNSW, 22).
This means that the kind of citation report Scopus, say, might offer about an individual academic in Literary Studies will not include comprehensive reference to the citations accrued through books they have written or been published in, and will further exclude any citations to journals not included in Scopus. This, of course, leads to compounding errors and a major under-counting of true citations. I offer an example of the disparity here, using my own individual citations (to avoid ethics concerns), as an example.
Compare the citations of Scopus/Web of Science and Google Scholar (for a detailed comparative analysis of these databases in relation to humanities research see Prins, et al.) Note how much lower the stated citations are in Scopus (116 total citations listed) which excludes most of my books and edited collections as opposed to Google Scholar (1948 total citations) which does collect reasonably comprehensive references to books and book chapters. Further detail can be entered into, but this, I think, is sufficient to demonstrate the merit of the conclusion made by KPMG in their 2017 report for UNSW which, citing numerous studies, states:
Any research evaluation using citation data supplied by Scopus or WoS would be based on very small samples of an academic’s work, and it is unlikely that this would provide for robust and verifiable assessment of their portfolio. The limited coverage means that commercial bibliometric databases are not suitable for research evaluation in the humanities. (KPMG 22)
Similar issues then occur for the journals that are included by Scimago and ranked as being the best in particular disciplines as these relate to the disciplines of the humanities. Again, here, I will focus on our discipline which is called ‘Literature and Literary Theory’ in Scimago.
As we have seen, KPMG cites studies that conclude only about 22% of journals in humanities disciplines are captured by Scopus, which is relied upon by Scimago. It is clear, through a comparison of journals in ‘Literature and Literary Theory’ and Project Muse (which aggregates many of the most important journals in the discipline that are largely based in the United States), that many Project Muse journals are simply not captured in Scimago. And yet these same Project Muse journals are often widely considered to be leaders in their fields. This is also true of important journals in Australian literature, which are not captured by Scimago. It is also true of JSTOR which houses many of the most prestigious journals in the humanities in an archive that stretches back over a century. That is, many of the most prominent journals in the field of Literary Studies do not appear at all in Scimago. To illustrate this point I began to compare Scimago with Project Muse. Stopping at the letter ‘C’ in an alphabetic process undertaken in December 2022 the following journals held by Project Muse were found to not be listed in Scimago:
Other methodological decisions further undermine the reliability of the Scimago list as an accurate source to rank the quality of journals in the humanities. Firstly, Scimago looks to short timeframes for the citation of sources and ranks its Q1, Q2, etc, journals on an annual basis based on captured citations. However, humanities research, unlike some research in high citations disciplines, takes place over much longer timeframes, with researchers in our fields often engaging with works that are decades and, in the case of primary source materials, centuries old. Researchers often cite scholarly publications that are many years old given the nature of research in our disciplines (this would be the focus of a separate discussion, but it is crucial to what is going on). This again can be simply illustrated by an example drawn from my own research. My monograph Beckett and Poststructuralism, published in late 1999, has continued to draw citations each year since publication, and this is typical of similar monographs in the field.
Secondly, Scimago allows journals to nominate multiple disciplines and on occasion this can radically distort the spread of results. See here the list of what are supposed to be the top 34 journals in Literature and Literary Theory (as accessed in December 2022).
Source, Scimago (Accessed December 2022)
I do not have space here to go through this list exhaustively but will note that many of these journals are from high citations disciplines using quantitative methodologies which for unstated reasons claim ‘Literature and Literary Theory’ as one of their disciplines. However, a quick analysis of a number of these journals indicates that articles related to literary studies are rare within their pages. See, for example, the number one ranked journals, Criminology and Public Policy. The journal’s title clearly situates it within disciplines outside Literary Studies and two other disciplines ‘Law’ and ‘Public Administration’ are also cited in relation to it.
Further, while it is no doubt the case that some articles are published in the journal that draw upon literature or use works of literature as examples, or perhaps make use of methodologies from ‘literary theory’, clearly many issues include no references at all to works of literature. An example is offered below from publications listed on the journal website in 2022. Rather than being the most prominent journal in the field of literary studies, then, this journal seems to only be related to the field in a marginal way.
That this problem seems common within the top 100 journals listed in ‘Literature and Literature Studies’ in Scimago attests to there being distorting factors in the application of ranks of Q1, Q2, et cetera in Scimago, as the high citations journals that publish predominantly in other disciplines that are included will distort the average citations across Literature and Literature Studies generally and push those journals that publish exclusively or predominantly in Literature and Literature Studies to lower rankings. I would argue, then, on this basis alone, that just as Scopus and Web of Science are not reliable sources for judging citations, Scimago is not a reliable source for judging journal quality.
This is stating the problem, and these problems are already known in our discipline. They need to be better known, however, and better explained to the research offices of universities in Australia and elsewhere. Firstly, then, I merely wish to restate as forcefully as possible that TEQSA, the ARC and the research offices of universities cannot rely on this data to judge citations in the humanities, or to rank journals within the humanities. I will offer suggestions as to what might be done in their place at the end of this paper.
Bad Peer Review
The correspondence related to the ERA cited at the beginning of this paper also mentions data being ‘informed by peer review’ but it is clear that the amount of peer review to be undertaken jointly by TEQSA and the ARC will be greatly diminished from what was in place in previous iterations of the ERA. In the next part of this paper I want to further underline that the ARC/TEQSA and research offices cannot rely on particular publicly available ‘peer review’ of university discipline rankings in the humanities. Again, I underline that their systems are not universally bad. Rather, they work much better for those disciplines that have access to good or excellent citation data. Yet again, working well in one area does not justify using them in relation to other areas where they do not work well.
Note the ranking measures the QS Survey regarding universities in general or the Arts and Humanities in general: there are several categories and these seem comprehensive and involve sophisticated engagement with the data available. However, this approach shifts radically between those disciplines that have clean and consistent data relating to citations and humanities disciplines. When I wrote the first iteration of this paper in 2022 these methodologies were clearly outlined on the QS Survey webpage under the sub-heading ‘Methodology’. The information on methodology has been altered since then and less information is now easily accessible. However, the statement of methodology from December 2022 (accessed now via ‘The Wayback Machine’ which archives web content) stated:
As research cultures and publication rates vary significantly across academic disciplines, the QS World University Rankings by Subject applies a different weighting of the above indicators in each subject. For example, in medicine, where publication rates are very high, research citations and the h-index account for 25% of each university’s total score. On the other hand, in areas with much lower publication rates such as history, these research-related indicators only account for 15% of the total ranking score. Meanwhile in subjects such as art and design, where there are too few papers published to be statistically significant, the ranking is based solely on the employer and academic surveys.
This shows that while 25% of the ranking for a discipline like medicine is based on objective data related to research outputs, only 15% is based on data of this kind for some humanities disciplines. However, other disciplines, including English Language and Literature, are ‘based solely on the employer and academic surveys’. These surveys, were, in 2022, described under ‘Academic Reputation’ and ‘Employer Reputation’. These are both ‘reputation’ surveys.
So then, what counts as ‘peer review’ in ranking disciplines, such as Literary Studies, within universities when no citations are factored in? The answer is two reputation surveys. For academic reputation, for example, email surveys are sent to ‘experts in the discipline’. These surveys ask these experts to rank ‘up to 10 domestic and 30 international institutions’. They are of course not allowed to include their own institution.
While these ‘experts’ no doubt are experts in their fields, no evidence is provided to suggest they might therefore have detailed knowledge of the work being done at other institutions in their country or region or internationally at any given time, or that they are consulting data of any kind in making these assessments. Of course, they will know something, but it is unlikely to be knowledge based on a recent analysis of evidence presented as has traditionally been the case, with peer review in our discipline in the ERA.
Rather, these surveys rank universities on historical reputation, whether or not these institutions are currently supporting the discipline in question and helping it to thrive, or, for example, giving little by way of resources to that discipline. The logic of reputation surveys of this kind, which I call ‘bad peer review’, is one that makes it difficult if not impossible for less established institutions to be ranked more highly no matter how much these institutions and the staff working in the discipline invest into the discipline. This in turn will tend to discourage investment in these areas, since no progress is recognised through the surveys.
So too (and I base these observations on work I was asked to undertake to review English departments at a number of institutions), those who work in the disciplines in the higher reputation institutions can and have found that their institution might not invest in their discipline despite impressive results in these ranking systems. This is because, based on the logic of a reputation survey, these results will remain consistently high because of the historical strengths of the discipline rather than its current health. Given the rankings remain high even in the face of extensive cuts to the fields, what is the incentive of universities of this kind to invest in these disciplines?
Again, I will quickly illustrate the kinds of discrepancies that occur, here looking at the QS Survey. I choose this over the Times Higher Education (THE) rankings since QS is the most pertinent to the individual disciplines of the humanities as it offers disciplinary distinctions that resemble the recognised distinctions that pertain historically in humanities. The THE rankings do not individually recognise the discipline of English but bundle it together under the heading of ‘Languages, Literature and Linguistics’ and does the same for most other disciplines, making it very difficult to glean meaningful data regarding Fields of Research codes in Australia. Their categories are listed here:
Art, performing arts and design
Languages, literature and linguistics
History, philosophy and theology
THE also relies on reputation surveys for at least 50% of rankings in the key areas of teaching and research. In order to demonstrate some of the distortions these systems generate I will steer away from Australian universities and use examples from universities from the United States.
It quickly becomes clear that certain institutions are given surprising rankings because of the QS methodology built upon reputation surveys. In the discipline field of ‘English Language and Literature’ some no doubt excellent universities based in non-English-speaking countries are given high rankings, based on their being ranked highly in the regions they come from. This is not in any way surprising in itself, yet what is surprising is how they rank above institutions that have outstanding reputations specifically in English Language and Literature.
Here we can see how some universities with outstanding track records and reputations in English Literature and Language teaching do not rank well in the QS system. Just to take a few examples, Johns Hopkins is ranked 73 in QS when it is ranked 13 in the US News Best English Programs. Rice University is ranked 151–200 in QS and 41 in US News Best English Programs. SUNY Buffalo is ranked 151–200 in QS and 46 in US News. Fordham University is ranked 251–300 in QS and 53 in US News.
These excellent schools, all of which would claim high status and standing in the field of English Language and Literature, are ranked well below what one might expect in QS. While Hong Kong and Singapore are no doubt excellent universities, their English Departments do not match or surpass those of Johns Hopkins or Rice University. This in part is due not only to the nature of the reputation survey, but the particular methodology adopted by QS which asks those who undertake its survey to rank ‘10 domestic […] institutions in their field’:
Drawing on responses from over 130,000 academics, respondents are asked to list up to 10 domestic and 30 international institutions which they consider to be excellent for research in the given area. The results of the survey are then filtered according to the narrow area of expertise identified by respondents. (https://www.topuniversities.com/subject-rankings/methodology, accessed December 2022)
This method, statistically, will have the effect of causing those institutions that fall just outside, say, a reputationally recognised top ten, to gather very few points notwithstanding their actual excellence. They will then in turn be disadvantaged against other institutions in different countries that gather votes related to their own domestic order of rankings.
While the US News rankings are also entirely based on reputational surveys and so equally subject to error and reputational bias, their methodology is at least more nuanced. Here the surveys are sent to departmental heads (rather than individual experts who may choose not to respond as in QS) and rather than being asked to list a top 10 they are asked to give grades to each institution.
The questionnaires asked respondents to rate the academic quality of the programs at other institutions on a 5-point scale: outstanding (5), strong (4), good (3), adequate (2) or marginal (1). Individuals who were unfamiliar with a particular school's programs were asked to select ‘don't know.’
(https://www.usnews.com/education/best-graduate-schools/articles/social-sciences-and-humanities-schools-methodology, accessed December 2022)
While not solving the inherent problems of the reputation survey it is arguable that the method of requiring heads of departments, who are more likely to take an interest in the structure of their own schools and those of rivals and feel obligated to respond in a timely and conscientious way, is superior to that of choosing generally defined experts in the fields who might not represent all institutions and are not obliged to respond. Secondly, the scale used by US News, while reasonably crude, at least avoids the problem of giving very few points to what might be excellent schools (because they fall outside of the generally recognised top ten). Indeed, it will further require the respondent to reflect somewhat on the quality of the programs offered, rather than to call to mind institutions with historically strong reputations in the field.
What Might Be Done?
So then, these are the problems, but what might be done about them? Firstly, I would argue that the ARC, TEQSA, and Federal Government, in recognising the limitations of data sets and of limited peer review, allow peak bodies in the humanities that represent certain disciplines to consult within their disciplines for a reasonable period about rankings in those disciplines. This consultation process will then allow disciplines to develop strategies and methodologies that might viably replace any reliance on Scopus, Web of Science or university rankings systems. This at least would allow for an informed consideration of the pros and cons of proposed methodologies.
My own suggestions are as follows. Although it has certain drawbacks, Google Scholar allows for a much more accurate collection of citations than Scopus or Web of Science. While the KPMG report argues that using Google Scholar is not possible for two reasons: one that manual processing is required; and two because it uses an ‘open’ rather than ‘closed’ system of collection, (see KPMG 23; Linmans; Hammarfelt ‘Beyond’), I believe neither of these arguments are compelling. Academics are already required to provide individual data to their institutions and maintaining their own Google Scholar pages and providing access to them via links on university webpages is already done in some institutions. Secondly, the ‘closed system’ of Scopus and Web of Science which only collect from journals within their systems is, as I have stated above, a problem for humanities academics rather than a benefit. Google’s ‘open’ system in Literary Studies, at least, is readily justifiable, and, at least with regard to refereed publications (journal articles, book chapters and books), which are the standard in Literary Studies, the citations that accrue to them on Google Scholar are almost exclusively drawn from other refereed publications.
Other issues might arise related to harvesting this data, yet web-scraping and data trawling is extremely common in business and government and many systems exist to allow for coding that facilitates this process. If universities provide links to pages (since Google’s own links do not always work for sites that wish to announce themselves as public) then it should be possible or be made possible for universities and the ARC to harvest this data.
What else can be done? One must question any data driven approach that insists on being equally applicable to ‘all disciplines’. This simply does not recognise or value publishing and research traditions that are specific to different fields. ‘The Norwegian Model’ which runs successfully in many European countries is designed to account for such field-specific differences and I encourage the ARC and TEQSA to examine this model (see detailed outlines of this model: Siversten ‘Publication-Based’; Aksnes and Sivertsen; Sivertsen ‘Norwegian’; Hammarfelt ‘Taking Comfort’; Engels and Guns). Rather than inventing an entirely new system, it makes much more sense to look to a model such as this, which seeks to directly address the kinds of distorting factors outlined in this paper, and which has been running with apparent success for some time in a number of countries.
However, I will venture some further suggestions that might be potentially useful at the discipline level within universities as a means of evaluating the performance of our disciplines. Here again the methodologies will need to be clear.
Through the numbers of publications, grants received, consideration of citations from Google Scholar of a number of staff in the discipline, and postgraduate completions, objective measures could be accumulated. The publications might achieve points, for example, through markers of quality, added to Google citations. Such markers could be suggested and refined by disciplinary working groups within peak bodies taking into account what is and is not practical in relation to gleaning of data. These recommendations could then be used to develop something approaching the ‘modern data driven approach informed by peer review’ the Minister requested in 2022.
Accessed via https://web.archive.org/web/20230129015901/https://www.topuniversities.com/subject-rankings/methodology. Please note: the link to the Support page no longer functions. The original link https://www.topuniversities.com/subject-rankings/methodology now directs to a different article.↩