‘Worm Wars’: The Unravelling of the Randomised Control Trial Success Story

SAMANTHA VANDERSLOTT
COXFORD VACCINE GROUP UNIVERSITY OF OXFORD &
NIHR OXFORD BIOMEDICAL RESEARCH CENTRE
UNITED KINGDOM

Abstract

What counts as evidence in global health? What happens when evidence is contested? This article concentrates on the ‘Worm Wars’, a public academic debate in 2015 on the effectiveness of health interventions to treat populations with parasitic worms, to assess how health interventions are appraised by different disciplinary perspectives. I discuss what happens when a success story about randomised control trials (or RCTs) – often hailed as the ‘gold standard’ of evidence adjudication – is contested but left unresolved. Questioning the prominence of RCTs through the unravelling of this evidence success story offers insights into how these forms of measurement are utilised in practice, first in the medical field and then more widely in economic development and global health policy. I address what a gold standard is within a hierarchy of evidence, as well as the standards that are imbued into RCTs by different disciplinary researchers, and the evidence requirements for health interventions in determining the impact of deworming medication. With the Worm Wars, I show how important measurement standards have become in defining and advocating for global health problems and what this means for the production of evidence.

Keywords

worm wars; RCT; randomised controlled trial; evidence

Introduction

Deworming substantially improved health and school participation among untreated children in both treatment schools and neighboring schools, and these externalities are large enough to justify fully subsidizing treatment. (Miguel and Kremer 2004, 1).

This quote comes from a landmark 2004 economic development publication in the journal Econometrica. The development economists Edward Miguel and Michael Kremer presented the seemingly straightforward finding that deworming treatment for parasitic worms improved health and school participation. Yet, over a decade after the publication, their work became highly controversial, when their original findings were challenged by a reanalysis in 2015 by a group of epidemiologists based at the London School of Hygiene and Tropical Medicine (LSHTM) who concluded:

Re-applying analytical approaches originally used, but correcting various errors, we found little evidence for some previously-reported indirect effects of a deworming intervention. (Aiken et al. 2015, 1573).

Thus began the ‘Worm Wars.’ In this article, I explore this academic public debate and controversy about the effectiveness of a health intervention for deworming populations against parasitic worms. The Worm Wars brings to bear questions about what counts as legitimate evidence in global health, and how standardised forms of measurement links health issues to economic development. Miguel and Kremer, at the center of the Worm Wars undoubtedly attracted attention for the high profile of their work. Miguel held a position at UC Berkeley, while Kremer was at Harvard, and their article was published in the aforementioned prestigious economics journal. But what was novel about their research was that they employed a randomised controlled trial (RCT), in a field where they had not been a common tool.

RCTs are a type of scientific experiment which controls for factors not under direct experimental control, by using randomisation (such as random allocation of participants) to reduce bias. RCTs are held by academics and practitioners to be a superior means of acquiring evidence about the effectiveness of health interventions (Bédécarrats et al. 2020; Bothwell et al. 2016; Cassidy 2015; Rosemann 2019). In Miguel and Kremer’s case, a ‘cluster-controlled trial’ was conducted to measure the benefit of deworming treatments for school absenteeism in a group of 75 Kenyan primary schools, some of which had received the deworming pills (albendazole and praziquantel), as well as in other neighbouring schools that had not (Miguel and Kremer 2004). The sampling took place through a randomised phasing, when the treatment was introduced in different schools at different times to reveal possible externalities on neighbouring schools. The article entered the limelight because it presented a correlation between mass deworming and increased school attendance and health. Their results were welcomed in development economics, because they appeared to reaffirm the positive effect of worm treatments on school attendance, but also showed that these treatments had spillovers or externalities in surrounding areas.

While there had been similar efforts to provide evidence for other health issues, there was a greater need to show the economic impact of parasitic worm infections through their effects on educational attainment and employment, because they cause more illness and disability than deaths. High mortality diseases present a more obvious evidence case for intervention, but as worms cause more disability a case needs to be made. Parasitic worms, such as roundworm, hookworm, whipworm, and threadworm (also called helminths) affect the intestines after transmission via the soil or through food (Feasey et al. 2010) – as with food-born trematodiasis. Some worms parasitise blood capillaries causing an infection in humans called schistosomiasis, or strike the lymph system, to cause the disease lymphatic filariasis (ibid.). Common problems resulting from parasitic worm infection are a disruption of the ability to absorb nutrients, leading to anaemia, nutritional deficiency, and disability (Stephenson et al. 2000).

If it were only health outcomes the trial was measuring then it would have been similar to a trial conducted by health researchers or epidemiologists, but the economists were interested in the causal relationship with a social outcome through school attendance (Abdelghafour 2017). The article drew on previous work using educational impacts to justify deworming and was innovative in using an RCT to show that the correlation between the two was statistically relevant, and for its consideration of externalities. The authors contended that previous studies had underestimated the impact of deworming because externalities their had failed to take possible externalities into account. While the specific methodology used by Miguel and Kremer article was new, what was not apparent was the longer history of health interventions against parasitic worms. By attending to this history, I offer insight into the wider politics of knowledge creation and standardisation. I examine why exactly such a controversy was provoked (i.e. who and what was at stake) and situate this question in a historical view of measurement and evidence production for health interventions against worms.

The relative merits on either side of the controversy have been discussed in depth elsewhere (Evans 2015). My treatment of controversy follows the ‘mapping controversy’ approach, first outlined by Bruno Latour, to focus on controversy surrounding scientific knowledge and the process of knowledge production, legitimatisation, and agreement, rather than concentrating on the scientific facts or outcomes themselves (Latour 1987) –as Latour was interested in the process of knowledge production itself. Venturini Tommaso and Anders Kristian Munk (2021) emphasise also how new digital spaces and the publics of controversies are especially important in matters of concern where doubt is raised about facts and expertise is contested. Therefore, I focus on the public visibility of the academic debate for knowledge production and legitimation dynamics to provide an analysis of such dynamics. I seek out the key actors, events, and topics of debate in forming a controversy about intervention impact and measurement in global health. Methodologically I draw on a larger project about neglected tropical diseases (2013–7), in which I rely on documentary sources and a small selection of qualitative interviews (Vanderslott 2017). I also attended public events on the topic, including an ‘International Society for NTDs’ (ISNTD) Conference with a session on school-based deworming in April 2016 at the Institute of Child Health, University College London (UCL) (see: ISNTD, n.d.).

This article is divided into four sections. The first begins with background about how global health measurements became controversial, focusing on RCTs and replication, and the approaches to standardisation in assessing attempts to remove politics from metrics. Second, I turn to a short historical background on how measurement has been applied to the problem of deworming, starting with the challenge of counting worms in the world, in a campaign spearheaded by parasitologist and epidemiologist Norman Stoll. Third, I begin a detailed exploration of the RCT that sparked the Worm Wars by explaining the need for a measure to demonstrate the connection between school deworming and economic development, followed by an examination of the controversy. Fourth, I provide the arguments for why RCTs have come to be seen as a ‘gold standard’ in the evidence hierarchy and what their use has meant for the resulting health intervention of deworming. I conclude with a discussion about how standardised health measurements are used with the purpose of defining and advocating for global health problems.

Contested Evidence Production in Global Health

The Worm Wars began when a group of epidemiologists based at the London School of Hygiene and Tropical Medicine (LSHTM) – Alexander Aiken, Calum Davey, James Hargreaves, and Richard Hayes – presented findings that seemed to refute Miguel and Kremer’s article (Miguel and Kremer 2004). The LSHTM authors were infectious disease epidemiologists who scrutinised the methodology used by the two development economists, including basic statistical practices, and found that they could not replicate their results. The ensuing controversy played out predominantly among academics (mostly development economists and epidemiologists) and health commentators (the Cochrane review, international organisations, and media figures), centring on the relationship between a public health intervention against parasitic worms and its economic development impact. The discourse has been quasi-academic, in that there were academic roots that then spilt over into the public arena. This reaction may have been prompted by the contentious nature of the academic discussion to begin with, spurred on by the international Cochrane Collaboration, which produces the Cochrane Library database of systematic reviews. The Cochrane Collaboration itself is an important institution as an authority in evidence adjudication. Established in 1993 its mission has been to systematically review published research to facilitate decision-making about interventions by healthcare researchers, practitioners, and policymakers (Hill 2000). As a research charity that consists of a network of review groups based at universities and research institutions that produces reports, the Cochrane Library database includes systematic reviews of RCTs, on evidence about health interventions (Shah and Chung 2009).

Spotlight on Evidence Production for Global Health

The clash between the researchers and also the Cochrane Collaboration brought a level of uncertainty and questioning to the core of global health. As the American news and opinion website Vox described, it was ‘The fight tearing apart the global health community’ (Belluz 2015). The media interest, particularly by high profile science and health journalists, as well as by various non-governmental organisations (NGOs) and research and funding institutions, led to diverging opinions on social media and online. What then pushed a measurement controversy into the spotlight?

The Worm Wars timeline begins with a first review by the Cochrane Collaboration on deworming impact in 2000, which prompted the original Miguel and Kremer research article (2004), a new Cochrane review in 2012, and then two re-analyses of the Miguel and Kremer research and two author responses in 2015. What followed in 2015, was an unusual published academic back-and-forth correspondence, both formally via the re-analyses and author responses, but also informally. The ensuing lively online debate included media articles, and then more informal interpretations and responses that quickly characterised the measurement controversy as the ‘worm wars’ in hashtags as well as within social media posts, blogposts, and commentaries (in academic, journalist and NGO websites).

The Worm Wars controversy would not have occurred if the academic debate had not gone on to become a wider public discussion. This lively epistemological and cross-disciplinary controversy took place on scientific and popular news outlets, blogs (across a whole range of individuals and organisations) and on Twitter (with the hashtag #wormwars), with ‘worms’ reaching some prominence in the public arena. The World Bank produced an anthology of the controversy (Evans 2015), which also includes the two most high-profile articles that appeared in the press:

‘Scientists Are Hoarding Data and It’s Ruining Medical Research’, BuzzFeed (Goldacre 2015)

‘New research debunks merits of global deworming programmes’, The Guardian (Boseley 2015)

The authors of these pieces were Ben Goldacre (a popular science author) and Sarah Boseley (the Health Editor for The Guardian newspaper). Goldacre has more widely highlighted the beneficial role of RCTs in challenging entrenched tendencies to favour clinical expertise over systematic examinations of evidence (Pearce and Raman 2014). Boseley has been a central figure on health reporting in the British media, commenting on both UK and global health issues working as the Guardian Health Correspondent. While Goldacre used the Worm Wars to show the importance of sharing data for the purposes of replication or reanalysis, Bosely warned that deworming was being presented as a panacea. The different viewpoints reflected the existing preoccupations of the authors. With Goldacre arguing for the need for replication in science and for transparency and Bosely having a concern with the political economy of international aid and development.

The overall message of each was rooted in a broader tendency to see global health problems as solvable through technological and pharmacological fixes. As Goldacre acknowledged, ‘The seductiveness of simple pills, as a solution to complex problems in the developing world, is perhaps overwhelming’ (Goldacre 2015). As Boseley referenced, Paul Garner from the Liverpool School of Tropical Medicine (LSTM) went one step further and saw the promotion of deworming as ‘a panacea’ in seeking a:

. . . single solution to multiple problems in low- and middle-income countries, and that the belief that deworming will impact substantially on economic development seems delusional when you look at the results of reliable controlled trials (Boseley 2015).

The reference to RCTs to provide evidence for deworming rests on how RCTs are now recognised as the preferred method for collecting evidence to assess the safety and efficacy of drugs and therapeutics. Beginning in 1948, trials of the antibiotic streptomycin undertaken by the British Medical Research Council (MRC) used the statistical technique of randomisation (Valier and Timmermann 2008). Along with similar trials conducted at the same time in the United States (US), these are recognised as the first published RCTs. Subsequently, the 1950s saw the development of multi-centre RCTs funded via governments using new organisational techniques to divide specialist labour and conduct central review (ibid., 493).^{^[1]} Soon afterwards, RCT approaches were also adopted in non-medical fields, such as large-scale social experiments conducted by governments for policy analysis (Greenberg and Robins 1986). By the 1960s and ’70s, campaigners were advocating for the use of RCTs on a more systematic basis for rational therapeutic assessment – this included Archibald Cochrane whose aspiration for medicine to be more effective and efficient led to the development of the Cochrane institutions of today.

RCTs thus have become an epistemological and institutional success more broadly, as demonstrated by the high trust placed in them by the research community. As Weesely (2007) describes, randomisation reduces bias that may impact internal validity (Samii 2020). Their ability to study causal relationships with randomisation is supposed to reduce the bias inherent in other study designs. These features of RCTs means that they serve as a ‘gold standard’ of modern clinical research and a reference point against which to compare other health interventions (Hariton and Locascio 2018). The expression ‘gold standard’ was originally used in a publication in 1982 to mean a ‘definitive exemplar of quality and reliability . . . reflecting the broad aspirations in medicine for evidentiary solid ground and standardisation’ (Jones and Podolsky 2015, 1502–3). Although Jones and Podolsky noted that despite their rigorous promotion, critiques and challenges to the status of RCTs came about early and included the observation that it was difficult to translate findings into practice. ^{^[2]}

RCTs also have the quality of being a standardised form of measurement for ascertaining a level of evidentiary quality, which can be used to assess the likely truth or validity of claims. As Oscar Maldonado and Tiago Moreira argue, this measurement standardisation allows for ‘explicit, formalised rules or specifications informing collective engagement with objects or persons in a particular realm of action’ (2019, 203). This has meant that RCTs have become a preferred form of measurement. As Vincanne Adams has argued more broadly for measurement in the shaping and governing of global health (2016), the overarching objective of measurement is to result in apolitical, politically neutral, or value-neutral evidence but it is an illusion that the removal of politics can be achieved. The political nature of RCTs is therefore important to explore, because of the appearance of neutrality. The question to ask of RCTs is: what does measurement do ‘to set aside questions of politics; [and] turn moral questions . . . . into problems of numbers’? (ibid.).

Adams’ identification of the importance of the ‘economy, sovereignty, and politics of knowledge’ within global health is key to understanding how RCTs have come to occupy such a privileged position when it comes to standardising modes of evidence evaluation and generation (2016). Measurement in global health has been closely tied to an economic view of the world in which health has become subject to economic standards of justification. Intensifying existing trends, international institutions – particularly since the post-war period – began the enterprise of collecting social and economic statistics. This intensification of enterprise was to use the data for planning under constraints of public financing and ranking on scales of development, with the increasing involvement of agencies concerned with economic affairs, such as the World Bank (Gorsky and Sirrs 2017). David Reubi, Clare Herrick, and Tim Brown highlight, how recent decades have seen a greater focus on health economics lead to even more reliance on evidence-based justification and intervention strategies (2016). This perspective comes out in the use of the evidence produced by RCTs, and the need to show economic reasoning through cost-effectiveness of an intervention and the impact on the economy, rendering both in comparable terms. By dictating what kind of evidence needed to be mobilised, RCTs thus enabled uniform conversations about ‘how best to intervene, how best to conceptualize health and disease, and how best to count and be accountable, and how best to pay for it all’ (ibid., 6). However, this does not mean that RCTs were really as neutral as their proponents suggest. The socio-political-economic context that helped to transform the impact of parasitic worm treatment into a matter of concern was marked by the growing importance of RCTs as an evidence practice for improving the effectiveness of policymaking and aid at the turn of millennium (Donovan 2018). The supporters of RCTs have been dubbed the ‘randomistas’ (ibid.). Kevin Donovan places the origin of the randomistas’ thought collective and centre of authority in the Abdul Latif Jameel Poverty Action Lab (J-PAL) at Massachusetts Institute of Technology (MIT). Established in 2003 as a poverty research center that conducts randomised impact evaluations (ibid., 30), J-PAL, along with international institutions, governments, and philanthropic foundations (e.g., Bill and Melinda Gates Foundation which was launched in 2000) were proponents of using the Miguel and Kremer RCT as evidence. J-PAL and the Bill and Melinda Gates Foundation are organisations that were and still are guided by evidence-based policy making, where economic arguments have been used to allocate funding.

Another measurement standard employed in the implementation of evidence-based policy was replication. As Donovan argues, this was ‘a means of reaffirming the epistemic virtues of experimentation, and organizations like 3ie have recently begun funding, guiding and conducting international replications’ (ibid., 46). The replication studies were commissioned and funded by the ‘International Initiative for Impact Evaluation’ (3ie) Registry for International Development Impact Evaluations (RIDIE n.d.), which launched in 2013 and offered small grants to check the evidence base of high-profile articles in economic development (Brown and Wood 2018). 3ie is an NGO that was set up in 2008 to provide grants that promote evidence-informed development policies. The Kremer and Miguel article had been nominated to the 3ie Registry for International Development Impact Evaluations and the authors provided the original dataset and computer code. The original authors were located in the US and the reanalysis authors were based in the UK, which meant that these were the locations where most of the ensuing debate about measurement in health and the appraisal of credible evidence across disciplines played out. What was ultimately questioned was the range of ways of measuring and generating evidence that resists standardisation. Deworming became a point of focus, but any topic may have prompted the wider debate about how to measure impact through the production data and evidence in global health programs.

Therefore, RCTs and replication have played a central role in the production and reaffirming of evidence in global health and also have a core part in this controversy of standardisation. Next, I explore the grounds for a standardisation of worms themselves on two fronts, involving different measurement typologies: firstly, ways of counting worms and their treatment within communities and nations; secondly, calculating the health impact of school deworming of worm-vector helminth infection for development. These measurement typologies would then be targeted for policy through RCTs and challenged through replication.

Counting Worms and Their Treatment

There is a long history of statistical counting of parasitic worms, which has been important for: establishing the rationale and getting ‘buy-in’ for interventions, knowing how to address the problem, and measuring success or failure afterwards. As afflictions that are not outwardly visible in signs or symptoms, having numbers provides more certainty about the problem and the benefits of treatments. The challenges of ‘counting worms’ are reflected in the different methods for doing so, which in turn produce different kinds of priorities and inclusions or exclusions of public health concern.

The US had begun a campaign to eradicate hookworm disease from the American South, through the Rockefeller Sanitary Commission (RSC) for the Eradication of Hookworm Disease (1909–14), which conducted a survey finding that 40 percent of school-aged children in the South were infected (Bleakley 2007). These early surveys targeted school-aged children, an emphasis that was confirmed by subsequent studies which did not see an impact in adults (ibid.). After RSC-sponsored treatment and education campaigns, follow-up studies indicated the campaign substantially reduced hookworm disease, and led to greater increases in school enrolment, attendance, and literacy compared to areas with higher levels of hookworm infection (ibid.). It was following this campaign in the 1920s that debates began about the benefits of deworming programs for children, with many new programs beginning across South and Central Latin America, Australia and across the US. The US became a leading country in scientific research of parasitic worm(s), as shown in 1916 by the founding of a Department of Helminthology at the Johns Hopkins School of Hygiene and Public Health with the aim of “applying modern science and quantitative methods” (Brooker, Bethony, and Hotez 2004, 2).

A further connection for the impact of interventions was also being made between school or educational performance and productivity in adulthood through severe and chronic infection, to show an effect on economic outcomes. Brown charted the obsession of the RSC with increased productivity at that time: ‘In virtually every annual report, every memorandum, and every discussion the extent of hookworm infection was described and the loss in labor productivity estimated’ (Brown 1976, 900). Also, Australian researchers Waite and Neilson (1919) concentrated on the effects on ‘mental development’, while US researchers Wilson Smillie and Donald Augustine (1926) considered the economic impact dependent on the severity of infection. Counting worms was further systematised in the 1920s. Smillie and Augustine along with their contemporaries, used the ‘Stoll method’ of ova counts (Stoll 1923) that led to an estimate of the number of hookworm eggs in faeces and encouraged a quantitative mapping of the problem.

Treatment programmes were also expanded, as the US took a technical and political interest in the issue, especially in Latin America. As Marcos Cueto outlines, the 1920s and ’30s were marked by an absence ‘of an effective international framework through which Latin American countries could act on common health problems’, a gap that the RSC filled, playing an active role in eradication campaigns (Cueto 1995, 222). This stemmed partly from the success of yellow fever eradication for the Panama Canal, the fear of (re)infection across the border in the US, and the perceived role of the US in protecting countries under its economic influence (ibid.). The concentration on hookworm served an economic development purpose, as the disease was seen as a reason for low productivity of rural workers. Work began in 1920 with a preliminary survey in Colombia that found infection rates of 75 per cent, which was, ‘considered crucial for presenting hookworm as a separate and dramatic disease and for convincing everybody of the urgency of treatment’ (ibid., 224). Counting remained so important because it supported the case that the problem was severe and so could prompt action. However, campaigns in Paraguay (1923–7), Venezuela (1927–8), and Mexico (1923–8) faced unexpected difficulties for the goal of eradication by the 1930s:

. . . the gigantic dimension of hookworm infestation in some countries, the problems in administering safe doses, the high cost of the campaigns, the resistance of native healers and some physicians, the tension between foreign experts and local inspectors, the temporary nature of the majority of the latrines constructed, the disturbed political conditions of some countries which discontinued services and the fact that despite the surveys hookworm disease was never fully considered a problem of primary importance in many countries. (Cueto 1995, 225)

Following the disillusionment towards these partially successful campaigns, Norman Stoll, the renowned parasitologist and epidemiologist based at the Rockefeller University, wanted to reinvigorate the campaign against parasitic worms, with his eye on measuring at a global scale. In the 1940s, Stoll joined parasitologists in a wider attempt to gain traction in international health (Mason Dentinger 2018).^{^[3]} Stoll (1947) played a key role in bringing attention to the problem, as shown by the experience of US servicemen, who were able to avoid worms on home soil but became infected in other countries. He identified that measurement was needed for the advocacy of a problem undetermined in scale and importance. Thus, there is a rhetorical force of counting that contributed to raising awareness of the issue. Indeed, measurement of prevalence had been a key component, alongside treatment, for the eventual eradication of hookworm in the American South (Elman et al. 2014).

Stoll’s work in highlighting the problem of worms showed how measurement matters for policy and public interest. He published an article in the Journal of Parasitology (Stoll 1947) which he presented at an address to the American Society of Parasitologists discussing the first systematic attempt to measure human helminthiasis worldwide, by using an appeal to tackle the issue of worm-infected servicemen returning from the Pacific battlefields of World War II (Klass 2015). He called it ‘the great infection of mankind’ and highlighted the need to conquer worms as one of the most prevalent human infections, for the common good and raise human capabilities: ‘(F)or only in a society made up of parasite-free individuals will we know of what the human being is capable’ (ibid.). To reach such a state, he posed the central question: ‘Just how much human helminthiasis is there in the world?’ (Zhou et al. 2010).

Answering the question of measuring the occurrence of worms proved difficult. Despite the mass campaigns of the 1920s and ’40s, there existed no central standardised source to access the information needed. However, Stoll was able to provide an estimate of the global numbers of infection with major parasitic worms. He conducted an extensive review of the literature and consultation with other parasitologists on soil-transmitted helminths (roundworm, hookworm, whipworm, and threadworm) along with lymphatic filariasis, schistosomiasis, and food-born trematodiasis (Utzinger et al. 2010). This estimate was created in the absence of uniform measurements and with widespread gaps in global surveillance in the midst of the dramatic intensification of global health of the post-war period. Nevertheless, he managed to produce estimates that showed a global roundworm prevalence in 1940 of 29.8 per cent, with 644.4 million people infected (ibid.). Counting, is therefore also intertwined with scale, where Stoll began with counting at an individual level and then moved to a larger scale to have a global health impact. An increase in the number of International institutions and donors required the symbolic force of counting on a global scale.

Alongside the US-led efforts, worms and deworming campaigns have also been at the heart of an aspirational and large-scale international developmental agenda. Being one of the most prevalent parasitic worm infections, hookworm was the disease that encapsulated many of the aims of fledgling international health organisations, to awaken public interest in hygiene and sanitation as well as scientific medicine. While the RSC had served as a focal point for hookworm efforts in the ‘ambitious and controversial’ aim of eradication (Cueto 1995, 222), it also became a model organisation for other health initiatives to imitate (Farley 1995). Global deworming fell behind as a public health goal, although contributions were still being made by epidemiologists through transmission modelling studies pioneered in the 1980s.

By the 1990s and 2000s, the development agenda for human capabilities had switched from a perspective of paternalist imperialism to being framed more squarely within economic development with a neoliberal foundation. The concern with worm infections expanded from measuring the occurrence of worms to better measuring the economic and development impact. In a global health era of cost-effectiveness reviews of interventions, defined by new entrants such as the Bill and Melinda Gates Foundation, there was a growing emphasis on counting exercises that aimed to connect worm prevalence directly with human and economic development. At the same time new forms of counting, such as quality-adjusted life years (QALY) – which was a measure of disease burden, including quality and the quantity of life lived – and later disability-adjusted life years (DALYs), aimed to translate health interventions into estimates of impact on population productivity (Wahlberg and Rose 2015).^{^[4]} Peter Hotez and Jennifer Herricks at the Baylor College of Medicine produced a ‘Worm Index’ comparing disease burden data from the World Health Organization (WHO) (indices range from 0–1, with 1 being the highest) with the United Nations Development Programme (UNDP) Human Development Index (HDI) (Hotez and Herricks 2015). The HDI measures a country’s achievement in education, the standard of living and years of life lived in good health (ibid.). The HDI measures a country’s achievement in education, the standard of living and years of life lived in good health (ibid.). McGillivray emphasises that this is intended to be a comparative measure, as an assessment of ‘. . . intercountry development levels on the basis of three so-called deprivation indicators: life expectancy, adult literacy and the logarithm of purchasing power adjusted per capita GDP’ (McGillivray 1991, 1461).

While estimates have become more uniform since the early 2000s, detail and specificity were still lacking; as Simon Brooker, Jeffrey Bethony, and Peter J. Hotez argue, the global burden is likely to be underestimated due to gaps in reporting for individual countries (2004). They pointed out the lack of published information from the former Soviet Union and Eastern Europe in the circulation of international literature. By the early 2000s, global prevalence had dropped to 12.4–18.8 per cent for 2003–5 but with an increase in people infected (807–1,221 million) because of population growth (Zhou et al. 2010, 199). As Rachel Pullan et al. (2014) have noted, improvements in cartographical techniques and mathematical modelling approaches have now produced more comprehensive estimates of helminth infection. These numbers form an important benchmark upon which to evaluate future scale-up of major control efforts. More recent estimates show the overall prevalence of roundworm declined by 10 per cent between 2005 and 2015 due to infection control efforts (ibid.).

Similarly for country comparisons of worm infections Stoll’s influence is still being felt. The highly ambitious efforts to achieve global control on worms and to maintain quantification or counting continues to occupy researchers. The visualisation of worms and management of large amounts of data has become an increasingly important tool for providing rationale, targeting interventions, and determining success. There are inherent limitations however, that afflict the mapping of disease generally, not least in the challenge presented by national boundaries. Adams (2016) has noted, data collection practices are both contingent on national willingness and support structures to collect data, as well as reliability of numbers and gaps in representational practices. As I have shown in this section, the rationale for worm metrics has changed over time, however, the next question is what different disciplinary perspectives have brought to the ideals of measurement.

Re-Analysis of School Deworming for Development

The epidemiologists Roy Anderson and Robert May first highlighted school-aged populations as being at greatest risk for heavy worm burdens, and they would later develop a modern framework to understand parasitic transmission (Anderson and May 1982; Anderson 1986). Others picked up on deworming via schools, with Donald Bundy et al., and Lorenzo Savioli providing a proof-of-concept of the benefits of school deworming programmes (Bundy et al. 1990; Savioli, Bundy, and Tomkins 1992). The interest in school deworming continued as the adverse effect of worms begins early in life – understood in economic development terms as first with schoolchildren in the education system who will become working adults in the labour market, which is why schools as a site for deworming became a central target for interventions.

However, proving the impact of such interventions would be challenging and the evidence gap expanded over time. Despite attempts of measuring, as described through the Cochrane Report (Dickson et al. 2000), the quality and robustness of evidence were still falling short. However, international organisations such as the WHO still referred to the ample evidence supporting the argument for mass deworming in children suffering nutritionally, educationally, and economically. Such a catching-up of the global health community in providing evidence was a feature across many health areas to justify often well-established interventions as being cost-effective as part of neoliberal reforms of the 1980s and ’90s (Williamson 2009).^{^[5]}As Christian McMillen (2021) argues through his case example of clean water and sanitation that emphasis on quantifying benefits through cost-effectiveness analysis became a preoccupation, especially via economists at the World Bank as it grew in global health influence, meaning ‘economic concerns not necessarily public health imperatives would guide global health policy’ (ibid.). McMillen points out that economic questions had emerged in the late 1970s, while before the assumption was that clean water and sanitation had obvious benefits to improving health.

A concern with quantification had been present earlier, including a recognition of the difficulties in finding a causal relationship between intervention and impact, but it was not needed in the same way to justify intervention. The rise and increased sophistication of measurement tools such as the RCT alongside a renewed emphasis on economic concerns led to proof of such a link being required. The goal of RCTs was to attain greater certainty and precision about interventions, but where this proved difficult with large-scale problems and difficult-to-measure benefits, a bias formed towards short-term and inexpensive interventions that could be more easily measured. Striking similarities occurred with ‘Water, Sanitation, and Hygiene’ (WASH) interventions, as McMillen (2021) has outlined comprehensively, where accounting for the rationale and impact came into contradiction with self-evident and general observations. It was intuitive that preventing parasitic worms was beneficial, but fitting intuition within the robust measurement logic of new global health evidence norms was a challenge. Where counting worms would again reach a greater level of interest was not just in counting – to assess the burden of disease or correlation with human and economic development – but as a justification for health interventions and appraising success. As Cal Biruk (2021) also points out, many of the success stories in global health are ‘propped up’ by quantitative evidence.

There was certainly a need for deworming interventions to be justified through proven economic development impact. A WHO report (2005) used the Miquel and Kramer article as justification for deworming interventions and indeed an evidence-based rationale for why neglected tropical diseases such as parasitic worms deserved attention. The report presented the claim about school deworming, stating: ‘The package of neglected tropical diseases is a clear example of a rapid-impact intervention with a high pay-off at a very low cost. School deworming, for example, is highly cost-effective’ (ibid., 8). Therefore, the WHO report related the lowering of worm infection rates with economic development. What was striking about this claim was the uncontentious assertion of intervention replicability to other countries, which mirrored the early Rockefeller Foundation rhetoric of eradication campaigns in the American South being applied to other areas of the world, particularly Latin America. The connection was made with deworming as an intervention that could be measured in terms of impact, cost-effectiveness, and quick results. The impact was on better school attendance translating to economic results, as the WHO describes:

From a microeconomic perspective, tackling neglected tropical diseases provides both health and economic benefits at low cost. There is ample evidence of significant gains in worker productivity as well as impressive effects on school attendance test scores. Externalities also apply to children attending schools without deworming activities, given the lower rate of infection in the community (ibid., 9).

The school that received the pills experienced lower school absenteeism, but they also showed that even for those in the same school who did not receive the pills and nearby schools, there was a positive spillover effect or externality in lower absenteeism there too. One of the reasonings for the outcome of positive spillovers or externalities was that less infected excrement in the environment could mean that others did not get infected. This result presented de-worming as an even cheaper way of increasing school participation, as the spillover effect magnified the pills’ impact. However, a challenge arose when a reanalysis was conducted and published in 2015, by Calum Davey et al. (2015) – who could not replicate the findings. They found a lower and no longer statistically significant number of attendances by treated children, and a lack of benefit for neighbouring schools.

The group of epidemiologists from LSHTM had wanted to conduct a reanalysis in the wake of a Cochrane Report (Taylor-Robinson et al. 2015) that has systematically reviewed the available evidence to support mass deworming. The LSHTM researchers (Aiken et al. 2015) found a lower and no longer statistically significant number of attendances by treated children. Specifically, they found that the lines in the program to calculate which schools fell into the ‘deworming nearby’ category had erroneously excluded the majority of schools, meaning that once that key error was corrected, the benefit for neighbouring schools ‘effectively disappeared’ (Goldacre 2015). Even after re-applying the analytical approaches originally used, and correcting for various errors, the epidemiologists found little evidence for the previously reported indirect effects of a deworming intervention. However, the evidence varied by method, as shown in another article by some of the same LSHTM authors, which put forward a more positive analysis:

The evidence supporting an improvement in school attendance differed by analysis method. . . . We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance (Davey et al. 2015, 1581–2).

Reanalysis is a crucial element of the Cochrane approach, and the circulation of RCT evidence and controversies are often linked to re-analysis. Therefore, it is important to consider: why are re-analyses carried out, who are behind them, what interests are at stake? There were two re-analyses conducted, the first was a ‘pure’ replication and the second an ‘alternative’ analysis. The first reanalysis article by the LSHTM researchers (Aiken et al. 2015) used the computer code and dataset to try to match the findings of the original article to check for errors and fraud. After correcting for coding errors, they found little evidence for an indirect effect of deworming. The second article that quickly followed by the same authors (Davey et al. 2015 – with a different author order) was an alternative analysis that tried to analyse the article in ways that the original did not – namely to analyse each year of the RCT separately.

The LSHTM researchers were interested in understanding the use of RCTs from another disciplinary perspective taken by the development economists, and due to this interest, the Worm Wars involved a critical lens on the use of RCTs in development economics. In an ‘author’s response’ to the controversy, they stated that as HIV epidemiologists, they wanted to learn about the evaluation methodology guiding economist-led randomised trials of the early 2000s on HIV-risk behaviours (e.g. drug use and sexual practices), as they found, ‘. . .appraising these studies [is] challenging because of different approaches to study design, reporting and analysis’ (Hargreaves et al. 2015, 1597). The experience the LSHTM researchers had with RCTs in the context of HIV could be viewed as one group of experts questioning the use of ‘their’ methods by another group of experts. However, from their own accounts, it does appear that they were driven by curiosity to understand more about the adoption of RCTs by another discipline. According to Deaton and Cartwright, ‘what epidemiology knows is not what is known by economics, or political science, or sociology, or philosophy—and the reverse’ (Deaton and Cartwright 2018, 2). RCT literature across disciplines ‘uses its own language and different understandings and misunderstandings characterize different fields and different kinds of projects’ (ibid.). Thus, the Worm Wars thus would soon spill over into a broader critical attack on RCTs used across disciplines in the differing emphasises given and choices made. Indeed, critiques would appear in the following years on the use of RCTs for HIV behavioural prevention interventions (Friedman, Perlman, and Ompad 2015).

Responding to the reanalysis of their findings, Kremer and Miguel (2015) asserted the validity of their initial study but with the correction of some errors of externality and school participation effects. The message here is not just that the achievement of standardised measures is contestable but also two key points in how measurement is used to provide evidence for understanding health issues and their interventions:

1. a connection is sought between an intervention for a health issue and economic development outcomes;

2. a hierarchy in methods exists for solutions to health issues privileging the quantifiable, statistical measurement that appears objective, scientific, and comparable.

Making a connection between deworming and education (attendance and performance) is used as justification for intervention. It also fits within status quo understandings that health improvement leads directly to better economic outcomes and is particularly evident when it comes to problems that seem to be amenable via straightforward technical interventions like administration of deworming drugs.

While such sentiments are not new and have surfaced repeatedly for over a century, the Worm Wars represent a collision between the use of measurement between disciplines. Different disciplinary perspectives and hence concerns between philosophers, ethicists, economists, and medical researchers lead to a lack of consensus for defining key topics and how they should be measured (Anderson and Burckhardt 1999). The increasing emphasis on measurement for evidence in global health and the standards, metrics, and quantification entailed makes for further contention. Disagreement arises where universal standards meet local practices and, as Maldonado and Moreira point out, a tension has developed between ascertaining the—

value of health interventions across populations [and those who] place less value on developing a standardised measure of health gain. . . For them, health research should not be concerned with the question of whether public health interventions work but, instead, with understanding the process of making them work in specific contexts. (2019, 214).

Many have argued there is more limited utility in applying RCTs to evaluate public health interventions and that it can result in distorted conclusions (Shelton 2014). Other options include observational studies as well as qualitative research and evaluation methods that do not privilege unbiasedness over other statistical qualities such as precision (Abdelghafour 2017). Authors such as Kingori (2013) and Biruk (2018) pay particular attention to data collectors (or fieldworkers) and the inner workings of data collection practices that they find offer further insight into the lived reality of people, in particular their socio-economic context and demands. As Kvangraven (2020) and others have also argued, the data collection stage is given less priority (over the econometric second stage data analysis). The result is collection and entry errors, which may be due to lack of experience or knowledge, or more systemically to hierarchical division of labour between project managers and field staff, and a lack of quality checking over the large number of trials that researchers oversea (ibid.).

There has been a paucity of well-supported alternatives to use in place of RCTs. The WHO, for example, relies on ‘GRADE’ system (a grading of recommendations, assessment, development, and evaluation). However, this continues to place importance on RCTs for developing public health recommendations (Guyatt et al. 2011). Such approaches still include RCTs while purporting to widen the methods used to incorporate other different study designs. The GRADE system is not an alternative to RCTs as experimental designs remain in a high position within the evidence hierarchy. As Littoz-Monnet and Uribe (2023) have argued, GRADE acts a methods regime to produce policy evidence but has limitations in undervaluing ecological and environmental knowledge and cannot be well applied to understand complex phenomena: ‘Those forms of knowledge that do not rank high according to GRADE, such as observational studies or case reports, are often disregarded or seen as anecdotal’ (ibid., 3–4). The increased interest in measures of health and their validation and use cannot be simply understood through a neoliberal logic of health production to maximise economic productivity as there is an:

interactive relationship between health measurement and the politics of health . . . characterised by controversy and uncertainty about how to interlock normative ideals and approaches to knowledge-making about health (Maldonado and Moreira 2019, 203).

It was also a time when the utility of RCTs were being stretched from the original scope of medical science to produce measures of local efficacy with non-standardisable data within the development economics field, but simultaneously epidemiological methods were being attempted that could account for large externalities and to make further connections between socio-economic impact (e.g., education and productivity). The 1990s were when Kremer was able to carry out the deworming RCT from 1998 to 2001 with Miguel after gaining the support of an education-focused Dutch NGO ‘International Christian Support Fund Africa’ (ICS) (Abdelghafour 2017; Kvangraven 2020).

The promise of RCTs was to bring in a level of rigour and legitimisation to the practical field of development economics (and international development), which had been marred by criticisms of a disconnect between theory and practice, as well as a lack of justification for the application of measures that were being employed. This has been a central claim of key development economists Abhijit Banerjee and Esther Duflo who are proponents of RCTs and the use of experiments more broadly. They have argued development policy was driven too much by ideology which have produced many failures (Kvangraven 2020). Similar to the institutional development of RCTs via Cochrane, in 2003 they founded the Abdul Latif Jameel Poverty Action Lab (J-PAL) with Sendhil Mullainathan, with the aim of testing and improving the effectiveness of social programs (The Abdul Latif Jameel Poverty Action Lab n.d.).

The RCT in the Evidence Hierarchy

At a deeper level, the appeal of trying to measure and correlate the impact of drugs and economic development has to do with the wider shift in the valuation of evidence that RCTs have been part of and have accelerated. The Cochrane Review process puts RCTs at the top of a hierarchy for forms of evidence. RCTs have acquired the reputation of being a ‘. . . “fair test” of whether an intervention works – throughout the entire community of development work’ (Goldacre 2015), and so the value of evidence is determined less by its reflection of complex interrelations in the field and more by how easily it fits into frameworks that can be studied via RCTs. In doing so, other types of research, particularly qualitative research, are marginalised (Cartwright 2007). This problem is not unique to RCTs but encompasses the wider quantification of international and global health that accelerated with the widespread adoptions of metrics from the 1970s onwards and the increasing necessity to justify the cost-effectiveness of health interventions in economic terms. The fact that RCTs themselves are no panacea and will not always provide clarity regarding the impact of interventions in complex human and non-human environments is not frequently reflected upon (Cassidy 2019).

The original Cochrane review in 2000 that inspired Kremer and Miguel to react had stated:

. . .the evidence of benefit for mass treatment of children related to positive effects on growth and cognitive performance is not convincing. In the light of these data, we would be unwilling to recommend that countries or regions invest in programmes that routinely treat children with anthelmintic drugs to improve their growth or cognitive performance. (Dickson et al. 2000, 1700).

However, it appears that the Cochrane review excluded articles if they were not ‘pure’ RCTs, or were historical articles that did not incorporate RCTs (Kremer and Miguel 2015). This approach follows the framework of ‘Evidence-based Medicine’ (EBM) in creating a hierarchy of evidence to guide health interventions (Sackett 1997). RCTs are rated to be the highest quality of evidence for unfiltered information, ahead of cohort studies and case-controlled studies (for the filtered information, systematic reviews, which Cochrane studies carry out, are top).

Not meeting the rigorous requirements of RCTs can lead to the defunding of interventions. In 2012, the charity evaluator ‘Giving What We Can’ revised their estimate of the cost-effectiveness of earlier deworming treatments due to the 2012 Cochrane review of deworming treatments, which had indicated that they were less effective than first thought (Cotton-Barratt 2012). A further Cochrane review in 2019 presented the current evidence at the time, which involved a large amount of evidence, with researchers examining the effects of deworming by searching for relevant trials up to 19 September 2018, which covered:

51 trials, including 10 cluster‐RCTs, that met the inclusion criteria. One trial evaluating mortality included over one million children, and the remaining 50 trials included a total of 84,336 participants. (Taylor-Robinson et al. 2019, 2).

While apparently reliable and robust evidence is important for justifying policy decisions and actions, lack of compatibility with RCTs as a priori reason to exclude data can lead to a blinkered approach of cherry-picking quantifiable data and designing measurements, which will result in RCT-compatible data but may fail to capture multi-causal realities on the ground. In the case of deworming, Deaton has argued that an RCT cannot establish economic causality and therefore, the ‘imposition of a hierarchy of evidence is both dangerous and unscientific’ (in Bédécarrats, Guérin, and Roubaud 2020). Bédécarrats, Guérin, and Roubaud (ibid.) similarly contend that there is not only difficulty of applying causation but make an additional point that use of the term ‘deworming’ is applied inconsistently to refer to different infections and treatment regimes.

While apparently reliable and robust evidence is important for justifying policy decisions and actions, lack of compatibility with RCTs as a priori reason to exclude data can lead to a blinkered approach of cherry-picking quantifiable data and designing measurements, which will result in RCT-compatible data but may fail to capture multi-causal realities on the ground. In the case of deworming, Deaton has argued that an RCT cannot establish economic causality and therefore, the ‘imposition of a hierarchy of evidence is both dangerous and unscientific’ (in ibid.). Bédécarrats, Guérin, and Roubaud (ibid. ) similarly contend that there is not only difficulty of applying causation but make an additional point that use of the term ‘deworming’ is applied inconsistently to refer to different infections and treatment regimes.

Also important to consider is the way that deworming drugs are administered as a health intervention and how that is measured. These are typically delivered via mass drug administration (MDA). Medical researchers have run studies measuring the impact of MDA, where a target population is given a regular deworming treatment whether they are infected or not. This is the type of deworming treatment Miguel and Kremer were measuring and researchers, including epidemiologists at LSHTM, have also used cluster-controlled RCTs to assess the impact of this intervention (Hart et al. 2020). In a different discipline, anthropologists Tim Allen and Melissa Parker (2016) have formed an interest in school deworming as a main vehicle for achieving MDA. Their weighing into the Worm Wars has been to further emphasise how the evidence for MDA is lacking. They were less concerned with the intricacies of measurement per se but thought it an indication of the high stakes attached to MDA, that was worth challenging. They questioned the attractiveness of MDA as an intervention, arguing it requires good communication, engagement and acceptance in order for people to take pills even when they are not ill and tolerate any side-effects. Similarly, Simon Croft at the LSHTM has worried about how the good news story of MDA does not encompass some of the drawbacks, especially picking up on the previous point of different worms being targeted in the same way:

. . .Mass Drug Administration can lead to drug resistance. . . doesn’t separate out the fact that some worms the ascaris roundworms, 90% sensitive to most of these drugs but trichuris and other worms are only about 30% sensitive.^{^[6]} You classify them all together (Interview with author, Croft 2014).

The ability to measure through RCTs prevents other interventions coming to the forefront in place of deworming dug interventions. At the WHO, Antonio Montresor has researched RCTs for MDA of parasitic worms. He believes that there is some delusion of the benefit of MDA and also RCTs in measuring the effects of the intervention. Instead, other interventions should be considered but showing their benefits is more difficult, such as with WASH. Montresor recounted, post-war Italy eliminated parasitic worms without the distribution of drugs but because of sanitary improvements from economic development, and Japan showed similar if not faster results: ‘I think sanitation is a right of a person but is not the case for 99% of people’ (Interview with author, Montresor 2013). For him, the prevalence of worms in poor areas means improving sanitation through new, adapted solutions to a different environment. In addition, Brooker, Bethony, and Hotez (2004) argue it was the economic development of the US South that led to hookworm elimination (not the Rockefeller Hookworm Eradication Campaigns) and a similar story for South Korea with economic reforms alongside specific control programmes in the 1960s and ’70s.

However, rather than a consideration of other methods, the dominance of evidence-based medicine favours the empiricism and experimental methodology of the RCT and faith in biostatistical techniques (Donovan 2018). These methods for evidence synthesis set a dogmatic agenda shaped by what can be measured rather than being guided by the questions that need to be answered. For example, RCTs still rely on expert interpretation and lack reflection on the possible limits of statistical methodology (ibid.). As we see from the Worm Wars, particular methods of measurement and evaluation are so pervasive that they even dictate the kind of questions posed, narrowing the topics that can be researched and limiting the range of possible solutions.

Conclusion: The Meaning of the Worm Wars

In this article I have shown how the Worm Wars reveal the importance placed on measurement and evidence standards by various actors, including academics (economists and epidemiologists in this instance) but also by policymakers, NGOs, and international organisations (for example, in World Bank reporting). Different disciplinary approaches have caused debate over how best to measure interventions and their impacts, which raises more fundamental concerns about the difficulty of measuring policy outcomes.

As Deaton and Cartwright (2018) observe, the literature on RCTs across disciplines are overlapping but different. Indeed, epidemiologists have noted that they come from an older tradition of RCTs. It was British epidemiologist Bradford Hill who formalised RCT methods in the 1940s. Laura Bothwell et al. (2016) also argue that it was clinical epidemiologists who, by the early 1980s, labelled RCTs the gold standard of medical knowledge. Economists rather than economic development researchers had also been using RCTs for a longer period before. However, Miguel and Kremer were among the first to apply the method to their specialism. Others have asserted that it is not the type of evidence-based methodology at fault, but the inability to repeat findings through researchers from a different discipline replicating the same findings using the same data. Similarly the failure to reanalyse and replicate the results – above the point about miscalculations and differing forms of evaluation to include – was at its core a disagreement over the magnitude of impact. A key argument is that reanalysis and replication are needed to have confidence in the reliability of research (Ioannidis 2005). What was becoming increasingly obvious was that despite being an allegedly uniform gold standard for adjudicating measurements, an RCT by itself is powerless to evaluate the different philosophies guiding what measurements were designed to find out in the first place.

Nevertheless, the Worm Wars have occurred at a level that is far removed from the original attempts to connect deworming and improved school attendance and then economic growth and development. No remeasurement of the original data took place. Since the early 2000s, other studies measuring worm burdens have shied away from RCTs and focused on the seemingly straightforward approach of counting worm infections and examining correlations between worms and economic indicators such as the HDI. There has also been some pushback against the use of RCTs to measure the success of deworming interventions at the international level, as discussed by WHO scientist Antonio Montresor. In general, it is becoming apparent that complex environmental interventions such as improved sanitation may be less easily measured through RCTs, and this may influence whether or not these types of interventions are pursued if the RCT is so highly-prized as a metric. Such an example has been well-documented by McMillen in showing the pushback by the hygiene and sanitation community against RCTs (McMillen 2021). There is clearly a difference in the expectations and assumptions of how RCTs are performed across subjects, particularly between the social sciences and medicine. This is not to argue either that RCTs are only useful to some disciplines and cannot be applied to others but to be realistic about complex interventions with humans and the involvement of many contributing factors when applying similar methodological approaches across disciplines.

Measurement is thus crucial in setting standards and the importance of measurement in defining and advocating for global health problems. The pervasive use of standard quantitative measurement in global health is having an effect, especially in how metrics are deployed by higher-income countries to set developmental and health agendas and priorities for lower income countries and areas. In measuring the impact of interventions in global health, the assumed success story of RCTs being applied to economic development has unravelled. In setting standards for global health, the RCT success story is difficult to uphold. Ongoing debate about RCTs has only prompted further contention about how to approach global health interventions, their audit and appraisal. However, moving past the controversy, a victor has emerged in the continued use of RCTs and their privileged position.

This position was affirmed in 2019, when one of the development economists at the centre of the worm wars controversy, Michael Kremer, was awarded the Nobel Prize in Economic Sciences with fellow development economists Banerjee and Duflo, ‘for their experimental approach to alleviating global poverty’ (Nobel 2019). Kremer, Banerjee, and Duflo were responsible for early experimentation, as well as establishing institutions to run economic development RCTs (J-PAL), and Kremer set up the charity ‘Deworm the World’. RCTs, therefore, transformed development economics and the prize was an acknowledgement of the tremendous impact this form of measurement had within their field. Despite the need for an epistemic role for the contextual dependencies of RCTs that the Worm Wars display, it is a measurement standard that seems set to stay although criticism does continue (including from Deaton, who is also another Nobel Prize winner in economics). Still, the continued critiques are needed. Rather than assuming a hegemony of standardised measures to guide the transfer of funding, resource, and knowledge, consideration is also needed for what the imperative is to do so.

Acknowledgements

This article has been a long time in the making: starting off originally during my PhD and then kick-started again by a really engaging workshop ‘Standards & Their Containers’ organised by Claas Kirchhelle and Aro Velmet in 2019. I would like to thank the many people who provided input – anonymous reviewers, special issue editors, and journal editors for their feedback that improved this paper.

Author Biography

Samantha Vanderslott is a health sociologist and associate professor at the University of Oxford, leading the Vaccines and Society Unit (VAS) hosted by the Oxford Vaccine Group. Her work centres around health, society, and policy topics. drawing on perspectives from sociology, anthropology, history, global health, and science and technology studies (STS).

References

Abdelghafour, Nassima. 2017. “Randomized Controlled Experiments to End Poverty?” Anthropologie & Développement (46–47): 235–62.
https://doi.org/10.4000/anthropodev.611.

Adams, Vincanne, ed. 2016. Metrics: What Counts in Global Health. Durham: Duke University Press.

Aiken, Alexander M., Calum Davey, James R. Hargreaves, and Richard J. Hayes. 2015. “Re-Analysis of Health and Educational Impacts of a School-Based Deworming Programme in Western Kenya: A Pure Replication.” International Journal of Epidemiology 45(5): 1572–80.
https://doi.org/10.1093/ije/dyv127.

Allen, Tim, and Melissa Parker. 2016. “Deworming Delusions? Mass Drug Administration in East African Schools.” Journal of Biosocial Science 48(S1): S116–47.
https://doi.org/10.1017/s0021932016000171.

Anderson, Kathryn L., and Carol S. Burckhardt. 1999. “Conceptualization and Measurement of Quality of Life as an Outcome Variable for Health Care Intervention and Research.” Journal of Advanced Nursing 29(2): 298–306.
https://doi.org/10.1046/j.1365-2648.1999.00889.x.

Anderson, Roy M. 1986. “The Population Dynamics and Epidemiology of Intestinal Nematode Infections.” Transactions of The Royal Society of Tropical Medicine and Hygiene 80(5): 686–96.
https://doi.org/10.1016/0035-9203(86)90367-6.

Anderson, Roy M., and Robert M. May. 1982. “Population Dynamics of Human Helminth Infections: Control by Chemotherapy.” Nature 297(5867): 557–563.
https://doi.org/10.1038/297557a0.

Bédécarrats, Florent, Isabelle Guérin, and François Roubaud. 2020. Randomized Control Trials in the Field of Development: A Critical Perspective. Oxford: Oxford University Press.

Belluz, Julia. 2015. “Worm Wars: The Fight Tearing Apart the Global Health Community, Explained.” Vox. Accessed October 14, 2022. Last modified July 29, 2015.
https://www.vox.com/2015/7/24/9031909/worm-wars-explained.

Biruk, Cal. 2018. Cooking Data: Culture and Politics in an African Research World. Durham: Duke University Press.

⸻. 2021. “The Politics of Global Health.” PoLAR: Political and Legal Anthropology Review 44(2): e161–80.
https://doi.org/10.1111/plar.12431.

Bleakley, Hoyt. 2007. “Disease and Development: Evidence from Hookworm Eradication in the American South.” The Quarterly Journal of Economics 122(1): 73–117.
https://doi.org/10.1162/qjec.121.1.73.

Boseley, Sarah. 2015. “New Research Debunks Merits of Global Deworming Programmes.” The Guardian. July 23, 2015. Accessed December 10, 2015.
https://www.theguardian.com/society/2015/jul/23/research-global-deworming-programmes.

Bothwell, Laura E., Jeremy A. Greene, Scott H. Podolsky, and David S. Jones. 2016. “Assessing the Gold Standard—Lessons from the History of RCTs.” The New England Journal of Medicine 374(22): 2175–81.
https://doi.org/10.1056/nejmms1604593.

Brooker, Simon, Jeffrey Bethony, and Peter J. Hotez. 2004. “Human Hookworm Infection in the 21st Century.” Advances in Parasitology 58: 197-288.
https://doi.org/10.1016/S0065-308X(04)58004-1.

Brown, Annette N., and Benjamin D. K. Wood. 2018. “Replication Studies of Development Impact Evaluations.” The Journal of Development Studies 55(5): 917–25.
https://doi.org/10.1080/00220388.2018.1506582.

Brown, E. Richard. 1976. “Public Health in Imperialism: Early Rockefeller Programs at Home and Abroad.” American Journal of Public Health 66(9): 897–903.
https://doi.org/10.2105/AJPH.66.9.897.

Bundy, Donald A. P., Michael S. Wong, Lewis L. Lewis, and John Horton. 1990. “Control of Geohelminths by Delivery of Targeted Chemotherapy through Schools.” Transactions of The Royal Society of Tropical Medicine and Hygiene 84(1): 115–20.
https://doi.org/10.1016/0035-9203(90)90399-Y.

Cartwright, Nancy. 2007. “Are RCTs the Gold Standard?” BioSocieties 2(1): 11–20.
https://doi.org/10.1017/S1745855207005029.

Cassidy, Angela. 2015. “‘Big Science’ in the Field: Experimenting with Badgers and Bovine TB, 1995–2015.” History and Philosophy of the Life Sciences 37(3): 305–25.
https://doi.org/10.1007/s40656-015-0072-z.

⸻. 2019. Vermin, Victims and Disease: British Debates over Bovine Tuberculosis and Badgers. Cham: Springer International Publishing.

Cotton-Barratt, Conrad. 2012. “Neglected Tropical Diseases – Are They Cost-Effective to Treat?” Giving What We Can, blog post. October 11, 2012. Accessed May 26, 2019.
https://www.givingwhatwecan.org/post/2012/10/neglected-tropical-diseases-are-they-cost-effective-to-treat/.

Cueto, Marcos. 1995. “The Cycles of Eradication: The Rockefeller Foundation and Latin American Public Health, 1918–1940.” In International Health Organisations and Movements 1918–1939, edited by Paul Weindling, 222–43. Cambridge: Cambridge University Press.

Davey, Calum, Alexander M. Aiken, Richard J. Hayes, and James R. Hargreaves. 2015. “Re-Analysis of Health and Educational Impacts of a School-Based Deworming Programme in Western Kenya: A Statistical Replication of a Cluster Quasi-Randomized Stepped-Wedge Trial.” International Journal of Epidemiology 44(5): 1581–92.
https://doi.org/10.1093/ije/dyv128.

Deaton, Angus, and Nancy Cartwright. 2018. “Understanding and Misunderstanding Randomized Controlled Trials.” Social Science & Medicine 210: 2–21.
https://doi.org/10.1016/j.socscimed.2017.12.005.

Dickson, Rumona, Shally Awasthi, Colin Demellweek, and Paula R. Williamson. 2000. “Anthelmintic Drugs for Treating Worms in Children: Effects on Growth and Cognitive Performance.” Cochrane Database of Systematic Reviews (2). Modified April 18, 2007.
https://doi.org/10.1002/14651858.CD000371.

Donovan, Kevin P. 2018. “The Rise of the Randomistas: On the Experimental Turn in International Aid.” Economy and Society 47(1): 27–58.
https://doi.org/10.1080/03085147.2018.1432153.

Elman, Cheryl, Robert A. McGuire, and Barbara Wittman. 2014. “Extending Public Health: The Rockefeller Sanitary Commission and Hookworm in the American South.” American Journal of Public Health 104(1): 47–58.
https://doi.org/10.2105/AJPH.2013.301472.

Evans, David. 2015. “Worm Wars: The Anthology | Impact Evaluations.” The World Bank Blog: Development Impact. Modified January 4, 2016. Accessed November 10, 2015.
https://blogs.worldbank.org/impactevaluations/worm-wars-anthology.

Farley, John. 1995. “The International Health Division of the Rockefeller Foundation: The Russell Years, 1920–1934.” In International Health Organisations and Movements 1918–1939, edited by Paul Weindling, 203–21. Cambridge: Cambridge University Press.

Feasey, Nick, Mark Wansbrough-Jones, David C. W. Mabey, and Anthony W. Solomon. 2010. “Neglected Tropical Diseases.” British Medical Bulletin 93(1): 179–200.
https://doi.org/10.1093/bmb/ldp046.

Friedman, Samuel R., David C. Perlman, and Danielle C. Ompad. 2015. “The Flawed Reliance on Randomized Controlled Trials in Studies of HIV Behavioral Prevention Interventions for People Who Inject Drugs and Other Populations.” Substance Use & Misuse 50(8–9): 1117-24.
https://doi.org/10.3109/10826084.2015.1007677.

Goldacre, Ben. 2015. “Scientists Are Hoarding Data and It’s Ruining Medical Research.” BuzzFeed News. July 23, 2015. Accessed December 10, 2015.
https://www.buzzfeed.com/bengoldacre/deworming-trials.

Gorsky, Martin, and Christopher Sirrs. 2017. “World Health by Place: The Politics of International Health System Metrics, 1924–c. 2010.” Journal of Global History 12(3): 361–85.
https://doi.org/10.1017/S1740022817000134.

Greenberg, David H., and Philip K. Robins. 1986. “The Changing Role of Social Experiments in Policy Analysis.” Journal of Policy Analysis and Management 5(2): 340–362.
https://doi.org/10.1002/pam.4050050210.

Guyatt, Gordon H., Andrew D. Oxman, Holger J. Schünemann, Peter Tugwell et al. 2011. “GRADE Guidelines: A New Series of Articles in the Journal of Clinical Epidemiology.” Journal of Clinical Epidemiology 64(4): 380–2.
https://doi.org/10.1016/j.jclinepi.2010.09.011.

Hargreaves, James R., Alexander M. Aiken, Calum Davey, and Richard J. Hayes. 2015. “Authors’ Response to: Deworming Externalities and School Impacts in Kenya.” International Journal of Epidemiology 44(5): 1596–99.
https://doi.org/10.1093/ije/dyv130.

Hariton, Eduardo, and Joseph J. Locascio. 2018. “Randomised Controlled Trials—The Gold Standard for Effectiveness Research.” BJOG: An International Journal of Obstetrics and Gynaecology 125(13): 1716.
https://doi.org/10.1111/1471-0528.15199.

Hart, John D., Lyson Samikwa, Feston Sikina, Khumbo Kalua et al. 2020. “Effects of Biannual Azithromycin Mass Drug Administration on Malaria in Malawian Children: A Cluster-Randomized Trial.” The American Journal of Tropical Medicine and Hygiene 103(3): 1329–34.
https://doi.org/10.4269/ajtmh.19-0619.

Hill, Gerry B. 2000. “Archie Cochrane and His Legacy: An Internal Challenge to Physicians’ Autonomy?” Journal of Clinical Epidemiology 53(12): 1189–92.
https://doi.org/10.1016/s0895-4356(00)00253-5.

Hotez, Peter J., Miriam Alvarado, María-Gloria Basáñez, Ian Bolliger et al. 2014. “The Global Burden of Disease Study 2010: Interpretation and Implications for the Neglected Tropical Diseases.” PLoS Neglected Tropical Diseases 8(7): e2865.
https://doi.org/10.1371/journal.pntd.0002865.

Hotez, Peter J., and Jennifer R. Herricks. 2015. “Helminth Elimination in the Pursuit of Sustainable Development Goals: ‘A Worm Index; for Human Development.” PLoS Neglected Tropical Diseases 9(4): e0003618.

Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2(8): 2–8.
https://doi.org/10.1371/journal.pmed.0020124.

International Society for NTDs (ISNTD). n.d. “International Society for Neglected Tropical Diseases.” Accessed December 14, 2016.
http://www.isntd.org/#/isntd-conferences/4565830915.

Jones, David S., and Scott H. Podolsky. 2015. “The History and Fate of the Gold Standard.” The Lancet 385(9977): 1502–3.
https://doi.org/10.1016/S0140-6736(15)60742-5.

Kingori, Patricia. 2013. “Experiencing Everyday Ethics in Context: Frontline Data Collectors Perspectives and Practices of Bioethics.” Social Science & Medicine 98: 361–70.
https://doi.org/10.1016/j.socscimed.2013.10.013.

Klass, Perri. 2015. “War of the Worms.” The New Yorker. December 14, 2015.
https://www.newyorker.com/tech/annals-of-technology/war-of-the-worms.

Kremer, Michael, and Edward Miguel. 2015. “Understanding Deworming Impacts on Education.” University of California, Berkeley. Accessed December 10, 2015.
https://emiguel.econ.berkeley.edu/assets/miguel_research/63/Deworming-summary_Kremer-Miguel_2015-07-24-CLEAN.pdf.

Kvangraven, Ingrid Harvold. 2020. “Nobel Rebels in Disguise—Assessing the Rise and Rule of the Randomistas.” 32(3): 305–41.
https://doi.org/10.1080/09538259.2020.1810886.

Latour, Bruno. 1987. Science in Action: How to Follow Scientists and Engineers through Society. Harvard: Harvard University Press.

Littoz-Monnet, Annabelle, and Juanita Uribe. 2023. “Methods Regimes in Global Governance: The Politics of Evidence-Making in Global Health.” International Political Sociology 17(2): olad005.
https://doi.org/10.1093/ips/olad005.

Maldonado, Oscar Javier, and Tiago Moreira. 2019. “Metrics in Global Health: Situated Differences in the Valuation of Human Life.” Historical Social Research 44(2): 202–224.
https://doi.org/10.12759/hsr.44.2019.2.202-224.

Mason Dentinger, Rachel. 2018. “The Parasitological Pursuit: Crossing Species and Disciplinary Boundaries with Calvin W. Schwabe and the Echinococcus Tapeworm, 1956–1975.” In Animals and the Shaping of Modern Medicine: Medicine and Biomedical Sciences in Modern History. 161–91. Palgrave MacMillan, Cham.

McGillivray, Mark. 1991. “The Human Development Index: Yet Another Redundant Composite Development Indicator?” World Development 19(10): 1461–8.
https://doi.org/10.1016/0305-750X(91)90088-Y.

McMillen, Christian. 2021. “‘These Findings Confirm Conclusions Many Have Arrived at by Intuition or Common Sense’: Water, Quantification and Cost-Effectiveness at the World Bank, ca. 1960 to 1995.” Social History of Medicine 34(2): 351–74.
https://doi.org/10.1093/shm/hkaa006.

Miguel, Edward, and Michael Kremer. 2004. “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities.” Econometrica 72(1): 159–217.
https://doi.org/10.1111/j.1468-0262.2004.00481.x.

Nobel Prize. 2019. “The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2019.” Press release, October 14, 2019. Accessed November 30, 2021.
https://www.nobelprize.org/prizes/economic-sciences/2019/press-release/.

Pearce, Warren, and Sujatha Raman. 2014. “The New Randomised Controlled Trials (RCT) Movement in Public Policy: Challenges of Epistemic Governance.” Policy Sciences 47: 387–402.
https://doi.org/10.1007/s11077-014-9208-3.

Pullan, Rachel L., Jennifer L. Smith, Rashmi Jasrasaria, and Simon J. Brooker. 2014. “Global Numbers of Infection and Disease Burden of Soil Transmitted Helminth Infections in 2010.” Parasites and Vectors 7(37).
https://doi.org/10.1186/1756-3305-7-37.

Reubi, David, Clare Herrick, and Tim Brown. 2016. “The Politics of Non-Communicable Diseases in the Global South.” Health & Place 39: 179–87.
https://doi.org/10.1016/j.healthplace.2015.09.001.

Registry for International Development Impact Evaluations (RIDIE). N.D. Website.
https://ridie.3ieimpact.org/.

Rosemann, Achim. 2019. “Alter-Standardizing Clinical Trials: The Gold Standard in the Crossfire.” Science as Culture 28(2): 125–48.
https://doi.org/10.1080/09505431.2019.1606190.

Sackett, D. L. 1997. “Evidence-Based Medicine.” Seminars in Perinatology 21(1): 3–5.
https://doi.org/10.1016/S0146-0005(97)80013-4.

Samii, Cyrus. 2020. “Reasons for Policy Experimentation That Have Nothing to Do with Selection Bias.” World Development 127: 104825.
https://doi.org/10.1016/j.worlddev.2019.104825.

Savioli, Lorenzo, Donald Bundy, and Andrew Tomkins. 1992. “Intestinal Parasitic Infections: A Soluble Public Health Problem.” Transactions of The Royal Society of Tropical Medicine and Hygiene 86(4): 353–4.
https://doi.org/10.1016/0035-9203(92)90215-X.

Shah, Hriday M., and Kevin C. Chung. 2009. “Archie Cochrane and His Vision for Evidence-Based Medicine.” Plastic and Reconstructive Surgery 124(3): 982–8.
https://doi.org/10.1097/PRS.0b013e3181b03928.

Shelton, James D. 2014. “Evidence-Based Public Health: Not Only Whether It Works, but How It Can Be Made to Work Practicably at Scale.” Global Health, Science and Practice 2(3): 253–8.
https://doi.org/10.9745/GHSP-D-14-00066.

Smillie, Wilson G. and Donald L. Augustine. 1926. “Hookworm Infestation: The Effect of Varying Intensities on The Physical Condition of School Children.” American Journal of Diseases of Children 31(2): 151–68.
https://doi.org/10.1001/archpedi.1926.04130020003001.

Stephenson, Lani S., Michael C. Latham, and Eric A. Ottesen. 2000. “Malnutrition and Parasitic Helminth Infections.” Parasitology 121(S1): S23–38.
https://doi.org/10.1017/S0031182000006491.

Stoll, Norman R. 1923. “Investigations on the Control of Hookworm Disease XV: An Effective Method of Counting Hookworm Eggs in Feces.” American Journal of Epidemiology 3(1): 59–70.
https://doi.org/10.1093/oxfordjournals.aje.a118916.

⸻. 1947. “This Wormy World.” The Journal of Parasitology 33(1): 1–18.
https://www.jstor.org/stable/3273613.

Taylor-Robinson, David C., Nicola Maayan, Sarah Donegan, Marty Chaplin et al. 2019. “Public Health Deworming Programmes for Soil-Transmitted Helminths in Children Living in Endemic Areas.” Cochrane Database of Systematic Reviews. September 11, 2019.
https://doi.org/10.1002/14651858.CD000371.pub7.

Taylor-Robinson, David C., Nicola Maayan, Karla Soares-Weiser, Sarah Donegan et al. 2015. “Deworming Drugs for Soil-Transmitted Intestinal Worms in Children: Effects on Nutritional Indicators, Haemoglobin and School Performance.” Cochrane Database of Systematic Review. July 23, 2015.
https://doi.org/10.1002/14651858.cd000371.pub6.

The Abdul Latif Jameel Poverty Action Lab. n.d. “About Us.” Accessed October 17, 2022.
https://www.povertyactionlab.org/about-us.

This Wormy World n.d. “Global Atlas of Helminth Infections.” LSHTM. No longer available.
http://www.thiswormyworld.org/.

Utzinger, Jürg, Robert Bergquist, Remigio Olveda, and Xiao-Nong Zhou. 2010. “Important Helminth Infections in Southeast Asia: Diversity, Potential for Control and Prospects for Elimination.” Advances in Parasitology 72: 1–30.
https://doi.org/10.1016/s0065-308x(10)72001-7.

Valier, Helen, and Carsten Timmermann. 2008. “Clinical Trials and the Reorganization of Medical Research in Post-Second World War Britain.” Medical History 52(4): 493–510.
https://doi.org/10.1017/S0025727300002994.

Vanderslott, Samantha. Josephine. 2017. Neglect in policy problems: the case of “neglected tropical diseases.” Doctoral dissertation, University College London (UCL).

Venturini, Tommaso, and Anders Kristian Munk. 2021. Controversy Mapping: A Field Guide. Cambridge: Polity.

Wahlberg, Ayo, and Linsey McGoey. 2007. “An Elusive Evidence Base: The Construction and Governance of Randomized Controlled Trials.” BioSocieties 2(1): 1–10.
https://doi.org/10.1017/S1745855207005017.

Wahlberg, Ayo, and Nikolas Rose. 2015. “The Governmentalization of Living: Calculating Global Health.” Economy and Society 44(1): 60–90.
https://doi.org/10.1080/03085147.2014.983830.

Waite, J. H. and I. L. Neilson. 1919. “A Study of the Effects of Hookworm Infection upon the Mental Development of North Queensland School Children.” Medical Journal of Australia 1(1): 1–8.
https://doi.org/10.5694/j.1326-5377.1919.tb29570.x.

Wessely, Simon. 2007. “A Defence of the Randomized Controlled Trial in Mental Health.” BioSocieties 2(1): 115–27.
https://doi.org/10.1017/S1745855207005091.

World Health Organization. 2005. “Strategic And Technical Meeting On Intensified Control of Neglected Tropical Diseases: A Renewed Effort to Combat Entrenched Communicable Diseases of the Poor.” Report Of An International Workshop, Berlin, April 18–20, 2005.
https://iris.who.int/handle/10665/69297.

Williamson, John. 2009. “A Short History of the Washington Consensus.” Law and Business Review of the Americas 15(1): 7–23.
https://scholar.smu.edu/lbra/vol15/iss1/3.

Zhou, Xiao-Nong, Robert Berguist, Remigio Olveda, and Jürg Utzinger. 2010. “Important Helminth Infections in Southeast Asia: Diversity and Potential for Control and Prospects for Elimination.” Advances in Parasitology 72:1–30.
https://doi.org/10.1016/S0065-308X(10)72001-7.

Notes

Clinically-controlled trials have a longer history, but the financial and organisational or institutional conditions became ripe for a convergence of disciplinary approaches in the physicians, statisticians, bacteriologists and radiologists converging as a multi-disciplinary team to test streptomycin against bed-rest for tuberculosis treatment (ibid.). ↑
This article will not concentrate directly on whether trust in RCTs is warranted or not – and this itself is wide-ranging, as Ayo Wahlberg and Linsey McGoey (2007) argue it is not only practical, technical, ethical, and moral concerns that matter but also practices of construction and governance. Notably, Angus Deaton and Nancy Cartwright (2018) have engaged extensively in the merits and limitations of RCTs. ↑
Referred to as the ‘father’ of ‘One Health’, combined biological and medical approaches to subjects like the tapeworm Echinococcus granulosus, which he treated like an animal worthy of study rather than simply a disease, and was not limited by discipline or conceptions of species relationships between human and non-human animals. ↑
The Global Burden of Disease study in 2010 has a measure combining years of life lost due to premature mortality and years lived with a disability from helminth infection – a total of 41.4 million DALYs (for soil-transmitted helminths, schistosomiasis, lymphatic filiriasis, and onchocerciasis) (Hotez et al. 2014). ↑
Neoliberal reforms reached their zenith through the Washington Consensus: coined in 1989 by English economist John Williamson, the Washington Consensus consisted of ten free-market promoting policies (or ‘prescriptions’) for economic development aimed at standard reform packages for developing countries in crisis, thought to have wide institutional agreement across international organisations. ↑
Different types of soil-transmitted helminths. ↑

Copyright, Citation, Contact

To cite this article: Vanderslott Samantha. 2024. “‘Worm Wars’: The Unravelling of the Randomised Control Trial Success Story.” Engaging Science, Technology, and Society 10(1–2): 235–261. https://doi.org/10.17351/ests2023.1469.

To email contact Samantha Vanderslott: samantha.vanderslott@paediatrics.ox.ac.uk.

Engaging Science, Technology, & Society