Go to content
Nordic Economic Policy Review 2026

Comments on Johan Vikström: Evaluating Active Labour Market Policies Using Randomised Control Trials


Jukka Mattila
Johan Vikström has most definitely found an interesting subject – discussing the evaluation of labour market policy using randomised controlled trials (RCTs). The article explains why RCTs are useful for evaluating labour market policy, but its most interesting and arguably novel points concern the practical issues a researcher (or policy maker) faces in motivating, conducting, and implementing RCTs. Few, if any, economists contest the broad usefulness of RCTs, so it is perhaps not worth dwelling on this point for too long. The practical questions concerning experiments are more interesting, I think, also from a design point of view. Resource constraints and other practical limits mean that large-scale controlled experiments will probably always be a rare occurrence, but public officials, e.g., tax officers or employment services directors, often have sufficient operational leeway to conduct small-scale field experiments. As demonstrated by the highly fascinating Bjørvatn et al. (2021), an example cited by Vikström, such experiments might also yield important insights that could lead to significant improvements in the service concerned. As demonstrated by multiple examples, a culture of using RCTs to assess the impact of various programmes and actions, along with the researchers involved, would be a useful way of enhancing the effectiveness of the public sector. Why is it so difficult, then?
As Vikström notes, challenges in designing and implementing RCTs in practice are rarely documented in research outputs, even though there are many, ranging from ethical questions to design and effective implementation. Anyone who has attempted to facilitate or even managed an RCT is familiar with at least some of these issues. Experience in Finland has demonstrated that large-scale experiments are prone to various types of problems that can be difficult to foresee and sometimes to solve. Some of the issues faced by the Finnish basic income experiment are documented in Hämäläinen & Verho (2022) and range from the reluctance of the tax authority to take part in the trial to other policy changes coinciding with the experiment. The Finnish two-year preschool experiment has been criticised for spillover effects between the control and treatment groups due to a shortage of trained early education teachers and the municipalities moving qualified teachers from the control group to the treatment group. Large-scale experiments are expensive investments, with the budgets for both of the Finnish experiments just mentioned in the €20 million ballpark. It would be worth investing in avoiding mishaps in such cases. Some of the problems encountered might at least have been alleviated if shared experiences and good practices established by fellow researchers were more readily available. However, other problems persist.
Vikström lists many key subjects that deserve attention in the practical questions related to RCTs, and he provides a brief analysis of this broad topic. One of the more interesting points he makes is about institutions. In practice, it often seems that what really underlies an ethical argument against trials, for example, is not really about ethics, but about incentives – only wearing a disguise. Politicians might not have any motivation to explore whether ideological aims result in what they hoped (or claimed) they would – failure to achieve their goals might not be good for them. Civil servants dislike changing their routines and are reluctant to adopt new approaches. Vikström recounts the positive story of Sweden’s public employment services dedicating in-house researchers to support RCTs, but the Finnish experience has been mixed. In unfortunate cases, in-house researchers in public agencies may vehemently oppose such experiments. They, too, might be uninterested in finding out whether the service they may have spent decades developing would stand up to the scrutiny of a rigorous controlled trial. Or, as Vikström also points out, they might simply be committed to the idea that what they are doing is so good that a trial is not warranted. They might lack the capacity to conduct such trials and be concerned for their status. Ultimately, they might have more to lose than to gain: the proposed services might turn out to be insignificant or, in the worst-case scenario, even to be harmful. In such cases, ignorance can be bliss. Institutions can then, instead of advocating for RCTs, curl up in a hedgehog-like position and simply oppose them by any means at their disposal. The first objection usually concerns ethics.
The core issue is that incentives matter. In its quest for identification, contemporary microeconomics appreciates RCTs for their value as evidence, so researchers have a natural incentive to be interested in them. They produce good publications. Other key stakeholders, politicians and civil servants, might feel less incentivised. In the worst cases, they may even have incentives not to participate or support such a policy, because it would require extra effort on top of their Monday morning chores, after all. The public sector is not incentivised by market competition in the same way as the private sector. Some big tech companies have a reputation for not only recruiting microeconomists but also for a habit of running a lot of A/B-testing, a close cousin to RCTs, when developing services. Unlike a tech company that faces market competition, the public employment service agency will probably still be there tomorrow, regardless of whether its work has any impact, and legislation decrees that it will have customers for its services. It would seem to me that the key question of incentives is the real problem that undermines our capability to develop a culture of RCTs in the field of not only labour market policy but in the public sector in general – a culture that would probably serve us well by making public services more efficient and enhancing the wellbeing of the population.
It is also worth noting, as Vikström points out at the end, that even RCTs can have built-in problems and might not be the silver bullet people sometimes consider them to be. For example, controlled experiments (as opposed to natural ones) cannot always be blinded. As such, the participants – preschool teachers and the staff of public-sector employment agencies – may know that they are taking part in an experiment. Whether this affects their behaviour seems to be an open question, but placebo effects are well documented in other disciplines, such as medicine. If the simple act of participating in an interesting, possibly societally significant experiment that receives media attention is a motivating factor, it may raise concerns about external validity. To a policy maker or a civil servant, of course, the integrity of the internal validity of the experiment is somewhat less interesting if the external validity is compromised, because a lot of the value of an experiment lies in testing something with the interest of a full-scale rollout. A practical application of a policy can also probably never be a continuous randomised controlled experiment with a control group. If key incentives related to the experiment rely on for example a procurement contract that requires a control group (Pesola et al., 2025), it might be difficult to move from a trial to a full-scale service because the lack of a control group could ruin the incentive structure. This is not a criticism of Vikström. On the contrary, these points perhaps only underline the importance of the discussion Vikström initiates about developing better RCTs in practical environments. Carefully selecting the appropriate way to generate information is an important part of the researcher’s understanding.
Vikström’s article is an interesting and important starting point for a potentially fruitful and important discussion about RCTs. Society at large would benefit from an administrative culture that, as a matter of routine, seeks to improve the services it provides based on evidence from RCTs. Economics might also help us find ways to provide institutions with incentives to work together and to routinely design controlled trials alongside researchers as part of their normal work of improving services. The irony probably is, however, that a paper about institutions would perhaps have less value in the academic discipline right now than an experiment would, even though work on finding the ways to establish institutions that foster more of these kinds of trials would be crucial to conducting more of them. On this point, it is easy to agree with Vikström’s concluding remarks that working on improving institutions and on co-operation between researchers and policy institutions (and the civil servants involved) would be important going forward, not just for labour market policy, of course, but for a great many policy spheres related to the provision of public services.

References

Bjørvatn, K., Ekström, M., & Garcia Pires, A. J. (2021). Setting goals for keystone habits improves labor market prospects and life satisfaction for unemployed youth: Experimental evidence from Norway. Journal of Economic Behavior & Organization, 188, 1109–1123.
Hämäläinen, K., & Verho, J. (2022). Design and evaluation of the Finnish basic income experiment. National Tax Journal, 75(3), 563–592.
Pesola, H., Sarvimäki, M., & Virkola, T. (2025). Randomization as an incentive device: Evidence from public procurement of immigrant integration services. SSRN Working Paper.