Nordic Economic Policy Review 2026

Advancing Evidence-Based Policy: Insights from Randomised Experiments in the Nordic Countries

Antti Kauhanen and Roope Uusitalo

1 Introduction to the issue

Randomised controlled trials (RCTs) are the gold standard of the scientific approach. The natural sciences have a long tradition of relying on experiments. Passing randomised trials conducted according to strict protocols is a prerequisite for launching new products. The trials must demonstrate that the positive effects of the new medicine are sufficiently large and that the potential side effects are sufficiently small compared to those of existing drugs.

By contrast, public policies can be implemented with high hopes but little evidence of their likely effects. To a degree, this lies at the heart of our democratic systems. Governments and parliaments make political decisions and rely on the support of voters, not on the support of scientific research. Voters have conflicting interests, and political decisions reflect the political strength of different groups of voters. Politics cannot be reduced to the implementation of scientifically tested treatments.

However, an increasing number of governments rely on, or at least claim to rely on, evidence-based policies. In an ideal world, political decision-makers would have unbiased and reliable information on the likely outcomes of potential policy choices and can base their actions on that information. However, such a shift places high demands on the quality of the evidence.

Over the past 30 years, economics has undergone a credibility revolution. Better research designs have improved the quality of the evidence and made the results more convincing. Better designs also enable researchers to analyse causal effects rather than only correlations.

The credibility revolution initially increased the use of “natural experiments“, i.e., situations in which the target population is divided into “treatment“ and “control“ groups without anyone actively allocating potential participants into these groups. A bit later, RCTs also became more commonplace in economics. The key difference is that random allocation to the treatment and control groups is monitored and often implemented by the evaluation team. RCTs have been used to evaluate planned reforms in active labour market policy (ALMP), education, health, and several other fields.

In a randomised trial, there are no systematic differences between the treatment and control groups. Any differences between groups after the treatment that exceed what would be expected to occur by chance can, therefore, be interpreted as the effect of the treatment. This makes evaluating effects simple, transparent, and convincing.

Even if evaluating the effects of experiments is easy, designing them is anything but. This is particularly true in policy experiments implemented in real-life settings outside the laboratory. Experiments involve addressing important issues, such as the design of the experiment, co-operation with organisations implementing the policies, the legal and ethical grounds for treating potential participants unequally, and the use of the results.

Researchers in all of the Nordic countries have actively used an experimental approach to evaluate various policies. The treatments that have been analysed range from “light“ ones involving sending encouragement letters to participants to massive multi-million budget experiments, changing the rules of the social security system for a randomly selected group, as in the Finnish basic income experiment, or changing the parameters in the tax system, as in the new Norwegian tax allowance experiment.

Researchers in Nordic countries have one major comparative advantage in conducting policy experiments: access to population-wide high-quality register data. Access to these data solves many issues related to the representativeness of the treatment and control groups. As far as outcomes from the experiments can be measured from administrative register data, the selectivity of the respondents to follow-up surveys generates much smaller risks of bias in the results. In some cases, the experiment can be conducted without even contacting the control group, which often has very low response rates due to minimal incentives to respond to surveys related to experiments in which they did not take part. In addition, data collection is often one of the most expensive parts of an experiment. Using administrative data means that the data-acquisition costs do not depend on the sample size, allowing much larger treatment and control groups and, hence, much more precise estimates.

In this issue of the Nordic Economic Policy Review, researchers who have conducted randomised policy experiments in the Nordic Region share their experiences. Hopefully, these experiences will be useful to those planning future experiments and to those using the experimental results to guide policy choices. In this introduction, we introduce some of the issues covered in later chapters.

2 Issues involved in experiments

2.1 Degree of control in controlled experiments

In an ideal RCT, the participants are randomly allocated to the treatment and control groups. Randomisation ensures that the treated group is, on average, similar to the control group in all aspects. All individuals in the treatment group receive the treatment, and none in the control group do. Finally, treatment only affects the treatment group and has no impact on the control group. In practice, however, this may not be the case.

Randomisation does not guarantee that all individuals allocated to the treatment group will receive the treatment. Some may refuse it, and some may drop out during the experiment. This question is especially relevant when the ‘treatment’ is a service rather than, e.g., a financial subsidy. If individuals cannot be forced to participate, researchers often estimate the impact of offering it as in the chapter on recruitment subsidies by Einiö and Nivala. As noted in the chapter by Sarvimäki and Izadi, even if participation in an experiment is in principle mandatory, the take-up is often incomplete as in the Finnish pre-school experiment. In cases when allocation to treatment is decentralised, researchers have an additional problem and must make sure that all units follow the same protocol in assigning individuals to treatment and control groups. This might be challenging, for example, in experiments on employment services, where multiple agencies are responsible for implementing the experiment. This issue is discussed in Vikström’s chapter on ALMP experiments.

It may also be the case that individuals allocated to the control group receive treatment from another provider or receive something similar to the treatment. For example, in the case of ALMP, individuals allocated to the control group in an experiment on enhanced job search assistance may buy similar services from private providers, or in decentralised experiments, some offices may not follow the randomisation protocol, and some individuals in the control group may be treated. These implementation problems complicate interpretation of the results.

In many cases, the treatment may also affect the control group. This is a serious problem. The role of the control group is to estimate what would have happened to the treatment group if it had not been treated. If the treatment also affects the control group, it cannot serve this purpose. Vikström’s ALMP chapter discusses an important case in which these spillover effects may be significant: enhanced job search services. If some unemployed job seekers are offered an enhanced service, it may improve their chances of finding a new job, but it may also lower the chances of other job seekers.

Spillover effects are also discussed in the chapter on the Norwegian tax experiment. During the design phase, the researchers discussed whether to conduct the experiment at the individual or regional level. A regional analysis would have captured potential spillover effects, which, in this case, might stem from increased labour demand driven by higher purchasing power.

However, incomplete take-up, spillover effects, or alternative treatments for the control group do not invalidate the experimental approach, although they make the analysis more complicated. The chapters in this volume demonstrate how the researchers approach the issue and manage to uncover effects of treatment also in imperfect experiments.

2.2 Following participants over time

To evaluate experiments, researchers need to have outcome measures for both the treatment and control groups. Pre-treatment information is also often useful for validating the approach and obtaining more precise results. Data acquisition for experiments is costly, especially if the data are gathered by conducting surveys of potential participants. Ensuring high response rates can be expensive, and even when substantial resources are devoted to this, response rates may remain low, reducing the experiment’s statistical power and potentially biasing the results if non-responses are not random.

One advantage Nordic countries often have is access to the data needed for evaluation from statistical population registers. When this is the case, non-response problems are minimised, and the costs of gathering the relevant data are small compared to those of sending out surveys. Population registers also allow tracking individuals over long periods, which is rarely feasible with surveys.

Register data becomes particularly valuable when there is a long time lag before a response to the treatment or when the effects may fade over time. For example, in the Norwegian tax experiment discussed by Markussen and Bjørneby, the experiment lasts five years, and people may adjust their behaviour slowly. In the Finnish preschool experiment discussed by Izadi and Sarvimäki, the researchers plan to follow participants throughout their school careers and develop evaluation tools for use in their follow-up work.

2.3 Legal and ethical aspects

Experiments necessarily involve unequal treatment of potential participants. Allocation to the treatment group may provide direct financial benefits to participants or additional resources, e.g., in education or job search. Or the experiment may deny the control group access to these resources. Programmes that require compulsory participation may also require participants to act in ways they would not do voluntarily.

Unequal treatment is necessary to create variation in the treatment. However, it must be justified. Experimental research uses several approaches to address this problem.

The issues are probably least severe in experiments where participation is voluntary, and participants provide informed consent to the treatment. Most experimental medical studies rely on this principle. In the social sciences, prime examples are experiments that provide information and encourage participation in the treatment, particularly when resources are limited, and not all potential participants can receive treatment. Even in these cases, research designs typically need to be pre-approved by ethics committees, but ethical considerations rarely prevent such experiments.

Ethical issues become more important when the treatment group receives benefits from the public sector or faces mandates from public bodies. Even in these cases, unequal treatment may not be the most fundamental issue. Even in the absence of experiments, many public policies involve age limits, apply only in specific regions, or otherwise treat people differently. In experiments, the question is whether scientific research provides sufficient justification for unequal treatment.

As shown by the authors in this edition of NEPR, unequal treatment of control and treatment groups is possible as long as the goals of the experiment are sufficiently important and as long as the experiment does not cause disproportionate harm to the participants. In some cases, it is also necessary that the experiment is governed by law. Following these principles, specific legislation was introduced in Finland compelling randomly selected five-year-olds to participate in preschool (Izadi & Sarvimäki), and legislation granting tax exemptions to a randomly selected cohort of young people was introduced in Norway (Markussen & Bjørneby).

Legal mandates can also be useful in other ways than enabling the experiment. For example, the Finnish employee payroll tax experiment granted the researchers full access to data (Einiö & Nivala), and the Finnish preschool experiment made it compulsory for daycare centres and schools to test children’s skills (Izadi & Sarvimäki).

2.4 What would we know without experiments?

Given the cost of conducting experiments, it is natural to ask whether it would be possible to use non-experimental methods to conduct credible evaluations of policies. This discussion began in economics in 1986, when Robert Lalonde published his landmark paper, showing that the non-experimental methods available at the time could not reproduce the estimates obtained from large-scale experiments. Further studies have shown that better data (for example, similar measurements of the treatment and control groups) and more advanced methods help substantially reduce bias in non-experimental evaluations (Heckman et al. 1998, Smith and Todd 2005, etc.). A very recent paper shows that the most recent non-experimental methods can replicate experimental results. However, to enable causal interpretation, researchers should always assess the key assumptions (Imbens and Xu, 2025). Therefore, the answer to the question of whether non-experimental methods can be used to conduct credible evaluations of policies is that it depends on the quality of the data and on whether the treatment is more or less randomly assigned.

The chapters in this edition touch upon this question several times. The chapter on experiments in education by Nyhus Larsen, Noer Poulsen, Rosholm and Bønneland Tølbøll includes both experimental and non-experimental evaluations. It shows that, on average, the results are similar. This supports the idea that, at least in this setting, experimental and non-experimental approaches can be used to evaluate policies. On the other hand, Vikström’s ALMP chapter discusses an example in which experimental and non-experimental evaluations led to opposite conclusions about the effectiveness of a policy. In that case, decisions based on non-experimental evaluation would have led to the wrong policy being implemented. The chapter on the Norwegian tax experiment by Markussen and Bjørneby explains that the experiment was conducted precisely because the policy could not be evaluated by any other method than an experiment. Finally, the chapter on the recruitment subsidy by Einiö and Nivala shows that the experiment’s results were more precise than those of a non-experimental evaluation of an earlier recruitment subsidy. This case highlights another difference between experimental and non-experimental evaluations. When the experiment was designed, the way the subsidy was implemented was changed to improve take-up. Thus, the evaluation led to a rethink of how to implement the subsidy to maximize its impact.

2.5 What can the experimental approach be used for?

As noted above, in many cases, carefully conducted non-experimental research can yield results similar to those of experimental research. However, this always depends on the treatment’s selectivity and the opportunities to address selectivity issues through research design. Sometimes this is not possible, and in all cases, the credibility of non-experimental research depends on the credibility of its assumptions.

A randomised trial is an ideal way to solve selectivity issues. However, they also present problems. One important factor is publication bias. If journal editors or the researchers themselves prefer statistically significant results, it is possible that only large and significant findings will be published. This could easily lead to incorrect conclusions based on the cumulative empirical evidence.

Fortunately, there is a simple solution to this problem. Requiring that experiments be pre-registered before the results are known increases the likelihood that also insignificant findings are reported and reduces the incentive to run many experiments and report only those with the desired results.

Another issue is that the experiments are time-consuming. Designing, conducting, and monitoring the outcomes of an experiment takes years, and this often exceeds the patience of policymakers elected for a fixed term. In some cases, as discussed by Izadi and Sarvimäki in this edition, it may be possible to convince anxious policymakers to implement a pilot or, rather, a controlled experiment, instead of rolling out a full-scale reform. Unfortunately, the results of experiments do not always match the expectations policy designers had for their favourite programmes. Whether this is a positive result, saving money that would have been used for the policy, or a negative one sinking a great idea that would have been implemented without a disappointing experiment, probably depends on the observer.

3 Description of the chapters

The chapter by Ramin Izadi and Matti Sarvimäki details the evaluation of the Finnish two-year preschool experiment. It involved approximately 35,000 children at more than 1,000 centres in 148 municipalities, facilitated by temporary legislation that made participation compulsory, specified randomisation, and authorised extensive data collection.

The treatment brought preschool forward by a year (from age six to age five), significantly altered learning environments by introducing more qualified teachers, more homogeneous age groups, and slightly more guided learning time, but leaving overall resources and schedules largely unchanged. A central contribution of the experiment is the data infrastructure: large-scale direct assessments of academic (numeracy, literacy) and teacher-rated socioemotional skills (task performance, social skills, emotional regulation, peer relationships) at ages 5–7 for over 30,000 children, expanding to nearly 60,000 at school entry, linked to rich administrative records covering children, families, teachers, peers, schools, and centres, with long-term follow-up until the early 2030s.

The results show that extending pre-school had sizable short-term gains in academic and socio-emotional skills for the children who were shifted from home care to centre-based pre-school. However, for the large majority of children, the alternative was standard daycare. For this group extending pre-school had no effect on skills measured at school entry. As a result, average effects at the population level were essentially zero.

The chapter by Simen Markussen and Marie Bjørneby presents a significant real-world Norwegian tax experiment aimed at generating evidence for policymaking. This experiment involves randomly granting an earned income tax allowance to approximately 100,000 young taxpayers – about 8% of the 20–35 age group – through their 2025 tax cards. The experiment started in January 2026, so no results are available yet. The chapter details the design choices, anticipated responses, ethical considerations, and implementation of the experiment on a national scale. Power analyses indicate that with 100,000 recipients, the effects on employment and earnings are estimated with precision. The chapter also explores ethical trade-offs, such as minor indirect costs to non-recipients. It discusses consent, noting that the ministry deemed an opt-out option infeasible before parliamentary approval, given the absence of negative risks to anyone. In addition, it addresses legal proportionality and reasonable justification for unequal treatment after dismissing non-randomised alternatives as insufficiently informative.

The chapter by Johan Vikström surveys how RCTs can credibly evaluate active labour market policies, explaining why non-experimental comparisons are vulnerable to selection and stressing RCTs as the gold standard for identifying “what works, for whom, and under what conditions.“ The discussion is structured around three examples: a Danish study of private providers, a Swedish “early meetings“ intervention, and a Norwegian programme focused on a light-touch habits programme. These trials are used to offer practical advice on ethics, randomisation designs, pre-analysis plans, implementation challenges, and maintaining research independence. A key message is that random assignment can be as ethical as any alternative when resources are scarce, provided transparency and safeguards are in place. The chapter then turns to measurement and analysis, highlighting the strengths of administrative data (e.g., reporting incentives and shifting responsibilities), the importance of studying take-up, the value of boosting compliance, and the use of objective outcome and cost data. Finally, the chapter urges designers to consider seriously general equilibrium responses and discusses the typical sources of equilibrium effects and how they can be studied using two-level randomisation.

The chapter by Elias Einiö and Annika Nivala examines Finland‘s Recruitment Subsidy Experiment, shedding light on its implications for other large-scale randomised national experiments that involve population registers and costly, imperfectly adopted treatments.

This chapter underscores the primary design challenge of balancing fixed budgets with uncertain take-up, detailing the trade-offs between under-subscription, which mainly results in a loss of statistical power, and over-subscription, which can introduce bias. It proposes a staged strategy to estimate take-up with minimal risk of over-subscription and then scale offers by explicitly managing the tolerated probability of over-subscription using the sampling distribution of the estimated take-up rate. The chapter also discusses how population register data can eliminate survey costs, enhance planning and implementation, and enable precise estimation of the effects of offering treatment on a national scale.

The results indicate that recruitment subsidy had a significant effect on hiring, although its total employment effect was rather modest, at 207 jobs. Compared to previous non-experimental evaluation, the experiment managed to induce much higher take-up rate and more precise estimates and hence, potentially different conclusions on effectiveness of the policy.

The chapter by Stine Nyhus Larsen, Nikolaj Noer Poulsen, Michael Rosholm, and Katrine Bønneland Tølbøll synthesises a decade of experimental and quasi-experimental intervention research by TrygFonden’s Centre for Child Research in Denmark to estimate learning outcomes and explain the variation in effects across programmes and contexts. The authors report modest yet positive average improvements in outcomes, with more significant impacts observed for preschool-age targets, daycare settings, educator-focused delivery, and an emphasis on process quality. The average effects do not differ between the RCTs and quasi-experiments.

This study updates and extends Rosholm et al. (2021), incorporating more RCTs and all available non-RCT causal studies from the same research centre, and replicates the pattern that earlier investments yield larger returns, consistent with the so-called Heckman curve. While these design- and context-sensitive insights are valuable for policy, the authors are transparent about the following limitations: short-run outcomes, potential cost measurement errors, clustering and spillovers, and generalisability, primarily to the Scandinavian welfare context. In addition, the focus on one centre – chosen to ensure methodological and contextual homogeneity and rich cost information – necessarily excludes other Nordic interventions, potentially leading to a skewed picture of regional effectiveness.

Taken together, the chapters in this volume illustrate both the promise and the complexity of using randomised experiments to inform public policy in the Nordic countries. They show that high-quality administrative data, careful design, and close collaboration between researchers and public institutions can generate exceptionally credible evidence, while also highlighting the ethical, legal, and practical challenges inherent in experimenting with real-world policies. As governments increasingly look for solid evidence to guide decisions, the lessons from these studies provide practical help for designing better trials, understanding their results, and using experimentation more routinely in policymaking. We hope this collection not only contributes to current discussions but also encourages Nordic policymakers and researchers to keep promoting a rigorous, transparent, and learning‑oriented approach to public policy.

References

Heckman, J. J., Ichimura, H., Smith, J. A., & Todd, P. E. (1998). Characterizing selection bias using experimental data. Econometrica, 66(5), 1017–1098.

Imbens, G. W., & Xu, Y. (2025). Comparing experimental and non-experimental methods: What lessons have we learned four decades after LaLonde (1986)? Journal of Economic Perspectives, 39(4), 173–202.

LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4), 604–620.

Rosholm, M., Paul, A., Bleses, D., Højen, A., Dale, P. S., Jensen, P., Justice, L. M., Svarer, M., & Andersen, S. C. (2021). Are impacts of early interventions in the Scandinavian welfare state consistent with a Heckman curve? A meta-analysis. Journal of Economic Surveys, 35(1), 106–140.

Smith, J. A., & Todd, P. E. (2005). Does matching overcome LaLonde’s critique of non-experimental estimators? Journal of Econometrics, 125(1), 305–353.