2 Issues involved in experiments
2.1 Degree of control in controlled experiments
In an ideal RCT, the participants are randomly allocated to the treatment and control groups. Randomisation ensures that the treated group is, on average, similar to the control group in all aspects. All individuals in the treatment group receive the treatment, and none in the control group do. Finally, treatment only affects the treatment group and has no impact on the control group. In practice, however, this may not be the case.
Randomisation does not guarantee that all individuals allocated to the treatment group will receive the treatment. Some may refuse it, and some may drop out during the experiment. This question is especially relevant when the ‘treatment’ is a service rather than, e.g., a financial subsidy. If individuals cannot be forced to participate, researchers often estimate the impact of offering it as in the chapter on recruitment subsidies by Einiö and Nivala. As noted in the chapter by Sarvimäki and Izadi, even if participation in an experiment is in principle mandatory, the take-up is often incomplete as in the Finnish pre-school experiment. In cases when allocation to treatment is decentralised, researchers have an additional problem and must make sure that all units follow the same protocol in assigning individuals to treatment and control groups. This might be challenging, for example, in experiments on employment services, where multiple agencies are responsible for implementing the experiment. This issue is discussed in Vikström’s chapter on ALMP experiments.
It may also be the case that individuals allocated to the control group receive treatment from another provider or receive something similar to the treatment. For example, in the case of ALMP, individuals allocated to the control group in an experiment on enhanced job search assistance may buy similar services from private providers, or in decentralised experiments, some offices may not follow the randomisation protocol, and some individuals in the control group may be treated. These implementation problems complicate interpretation of the results.
In many cases, the treatment may also affect the control group. This is a serious problem. The role of the control group is to estimate what would have happened to the treatment group if it had not been treated. If the treatment also affects the control group, it cannot serve this purpose. Vikström’s ALMP chapter discusses an important case in which these spillover effects may be significant: enhanced job search services. If some unemployed job seekers are offered an enhanced service, it may improve their chances of finding a new job, but it may also lower the chances of other job seekers.
Spillover effects are also discussed in the chapter on the Norwegian tax experiment. During the design phase, the researchers discussed whether to conduct the experiment at the individual or regional level. A regional analysis would have captured potential spillover effects, which, in this case, might stem from increased labour demand driven by higher purchasing power.
However, incomplete take-up, spillover effects, or alternative treatments for the control group do not invalidate the experimental approach, although they make the analysis more complicated. The chapters in this volume demonstrate how the researchers approach the issue and manage to uncover effects of treatment also in imperfect experiments.
2.2 Following participants over time
To evaluate experiments, researchers need to have outcome measures for both the treatment and control groups. Pre-treatment information is also often useful for validating the approach and obtaining more precise results. Data acquisition for experiments is costly, especially if the data are gathered by conducting surveys of potential participants. Ensuring high response rates can be expensive, and even when substantial resources are devoted to this, response rates may remain low, reducing the experiment’s statistical power and potentially biasing the results if non-responses are not random.
One advantage Nordic countries often have is access to the data needed for evaluation from statistical population registers. When this is the case, non-response problems are minimised, and the costs of gathering the relevant data are small compared to those of sending out surveys. Population registers also allow tracking individuals over long periods, which is rarely feasible with surveys.
Register data becomes particularly valuable when there is a long time lag before a response to the treatment or when the effects may fade over time. For example, in the Norwegian tax experiment discussed by Markussen and Bjørneby, the experiment lasts five years, and people may adjust their behaviour slowly. In the Finnish preschool experiment discussed by Izadi and Sarvimäki, the researchers plan to follow participants throughout their school careers and develop evaluation tools for use in their follow-up work.
2.3 Legal and ethical aspects
Experiments necessarily involve unequal treatment of potential participants. Allocation to the treatment group may provide direct financial benefits to participants or additional resources, e.g., in education or job search. Or the experiment may deny the control group access to these resources. Programmes that require compulsory participation may also require participants to act in ways they would not do voluntarily.
Unequal treatment is necessary to create variation in the treatment. However, it must be justified. Experimental research uses several approaches to address this problem.
The issues are probably least severe in experiments where participation is voluntary, and participants provide informed consent to the treatment. Most experimental medical studies rely on this principle. In the social sciences, prime examples are experiments that provide information and encourage participation in the treatment, particularly when resources are limited, and not all potential participants can receive treatment. Even in these cases, research designs typically need to be pre-approved by ethics committees, but ethical considerations rarely prevent such experiments.
Ethical issues become more important when the treatment group receives benefits from the public sector or faces mandates from public bodies. Even in these cases, unequal treatment may not be the most fundamental issue. Even in the absence of experiments, many public policies involve age limits, apply only in specific regions, or otherwise treat people differently. In experiments, the question is whether scientific research provides sufficient justification for unequal treatment.
As shown by the authors in this edition of NEPR, unequal treatment of control and treatment groups is possible as long as the goals of the experiment are sufficiently important and as long as the experiment does not cause disproportionate harm to the participants. In some cases, it is also necessary that the experiment is governed by law. Following these principles, specific legislation was introduced in Finland compelling randomly selected five-year-olds to participate in preschool (Izadi & Sarvimäki), and legislation granting tax exemptions to a randomly selected cohort of young people was introduced in Norway (Markussen & Bjørneby).
Legal mandates can also be useful in other ways than enabling the experiment. For example, the Finnish employee payroll tax experiment granted the researchers full access to data (Einiö & Nivala), and the Finnish preschool experiment made it compulsory for daycare centres and schools to test children’s skills (Izadi & Sarvimäki).
2.4 What would we know without experiments?
Given the cost of conducting experiments, it is natural to ask whether it would be possible to use non-experimental methods to conduct credible evaluations of policies. This discussion began in economics in 1986, when Robert Lalonde published his landmark paper, showing that the non-experimental methods available at the time could not reproduce the estimates obtained from large-scale experiments. Further studies have shown that better data (for example, similar measurements of the treatment and control groups) and more advanced methods help substantially reduce bias in non-experimental evaluations (Heckman et al. 1998, Smith and Todd 2005, etc.). A very recent paper shows that the most recent non-experimental methods can replicate experimental results. However, to enable causal interpretation, researchers should always assess the key assumptions (Imbens and Xu, 2025). Therefore, the answer to the question of whether non-experimental methods can be used to conduct credible evaluations of policies is that it depends on the quality of the data and on whether the treatment is more or less randomly assigned.
The chapters in this edition touch upon this question several times. The chapter on experiments in education by Nyhus Larsen, Noer Poulsen, Rosholm and Bønneland Tølbøll includes both experimental and non-experimental evaluations. It shows that, on average, the results are similar. This supports the idea that, at least in this setting, experimental and non-experimental approaches can be used to evaluate policies. On the other hand, Vikström’s ALMP chapter discusses an example in which experimental and non-experimental evaluations led to opposite conclusions about the effectiveness of a policy. In that case, decisions based on non-experimental evaluation would have led to the wrong policy being implemented. The chapter on the Norwegian tax experiment by Markussen and Bjørneby explains that the experiment was conducted precisely because the policy could not be evaluated by any other method than an experiment. Finally, the chapter on the recruitment subsidy by Einiö and Nivala shows that the experiment’s results were more precise than those of a non-experimental evaluation of an earlier recruitment subsidy. This case highlights another difference between experimental and non-experimental evaluations. When the experiment was designed, the way the subsidy was implemented was changed to improve take-up. Thus, the evaluation led to a rethink of how to implement the subsidy to maximize its impact.
2.5 What can the experimental approach be used for?
As noted above, in many cases, carefully conducted non-experimental research can yield results similar to those of experimental research. However, this always depends on the treatment’s selectivity and the opportunities to address selectivity issues through research design. Sometimes this is not possible, and in all cases, the credibility of non-experimental research depends on the credibility of its assumptions.
A randomised trial is an ideal way to solve selectivity issues. However, they also present problems. One important factor is publication bias. If journal editors or the researchers themselves prefer statistically significant results, it is possible that only large and significant findings will be published. This could easily lead to incorrect conclusions based on the cumulative empirical evidence.
Fortunately, there is a simple solution to this problem. Requiring that experiments be pre-registered before the results are known increases the likelihood that also insignificant findings are reported and reduces the incentive to run many experiments and report only those with the desired results.
Another issue is that the experiments are time-consuming. Designing, conducting, and monitoring the outcomes of an experiment takes years, and this often exceeds the patience of policymakers elected for a fixed term. In some cases, as discussed by Izadi and Sarvimäki in this edition, it may be possible to convince anxious policymakers to implement a pilot or, rather, a controlled experiment, instead of rolling out a full-scale reform. Unfortunately, the results of experiments do not always match the expectations policy designers had for their favourite programmes. Whether this is a positive result, saving money that would have been used for the policy, or a negative one sinking a great idea that would have been implemented without a disappointing experiment, probably depends on the observer.