Go to content

Nordic Economic Policy Review 2026

Large-Scale Randomised Experiments with Population Data


Elias Einiö and Annika Nivala

Abstract

This article features insights from the Finnish Recruitment Subsidy Experiment (Einiö et al., 2021; Einiö and Nivala, 2026), which we designed and implemented along with the Ministry of Economic Affairs and Employment of Finland. It belongs to a class of randomised controlled trials characterised by large, national-scale, costly treatments and imperfect take-up, which are typical features of many public policies. The experimental setting is based on data drawn from population registers covering all individuals, firms or other relevant units of observation. Data of this sort are becoming increasingly common in many countries, and are particularly accessible in the Nordic Region. We address econometric issues typical for these types of settings, including the management of over- and under-subscription, as well as the calculation of statistical power with multiple experimental waves. We also discuss practical issues relevant to running large-scale national randomised experiments involving a research team and several parties from the public administration.
JEL codes: H25, J38, L26, M51

1 Introduction

Randomised experiments are a common tool for estimating the causal impacts of policy interventions, yet there is great variation in terms of experimental settings. One factor that often limits the design of experiments is the cost of data collection. When data needs to be collected in multiple survey waves, both before and after the experiment, the costs increase with the number of waves and observations. In this article, we consider a specific class of experiments in which such limitations are minimal – namely, randomised controlled trials conducted in settings where data are available in population registers covering all individuals, firms or other relevant units of observation.
We address econometric topics that are relevant to large-scale experiments with register data, and offer insights from an experiment that we designed and implemented in conjunction with the Ministry of Economic Affairs and Employment of Finland: the Finnish Recruitment Subsidy Experiment (Einiö et al., 2021; Einiö and Nivala, 2026). The article is not intended to be a comprehensive guide to designing randomised experiments, but rather a discussion of specific issues that may arise in the implementation of large-scale randomised experiments that use register data and involve imperfect take-up and costly treatments (e.g., monetary subsidies). These are commonly occurring features in public support policies, whether they target individuals or firms, and arise in the context of national experiments on such instruments, conducted with the public administration.
The availability of population register data has several advantages, an important one being the elimination of data-collection costs. Population data also enable large-scale experiments that facilitate highly accurate estimates of treatment effects. The ready availability of data also helps with planning and implementing experiments. Importantly, population register data allows for an estimation of the effects of offering a treatment, which is essential in many public policy contexts in which individuals are not required to take up a subsidy, transfer or other form of public support.
In the economics literature, the effect of offering a treatment is often referred to as the intention-to-treat (ITT) effect. It is widely viewed as the single most important parameter for policy-making. The difference between ITT and the treatment effect on the treated units – which is often the focus and the relevant parameter, e.g., in medical studies – is that some units do not necessarily take up the treatment. For instance, a firm may not take up an investment subsidy if the subsidy does not make new investments profitable enough, or an individual may not take up a student allowance if it does not make studying the most beneficial long-term option. More generally, in many public policy settings, public expenditure on a unit – be it an individual or a firm – is determined by both eligibility and the unit’s own actions.
The importance of the intention-to-treat parameter increases with the share of units that do not take up the treatment, making it a key parameter of interest in many public policy contexts (e.g., Duflo et al., 2007). By estimating the ITT effect, it becomes possible to predict the effects of a policy when it is made available to a wider population – this effect arises from both the impact of the treatment on the treated and the proportion of individuals who take it up. For instance, an investment subsidy that doubles investment in a firm that takes up the treatment has little aggregate impact if only a few firms make use of it (e.g., if the subsidised investment is deemed insufficiently profitable). The ITT estimate recovers the average effect of offering a treatment at population level, which is often the primary concern for decision-making.
Experiments are almost always limited by budget constraints. It may be desirable, for multiple reasons, to use the whole of the budget. When conducting experiments with the public administration, the budget may be set as part of a political process, with the intention of spending it all. When the maximum number of treatments are distributed, the experiment’s ability to detect treatment effects with sufficient statistical precision typically improves compared to when a smaller number of treatments are distributed. In this article, we discuss the potential challenges that a fixed budget may introduce in settings where take-up is not perfect – a feature of many public policies – and not known in advance, along with strategies for managing over- and under-subscription to the treatment.
The second part of the article features lessons from the Finnish Recruitment Subsidy Experiment, which we conducted with the Finnish Ministry of Economic Affairs and Employment. The goal of the experiment was to evaluate whether such a subsidy is effective at inducing non-employer entrepreneurs to hire their first employees and to what extent this would spur growth among these firms. We discuss issues that are relevant in practice when running large-scale national randomised experiments involving a research team and several parties from the public administration. We describe the planning, timeline, implementation and evaluation of the experiment, as well as data management issues. We briefly discuss the results of the experiment and compare them to a previous observational study on a similar policy instrument. The final part of the article provides recommendations and concluding remarks.

2 Experiments with imperfect take-up and population data

In this section, we discuss some econometric issues specific to large-scale experiments with population data and imperfect take-up. Our practical example is the recruitment subsidy for first hires, which is conditional on becoming an employer or incurring positive wage costs. More generally, the setting we discuss corresponds to a randomised experiment in which the treatment group is offered a treatment (e.g., a monetary subsidy) that is conditional on taking an intended action. The agent is not forced to take the action, but the treatment offer increases their incentives to choose it. This is a typical treatment setting for many public policies, such as investment subsidies for firms or allowances for students. The control group may also choose to take the intended action (e.g., invest or study), even without being offered the additional incentive. Other examples of such policies are free vaccines, bonus pension coverage from additional years of work, and so forth.
The goal of the experiment is to assess the effect of the treatment not only on the likelihood of choosing the intended action, but also on other outcomes, whether intended or not.

2.1 Under- and over-subscription

Experiments are rarely performed without budget constraints. The budget, including the cost per treatment, determines the number of units to which the treatment can be granted. When the take-up rate is uncertain, two possible situations arise: (i) under-subscription, where fewer units take the treatment than allowed for in the budget; and (ii) over-subscription, where more units would take the treatment than it is possible to accommodate.
Suppose the researcher estimates the take-up rate at 5% – that is, 5% of firms in the treatment group would take up the subsidy. In addition, suppose that the budget can cover 10,000 treatments (e.g., a monetary subsidy). How many firms should be offered the treatment? This decision would be an easy one if there were no uncertainty about the take-up rate. However, because the rate needs to be estimated, there is a possibility that the actual take-up rate will be larger than the estimate and too many will accept the treatment (over-subscription); or the take-up rate could be lower, leading to fewer accepting the treatment than the budget can accommodate (under-subscription). Therefore, if the researcher prefers under-subscription to over-subscription, they should offer the subsidy to fewer firms than the estimate of the take-up rate would imply, and vice versa.
Whether a researcher prefers over- or under-subscription depends on the weight they give to problems arising from each of these scenarios. Under-subscription means that more units could have been offered the treatment, given the actual take-up rate. While this does not bias the experimental results, a smaller treatment group size reduces the statistical power of the experiment. Conversely, over-subscription means that some firms that would like to accept the treatment will not receive it. If those firms are somehow selected, this may lead to bias in the estimate of the treatment effect. We discuss the potential biases introduced by over-subscription in Section 2.4.
In light of the previous discussion, over-subscription is often a relatively bigger concern than under-subscription, as the former may lead to bias in the estimation. However, significant under-subscription is also problematic because it may lead to underpowered statistical testing, making the experiment less useful. As such, it is important to balance the trade-offs between over- and under-subscription.

2.2 Managing over-subscription: A multi-wave setting

In this section, we briefly discuss an approach to recovering the take-up rate to calculate the treatment group size – namely, using a waved experiment comprising at least two experimental waves. The aim of the first wave is to estimate the take-up rate. In this wave, the number of treatment offers is large enough to provide a sufficiently reliable estimate of the rate, but small enough that the risk of over-subscription is negligible. The minimum number of waves for this approach is two, but the larger the number, the more accurate the estimate of the targeted number of treatments. The trade-off, however, is that the duration of the experiment increases with the number of waves.
Once the take-up rate has been estimated in the first wave, the size of the second wave can be determined, based on a statistical calculation that takes into account the uncertainty of the first-wave take-up rate estimate. The aim is to avoid over-subscription without significantly reducing the experiment’s size and power. In this approach, the researcher decides the maximum risk of over-subscription that they are willing to accept. They can then calculate an adjusted value for the take-up rate that produces over-subscription with that probability. We discuss the details of this approach in the Technical Appendix.
Another approach that does not rely on estimating the take-up rate is to run multiple waves with small treatment and control group sizes, until the desired number of allocated treatments has been achieved. The practical problem with this approach is often time: in many policy interventions, individuals need a reasonably long time to respond and take up the treatment. Such an experiment would therefore take a long time to run, unless the response time can be very short. What is considered a reasonable time to respond depends, of course, on the type of treatment. In our example of firms considering hiring their first employee, planning and acquiring the required information may take some time.
A third approach is to randomise the applicant pool. While this type of design can identify the treatment effect of receiving the subsidy, it is not necessarily informative about population-level effects, because the procedure leads to a selected experimental population. For instance, some firms that would use the subsidy on offer to hire employees might not apply for it if receiving it is a lottery rather than a certainty. This type of selection might be particularly important for entrepreneurs at the margin of employer entry, whose response is an important component of the ITT effect.

2.3 Power in a multi-wave setting

When pre-planning RCTs, it is essential to calculate the experiment’s statistical precision. These calculations inform the planner of the degree of precision in the estimates of the expected treatment effects. The key factors that determine statistical power are the variation in the outcome variables, the size of treatment and control groups, and the level of randomisation.
For several types of single-wave RCTs, mathematical formulas for statistical power are available, and the calculations are straightforward (see, e.g., McConnell and Vera-Hernandez, 2015). For more complex settings, formulas may not be available. In such cases, Monte Carlo methods can be used to simulate the statistical power. In addition, simulations can be used to evaluate the magnitude and likelihood of over-subscription bias and are a useful tool for developing sampling algorithms.
The size of the target population can be a limiting factor for large-scale randomised experiments with a low take-up rate. If the take-up rate is low, very large treatment groups may be required in order to reach the desired number of units taking up the treatment. In some cases, the treatment group could be so large that it accounts for more than half of the target population, while the number of units left for the control group could be small. If the control group is too small, the precision of the experiment is reduced. In that case, rather than aiming to allocate as many subsidies as possible, the most effective design can be determined via methods that identify the optimal split into treatment and control groups (e.g., McConnell and Vera-Hernandez, 2015). 

2.4 Selection of treated units due to over-subscription

In this section, before proceeding to the practical example of the Finnish Recruitment Subsidy Experiment, we discuss the biases associated with over-subscription.
Over-subscription means that the treatment cannot be given to all units in the treatment group that want it. If those units that would have accepted but did not receive the treatment are systematically selected, this can introduce bias into the estimates. This can be a particular concern in settings in which the public administration grants treatment to treatment group units based solely on the order in which they applied, as is typical in public policy settings. For example, in a setting where the public administration stops granting treatments when the maximum number has been allocated, those who apply last will be left out. If those units had only taken the intended action (e.g., invest or study) after receiving the treatment, their exclusion from the group of units treated can lead to bias in the ITT estimate of the impact of offering the treatment on the incidence of taking the intended action, moving it in a downward direction.

3 Lessons from practice: The Finnish Recruitment Subsidy Experiment

The Ministry of Economic Affairs and Employment commissioned the Finnish recruitment subsidy experiment in 2022–2023. The experiment followed the approach described above: a subsidy was offered to the treatment group with ex ante uncertain, imperfect take-up, within a budget that allowed a maximum of 900 subsidies. The experiment was run in two waves, and the treatment group size was adjusted based on the take-up rate in the first wave.
All of the data needed to plan, implement and evaluate the experiment was pre-existing register data, with the exception of the Experiment Register, that was specifically created to run the experiment and administer subsidy payments. There was no need for extensive data collection outside of normal administrative processes. The key public register that enabled the experiment was the Incomes Register, which requires all employers to report wages paid to employees on a monthly basis.

3.1 Motivation

The economic motivation for the first employee subsidy stems from the idea that becoming an employer incurs some fixed costs or market frictions that hamper a firm’s growth. For example, a non-employer firm needs to learn about employer responsibilities and recruitment before hiring, and there may be uncertainty about the productivity of the first employees. These considerations lead to uncertainty about the profitability of hiring and costs before the firm receives any potential profits from the new employees’ input. These types of costs, including time spent on hiring and acquiring information on employer liabilities, can create barriers to becoming an employer. In such cases, a temporary subsidy for hiring could incentivise a firm to become a permanent employer and accelerate its long-term growth.
There is little evidence regarding the extent to which these factors limit firm growth. However, over half of all firms have no employees, indicating a large potential target group for a recruitment subsidy.

3.2 Planning process and timeline

A recruitment subsidy experiment aimed at encouraging hiring by non-employer firms was part of the 2019–2023 government programme. The Ministry began planning the experiment in early 2020 and commissioned us to design it (Einiö et al., 2021). The planning process took place in summer 2020. The Ministry then prepared to conduct the experiment and drew up a legal proposal. During this time, we did additional consulting work on running the experiment, including statistical power calculations based on register data, technical feedback on the proposed legislation, and drafting the letters sent to the treatment group of firms. Parliament passed the legislation on the Recruitment Subsidy Experiment in January 2022 (FINLEX, 2022).
As part of the planning process, we drew up an idealised proposal for the experiment. The Ministry took the time to respond to this proposal and set constraints based on feasibility. The final plan incorporated these feasibility constraints as well as suggestions by policy-makers. The plan struck a balance between the ideal experiment, which tested a recruitment subsidy to increase first hires, and administrative and political feasibility. The experiment and subsidy design were close to our initial proposal, but there were some changes, as discussed below.
We argued for randomisation at the level of the whole target population, to be identified by the tax authority on the basis of register data. This allowed us to estimate a population-wide intention-to-treat effect, the key parameter policy-makers use to decide whether to implement the policy on a larger scale. Another option discussed was randomisation from the applicant pool. As discussed in Section 2.2, while this type of design can identify the treatment effect of receiving the subsidy among the applicants, this treatment effect in itself is not necessarily informative about population-level effects.
When identifying the effects of offering subsidies, one issue to consider is whether they have equilibrium effects. These can lead to spillover effects on control firms, or other firms, which can make it difficult to disentangle the treatment effect. In addition, even if the effects of a smaller-scale pilot experiment are clearly identified, they may not reflect the effects of adopting the policy at full scale. The reason is that equilibrium effects may be more likely to arise in a full-scale policy due to a larger number of treated units. In the recruitment subsidy experiment, the subsidy cannot be expected to have significant equilibrium effects, at least in the short run, because first hires by non-employer firms account for a fairly small proportion of the labour market.
One key aspect of the subsidy design was to minimise application costs and bureaucracy for the entrepreneur. An increased administrative burden may lead firms to decide not to apply for subsidies, which can be a major reason for their ineffectiveness as an incentive. The experiment therefore made it as easy as possible for firms to apply for and be granted the subsidy. This both maximised effectiveness and ensured that the results reflect how well the subsidy reduces the barriers to hiring first employees, rather than frictions in take-up due to administrative complexity and costs.
We suggested that receiving the subsidy would be automatic, based on the firm’s reported wage costs in the Incomes Register, without the need for an application process. However, this was not feasible due to business subsidy regulations, which require firms to provide certain information in order to be granted a subsidy, including adherence to the EU de minimis rule. As a result, the firms in the treatment group that wanted the subsidy had to apply for it via a single, short online application. After this initial application, the subsidy was paid automatically based on the wage costs in the Incomes Register. We also suggested ruling out the hiring of family members with the subsidy, to reduce the possibility of misuse. However, this was not considered administratively feasible.
The experiment required a time limit for subsidy use. The initially suggested timeframe for hiring was one year, but it ended up being shorter, at five months, which may not have been long enough for some firms.
The experiment was facilitated by access to register data at all stages: planning, implementation and evaluation. We used firm tax return data from previous years to aid in planning the experiment, including selecting the target group. The target group was identified from the registries, and randomly allocated to treatment and control groups. Automatic subsidy payments (conditional on the initial application) were based on the Incomes Register, which allowed real-time payments to be made with very little lag with respect to realised wage costs and administrative costs for firms. Due to the ready availability of the register data, there was no need for extensive data collection, which enabled us to evaluate the effects quickly and at relatively low cost. 

3.3 Use of previous evidence in planning

The planning of experiments can benefit substantially from previous research on similar instruments and outcomes. Previous research can inform the researcher both about the distributions of outcomes and the expected take-up rates and effects of the instrument. This information can be very useful even if it shows no significant impacts, as it can help researchers design more effective treatments. Overall, previous evidence can help to inform both the design of the treatment and the details of the experiment.
For the recruitment subsidy experiment, one source of information that benefitted the planning was an earlier regional support instrument that encouraged non-employer firms to hire their first employees. Nivala (2024) studied the effects of this previous regional subsidy programme using a difference-in-differences design, which compared regions in which firms were eligible for the subsidy to regions in which similar firms were ineligible. In Section 3.10, we compare the results from the observational evaluation of the regional subsidy and the results from the RCT evaluation of the Recruitment Subsidy Experiment (Einiö and Nivala, 2026). Here we discuss how the experiences from the regional experiment informed the planning of the Recruitment Subsidy Experiment.
The previous regional first-employee subsidy programme was implemented in parts of Finland in 2007–2011. It targeted firms that had no employees for at least 12 months. To qualify, a firm had to hire an employee on a permanent full-time contract (defined as 25 hours per week). The subsidy covered 30% of wage costs in the first year and 15% in the second year. Firms had to apply for the subsidy before hiring and then apply for subsidy payments, which were paid out twice a year.
Nivala (2024) finds that the programme had no effect on the probability of hiring. A striking observation is that the regional subsidy had an extremely low take-up rate: only 2% among new employers. Low take-up rates can be due to scheme complexity, poor salience of the programme or high application costs (e.g., Bhargava and Manoli, 2015; Finkelstein and Notowidigdo, 2019). One key detail that may have discouraged firms from using the subsidy was the requirement to hire on a full-time, permanent contract. Additionally, application costs and payment delays may have reduced the desirability of taking up the subsidy.
While this prior evidence does not provide good estimates for a potential take-up rate in the RCT, it led us to think particularly carefully about which design features would improve take-up. It also helped convince the administration of the importance of introducing such features. As a result, significant efforts were made to ensure that the subsidy instrument was as simple and easy as possible. In the RCT, the subsidy was paid automatically, based on wage costs in the Incomes Register. In addition, letters were sent directly to the treatment group, notifying them that they were eligible for a subsidy.

3.4 The recruitment subsidy

In the randomised experiment, the recruitment subsidy was 50% of the firm’s total wage costs for the first 12 months, up to a maximum of €10,000. The target group included firms that: 1) had no employees (except the entrepreneur(s) themselves) in the previous 12 months; 2) had revenue of €15,000–1M in the previous year; and 3) had no tax debt, no ongoing bankruptcy filings, etc. The target group was identified based on tax registry data. The revenue constraint was introduced to increase the experiment’s statistical power: excluding extremely small and large firms reduced the standard errors of outcome variables, which allowed for greater precision in the estimates. The third restriction was required by other business subsidy regulations.
Each of the firms in the treatment group was sent a letter notifying them of their eligibility for the subsidy. To qualify, they had to hire their first employee within four months after the month of receiving the letter. The letters included an explanation of the subsidy, information on how to apply and referrals to additional information.
To apply, firms had to complete a simple electronic application form administered by TE Services of Southeast Finland (hereinafter TE Services), a public employment services office under the Ministry of Economic Affairs and Employment, which provides business and employment services and allocates support for a range of programmes. The firms also had to provide the social security number of the hired employee, either on the application or as an addendum after applying. If a firm hired an employee before applying, the wage costs after the application were eligible for the subsidy. A firm could be denied the subsidy if it was not in the treatment group, or if its total of de minimis subsidies exceeded the allowed ceiling.
Once the subsidy was granted, it was paid automatically each month, based on the wage costs reported in the Incomes Register.

3.5 Randomisation

We conducted a stratified random sampling from the target population to treatment and control groups in two waves (see Section 3.6 for details). Stratification means that randomisation takes place within subgroups of units (called strata), which improves the balance between treatment and control groups. This approach enables the treatment effect of the subsidy programme to be estimated using a simple regression framework that compares the treatment and control groups within strata. The aim of the randomisation is to ensure that the control and treatment groups are similar. The stratified sampling further balances the treatment and control groups based on the background characteristics that determine the strata. The randomisation was stratified by revenue quartiles within two-digit industries.
First, the tax administration identified the target population from tax registries and sent the list of firms and entrepreneurs to TE Services, which then ran our randomisation code written by us. When controlling for strata, the baseline characteristics of the treatment and control groups were found to be balanced – in other words, there were no statistically significant differences between the groups in terms of predetermined outcomes (Einiö and Nivala, 2026).

3.6 Waves

The experiment was conducted in two waves in order to manage the number of treatment-takers. The budget for the subsidy was fixed at €9 million, which meant 900 companies could be granted full subsidies. However, before the experiment began, the take-up rate in the target group was unknown. The fixed constraint on how many firms could be granted the subsidy, along with the ex ante uncertain take-up rate, raised a potential problem of over- or under-subscription, as discussed in Section 2.1. Conducting the experiment in two waves allowed us to manage the trade-off between these two.
The first, smaller wave allowed us to estimate the take-up rate. In the first wave, 3,500 firms were randomised into the treatment group, and 20,000 into the control group. The randomisation took place on 8 March 2022, letters to firms were sent by 14 March, and the firms had until the end of July to hire their first employee and apply for the subsidy.
The take-up rate in the first wave was estimated at 1.5%, which meant the subsidy could be offered to more than half of the target population. Hence, the sizes of the treatment and control groups in the second wave were set in such a way as to ensure an optimal split between them (e.g., McConnell and Vera-Hernandez, 2015). In the second wave, the treatment group size was set to 31,000 (rounded to the last hundred). In total, the treatment groups in both waves covered 34,500 firms, while the control groups in the rest of the target population totalled 38,771 firms.
The two-wave design had an important implication for the size of the treatment group. As a comparison, for a single-wave design, the statistical analysis and calculations in the original experimental plan provided a treatment group comprising 13,087 firms, which would have produced a significantly smaller number of treatment-takers and diminished the statistical power. This comparison demonstrates that wave design is a fundamental tool when running public policy experiments with incomplete take-up.
In the second wave, the randomisation was performed on 10 August 2022. At the beginning of August, the tax administration performed the same target group identification as in the first wave, but based on up-to-date data. The randomisation code excluded firms that were in the treatment or control groups in the first wave, and then randomised the rest of the target group. The letters were sent between 15 and 29 August. As the second-wave treatment group was substantially larger, it took somewhat longer for the TE Services to send out the letters.

3.7 Allocation of subsidies

TE Services administered the allocation of subsidies. This was done via a regional office that was responsible for Southeast Finland, but handled the treatment allocation throughout the whole country. The office was also responsible for communicating with the treatment group (e.g. the content of the letter informing the treatment group companies of their eligibility), which was planned together with the research team. This regional office was chosen because it had previously administered some temporary COVID subsidies, so it had experience of working on similar policies.

3.8 Data management

The authors, operating under the licence of their research institute (VATT), undertook the planning phase of the experiment based on access to firm tax data and accessed through Statistics Finland’s secure online research data environment (FIONA). For the legal planning and randomisation phase, the tax authority provided VATT data on the target population, along with the variables needed to design the randomisation algorithm. TE Services were provided the wider dataset needed to meet its legal requirements and carry out the sampling based on the randomisation program provided by the researchers. This data also formed the basis for the Experiment Register, administered by the KEHA centre, which provided IT services to TE Services. The experiment register included information on the target population, treatment status, randomisation strata and entrepreneurs, as well as decisions and payments for firms that applied for the subsidy. The legislation detailed the access to and use of the data for conducting the experiment.
In the analysis phase, the evaluator’s data access was secured by the legislation regarding the experiment. The tax authority provided outcome and background data to VATT, while the KEHA centre provided the experimental register, which was merged with the tax data. All data used in the analysis were pseudonymised. The evaluation used tax data from the Incomes Register, including monthly wage costs and employees, firm income tax returns, VAT returns and individual income tax returns. The extensive data allowed for an evaluation of the effects on outcomes at the firm, entrepreneur, and worker levels.

3.9 Evaluation

One important feature of the legislation was that it granted the evaluator access to all relevant register data held by public authorities to conduct the evaluation. It also outlined that the public authorities, including the tax authority, had to provide the data at no cost. This sped up the results.
Einiö and Nivala (2026) provide the results of the evaluation. According to their study, the subsidy had a positive and statistically significant effect on the proportion of firms that became employers. The proportion of firms that hire workers increased by 0.6 percentage points during the first six months (20% relative to the baseline) and 0.7 percentage points (11% from the baseline) over the whole subsidy period. They find effects of a similar magnitude on wage costs and the number of employees. Importantly, they find that the positive increase in the proportion of employers persists even after the end of the subsidy period, which suggests that the subsidy, while temporary, created permanent employers.
According to Einiö and Nivala (2026), the subsidy created 207 new employers and 138 permanent employers by the end of the observation period, making the direct subsidy cost of an employer and a permanent employer €18,200 and €27,300, respectively. In other words, every €1 in subsidies created €2.3 of labour input. In addition, according to their results, the average monthly number of employees increased by 0.006 during the subsidy period, corresponding to 207 individuals, which is equivalent to the number of new employers. The authors argue that because the subsidy had a permanent effect on hiring, it appears to have helped some non-employer firms overcome a recruitment threshold. However, while the subsidy had a statistically significant and, in relative terms, large effect on the incidence of hiring first employees and on employment, the overall employment effects are limited by the low take-up rate.
The study also finds that most of the hired employees were already employed, and therefore, the subsidy may have primarily increased the hiring of already employed people. If the subsidy merely reallocated individuals who would have been working regardless of the support, this may further reduce the effect on overall employment. Finally, the study does not show statistically significant effects on measures of firm growth other than employment, although the estimates for revenue and added value are positive, which is consistent with the positive employment effect.

3.10 Comparison of RCT results with previous work

In this section, we compare the recruitment subsidy experiment to the earlier regional first-employee subsidy programme. Table 1 summarises the treatments and key findings from these studies. The results differ greatly. The RCT had an economically and statistically significant effect on hiring probability, while the regional programme had no detectable effect. The take-up rate in the RCT is an order of magnitude higher than in the regional programme.
Both instruments covered wage costs. For the regional subsidy, the support rate was 30% in the first year of use, and 15% in the second. The recruitment subsidy support rate was 50% for one year. They are therefore comparable in size, but the recruitment subsidy is front-loaded compared to the earlier regional subsidy. The realised subsidy payments were of a similar magnitude for recipients. Hence, the support rate or financial benefits of the subsidy are unlikely to explain the differences in the results. In addition, in the RCT, the firms only had five months to hire their first employee to qualify for the treatment, compared to approximately four years in the regional programme. If anything, this should reduce the effect of the RCT compared to the regional programme.
The main difference between the results of the two studies is that the recruitment subsidy design led to a higher take-up rate among new employers, which in turn had a positive effect on becoming an employer. The main design differences are (i) flexibility of the employment contract in the RCT; (ii) lower administrative costs and timely subsidy payments in the RCT; and (iii) direct information letters sent to the treatment group in the RCT. Based on this evidence alone, it is impossible to disentangle which of these best explains the difference in the take-up rate. However, it is clear that when these design features are implemented together, as in the RCT, they can have a large impact on a programme’s effectiveness. Nivala (2024) argues that restricting the regional first-employee subsidy to full-time contracts and potential information frictions reduced take-up. In addition, the regional programme had higher administrative costs. It is likely that a combination of these factors explains the difference in take-up rates. 
Table 1. Comparison between the RCT and prior evaluation of the regional subsidy
 
 
Recruitment Subsidy Experiment 2022
Regional First-Employee Subsidy
2007-2011
Study
Einiö and Nivala (2026)
Nivala (2024)
Research method
Randomised controlled trial
Regional difference-in-differences
Subsidy rule
50% total wage costs for 12 months, max €10,000
30% first-employee wage costs for the first year, 15% second year
Time to take up
5 months
3.5 to 4.5 years
Eligibility
No employees for 12 months, treatment group
No employees for 12 months, in
eligible municipality,
hires for permanent contract, at least 25 hours/week
Administration
Simple online application, automatic monthly payments
Simple application, apply for payments after realised wage costs
twice a year
Information
Directly informed by letters
Spread through ELY Centres, firms may need to seek information
Estimated effect (std. err.)
0.006 (0.001)
−0.003 (0.005)
Effect relative to baseline
20%
−4%
Number of
targeted firms
34,500
c. 40,000
Take-up rate in target population
1.5%
0.2%
Take-up rate among new
employers
32%
1.9%
Average paid subsidy (EUR)
8,131
7,459
Note: Information and estimates gathered from referenced studies.

4 Recommendations

The parameter of interest to be estimated depends on the policy instrument and the research questions. Our general recommendation is to consider designing experiments intended to estimate the ITT effect, especially in cases of public policy interventions with incomplete take-up.
Planning is often needed before political decisions can be made about large-scale experiments. In such processes, it would be important to integrate, to a sufficient degree, research perspectives and analysis, e.g., when the policy design changes. In such situations, the experiment’s power to detect effects may change, and it is therefore important to recalculate the statistical properties of the experiment.
A significant and demanding part of an experimental project’s scientific work takes place during the design and pre-analysis phases, and therefore at this stage, it is advisable to avoid uncertainty regarding the continuation of the project. In the Recruitment Subsidy Experiment, the pre-planning, the scientific support provided during the preparation of the proposed legislation and the implementation of the treatment allocation, as well as the evaluation, were carried out in separate phases. This meant that the researchers experienced some uncertainty as to whether they would ultimately be able to evaluate the experiment. Our recommendation is that the planning and evaluation components of projects should be included as part of a single, comprehensive contract. This would allow the research side to be better integrated into the research programme in the long term. It would also create stronger incentives, as it ensures that the researchers can utilise the scientific outputs from their design and pre-analysis work.
One aspect not implemented in the Recruitment Subsidy Experiment was the collection of survey data to identify the factors that influence the low take-up rate for this type of support, and why some firms in the treatment group that hired employees chose not to use the subsidy. However, the project lacked sufficient resources to carry out such a survey. Our recommendation for future projects is that sufficient funding be allocated for surveys of this kind. It is important that such surveys take place after the end of the treatment period, including in projects where population registers are the primary data source.

5 Concluding remarks

The intended functioning of public policies is crucial for social and economic welfare. Randomised controlled trials are the gold standard for reliable assessments of the impacts of policy interventions. Yet governments often roll out new and expensive programmes with little evidence regarding their impacts, despite the fact that experimentation prior to full implementation provides significant opportunities (Muralidharan and Niehaus, 2017). The resulting lack of evidence on what works and what does not is therefore likely to reduce the effectiveness of aggregate public spending significantly.
The Recruitment Subsidy Experiment, as described in this article, studied how entrepreneurs respond to incentives for hiring their first employees. While there was a previous observational study on a similar policy instrument  (Nivala, 2024), the current experiment provided new insights. Contrary to the previous results, Einiö and Nivala (2026) find that the subsidy programme had a significant effect on hiring rates among non-employer firms. Together, these two studies show how policy design matters for effectiveness. Duflo (2020) describes field experiments as a practice of policy: one experiment reveals whether a treatment worked; researchers and practitioners use that evidence to redesign the next treatment; and the new experiment again increases our understanding of what works. The Finnish Recruitment Subsidy Experiment is one example of such a practice, which builds on previous evaluations to develop and test an improved type of policy design.
We discussed issues that are likely to arise in the evaluation of many typical public policies with costly treatments and incomplete take-up. As a practical example of such an evaluation, we discussed the Recruitment Subsidy Experiment. We hope that this study will inspire similar large-scale experiments aimed at assessing and improving the effectiveness of a wide variety of public policies.

References

Bhargava, S. and Manoli, D. (2015). Psychological frictions and the incomplete takeup of social benefits: Evidence from an IRS field experiment. American Economic Review, 105(11):3489–3529.
Duflo, E. (2020). Field Experiments and the Practice of Policy. American Economic Review, 110(7):1952–1973.
Duflo, E., Glennerster, R., and Kremer, M. (2007). Using randomization in development economics research: A toolkit. Handbook of Development Economics, 4:3895–3962.
Einiö, E. and Nivala, A. (2026). Making employers: The effects of supporting first hires in a large-scale randomized experiment. SSRN Working Paper no. 6063771.
Einiö, E., Nivala, A., and Nokso-Koivisto, O. (2021). Yksinyrittäjien rekrytointituen vaikutusten arviointi: Suunnitelma satunnaistetusta kontrolloidusta kokeesta. Työ- ja elinkeinoministeriön julkaisuja 2021:1.
Finkelstein, A. and Notowidigdo, M. J. (2019). Takeup and targeting: Experimental evidence from SNAP. The Quarterly Journal of Economics, 134(3):1505–1556.
FINLEX (2022). Laki rekrytointitukikokeilusta 20/2022. Helsinki, 14.1.2022.
McConnell, B. and Vera-Hernandez, M. (2015). Going beyond simple sample size calculations: A practitioner’s guide. IFS working paper w15/17, London.
Muralidharan, K. and Niehaus, P. (2017). Experimentation at scale. Journal of Economic Perspectives, 31(4):103–124.
Nivala, A. (2024). (No) Effects of Subsidizing the First Employee: Evidence of a Low Takeup Puzzle Among Firms. Working Papers 23, Finnish Centre of Excellence in Tax Systems Research.

Technical Appendix

Formally, the treatment group size for the second wave can be written as
r_2 = \hat{p}_1 + z_{\gamma}^{*}\frac{\hat{\sigma}_1}{\sqrt{n_1}}
(1)
where r2 is the adjusted take-up rate used to calculate the number of offers in the second wave, \hat{p}_1 and \hat{\sigma}_1 are the estimates of the take-up rate and its standard error from the first wave, z_{\gamma}^{*} is the (one-sided test) critical value of the standard normal distribution corresponding to the chosen risk level γ for over-subscription, and n1 is the number of observations in the first-wave treatment group. When over-subscription is to be avoided, this approach inflates the anticipated fraction of treatment-takers, lowering the number of treatment offers. For example, suppose that \hat{p}_1= 0.05, \hat{\sigma}_1 ≈ 0.2179, and n_1 = 10,000 (we note that for a binary variable, the standard error is a function of p in this setup: \sigma = \sqrt{p(1 - p)}). If the researcher is willing to accept a risk level of over-subscription of 5%, we have z_{0.05}^{*} \approx 1.6449 and r_2 \approx 0.05 + 1.6449 \cdot \frac{0.2179}{100} \approx 0.0536. With a budget of B2 left for the second wave and a cost per treatment of s, the second wave will offer the treatment to n2 = B2/(0.0536s) units.
One specific issue is how to determine the size of the first wave. The key threat arises from it being too large. However, in a large-scale setting, a relatively small first wave is typically sufficient to provide enough control to avoid over-subscription. Sometimes, previous studies or other existing measures can be helpful guides when setting the size of the first wave. However, this may be limited by the fact that even policies with similar goals may differ in design features that can have a considerable impact on take-up rates.
The size of the first wave can also be assessed with available register data on the incidence of taking the intended action, and by adjusting this information with an expected effect.