An introduction to the synthetic control method in R

Statistics
R
Author

Celâl Güney

Published

October 7, 2025

Comparative politics and comparative political economy rely extensively on comparative case studies. One can assuredly say that the beginnings of comparative politics were mostly based on qualitative research based on a close examination and comparison of a small sample of countries. This includes the founding works of Montesquieu De l’esprit des lois (1748) and almost a century later of Tocqueville De la démocratie en Amérique (1835). Despite the increasing use of statistics in social sciences during the 19th and 20th centuries, the most notable comparative case studies were still mostly based on a qualitative approach. One can refer for instance to Katzenstein (1985) and Shonfield (1974) who used statistics but in a descriptive way. Quantitative methods will nonetheless become predominant in comparative case studies during the 90s-2000s and the use of factor analysis to identify different types of socio-economic and political systems. Lijphart (2012) famously tried to find quantitative support for his typology of democracies; Esping-Andersen (1990) for his different types of welfare regimes and Amable (2003) for his typology of capitalisms.

But once different socio-economic and political regimes are identified at one point in time, the question of change remains and more specifically the question of how a given policy or reform would impact these regimes. To tackle the question of change, researchers became increasingly interested in policy impact evaluation through causal analysis, which includes instrumental variable regression, regression discontinuity design and difference in differences (DiD). The latter is famous and widely used for its simplicity because it can be used in a wide variety of case studies involving cross-section and panel data. However, one problem with DiD is its parallel trend assumption which is difficult to hold in practice. Comparative case studies using a DiD approach very often try to find very similar cases in terms of trends and characteristics to hold this assumption. But what if, instead of comparing our case of interest with one of several quasi-counterfactuals, one could construct an artificial counterfactual by synthetizing core information from many other cases ?

This is the purpose of the synthetic control method which, according to its main creator Alberto Abadie, is a useful method to complement and facilitate qualitative analysis in comparative case studies (Abadie, Diamond, and Hainmueller 2015). Another interesting property of this method is that it is designed to study the very specific case in which a policy, reform of any “intervention” happens in the case of interest but not in the comparison units.

The core ideas of the synthetic control method are the following1:

We face the fundamental problem of causal inference in the context of an intervention/event/treatment which happened to a single or small number of units at an aggregate level (countries, regions or cities), the causal effect that we want to estimate is:

\[ \tau_{ij} = Y_{1t}^I - Y_{1t}^N \tag{1}\]

With \(Y_{1t}^I\) the outcome of interest for the treated unit \(1\) when treated at time t and \(Y_{1t}^N\) the outcome if it would not have been treated. Since \(Y_{1t}^N\) is not observed, the synthetic control method proposes to construct a synthetic index from \(j = 2, 3,..., J+1\) units which have not been treated (the “donor pool”) with a set of predictors of \(Y_{it}\), the outcome of interest.

The synthetic control is defined as a weighted average of the units in the donor pool:

\[ \hat{Y}_{1t}^N = \sum^{j+1}_{j=2}w_j Y_{jt} \tag{2}\]

With \(w_j\) the weights which are restricted to sum to one and to be nonnegative. The weights are chosen so that the synthetic index best resembles the pre-intervention values for the treated unit of predictors of the outcome variable. Alberto Abadie’s solution is to choose the weights that minimize

\[ \lVert X_1 - X_0 W\rVert V \tag{3}\] With \(X_1\) the covariate matrix for treated unit 1, \(X_0\) the covariate matrix for untreated units and \(W = (w_2, ..., w_{j+1})\) the weight vector.

An important point is that Equation 3 implies another set of weights, \(V\) which is a \(k \times k\) diagonal matrix (for k predictors) with the diagonal elements being predictors weights, that is, weights which capture the importance of the predictors. In the synth package, \(V\) is chosen such that the mean square error of the synthetic control estimator is minimized.

To sum up:

I will now show how the synthetic control method can be run in R by comparing three different packages Synth, tidysynth and synthdid. I will reproduce Abadie, Diamond, and Hainmueller (2015) ’s results on the negative effects of German reunification on West Germany’s economic growth.

The data used are panel data for West Germany and 16 other countries such as Switzerland, the UK, USA and France. The data set can be downloaded here.

#install.packages("Synth")

library(haven) # to import the dta file
library(Synth)
library(tidyverse)
library(sjlabelled)
library(gt)

setwd("~/Documents/R training/synthetic_control")
data <- read_dta("repgermany.dta")
country = unique(as_label(data$country)) # save the country names for later
glimpse(data) # have a quick look at the date
Rows: 748
Columns: 10
$ year      <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, …
$ gdp       <dbl> 2373, 2346, 2539, 2717, 2873, 2973, 3230, 3404, 3788, 4162, …
$ infrate   <dbl> NA, 2.5121319, -0.2784740, 0.5305781, 2.3611109, 3.9620080, …
$ trade     <dbl> 31.26644, 29.82107, 30.15854, 32.05540, 31.52728, 31.03560, …
$ schooling <dbl> 50.4, NA, NA, NA, NA, 48.9, NA, NA, NA, NA, 48.3, NA, NA, NA…
$ invest60  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ invest70  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ invest80  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ industry  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 37.29359, 36.800…
$ country   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

The repgermany data set contains 10 variables for 17 countries including West Germany. The variables needed to conduct a synthetic control analysis are a time index, the outcome of interest, a set of countries (or any aggregated entities) and a set of variables which will be used to construct the synthetic control. Here, the time frame are the years spanning from 1960 to 2003, which is good because the synthetic control method need to go back several years before the time of intervention. gdp, the real gdp per capita at purchasing power parity, is the outcome of interest; infrate is the inflation rate; trade is exports plus imports as percentage of gdp (a standard measure of trade openness); schooling the percentage of secondary school attained in the total population aged 25 and older; the set of invest variables a measure of investment rate and industry the industry share of value added2.

Synthetic control with the Synth package

Running synthetic control analysis with the Synth package is a bit tedious. There are two main functions: the dataprep() function which transform the data into a suitable format for the synth() function which will run the computation. It is important to note that the dataframe must be transformed into a dataframe object with the function as.data.frame().

The dataprep function takes a lot of arguments and some of them are not easy to grasp at first sight. predictors takes the variables that we want for the whole computation, including the dependent variable which needs also to be specified with dependent. special.predictors is for the variables that we want only for specific periods. treatment.identifier takes the value of the country of interest, here West Germany, which is coded as 17 in the country variable. control.identifier does the same for the comparison units.

The time of intervention is included through the time.optimize.ssr and time.predictors.prior arguments by specifying 1960:1990. This means that the function synth will construct the synthetic West Germany for the period 1960-1990.

data <- as.data.frame(data)
data$country <- as.numeric(data$country)

dataprep.out <-
  Synth::dataprep(foo = data,
           predictors    = c("gdp","trade","infrate"),
           predictors.op = "mean",
           time.predictors.prior = 1960:1990,
           dependent     = "gdp",
           unit.variable = 10,
           special.predictors = list(
             list("industry" ,1981:1990, c("mean")),
             list("schooling",c(1980,1985), c("mean")),
             list("invest80" ,1980, c("mean"))),
           time.variable = "year",
           treatment.identifier = 17,
           controls.identifier = unique(data$country)[-17],
           time.optimize.ssr = 1960:1990,
           time.plot = 1960:2003
  )

synth.out = synth(dataprep.out)

The dataprep function is the main drawback of the Synth package because the specification of the country of interest and the comparisons units is very buggy: the arguments only accept numerical variables and even labelled numerical variables need to be transformed into a strictly numeric. Once an object is created through this function, it can be passed through synth to run the computation, which returns a list with 6 objects.

The first object are the \(v\) weights for the predictors:

synth.out[["solution.v"]] %>% 
  gt()
gdp trade infrate special.industry.1981.1990 special.schooling.1980.1985 special.invest80.1980
0.1145228 0.0007370546 2.613284e-10 0.4284261 0.3161824 0.1401317

We can see that GDP accounts for most of the weights (0.817).

The country-weights are:

v = cbind(Country = as_character(country[-17]), synth.out[["solution.w"]])
v %>% 
  as.data.frame() %>% 
  gt()
Country w.weight
Australia 0.00493616648877229
Austria 0.494464588824856
Belgium 0.00374765056464507
Denmark 0.00447848121462468
France 0.00413526136545093
Greece 0.00290062920746648
Italy 0.00407317677505729
Japan 0.085554943108393
Netherlands 0.00453674236178326
New Zealand 0.00402326577624126
Norway 0.00463325326047622
Portugal 0.00213694831684883
Spain 0.00244940221147909
Switzerland 0.180458433997324
UK 0.0490710197789965
USA 0.148400036710521

Checking the country weights is a particularly interesting feature of the synthetic control method, because these weights allow to see which countries are the best comparison units to our treated country. As a result, the synthetic control method can be useful to identify countries which best resemble our country of interest. Here, we can see that Austria captures almost half of the weights, Switzerland around 0.189 and the USA 0.14, whereas the remaining weights are very low.

Now that we have the predictors-weights and the country-weights, we can compute the synthetic West Germany for the period 1960-2003.

With the information stored in dataprep.out and synth.out, we can now compute the quantities of interest. The synthetic index can be computed by multiplying dataprep.out$Y0plot, the gdp per capita of untreated countries, with synth.out$solution.w, the country-weights. We are here computing \(\hat{Y}_{1t}^N = \sum^{j+1}_{j=2}w_j Y_{jt}\), our estimator of the counterfactual:

synthetic_west_germany = dataprep.out$Y0plot %*% synth.out$solution.w

We can now compare our synthetic West Germany with actual West Germany by plotting the actual gdp of West Germany with its synthetic counterpart:

data2 <- synthetic_west_germany %>% 
  as_tibble() %>% 
  rename(synthetic_west_germany = "w.weight") %>% 
  mutate(
    west_germany = data$gdp[data$country == 17],
    year = 1960:2003,
    gap = west_germany - synthetic_west_germany
  )


data2 %>%
  ggplot()+
  aes(x = year, y = west_germany, color = "Actual West Germany (with reunification at 1990)")+
  geom_line()+
  geom_line(aes(y = synthetic_west_germany, color = "Synthetic West Germany"))+
  geom_vline(xintercept = 1990)+
  theme_minimal()+
  labs(x = "", y = "GDP per Capita")+
  theme(legend.position = c(0.3, 0.7),
        legend.title = element_blank())

We can see that before reunification happened in 1990, the line for synthetic West Germany closely follows the line for actual West Germany, meaning that the synthetic index is a good approximation. After 1990, we can see that reunificated West Germany performed better than not-reunificated West Germany, but not for long as synthetic West Germany then grows faster than reunificated West Germany.

The synth package offers a built-in function to plot the exact same graph, but in base R:

path.plot(synth.res = synth.out, dataprep.res = dataprep.out,
          Ylab = "Per capita GDP (PPP, 2002 USD)", Xlab = "Year",
          Ylim = c(0, 35000), Legend = c("West Germany","Synthetic West Germany"), Legend.position = "bottomright",
          tr.intake = 1990)

One can also plot the gaps between the two lines to have a better view of the causal effect:

data2 %>% 
  ggplot(aes(x = year, y = gap))+
  geom_line()+
  geom_hline(yintercept = 0, alpha = 0.5)+
  geom_vline(xintercept = 1990, alpha = 0.5)+
  theme_minimal()+
  labs(x = "", y = "Gap")

Placebo and inferential tests

As any models, the synthetic control method comes with several important assumptions that need to be checked. The most important one is that the intervention/event/treatment was not anticipated. This assumption can be checked with an “in-time placebo” check by re-assigning the treatment in a period before 1990. This can be done simply by changing the time.prior.predictor and time.optimize.ssr arguments in the dataprep function:

dataprep.out2 <-
  Synth::dataprep(foo = data,
           predictors    = c("gdp","trade","infrate"),
           predictors.op = "mean",
           time.predictors.prior = 1975:1990,
           dependent     = "gdp",
           unit.variable = 10,
           special.predictors = list(
             list("industry" ,1981:1990, c("mean")),
             list("schooling",c(1980,1985), c("mean")),
             list("invest80" ,1980, c("mean"))),
           time.variable = "year",
           treatment.identifier = 17,
           controls.identifier = unique(data$country)[-17],
           time.optimize.ssr = 1975:1990,
           time.plot = 1960:2003
  )

synth.out2 = synth(dataprep.out2)
synthetic_west_germany2 = dataprep.out2$Y0plot %*% synth.out2$solution.w

data3 <- synthetic_west_germany2 %>% 
  as_tibble() %>% 
  rename(synthetic_west_germany = "w.weight") %>% 
  mutate(
    west_germany = data$gdp[data$country == 17],
    year = 1960:2003,
    gap = west_germany - synthetic_west_germany
  )


data3 %>%
  ggplot()+
  aes(x = year, y = west_germany, color = "Actual West Germany")+
  geom_line()+
  geom_line(aes(y = synthetic_west_germany, color = "Synthetic West Germany"))+
  geom_vline(xintercept = 1990)+
  geom_vline(xintercept = 1975)+
  theme_minimal()+
  labs(x = "", y = "GDP per Capita",
       title = "Placebo test with treatment starting at 1975")+
  theme(legend.position = c(0.1, 0.7),
        legend.title = element_blank())

We can see that the estimation is very robust to the in-time robustness check as re-assigning the treatment to 1975 does not change the overall picture at all.

Another way to test for the significance and robustness of the estimates is a cross-entity placebo test. This is essentially a permutation test in which we run the same analysis by assigning the same treatment to each country in the sample. If the results from these permutations create gaps of similar magnitude than for our treated unit, this implies that our result for West Germany is not really significant.

It is however here that the Synth package reaches its limits, because it does not include any function to do this directly. Here is how to do it manually through a for loop which assigns the treatment for each country and collects the gaps.

gaps = list()
for (i in 1:17) {
  dataprep.out <-
  Synth::dataprep(foo = data,
           predictors    = c("gdp","trade","infrate"),
           predictors.op = "mean",
           time.predictors.prior = 1960:1990,
           dependent     = "gdp",
           unit.variable = 10,
           special.predictors = list(
             list("industry" ,1981:1990, c("mean")),
             list("schooling",c(1980,1985), c("mean")),
             list("invest80" ,1980, c("mean"))),
           time.variable = "year",
           treatment.identifier = i,
           controls.identifier = unique(data$country)[-i],
           time.optimize.ssr = 1960:1990,
           time.plot = 1960:2003
  )

synth.out = synth(dataprep.out)

gaps[[i]] = dataprep.out$Y1plot - dataprep.out$Y0plot %*% synth.out$solution.w


}

# transform the gaps list into a (tibble) dataframe, also adding the years and transforming into long format for ggplot
gaps2 = as_tibble(gaps, .name_repair = c("unique_quiet")) %>% 
  mutate(year = 1960:2003) %>% 
  pivot_longer(cols = 1:17, names_to = "country", values_to = "gaps") 
library(gghighlight) # use gghighlight to highlight West Germany
gaps2 %>% 
  ggplot(aes(x = year, y = gaps, group = country, colour = country))+
  geom_line()+
  gghighlight(country %in% c("...17", "...11"))+
  theme_minimal()+
  labs(x = "", y = "Gaps")+
  geom_vline(xintercept = 1990)

The gaps plot indicate that there is substantial variation in gaps across countries and that West Germany (coded as 17 and visible in blue) does not seem to be an outlier. Several other countries even display higher gaps. The gaps plot can be useful to detect outliers that can be removed from the analysis to have better comparison units. Here, we could for instance remove Norway (coded as 11).

Other robustness and significance checks such as the leave-one-out procedure are also available. But since we have reached the limits of what the Synth package can do without with built-in function, I will show next how synthetic control can be done more easily using the tidysynth package.

Tidysynth

Tidysynth is essentially an improvement over Synth and makes synthetic control analysis far more simple in R. It includes more built-in function which are compatible for instance with ggplot and direclty compute placebo and inferential tests. The functions are also far more intuitive and the main computation can be done in a single workflow:

#install.packages("tidysynth")

library(tidysynth)
library(haven)
library(tidyverse)
library(sjlabelled)

data <- read_dta("repgermany.dta")

# since the data are in stata format, they need a little bit of transformation, especially for the categorical variables. We need to convert the "country" column coded in number into a character variable with the actual names:
data$country = as_character(data$country)


# Now we can run the synthetic control:

results_tidysynth <-
  data %>%
  # Construct the synthetic control object
  synthetic_control(outcome = gdp,
                    unit = country,
                    time = year,
                    i_unit = "West Germany",
                    i_time = 1990,
                    generate_placebos = TRUE) %>%
  
  # Defining the predictors that will be constructing the weights
  generate_predictor(time_window=1960:1990,
                     gdp = mean(gdp),
                     infrate = mean(infrate, na.rm = TRUE),
                     trade = mean(trade, na.rm = TRUE)) %>%
  
  generate_predictor(time_window=c(1970, 1975),
                     schooling = mean(schooling, na.rm = TRUE)) %>% 
  
  generate_predictor(time_window = 1980,
                     invest80 = mean(invest80)) %>% 
  
  generate_predictor(time_window = 1971:1975,
                     industry = mean(industry)) %>% 
  
  
  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = 1960:1990) %>%
  
  # Generate the synthetic control
  generate_control()

Tidysynth offers many plot functions. For instance plot_trends and plot_difference which produce the same plots as above:

results_tidysynth %>% 
  plot_trends()

results_tidysynth %>% 
  plot_differences()

In addition, we can also plot the weights:

results_tidysynth %>% 
  plot_weights()

The attentive reader will notice the difference in weights compared with the Synth package. The different results between weights calculated across packages is recurrent in synthetic control analysis. Even though tidysynth is built on Synth, there are still some small differences in some default parameters, especially regarding the optimizer algorithm for the weights. However, this difference does not dramatically impact the overall results.

The cross-country placebo test, which took a lot of coding with the synth package can now be done in a single line with plot_placebos():

results_tidysynth %>% 
  plot_placebos(prune = FALSE)

We can also plot the ratio between post and pre-treatment mean squared predictive errors (MSPE). This ratio captures the difference between pre-intervention fit and post-intervention divergence of the trend. A higher ratio for the treated unit compared to untreated units is a good sign, because it implies that the difference between the actual and synthetic outcomes is higher for the treated unit.

results_tidysynth %>% 
  plot_mspe_ratio()

With grab_significance(), we can have a table with the MSPEs and other statistics such as the Fisher exact p-value, which is computed by ranking the ratios for the treated and placebo units and dividing the rank over the total. This test measures how often placebo effects are as large as the observed treatment effect.

results_tidysynth %>% 
  grab_significance() %>% 
  gt()
unit_name type pre_mspe post_mspe mspe_ratio rank fishers_exact_pvalue z_score
West Germany Treated 9429.751 3861807.5 409.534429 1 0.05882353 3.64670134
Norway Donor 244847.049 35911017.7 146.667145 2 0.11764706 0.99232432
Spain Donor 191745.925 8322283.5 43.402662 3 0.17647059 -0.05041812
USA Donor 515341.841 18484150.9 35.867747 4 0.23529412 -0.12650406
Greece Donor 284464.698 10123815.3 35.589004 5 0.29411765 -0.12931875
New Zealand Donor 549756.452 18661352.9 33.944764 6 0.35294118 -0.14592193
Japan Donor 348967.603 7804760.2 22.365286 7 0.41176471 -0.26284899
Australia Donor 95946.690 1989050.2 20.730785 8 0.47058824 -0.27935383
Italy Donor 324352.526 6369501.2 19.637588 9 0.52941176 -0.29039270
Netherlands Donor 53553.867 978965.9 18.280022 10 0.58823529 -0.30410111
UK Donor 195423.463 2166460.9 11.085981 11 0.64705882 -0.37674497
Austria Donor 65789.057 475479.4 7.227332 12 0.70588235 -0.41570878
Denmark Donor 34803.458 209290.2 6.013489 13 0.76470588 -0.42796591
Switzerland Donor 1400736.762 8114299.7 5.792880 14 0.82352941 -0.43019357
Belgium Donor 125571.637 446763.5 3.557838 15 0.88235294 -0.45276254
France Donor 39383.013 98115.9 2.491325 16 0.94117647 -0.46353195
Portugal Donor 1851083.924 995475.9 0.537780 17 1.00000000 -0.48325843

For instance, West Germany has the highest MSPE ratio and its rank is thus 1. Since we have 17 units in total, the p-value is \(1/17 = 0.0588\), slightly above the common significance threshold of 0.05. This is because we have only 17 countries in the sample and need to have at least 20 to get below the threshold (\(1/20 = 0.05\)).

Synthetic difference in differences

Another package which deserves to be mentioned here is synthdid which implements the synthetic control method presented above and one of its extension, synthetic difference in differences, elaborated by Arkhangelsky et al. (2021). The main difference with the synthetic control method is that synthetic difference in differences (SDiD) includes weights for time periods, motivated by the idea that time weights can remove bias and increase accuracy by eliminating the role of time periods that are very different from the post-treatment periods.

The main functions are panel.matrices which prepares the data and synthdid_estimate which runs the computations. In contrast to Synth and tidysynth, the data must be augmented to include a binary variable for the treatment argument in panel.matrices:

#devtools::install_github("synth-inference/synthdid") 

library(synthdid)

data <- read_dta("repgermany.dta")

# add a binary variable taking value 1 for West Germany for period 1990-2003
data <- data %>% 
  mutate(
    treatment = if_else(year >= 1990 & country == 17, 1, 0)
    )

The data preparation is rather simple and intuitive compared to synth:

panel.matrices returns a list with four objects: Y which is the observation matrix; N0 the number of control units (countries); T0 the number of pre-treatment periods and W the treatment indicator.

For synthdid_estimate to work, Y, N0 and T0 must be passed into the function as follows:

synthdid_results = synthdid_estimate(setup$Y, setup$N0, setup$T0)

And we can have a quick look at the result with:

plot(synthdid_results)

We can see that the SDiD estimates give similar results than the synthetic control estimates computed with the Synth and tidysynth packages. A good feature of the synthdid package is that we can also run the estimates for the classical DiD and synthetic control methods and plot the results together:

sc_results   = sc_estimate(setup$Y, setup$N0, setup$T0)
did_results  = did_estimate(setup$Y, setup$N0, setup$T0)

estimates = list(synthdid_results, sc_results, did_results)

names(estimates) = c('Diff-in-Diff', 'Synthetic Control', 'Synthetic Diff-in-Diff')
synthdid_plot(estimates)

The synthdid package looks promising, but is still under development and the documentation is thus for now incomplete. Until it is further improved, tidysynth remains by far the best package to run synthetic control analysis in R.

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105 (490): 493–505. https://doi.org/10.1198/jasa.2009.ap08746.
———. 2015. “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59 (2): 495–510. https://doi.org/10.1111/ajps.12116.
Amable, Bruno. 2003. The Diversity of Modern Capitalism. Oxford ; New York: Oxford University Press.
Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. 2021. “Synthetic Difference-in-Differences.” American Economic Review 111 (12): 4088–118. https://doi.org/10.1257/aer.20190159.
Esping-Andersen, Gosta. 1990. The Three Worlds of Welfare Capitalism. Princeton University Press.
Katzenstein, Peter J. 1985. Small States in World Markets: Industrial Policy in Europe. Ithaca; London: Cornell University Press.
Lijphart, Arend. 2012. Patterns of Democracy: Government Forms and Performance in Thirty-Six Countries. Second edition. New Haven: Yale University Press.
Shonfield, Andrew. 1974. Modern Capitalism: The Changing Balance of Public and Private Power. London: Oxford University Press Inc.

Footnotes

  1. I am repeating here the formalization of the method as presented in Abadie, Diamond, and Hainmueller (2010, 2015)↩︎

  2. For more information on data sources of the repgermany data set, see the appendix of Abadie, Diamond, and Hainmueller (2015)↩︎