Getting Started with pvarife

Introduction

pvarife implements the Panel VAR estimator with Interactive Fixed Effects (IFE) of Tuğan (2021). The model is

\[ y_{i,t} = \sum_{\ell=1}^{L} \Theta_\ell \, y_{i,t-\ell} + F_t \lambda_i + e_{i,t}, \quad i = 1,\ldots,I,\; t = 1,\ldots,T, \]

where \(y_{i,t}\) is a \(K\times 1\) vector of endogenous variables, \(F_t\) is an \(r\times 1\) vector of unobservable common factors, \(\lambda_i\) is a unit-specific loading vector, and \(e_{i,t}\) is an idiosyncratic error.

The estimator jointly identifies \(\beta = [\Theta_1,\ldots,\Theta_L]\), \(F = \{F_t\}\), and \(\Lambda = \{\lambda_i\}\) via an inner/outer EM algorithm (Bai 2009). Impulse responses under recursive (Cholesky) identification and asymptotic confidence bands (Theorem 2.3 of Tuğan 2021) are provided.

Reference: Tuğan, M. (2021). Panel VAR models with interactive fixed effects. Econometrics Journal, 24, 225–246. https://doi.org/10.1093/ectj/utaa021

1 Data Format

The main input is a three-dimensional array y[i, t, k]:

Dimension	Index	Meaning
1	`i`	cross-sectional unit
2	`t`	time period
3	`k`	VAR variable

NA values are allowed for unbalanced panels; the algorithm imputes missing entries via the EM step.

2 Simulate Data

library(pvarife)

set.seed(1)
sim <- sim_pvarife(
  n_units   = 50,   # I
  n_time    = 30,   # T
  n_vars    = 2,    # K
  n_lags    = 1,    # lag order
  n_factors = 1,    # number of interactive fixed effects
  seed      = 42
)

dim(sim$y)          # I x T x K
#> [1] 50 30  2
sim$beta_true       # true coefficient vector
#> [1] 1.00 1.00 0.65 0.30 0.20 0.60
sim$sigma_true      # true reduced-form covariance
#>      [,1] [,2]
#> [1,]  1.0  0.5
#> [2,]  0.5  1.0

The function generates data from the DGP of Tuğan (2021), Section S10:

VAR coefficient matrix \(\Theta_1 = \begin{pmatrix}0.65 & 0.30\\0.20 & 0.60\end{pmatrix}\)
Reduced-form covariance \(\Sigma_e = \begin{pmatrix}1 & 0.5\\0.5 & 1\end{pmatrix}\)
Common factor: AR(1) with coefficient 0.5
Loadings \(\lambda_{k,i} \sim N(1, 1)\) independently

3 Estimate the Model

fit <- pvarife(
  y         = sim$y,
  n_lags    = 1,
  n_factors = 1,
  n_out     = 20,   # outer GLS iterations (paper default: 50)
  n_in      = 5    # inner PCA/EM iterations (paper default: 10)
)

print(fit)
#> Panel VAR with Interactive Fixed Effects
#>   Units (I): 50  |  Time periods (T): 30  |  Variables (K): 2
#>   Lag order: 1  |  Factors: 1
#> 
#> Coefficients:
#>   Intercepts:
#> intercept[1] intercept[2] 
#>     1.111169     1.099592 
#> 
#>   Theta_1 (VAR lag 1):
#>      y[1,t-1] y[2,t-1]
#> y[1] 0.620220 0.317411
#> y[2] 0.224987 0.549718
#> 
#> Reduced-form covariance (Sigma):
#>          [,1]     [,2]
#> [1,] 0.916247 0.474269
#> [2,] 0.474269 0.937656

The n_out and n_in parameters control the convergence of the EM algorithm. For publication-quality results use the defaults (n_out = 50, n_in = 10); smaller values speed up experimentation.

Extract coefficients

coef(fit)
#> intercept[1] intercept[2]  theta1[1,1]  theta1[1,2]  theta1[2,1]  theta1[2,2] 
#>    1.1111692    1.0995918    0.6202204    0.3174106    0.2249868    0.5497178

The first \(K\) elements are equation-specific intercepts; the remaining \(K^2 L\) elements are the VAR lag matrices \(\Theta_1, \ldots, \Theta_L\) (stacked row-major).

4 Impulse Response Functions

Point estimates (no confidence bands)

ir <- compute_irf(
  fit,
  n_periods = 8,
  shock     = 1L     # shock to first variable (Cholesky ordering)
)

dim(ir)    # K x n_periods
#> [1] 2 8

plot(ir, var_names = c("Variable 1", "Variable 2"))

Confidence bands (asymptotic simulation)

The irf_bands() function draws from the joint asymptotic distribution of \(\hat\beta\) and the common component (Theorem 2.3 of Tuğan 2021).

bands <- irf_bands(
  fit,
  n_periods = 8,
  n_draw    = 200,    # paper default: 500
  level     = 0.95,
  seed      = 1
)

plot(bands, var_names = c("Variable 1", "Variable 2"))

Classical residual bootstrap (optional)

For robustness checks that do not rely on asymptotic approximations:

bsb <- bootstrap_irf_bands(
  fit,
  n_periods = 8,
  n_boot    = 100,   # computationally expensive
  level     = 0.95,
  seed      = 2
)
plot(bsb, var_names = c("Variable 1", "Variable 2"))

5 Long-Run Identification (Blanchard-Quah)

By default compute_irf() uses short-run (Cholesky) identification: the impact matrix \(A_0 = \mathrm{chol}(\hat\Sigma)^\top\) is lower-triangular, so shock 1 has no contemporaneous effect on variable 2.

For long-run identification, the cumulative impact matrix \(C(1) = (I - \Theta_1)^{-1} A_0\) is constrained to be lower-triangular: shock 1 has no long-run (permanent) effect on variable 2. The impact matrix becomes

\[A_0 = (I - \Theta_1)\,\mathrm{chol}(D)^\top, \quad D = (I-\Theta_1)^{-1}\hat\Sigma(I-\Theta_1)^{-\top}.\]

This matches the MATLAB Monte Carlo code IRs_to_Shocks_LR_Identification.m.

Long-run DGP

sim_lr <- sim_pvarife(
  n_units        = 50,
  n_time         = 30,
  identification = "long_run",   # Blanchard-Quah DGP
  seed           = 42
)
sim_lr$identification          # "long_run"
sim_lr$diff_vars_suggested     # 1L  (display cumulative IRF for variable 1)

Long-run IRFs and bands

fit_lr <- pvarife(sim_lr$y, n_lags = 1, n_factors = 1, n_out = 20, n_in = 5)

# Point estimate at estimated beta (uncorrected)
ir_lr <- compute_irf(
  fit_lr,
  n_periods      = 8,
  identification = "long_run",
  diff_vars      = sim_lr$diff_vars_suggested   # cumulate variable 1
)
plot(ir_lr, var_names = c("Variable 1", "Variable 2"))

# Bias-corrected point estimate
ir_lr_bc <- compute_irf(
  fit_lr,
  n_periods      = 8,
  identification = "long_run",
  diff_vars      = 1L,
  bias_correct   = TRUE    # apply Theorem 2.3 bias correction
)

# Full confidence bands (median is already bias-corrected)
bands_lr <- irf_bands(
  fit_lr,
  n_periods      = 8,
  identification = "long_run",
  diff_vars      = 1L,
  n_draw         = 200,
  seed           = 1
)
plot(bands_lr, var_names = c("Variable 1", "Variable 2"))

6 Bias-Corrected Point Estimate

The pvarife() estimator has a bias of order \(O(1/\sqrt{TC})\) documented in Theorem 2.3 of Tuğan (2021). Setting bias_correct = TRUE in compute_irf() subtracts the analytical bias from \(\hat\beta\) before computing the MA representation.

Call	Description
`compute_irf(fit)`	Fast uncorrected point estimate
`compute_irf(fit, bias_correct = TRUE)`	Bias-corrected point estimate
`irf_bands(fit)`	Full inference; median is implicitly bias-corrected

ir_unc <- compute_irf(fit, n_periods = 8)
ir_bc  <- compute_irf(fit, n_periods = 8, bias_correct = TRUE)
# ir_bc is close to the median returned by irf_bands()

7 Asymptotic Inference

avar <- asymptotic_var(fit)
se   <- sqrt(diag(avar$variance))

# Bias-corrected 95% confidence intervals for each element of beta
beta_hat   <- as.numeric(fit$beta)
beta_biasC <- beta_hat - avar$bias
ci_lower   <- beta_biasC - 1.96 * se
ci_upper   <- beta_biasC + 1.96 * se

data.frame(
  parameter = names(coef(fit)),
  estimate  = round(beta_hat, 4),
  std_err   = round(se, 4),
  ci_lower  = round(ci_lower, 4),
  ci_upper  = round(ci_upper, 4)
)

8 Working with Unbalanced Panels

Simply set missing cells to NA before calling pvarife():

y_unbal <- sim$y
# Randomly set 15% of unit-time observations to NA
set.seed(10)
mask <- array(runif(prod(dim(y_unbal))) < 0.15, dim = dim(y_unbal))
y_unbal[mask] <- NA

fit_unbal <- pvarife(y_unbal, n_lags = 1, n_factors = 1, n_out = 10, n_in = 5)
print(fit_unbal)

The EM step in the inner loop imputes missing observations using the current factor estimates, consistent with the approach of Bai (2009).

9 Tuning Advice

Parameter	Recommended	Notes
`n_out`	50	Increase if coefficients have not converged
`n_in`	10	Increase for more precise factor extraction
`n_factors`	Use IC criteria	See paper Section 3 for selection
`n_draw`	500	For `irf_bands()`; reduce for testing
`n_boot`	200	For `bootstrap_irf_bands()`; very slow

To check convergence, compare results with n_out = 50 and n_out = 100.

10 Replicating the Liaqat (2019) Application

The paper’s empirical application uses panel data on government debt and capital formation from Liaqat (2019). The dataset is not bundled with this package due to redistribution restrictions, but the replication code is available in the MATLAB package at documentforrepl/replication package/Economic Application/.

The R workflow would be:

Load the Liaqat dataset and compute first-differenced variables
Set n_lags = 1, n_factors = 2 (for Panel B of Figure 1, rmax = 2)
Call pvarife(), compute_irf(), and irf_bands() with n_draw = 500
Normalize IRFs by dividing by ir[1, 1] (shock to first variable at horizon 1)

References

Tuğan, M. (2021). Panel VAR models with interactive fixed effects. Econometrics Journal, 24, 225–246. https://doi.org/10.1093/ectj/utaa021
Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4), 1229–1279.
Liaqat, Z. (2019). Does public debt cause crowding out? A cross-country empirical evidence. Economics Letters, 181, 1–4.