Title: | Econometrics of Network Data |
---|---|
Description: | Simulating and estimating peer effect models and network formation models. The class of peer effect models includes linear-in-means models (Lee, 2004; <doi:10.1111/j.1468-0262.2004.00558.x>), Tobit models (Xu and Lee, 2015; <doi:10.1016/j.jeconom.2015.05.004>), and discrete numerical data models (Houndetoungan, 2024; <doi:10.2139/ssrn.3721250>). The network formation models include pair-wise regressions with degree heterogeneity (Graham, 2017; <doi:10.3982/ECTA12679>) and exponential random graph models (Mele, 2017; <doi:10.3982/ECTA10400>). |
Authors: | Aristide Houndetoungan [cre, aut] |
Maintainer: | Aristide Houndetoungan <[email protected]> |
License: | GPL-3 |
Version: | 2.2.1 |
Built: | 2025-02-01 02:41:57 UTC |
Source: | https://github.com/ahoundetoungan/cdatanet |
The CDatanet package simulates and estimates peer effect models and network formation models. The peer effect models include linear-in-means models (Lee, 2004; Lee et al., 2010),
Tobit models (Xu and Lee, 2015), and discrete numerical data models (Houndetoungan, 2024).
The network formation models include pairwise regressions with degree heterogeneity (Graham, 2017; Yan et al., 2019) and exponential random graph models (Mele, 2017).
To enhance computation speed, CDatanet uses C++
via the Rcpp package (Eddelbuettel et al., 2011).
Maintainer: Aristide Houndetoungan [email protected]
Eddelbuettel, D., & Francois, R. (2011). Rcpp: Seamless R and C++
integration. Journal of Statistical Software, 40(8), 1-18, doi:10.18637/jss.v040.i08.
Houndetoungan, E. A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899-1925, doi:10.1111/j.1468-0262.2004.00558.x.
Lee, L. F., Liu, X., & Lin, X. (2010). Specification and estimation of social interaction models with network structures. The Econometrics Journal, 13(2), 145-176, doi:10.1111/j.1368-423X.2010.00310.x
Xu, X., & Lee, L. F. (2015). Maximum likelihood estimation of a spatial autoregressive Tobit model. Journal of Econometrics, 188(1), 264-280, doi:10.1016/j.jeconom.2015.05.004.
Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity. Econometrica, 85(4), 1033-1063, doi:10.3982/ECTA12679.
Mele, A. (2017). A structural model of dense network formation. Econometrica, 85(3), 825-850, doi:10.3982/ECTA10400.
Yan, T., Jiang, B., Fienberg, S. E., & Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526), 857-868, doi:10.1080/01621459.2018.1448829.
Useful links:
Report bugs at https://github.com/ahoundetoungan/CDatanet/issues
cdnet
estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024).
cdnet( formula, Glist, group, Rmax, Rbar, starting = list(lambda = NULL, Gamma = NULL, delta = NULL), Ey0 = NULL, ubslambda = 1L, optimizer = "fastlbfgs", npl.ctr = list(), opt.ctr = list(), cov = TRUE, data )
cdnet( formula, Glist, group, Rmax, Rbar, starting = list(lambda = NULL, Gamma = NULL, delta = NULL), Ey0 = NULL, ubslambda = 1L, optimizer = "fastlbfgs", npl.ctr = list(), opt.ctr = list(), cov = TRUE, data )
formula |
a class object formula: a symbolic description of the model. The |
Glist |
adjacency matrix. For networks consisting of multiple subnets (e.g., schools), |
group |
a vector indicating the individual groups. The default assumes a common group. For two groups, i.e., |
Rmax |
an integer indicating the theoretical upper bound of |
Rbar |
an |
starting |
(optional) a starting value for |
Ey0 |
(optional) a starting value for |
ubslambda |
a positive value indicating the upper bound of |
optimizer |
specifies the optimization method, which can be one of: |
npl.ctr |
a list of controls for the NPL method (see details). |
opt.ctr |
a list of arguments to be passed to |
cov |
a Boolean indicating whether the covariance should be computed. |
data |
an optional data frame, list, or environment (or an object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in |
The count variable takes the value
with probability.
In this equation, is a vector of control variables;
is the distribution function of the standard normal distribution;
is the average of
among peers using the
s
-th network definition;
is the
r
-th cut-point in the cost group .
The following identification conditions have been introduced: ,
,
, and
for any
. The last condition implies that
for any
.
For any
, the distance between two cut-points is
.
As the number of cut-points can be large, a quadratic cost function is considered for
, where
.
With the semi-parametric cost function,
.
The model parameters are: ,
, and
,
where
for
.
The number of single parameters in
depends on
and
. The components
or/and
must be removed in certain cases.
If , then
.
If (binary models), then
must be empty.
If , then
.
npl.ctr
The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach
begins with an initial guess for and
and iteratively refines them.
The solution converges when the
-distance between two consecutive estimates of
and
is smaller than a specified tolerance.
The argument npl.ctr
must include the following parameters:
the tolerance level for the NPL algorithm (default is 1e-4).
the maximum number of iterations allowed (default is 500).
a boolean value indicating whether the estimates should be printed at each step.
the number of simulations performed to compute the integral in the covariance using importance sampling.
A list consisting of:
info |
a list containing general information about the model. |
estimate |
the NPL estimator. |
Ey |
|
GEy |
the average of |
cov |
a list that includes (if |
details |
step-by-step output returned by the optimizer. |
Houndetoungan, A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.
set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) n <- sum(nvec) # Adjacency matrix A <- list() for (m in 1:M) { nm <- nvec[m] Am <- matrix(0, nm, nm) max_d <- 30 #maximum number of friends for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Am[i, tmp] <- 1 } A[[m]] <- Am } Anorm <- norm.network(A) #Row-normalization # X X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4)) # Two group: group <- 1*(X[,1] > 0.95) # Networks # length(group) = 2 and unique(sort(group)) = c(0, 1) # The networks must be defined as to capture: # peer effects of `0` on `0`, peer effects of `1` on `0` # peer effects of `0` on `1`, and peer effects of `1` on `1` G <- list() cums <- c(0, cumsum(nvec)) for (m in 1:M) { tp <- group[(cums[m] + 1):(cums[m + 1])] Am <- A[[m]] G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)), Am * ((1 - tp) %*% t(tp)), Am * (tp %*% t(1 - tp)), Am * (tp %*% t(tp)))) } # Parameters lambda <- c(0.2, 0.3, -0.15, 0.25) Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2) delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) # Data data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) = c("x1", "x2", "gx1", "gx2") ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), lambda = lambda, Gamma = Gamma, delta = delta, group = group, data = data) y <- ytmp$y hist(y, breaks = max(y) + 1) table(y) # Estimation est <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group, optimizer = "fastlbfgs", data = data, opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11)) summary(est)
set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) n <- sum(nvec) # Adjacency matrix A <- list() for (m in 1:M) { nm <- nvec[m] Am <- matrix(0, nm, nm) max_d <- 30 #maximum number of friends for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Am[i, tmp] <- 1 } A[[m]] <- Am } Anorm <- norm.network(A) #Row-normalization # X X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4)) # Two group: group <- 1*(X[,1] > 0.95) # Networks # length(group) = 2 and unique(sort(group)) = c(0, 1) # The networks must be defined as to capture: # peer effects of `0` on `0`, peer effects of `1` on `0` # peer effects of `0` on `1`, and peer effects of `1` on `1` G <- list() cums <- c(0, cumsum(nvec)) for (m in 1:M) { tp <- group[(cums[m] + 1):(cums[m + 1])] Am <- A[[m]] G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)), Am * ((1 - tp) %*% t(tp)), Am * (tp %*% t(1 - tp)), Am * (tp %*% t(tp)))) } # Parameters lambda <- c(0.2, 0.3, -0.15, 0.25) Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2) delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) # Data data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) = c("x1", "x2", "gx1", "gx2") ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), lambda = lambda, Gamma = Gamma, delta = delta, group = group, data = data) y <- ytmp$y hist(y, breaks = max(y) + 1) table(y) # Estimation est <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group, optimizer = "fastlbfgs", data = data, opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11)) summary(est)
homophili.data
converts the matrix of explanatory variables between directed network models and symmetric network models.
homophili.data(data, nvec, to = c("lower", "upper", "symmetric"))
homophili.data(data, nvec, to = c("lower", "upper", "symmetric"))
data |
A |
nvec |
A vector of the number of individuals in the networks. |
to |
Indicates the direction of the conversion. For a matrix of explanatory variables |
The transformed data.frame
.
homophily.fe
implements a Logit estimator for a network formation model with homophily. The model includes degree heterogeneity using fixed effects (see details).
homophily.fe( network, formula, data, symmetry = FALSE, fe.way = 1, init = NULL, method = c("L-BFGS", "Block-NRaphson", "Mix"), ctr = list(maxit.opt = 10000, maxit.nr = 50, eps_f = 1e-09, eps_g = 1e-09, tol = 1e-04), print = TRUE )
homophily.fe( network, formula, data, symmetry = FALSE, fe.way = 1, init = NULL, method = c("L-BFGS", "Block-NRaphson", "Mix"), ctr = list(maxit.opt = 10000, maxit.nr = 50, eps_f = 1e-09, eps_g = 1e-09, tol = 1e-04), print = TRUE )
network |
A matrix or list of sub-matrices of social interactions containing 0 and 1, where links are represented by 1. |
formula |
An object of class formula: a symbolic description of the model. The |
data |
An optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables
in the model. If not found in data, the variables are taken from |
symmetry |
Indicates whether the network model is symmetric (see details). |
fe.way |
Indicates whether it is a one-way or two-way fixed effect model. The expected value is 1 or 2 (see details). |
init |
(optional) Either a list of starting values containing |
method |
A character string specifying the optimization method. Expected values are |
ctr |
(optional) A list containing control parameters for the solver. For the |
print |
A boolean indicating if the estimation progression should be printed. |
Let be the probability for a link to go from individual
to individual
.
This probability is specified for two-way effect models (
fe.way = 2
) as
where is the cumulative distribution function of the standard logistic distribution. Unobserved degree heterogeneity is captured by
and
. These are treated as fixed effects (see
homophily.re
for random effect models).
As shown by Yan et al. (2019), the estimator of the parameter is biased. A bias correction is necessary but not implemented in this version. However,
the estimators of
and
are consistent.
For one-way fixed effect models (fe.way = 1
), . For symmetric models, the network is not directed, and the fixed effects need to be one-way.
A list consisting of:
model.info |
A list of model information, such as the type of fixed effects, whether the model is symmetric, the number of observations, etc. |
estimate |
The maximizer of the log-likelihood. |
loglike |
The maximized log-likelihood. |
optim |
The returned value from the optimization solver, which contains details of the optimization. The solver used is |
init |
The returned list of starting values. |
loglike.init |
The log-likelihood at the starting values. |
Yan, T., Jiang, B., Fienberg, S. E., & Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526), 857-868, doi:10.1080/01621459.2018.1448829.
set.seed(1234) M <- 2 # Number of sub-groups nvec <- round(runif(M, 20, 50)) beta <- c(.1, -.1) Glist <- list() dX <- matrix(0, 0, 2) mu <- list() nu <- list() Emunu <- runif(M, -1.5, 0) # Expectation of mu + nu smu2 <- 0.2 snu2 <- 0.2 for (m in 1:M) { n <- nvec[m] mum <- rnorm(n, 0.7*Emunu[m], smu2) num <- rnorm(n, 0.3*Emunu[m], snu2) X1 <- rnorm(n, 0, 1) X2 <- rbinom(n, 1, 0.2) Z1 <- matrix(0, n, n) Z2 <- matrix(0, n, n) for (i in 1:n) { for (j in 1:n) { Z1[i, j] <- abs(X1[i] - X1[j]) Z2[i, j] <- 1*(X2[i] == X2[j]) } } Gm <- 1*((Z1*beta[1] + Z2*beta[2] + kronecker(mum, t(num), "+") + rlogis(n^2)) > 0) diag(Gm) <- 0 diag(Z1) <- NA diag(Z2) <- NA Z1 <- Z1[!is.na(Z1)] Z2 <- Z2[!is.na(Z2)] dX <- rbind(dX, cbind(Z1, Z2)) Glist[[m]] <- Gm mu[[m]] <- mum nu[[m]] <- num } mu <- unlist(mu) nu <- unlist(nu) out <- homophily.fe(network = Glist, formula = ~ -1 + dX, fe.way = 2) muhat <- out$estimate$mu nuhat <- out$estimate$nu plot(mu, muhat) plot(nu, nuhat)
set.seed(1234) M <- 2 # Number of sub-groups nvec <- round(runif(M, 20, 50)) beta <- c(.1, -.1) Glist <- list() dX <- matrix(0, 0, 2) mu <- list() nu <- list() Emunu <- runif(M, -1.5, 0) # Expectation of mu + nu smu2 <- 0.2 snu2 <- 0.2 for (m in 1:M) { n <- nvec[m] mum <- rnorm(n, 0.7*Emunu[m], smu2) num <- rnorm(n, 0.3*Emunu[m], snu2) X1 <- rnorm(n, 0, 1) X2 <- rbinom(n, 1, 0.2) Z1 <- matrix(0, n, n) Z2 <- matrix(0, n, n) for (i in 1:n) { for (j in 1:n) { Z1[i, j] <- abs(X1[i] - X1[j]) Z2[i, j] <- 1*(X2[i] == X2[j]) } } Gm <- 1*((Z1*beta[1] + Z2*beta[2] + kronecker(mum, t(num), "+") + rlogis(n^2)) > 0) diag(Gm) <- 0 diag(Z1) <- NA diag(Z2) <- NA Z1 <- Z1[!is.na(Z1)] Z2 <- Z2[!is.na(Z2)] dX <- rbind(dX, cbind(Z1, Z2)) Glist[[m]] <- Gm mu[[m]] <- mum nu[[m]] <- num } mu <- unlist(mu) nu <- unlist(nu) out <- homophily.fe(network = Glist, formula = ~ -1 + dX, fe.way = 2) muhat <- out$estimate$mu nuhat <- out$estimate$nu plot(mu, muhat) plot(nu, nuhat)
homophily.re
implements a Bayesian Probit estimator for network formation model with homophily. The model includes degree heterogeneity using random effects (see details).
homophily.re( network, formula, data, symmetry = FALSE, group.fe = FALSE, re.way = 1, init = list(), iteration = 1000, print = TRUE )
homophily.re( network, formula, data, symmetry = FALSE, group.fe = FALSE, re.way = 1, init = list(), iteration = 1000, print = TRUE )
network |
matrix or list of sub-matrix of social interactions containing 0 and 1, where links are represented by 1. |
formula |
an object of class formula: a symbolic description of the model. The |
data |
an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables
in the model. If not found in data, the variables are taken from |
symmetry |
indicates whether the network model is symmetric (see details). |
group.fe |
indicates whether the model includes group fixed effects. |
re.way |
indicates whether it is a one-way or two-way random effect model. The expected value is 1 or 2 (see details). |
init |
(optional) list of starting values containing |
iteration |
the number of iterations to be performed. |
print |
boolean indicating if the estimation progression should be printed. |
Let be a probability for a link to go from the individual
to the individual
.
This probability is specified for two-way effect models (
re.way = 2
) as
where is the cumulative of the standard normal distribution. Unobserved degree heterogeneity is captured by
and
. The latter are treated as random effects (see
homophily.fe
for fixed effect models).
For one-way random effect models (re.way = 1
), . For symmetric models, the network is not directed and the
random effects need to be one way.
A list consisting of:
model.info |
list of model information, such as the type of random effects, whether the model is symmetric, number of observations, etc. |
posterior |
list of simulations from the posterior distribution. |
init |
returned list of starting values. |
set.seed(1234) library(MASS) M <- 4 # Number of sub-groups nvec <- round(runif(M, 100, 500)) beta <- c(.1, -.1) Glist <- list() dX <- matrix(0, 0, 2) mu <- list() nu <- list() cst <- runif(M, -1.5, 0) smu2 <- 0.2 snu2 <- 0.2 rho <- 0.8 Smunu <- matrix(c(smu2, rho*sqrt(smu2*snu2), rho*sqrt(smu2*snu2), snu2), 2) for (m in 1:M) { n <- nvec[m] tmp <- mvrnorm(n, c(0, 0), Smunu) mum <- tmp[,1] - mean(tmp[,1]) num <- tmp[,2] - mean(tmp[,2]) X1 <- rnorm(n, 0, 1) X2 <- rbinom(n, 1, 0.2) Z1 <- matrix(0, n, n) Z2 <- matrix(0, n, n) for (i in 1:n) { for (j in 1:n) { Z1[i, j] <- abs(X1[i] - X1[j]) Z2[i, j] <- 1*(X2[i] == X2[j]) } } Gm <- 1*((cst[m] + Z1*beta[1] + Z2*beta[2] + kronecker(mum, t(num), "+") + rnorm(n^2)) > 0) diag(Gm) <- 0 diag(Z1) <- NA diag(Z2) <- NA Z1 <- Z1[!is.na(Z1)] Z2 <- Z2[!is.na(Z2)] dX <- rbind(dX, cbind(Z1, Z2)) Glist[[m]] <- Gm mu[[m]] <- mum nu[[m]] <- num } mu <- unlist(mu) nu <- unlist(nu) out <- homophily.re(network = Glist, formula = ~ dX, group.fe = TRUE, re.way = 2, iteration = 1e3) # plot simulations plot(out$posterior$beta[,1], type = "l") abline(h = cst[1], col = "red") plot(out$posterior$beta[,2], type = "l") abline(h = cst[2], col = "red") plot(out$posterior$beta[,3], type = "l") abline(h = cst[3], col = "red") plot(out$posterior$beta[,4], type = "l") abline(h = cst[4], col = "red") plot(out$posterior$beta[,5], type = "l") abline(h = beta[1], col = "red") plot(out$posterior$beta[,6], type = "l") abline(h = beta[2], col = "red") plot(out$posterior$sigma2_mu, type = "l") abline(h = smu2, col = "red") plot(out$posterior$sigma2_nu, type = "l") abline(h = snu2, col = "red") plot(out$posterior$rho, type = "l") abline(h = rho, col = "red") i <- 10 plot(out$posterior$mu[,i], type = "l") abline(h = mu[i], col = "red") plot(out$posterior$nu[,i], type = "l") abline(h = nu[i], col = "red")
set.seed(1234) library(MASS) M <- 4 # Number of sub-groups nvec <- round(runif(M, 100, 500)) beta <- c(.1, -.1) Glist <- list() dX <- matrix(0, 0, 2) mu <- list() nu <- list() cst <- runif(M, -1.5, 0) smu2 <- 0.2 snu2 <- 0.2 rho <- 0.8 Smunu <- matrix(c(smu2, rho*sqrt(smu2*snu2), rho*sqrt(smu2*snu2), snu2), 2) for (m in 1:M) { n <- nvec[m] tmp <- mvrnorm(n, c(0, 0), Smunu) mum <- tmp[,1] - mean(tmp[,1]) num <- tmp[,2] - mean(tmp[,2]) X1 <- rnorm(n, 0, 1) X2 <- rbinom(n, 1, 0.2) Z1 <- matrix(0, n, n) Z2 <- matrix(0, n, n) for (i in 1:n) { for (j in 1:n) { Z1[i, j] <- abs(X1[i] - X1[j]) Z2[i, j] <- 1*(X2[i] == X2[j]) } } Gm <- 1*((cst[m] + Z1*beta[1] + Z2*beta[2] + kronecker(mum, t(num), "+") + rnorm(n^2)) > 0) diag(Gm) <- 0 diag(Z1) <- NA diag(Z2) <- NA Z1 <- Z1[!is.na(Z1)] Z2 <- Z2[!is.na(Z2)] dX <- rbind(dX, cbind(Z1, Z2)) Glist[[m]] <- Gm mu[[m]] <- mum nu[[m]] <- num } mu <- unlist(mu) nu <- unlist(nu) out <- homophily.re(network = Glist, formula = ~ dX, group.fe = TRUE, re.way = 2, iteration = 1e3) # plot simulations plot(out$posterior$beta[,1], type = "l") abline(h = cst[1], col = "red") plot(out$posterior$beta[,2], type = "l") abline(h = cst[2], col = "red") plot(out$posterior$beta[,3], type = "l") abline(h = cst[3], col = "red") plot(out$posterior$beta[,4], type = "l") abline(h = cst[4], col = "red") plot(out$posterior$beta[,5], type = "l") abline(h = beta[1], col = "red") plot(out$posterior$beta[,6], type = "l") abline(h = beta[2], col = "red") plot(out$posterior$sigma2_mu, type = "l") abline(h = smu2, col = "red") plot(out$posterior$sigma2_nu, type = "l") abline(h = snu2, col = "red") plot(out$posterior$rho, type = "l") abline(h = rho, col = "red") i <- 10 plot(out$posterior$mu[,i], type = "l") abline(h = mu[i], col = "red") plot(out$posterior$nu[,i], type = "l") abline(h = nu[i], col = "red")
The vec.to.mat
function creates a list of square matrices from a given vector.
Elements of the generated matrices are taken from the vector and placed column-wise or row-wise, progressing from the first matrix in the list to the last.
The diagonals of the generated matrices are set to zeros.
The mat.to.vec
function creates a vector from a given list of square matrices.
Elements of the generated vector are taken column-wise or row-wise, starting from the first matrix in the list to the last, excluding diagonal entries.
The norm.network
function row-normalizes matrices in a given list.
norm.network(W) vec.to.mat(u, N, normalise = FALSE, byrow = FALSE) mat.to.vec(W, ceiled = FALSE, byrow = FALSE)
norm.network(W) vec.to.mat(u, N, normalise = FALSE, byrow = FALSE) mat.to.vec(W, ceiled = FALSE, byrow = FALSE)
W |
A matrix or list of matrices to convert. |
u |
A numeric vector to convert. |
N |
A vector of sub-network sizes such that |
normalise |
A boolean indicating whether the returned matrices should be row-normalized ( |
byrow |
A boolean indicating whether entries in the matrices should be taken by row ( |
ceiled |
A boolean indicating whether the given matrices should be ceiled before conversion ( |
A vector of size sum(N * (N - 1))
or a list of length(N)
square matrices, with matrix sizes determined by N[1], N[2], ...
.
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Rate of friendship p <- c(0.2, 0.15, 0.18) ## Network data u <- unlist(lapply(1:3, function(x) rbinom(N[x] * (N[x] - 1), 1, p[x]))) W <- vec.to.mat(u, N) # Convert W into a list of row-normalized matrices G <- norm.network(W) # Recover u v <- mat.to.vec(G, ceiled = TRUE) all.equal(u, v)
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Rate of friendship p <- c(0.2, 0.15, 0.18) ## Network data u <- unlist(lapply(1:3, function(x) rbinom(N[x] * (N[x] - 1), 1, p[x]))) W <- vec.to.mat(u, N) # Convert W into a list of row-normalized matrices G <- norm.network(W) # Recover u v <- mat.to.vec(G, ceiled = TRUE) all.equal(u, v)
The peer.avg
function computes peer average values using network data (provided as a list of adjacency matrices) and observable characteristics.
peer.avg(Glist, V, export.as.list = FALSE)
peer.avg(Glist, V, export.as.list = FALSE)
Glist |
An adjacency matrix or a list of sub-adjacency matrices representing the network structure. |
V |
A vector or matrix of observable characteristics. |
export.as.list |
(optional) A boolean indicating whether the output should be a list of matrices ( |
The matrix product diag(Glist[[1]], Glist[[2]], ...) %*% V
, where diag()
represents the block diagonal operator.
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Rate of friendship p <- c(0.2, 0.15, 0.18) ## Network data u <- unlist(lapply(1:3, function(x) rbinom(N[x] * (N[x] - 1), 1, p[x]))) G <- vec.to.mat(u, N, normalise = TRUE) # Generate a vector y y <- rnorm(sum(N)) # Compute G %*% y Gy <- peer.avg(Glist = G, V = y)
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Rate of friendship p <- c(0.2, 0.15, 0.18) ## Network data u <- unlist(lapply(1:3, function(x) rbinom(N[x] * (N[x] - 1), 1, p[x]))) G <- vec.to.mat(u, N, normalise = TRUE) # Generate a vector y y <- rnorm(sum(N)) # Compute G %*% y Gy <- peer.avg(Glist = G, V = y)
Summary and print methods for the class simcdEy
as returned by the function simcdEy.
## S3 method for class 'simcdEy' print(x, ...) ## S3 method for class 'simcdEy' summary(object, ...) ## S3 method for class 'summary.simcdEy' print(x, ...)
## S3 method for class 'simcdEy' print(x, ...) ## S3 method for class 'simcdEy' summary(object, ...) ## S3 method for class 'summary.simcdEy' print(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods. |
object |
an object of class |
A list of the same objects in object
.
The remove.ids
function removes identifiers with missing values (NA) from adjacency matrices in an optimal way.
Multiple combinations of rows and columns can be deleted to eliminate NAs, but this function ensures that the smallest
number of rows and columns are removed to retain as much data as possible.
remove.ids(network, ncores = 1L)
remove.ids(network, ncores = 1L)
network |
A list of adjacency matrices to process. |
ncores |
The number of cores to use for parallel computation. |
A list containing:
A list of adjacency matrices without missing values.
A list of vectors indicating the indices of retained rows and columns for each matrix.
# Example 1: Small adjacency matrix A <- matrix(1:25, 5) A[1, 1] <- NA A[4, 2] <- NA remove.ids(A) # Example 2: Larger adjacency matrix with multiple NAs B <- matrix(1:100, 10) B[1, 1] <- NA B[4, 2] <- NA B[2, 4] <- NA B[, 8] <- NA remove.ids(B)
# Example 1: Small adjacency matrix A <- matrix(1:25, 5) A[1, 1] <- NA A[4, 2] <- NA remove.ids(A) # Example 2: Larger adjacency matrix with multiple NAs B <- matrix(1:100, 10) B[1, 1] <- NA B[4, 2] <- NA B[2, 4] <- NA B[, 8] <- NA remove.ids(B)
sar
computes quasi-maximum likelihood estimators for linear-in-mean models with social interactions (see Lee, 2004 and Lee et al., 2010).
sar( formula, Glist, lambda0 = NULL, fixed.effects = FALSE, optimizer = "optim", opt.ctr = list(), print = TRUE, cov = TRUE, cinfo = TRUE, data )
sar( formula, Glist, lambda0 = NULL, fixed.effects = FALSE, optimizer = "optim", opt.ctr = list(), print = TRUE, cov = TRUE, cinfo = TRUE, data )
formula |
a class object formula: a symbolic description of the model. |
Glist |
The network matrix. For networks consisting of multiple subnets, |
lambda0 |
an optional starting value of |
fixed.effects |
a Boolean indicating whether group heterogeneity must be included as fixed effects. |
optimizer |
is either |
opt.ctr |
list of arguments of nlm or optim (the one set in |
print |
a Boolean indicating if the estimate should be printed at each step. |
cov |
a Boolean indicating if the covariance should be computed. |
cinfo |
a Boolean indicating whether information is complete ( |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables
in the model. If not found in data, the variables are taken from |
In the complete information model, the outcome for individual
is defined as:
where represents the average outcome
among individual
's peers,
is a vector of control variables, and
is the error term.
In the case of incomplete information models with rational expectations, the outcome
is defined as:
where is the expected average outcome of
's peers, as perceived by individual
.
A list consisting of:
info |
list of general information on the model. |
estimate |
Maximum Likelihood (ML) estimator. |
cov |
covariance matrix of the estimate. |
details |
outputs as returned by the optimizer. |
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899-1925, doi:10.1111/j.1468-0262.2004.00558.x.
Lee, L. F., Liu, X., & Lin, X. (2010). Specification and estimation of social interaction models with network structures. The Econometrics Journal, 13(2), 145-176, doi:10.1111/j.1368-423X.2010.00310.x
# Groups' size set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 1000)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # X X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm/rs G[[m]] <- Gm } # data data <- data.frame(X, peer.avg(G, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ytmp <- simsar(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data) data$y <- ytmp$y out <- sar(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, optimizer = "optim", data = data) summary(out)
# Groups' size set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 1000)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # X X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm/rs G[[m]] <- Gm } # data data <- data.frame(X, peer.avg(G, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ytmp <- simsar(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data) data$y <- ytmp$y out <- sar(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, optimizer = "optim", data = data) summary(out)
sart
estimates Tobit models with social interactions based on the framework of Xu and Lee (2015).
The method allows for modeling both complete and incomplete information scenarios in networks, incorporating rational expectations in the latter case.
sart( formula, Glist, starting = NULL, Ey0 = NULL, optimizer = "fastlbfgs", npl.ctr = list(), opt.ctr = list(), cov = TRUE, cinfo = TRUE, data )
sart( formula, Glist, starting = NULL, Ey0 = NULL, optimizer = "fastlbfgs", npl.ctr = list(), opt.ctr = list(), cov = TRUE, cinfo = TRUE, data )
formula |
An object of class formula: a symbolic description of the model. The formula must follow the structure,
e.g., |
Glist |
The network matrix. For networks consisting of multiple subnets, |
starting |
(Optional) A vector of starting values for
|
Ey0 |
(Optional) A starting value for |
optimizer |
The optimization method to be used. Choices are:
Additional arguments for these functions, such as |
npl.ctr |
A list of controls for the NPL (Nested Pseudo-Likelihood) method (refer to the details in |
opt.ctr |
A list of arguments to be passed to the chosen solver ( |
cov |
A Boolean indicating whether to compute the covariance matrix ( |
cinfo |
A Boolean indicating whether the information structure is complete ( |
data |
An optional data frame, list, or environment (or object coercible by as.data.frame) containing the variables
in the model. If not found in |
For a complete information model, the outcome is defined as:
where is the average of
among peers,
is a vector of control variables,
and
.
In the case of incomplete information models with rational expectations, is defined as:
A list containing:
info
General information about the model.
estimate
The Maximum Likelihood (ML) estimates of the parameters.
Ey
, the expected values of the endogenous variable.
GEy
The average of among peers.
cov
A list including covariance matrices (if cov = TRUE
).
details
Additional outputs returned by the optimizer.
Xu, X., & Lee, L. F. (2015). Maximum likelihood estimation of a spatial autoregressive Tobit model. Journal of Econometrics, 188(1), 264-280, doi:10.1016/j.jeconom.2015.05.004.
# Group sizes set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # Covariates (X) X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network creation G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm / rs G[[m]] <- Gm } # Data creation data <- data.frame(X, peer.avg(G, cbind(x1 = X[, 1], x2 = X[, 2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ## Complete information game ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = TRUE) data$yc <- ytmp$y ## Incomplete information game ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = FALSE) data$yi <- ytmp$y # Complete information estimation for yc outc1 <- sart(formula = yc ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = TRUE) summary(outc1) # Complete information estimation for yi outc1 <- sart(formula = yi ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = TRUE) summary(outc1) # Incomplete information estimation for yc outi1 <- sart(formula = yc ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = FALSE) summary(outi1) # Incomplete information estimation for yi outi1 <- sart(formula = yi ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = FALSE) summary(outi1)
# Group sizes set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # Covariates (X) X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network creation G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm / rs G[[m]] <- Gm } # Data creation data <- data.frame(X, peer.avg(G, cbind(x1 = X[, 1], x2 = X[, 2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ## Complete information game ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = TRUE) data$yc <- ytmp$y ## Incomplete information game ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = FALSE) data$yi <- ytmp$y # Complete information estimation for yc outc1 <- sart(formula = yc ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = TRUE) summary(outc1) # Complete information estimation for yi outc1 <- sart(formula = yi ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = TRUE) summary(outc1) # Incomplete information estimation for yc outi1 <- sart(formula = yc ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = FALSE) summary(outi1) # Incomplete information estimation for yi outi1 <- sart(formula = yi ~ x1 + x2 + gx1 + gx2, optimizer = "nlm", Glist = G, data = data, cinfo = FALSE) summary(outi1)
simcdpar
computes the average expected outcomes for count data models with social interactions and standard errors using the Delta method.
This function can be used to examine the effects of changes in the network or in the control variables.
simcdEy(object, Glist, data, group, tol = 1e-10, maxit = 500, S = 1000)
simcdEy(object, Glist, data, group, tol = 1e-10, maxit = 500, S = 1000)
object |
an object of class |
Glist |
adjacency matrix. For networks consisting of multiple subnets, |
data |
an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from |
group |
the vector indicating the individual groups (see function |
tol |
the tolerance value used in the Fixed Point Iteration Method to compute the expectancy of |
maxit |
the maximal number of iterations in the Fixed Point Iteration Method. |
S |
number of simulations to be used to compute integral in the covariance by important sampling. |
A list consisting of:
Ey |
|
GEy |
the average of |
aEy |
the sampling mean of |
se.aEy |
the standard error of the sampling mean of |
simcdnet
simulates the count data model with social interactions under rational expectations developed by Houndetoungan (2024).
simcdnet( formula, group, Glist, parms, lambda, Gamma, delta, Rmax, Rbar, tol = 1e-10, maxit = 500, data )
simcdnet( formula, group, Glist, parms, lambda, Gamma, delta, Rmax, Rbar, tol = 1e-10, maxit = 500, data )
formula |
A class object of class formula: a symbolic description of the model. |
group |
A vector indicating the individual groups. By default, this assumes a common group. If there are 2 groups (i.e., |
Glist |
An adjacency matrix or list of adjacency matrices. For networks consisting of multiple subnets (e.g., schools), |
parms |
A vector defining the true values of |
lambda |
The true value of the vector |
Gamma |
The true value of the vector |
delta |
The true value of the vector |
Rmax |
An integer indicating the theoretical upper bound of |
Rbar |
An |
tol |
The tolerance value used in the Fixed Point Iteration Method to compute the expectancy of |
maxit |
The maximum number of iterations in the Fixed Point Iteration Method. |
data |
An optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in |
The count variable takes the value
with probability.
In this equation, is a vector of control variables;
is the distribution function of the standard normal distribution;
is the average of
among peers using the
s
-th network definition;
is the
r
-th cut-point in the cost group .
The following identification conditions have been introduced: ,
,
, and
for any
. The last condition implies that
for any
.
For any
, the distance between two cut-points is
.
As the number of cut-points can be large, a quadratic cost function is considered for
, where
.
With the semi-parametric cost function,
.
The model parameters are: ,
, and
,
where
for
.
The number of single parameters in
depends on
and
. The components
or/and
must be removed in certain cases.
If , then
.
If (binary models), then
must be empty.
If , then
.
A list consisting of:
yst |
|
y |
the observed count variable. |
Ey |
|
GEy |
the average of |
meff |
a list including average and individual marginal effects. |
Rmax |
infinite sums in the marginal effects are approximated by sums up to Rmax. |
iteration |
number of iterations performed by sub-network in the Fixed Point Iteration Method. |
Houndetoungan, A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.
set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) # Random group sizes n <- sum(nvec) # Total number of individuals # Adjacency matrix for each group A <- list() for (m in 1:M) { nm <- nvec[m] # Size of group m Am <- matrix(0, nm, nm) # Empty adjacency matrix max_d <- 30 # Maximum number of friends for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) # Sample friends Am[i, tmp] <- 1 # Set friendship links } A[[m]] <- Am # Add to the list } Anorm <- norm.network(A) # Row-normalization of the adjacency matrices # Covariates (X) X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4)) # Random covariates # Two groups based on first covariate group <- 1 * (X[,1] > 0.95) # Assign to groups based on x1 # Networks: Define peer effects based on group membership # The networks should capture: # - Peer effects of `0` on `0` # - Peer effects of `1` on `0` # - Peer effects of `0` on `1` # - Peer effects of `1` on `1` G <- list() cums <- c(0, cumsum(nvec)) # Cumulative indices for groups for (m in 1:M) { tp <- group[(cums[m] + 1):(cums[m + 1])] # Group membership for group m Am <- A[[m]] # Adjacency matrix for group m # Define networks based on peer effects G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)), Am * ((1 - tp) %*% t(tp)), Am * (tp %*% t(1 - tp)), Am * (tp %*% t(tp)))) } # Parameters for the model lambda <- c(0.2, 0.3, -0.15, 0.25) Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2) delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) # Repeated values for delta # Prepare data for the model data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) = c("x1", "x2", "gx1", "gx2") # Set column names # Simulate outcomes using the `simcdnet` function ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), lambda = lambda, Gamma = Gamma, delta = delta, group = group, data = data) y <- ytmp$y # Plot histogram of the simulated outcomes hist(y, breaks = max(y) + 1) # Display frequency table of the simulated outcomes table(y)
set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) # Random group sizes n <- sum(nvec) # Total number of individuals # Adjacency matrix for each group A <- list() for (m in 1:M) { nm <- nvec[m] # Size of group m Am <- matrix(0, nm, nm) # Empty adjacency matrix max_d <- 30 # Maximum number of friends for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) # Sample friends Am[i, tmp] <- 1 # Set friendship links } A[[m]] <- Am # Add to the list } Anorm <- norm.network(A) # Row-normalization of the adjacency matrices # Covariates (X) X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4)) # Random covariates # Two groups based on first covariate group <- 1 * (X[,1] > 0.95) # Assign to groups based on x1 # Networks: Define peer effects based on group membership # The networks should capture: # - Peer effects of `0` on `0` # - Peer effects of `1` on `0` # - Peer effects of `0` on `1` # - Peer effects of `1` on `1` G <- list() cums <- c(0, cumsum(nvec)) # Cumulative indices for groups for (m in 1:M) { tp <- group[(cums[m] + 1):(cums[m + 1])] # Group membership for group m Am <- A[[m]] # Adjacency matrix for group m # Define networks based on peer effects G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)), Am * ((1 - tp) %*% t(tp)), Am * (tp %*% t(1 - tp)), Am * (tp %*% t(tp)))) } # Parameters for the model lambda <- c(0.2, 0.3, -0.15, 0.25) Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2) delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) # Repeated values for delta # Prepare data for the model data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) = c("x1", "x2", "gx1", "gx2") # Set column names # Simulate outcomes using the `simcdnet` function ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), lambda = lambda, Gamma = Gamma, delta = delta, group = group, data = data) y <- ytmp$y # Plot histogram of the simulated outcomes hist(y, breaks = max(y) + 1) # Display frequency table of the simulated outcomes table(y)
simnetwork
generates adjacency matrices based on specified probabilities.
simnetwork(dnetwork, normalise = FALSE)
simnetwork(dnetwork, normalise = FALSE)
dnetwork |
A list of sub-network matrices, where the (i, j)-th position of the m-th matrix represents the probability that individual |
normalise |
A boolean indicating whether the returned matrices should be row-normalized ( |
A list of (row-normalized) adjacency matrices.
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Probability distributions dnetwork <- lapply(N, function(x) matrix(runif(x^2), x)) ## Generate networks G <- simnetwork(dnetwork)
# Generate a list of adjacency matrices ## Sub-network sizes N <- c(250, 370, 120) ## Probability distributions dnetwork <- lapply(N, function(x) matrix(runif(x^2), x)) ## Generate networks G <- simnetwork(dnetwork)
simsar
simulates continuous variables under linear-in-mean models with social interactions, following the specifications described
in Lee (2004) and Lee et al. (2010). The model incorporates peer interactions, where the value of an individual’s outcome depends
not only on their own characteristics but also on the average characteristics of their peers in the network.
simsar(formula, Glist, theta, cinfo = TRUE, data)
simsar(formula, Glist, theta, cinfo = TRUE, data)
formula |
A symbolic description of the model, passed as a class object of type formula.
The formula must specify the endogenous variable and control variables, for example:
|
Glist |
A list of network adjacency matrices representing multiple subnets. The |
theta |
A numeric vector defining the true values of the model parameters |
cinfo |
A Boolean flag indicating whether the information is complete ( |
data |
An optional data frame, list, or environment (or an object coercible by as.data.frame to a data frame) containing the variables in the model. If not provided, the variables are taken from the environment of the function call. |
In the complete information model, the outcome for individual
is defined as:
where represents the average outcome
among individual
's peers,
is a vector of control variables, and
is the error term.
In the case of incomplete information models with rational expectations, the outcome
is defined as:
where is the expected average outcome of
's peers, as perceived by individual
.
A list containing the following elements:
y |
the observed count data. |
Gy |
the average of y among friends. |
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899-1925, doi:10.1111/j.1468-0262.2004.00558.x.
Lee, L. F., Liu, X., & Lin, X. (2010). Specification and estimation of social interaction models with network structures. The Econometrics Journal, 13(2), 145-176, doi:10.1111/j.1368-423X.2010.00310.x
# Groups' size set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 1000)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # X X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm/rs G[[m]] <- Gm } # data data <- data.frame(X, peer.avg(G, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ytmp <- simsar(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data) y <- ytmp$y
# Groups' size set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 1000)) n <- sum(nvec) # Parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # X X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Network G <- list() for (m in 1:M) { nm <- nvec[m] Gm <- matrix(0, nm, nm) max_d <- 30 for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) Gm[i, tmp] <- 1 } rs <- rowSums(Gm); rs[rs == 0] <- 1 Gm <- Gm/rs G[[m]] <- Gm } # data data <- data.frame(X, peer.avg(G, cbind(x1 = X[,1], x2 = X[,2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") ytmp <- simsar(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data) y <- ytmp$y
simsart
simulates censored data with social interactions (see Xu and Lee, 2015).
simsart(formula, Glist, theta, tol = 1e-15, maxit = 500, cinfo = TRUE, data)
simsart(formula, Glist, theta, tol = 1e-15, maxit = 500, cinfo = TRUE, data)
formula |
a class object |
Glist |
The network matrix. For networks consisting of multiple subnets, |
theta |
a vector defining the true value of |
tol |
the tolerance value used in the fixed-point iteration method to compute |
maxit |
the maximum number of iterations in the fixed-point iteration method. |
cinfo |
a Boolean indicating whether information is complete ( |
data |
an optional data frame, list, or environment (or object coercible by |
For a complete information model, the outcome is defined as:
where is the average of
among peers,
is a vector of control variables,
and
.
In the case of incomplete information models with rational expectations, is defined as:
A list consisting of:
, the latent variable.
The observed censored variable.
, the expected value of
.
The average of among peers.
The average of among peers.
A list including average and individual marginal effects.
The number of iterations performed per sub-network in the fixed-point iteration method.
Xu, X., & Lee, L. F. (2015). Maximum likelihood estimation of a spatial autoregressive Tobit model. Journal of Econometrics, 188(1), 264-280, doi:10.1016/j.jeconom.2015.05.004.
# Define group sizes set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) # Number of nodes per sub-group n <- sum(nvec) # Total number of nodes # Define parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # Generate covariates (X) X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Construct network adjacency matrices G <- list() for (m in 1:M) { nm <- nvec[m] # Nodes in sub-group m Gm <- matrix(0, nm, nm) # Initialize adjacency matrix max_d <- 30 # Maximum degree for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) # Random connections Gm[i, tmp] <- 1 } rs <- rowSums(Gm) # Normalize rows rs[rs == 0] <- 1 Gm <- Gm / rs G[[m]] <- Gm } # Prepare data data <- data.frame(X, peer.avg(G, cbind(x1 = X[, 1], x2 = X[, 2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") # Add column names # Complete information game simulation ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = TRUE) data$yc <- ytmp$y # Add simulated outcome to the dataset # Incomplete information game simulation ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = FALSE) data$yi <- ytmp$y # Add simulated outcome to the dataset
# Define group sizes set.seed(123) M <- 5 # Number of sub-groups nvec <- round(runif(M, 100, 200)) # Number of nodes per sub-group n <- sum(nvec) # Total number of nodes # Define parameters lambda <- 0.4 Gamma <- c(2, -1.9, 0.8, 1.5, -1.2) sigma <- 1.5 theta <- c(lambda, Gamma, sigma) # Generate covariates (X) X <- cbind(rnorm(n, 1, 1), rexp(n, 0.4)) # Construct network adjacency matrices G <- list() for (m in 1:M) { nm <- nvec[m] # Nodes in sub-group m Gm <- matrix(0, nm, nm) # Initialize adjacency matrix max_d <- 30 # Maximum degree for (i in 1:nm) { tmp <- sample((1:nm)[-i], sample(0:max_d, 1)) # Random connections Gm[i, tmp] <- 1 } rs <- rowSums(Gm) # Normalize rows rs[rs == 0] <- 1 Gm <- Gm / rs G[[m]] <- Gm } # Prepare data data <- data.frame(X, peer.avg(G, cbind(x1 = X[, 1], x2 = X[, 2]))) colnames(data) <- c("x1", "x2", "gx1", "gx2") # Add column names # Complete information game simulation ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = TRUE) data$yc <- ytmp$y # Add simulated outcome to the dataset # Incomplete information game simulation ytmp <- simsart(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, theta = theta, data = data, cinfo = FALSE) data$yi <- ytmp$y # Add simulated outcome to the dataset
Summary and print methods for the class cdnet
as returned by the function cdnet.
## S3 method for class 'cdnet' summary(object, Glist, data, S = 1000L, ...) ## S3 method for class 'summary.cdnet' print(x, ...) ## S3 method for class 'cdnet' print(x, ...)
## S3 method for class 'cdnet' summary(object, Glist, data, S = 1000L, ...) ## S3 method for class 'summary.cdnet' print(x, ...) ## S3 method for class 'cdnet' print(x, ...)
object |
an object of class |
Glist |
adjacency matrix. For networks consisting of multiple subnets, |
data |
an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from |
S |
number of simulations to be used to compute integral in the covariance by important sampling. |
... |
further arguments passed to or from other methods. |
x |
an object of class |
A list of the same objects in object
.
Summary and print methods for the class sar
as returned by the function sar
.
## S3 method for class 'sar' summary(object, ...) ## S3 method for class 'summary.sar' print(x, ...) ## S3 method for class 'sar' print(x, ...)
## S3 method for class 'sar' summary(object, ...) ## S3 method for class 'summary.sar' print(x, ...) ## S3 method for class 'sar' print(x, ...)
object |
an object of class |
... |
further arguments passed to or from other methods. |
x |
an object of class |
A list of the same objects in object
.
Summary and print methods for the class sart
as returned by the function sart.
## S3 method for class 'sart' summary(object, Glist, data, ...) ## S3 method for class 'summary.sart' print(x, ...) ## S3 method for class 'sart' print(x, ...)
## S3 method for class 'sart' summary(object, Glist, data, ...) ## S3 method for class 'summary.sart' print(x, ...) ## S3 method for class 'sart' print(x, ...)
object |
an object of class |
Glist |
adjacency matrix or list sub-adjacency matrix. This is not necessary if the covariance method was computed in cdnet. |
data |
dataframe containing the explanatory variables. This is not necessary if the covariance method was computed in cdnet. |
... |
further arguments passed to or from other methods. |
x |
an object of class |
A list of the same objects in object
.