Generate simulated matrix that is the superposition of a low-rank component and a sparse component.

generate.matrix(
  n,
  p,
  rank = NULL,
  support.size = NULL,
  beta = NULL,
  snr = Inf,
  sigma = NULL,
  seed = 1
)

Arguments

n

The number of observations.

p

The number of predictors of interest.

rank

The rank of low-rank matrix.

support.size

The number of nonzero coefficients in the underlying regression model. Can be omitted if beta is supplied.

beta

The coefficient values in the underlying regression model. If it is supplied, support.size would be omitted.

snr

A positive value controlling the signal-to-noise ratio (SNR). A larger SNR implies the identification of sparse matrix is much easier. Default snr = Inf enforces no noise exists.

sigma

A numerical value supplied the variance of the gaussian noise. Default sigma = NULL implies it is determined by snr.

seed

random seed. Default: seed = 1.

Value

A list object comprising:

x

An \(n\)-by-\(p\) matrix.

L

The latent low rank matrix.

S

The latent sparse matrix.

Details

The low rank matrix \(L\) is generated by \(L = UV\), where \(U\) is an \(n\)-by-\(rank\) matrix and \(V\) is a \(rank\)-by-\(p\) matrix. Each element in \(U\) (or \(V\)) are i.i.d. drawn from \(N(0, 1/n)\).

The sparse matrix \(S\) is an \(n\)-by-\(rank\) matrix. It is generated by choosing a support set of size support.size uniformly at random. The non-zero entries in \(S\) are independent Bernoulli (-1, +1) entries.

The noise matrix \(N\) is an \(n\)-by-\(rank\) matrix, the elements in \(N\) are i.i.d. gaussian random variable with standard deviation \(\sigma\).

The SNR is defined as as the variance of vectorized matrix \(L + S\) divided by \(\sigma^2\).

The matrix \(x\) is the superposition of \(L\), \(S\), \(N\): $$x = L + S + N.$$

Author

Jin Zhu

Examples

# Generate simulated data
n <- 30
p <- 20
dataset <- generate.matrix(n, p)
# \donttest{
stats::heatmap(as.matrix(dataset[["S"]]),
  Rowv = NA,
  Colv = NA,
  scale = "none",
  col = grDevices::cm.colors(256),
  frame.plot = TRUE,
  margins = c(2.4, 2.4)
)

# }