Generate simulated data under the generalized linear model and Cox proportional hazard model.

```
generate.data(
n,
p,
support.size = NULL,
rho = 0,
family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial",
"gamma", "ordinal"),
beta = NULL,
cortype = 1,
snr = 10,
sigma = NULL,
weibull.shape = 1,
uniform.max = 1,
y.dim = 3,
class.num = 3,
seed = 1
)
```

- n
The number of observations.

- p
The number of predictors of interest.

- support.size
The number of nonzero coefficients in the underlying regression model. Can be omitted if

`beta`

is supplied.- rho
A parameter used to characterize the pairwise correlation in predictors. Default is

`0`

.- family
The distribution of the simulated response.

`"gaussian"`

for univariate quantitative response,`"binomial"`

for binary classification response,`"poisson"`

for counting response,`"cox"`

for left-censored response,`"mgaussian"`

for multivariate quantitative response,`"mgaussian"`

for multi-classification response,`"ordinal"`

for ordinal response.- beta
The coefficient values in the underlying regression model. If it is supplied,

`support.size`

would be omitted.- cortype
The correlation structure.

`cortype = 1`

denotes the independence structure, where the covariance matrix has \((i,j)\) entry equals \(I(i \neq j)\).`cortype = 2`

denotes the exponential structure, where the covariance matrix has \((i,j)\) entry equals \(rho^{|i-j|}\).`cortype = 3`

denotes the constant structure, where the non-diagonal entries of covariance matrix are \(rho\) and diagonal entries are 1.- snr
A numerical value controlling the signal-to-noise ratio (SNR). The SNR is defined as as the variance of \(x\beta\) divided by the variance of a gaussian noise: \(\frac{Var(x\beta)}{\sigma^2}\). The gaussian noise \(\epsilon\) is set with mean 0 and variance. The noise is added to the linear predictor \(\eta\) = \(x\beta\). Default is

`snr = 10`

. Note that this arguments's effect is overridden if`sigma`

is supplied with a non-null value.- sigma
The variance of the gaussian noise. Default

`sigma = NULL`

implies it is determined by`snr`

.- weibull.shape
The shape parameter of the Weibull distribution. It works only when

`family = "cox"`

. Default:`weibull.shape = 1`

.- uniform.max
A parameter controlling censored rate. A large value implies a small censored rate; otherwise, a large censored rate. It works only when

`family = "cox"`

. Default is`uniform.max = 1`

.- y.dim
Response's Dimension. It works only when

`family = "mgaussian"`

. Default:`y.dim = 3`

.- class.num
The number of class. It works only when

`family = "multinomial"`

. Default:`class.num = 3`

.- seed
random seed. Default:

`seed = 1`

.

A `list`

object comprising:

- x
Design matrix of predictors.

- y
Response variable.

- beta
The coefficients used in the underlying regression model.

For `family = "gaussian"`

, the data model is
$$Y = X \beta + \epsilon.$$
The underlying regression coefficient \(\beta\) has
uniform distribution [m, 100m] and \(m=5 \sqrt{2log(p)/n}.\)

For `family= "binomial"`

, the data model is $$Prob(Y = 1) = \exp(X
\beta + \epsilon)/(1 + \exp(X \beta + \epsilon)).$$
The underlying regression coefficient \(\beta\) has
uniform distribution [2m, 10m] and \(m = 5 \sqrt{2log(p)/n}.\)

For `family = "poisson"`

, the data is modeled to have
an exponential distribution:
$$Y = Exp(\exp(X \beta + \epsilon)).$$
The underlying regression coefficient \(\beta\) has
uniform distribution [2m, 10m] and \(m = \sqrt{2log(p)/n}/3.\)

For `family = "gamma"`

, the data is modeled to have
a gamma distribution:
$$Y = Gamma(X \beta + \epsilon + 10, shape),$$
where \(shape\) is shape parameter in a gamma distribution.
The underlying regression coefficient \(\beta\) has
uniform distribution [2m, 100m] and \(m = \sqrt{2log(p)/n}.\)

For `family = "ordinal"`

, the data is modeled to have
an ordinal distribution.

For `family = "cox"`

, the model for failure time \(T\) is
$$T = (-\log(U / \exp(X \beta)))^{1/weibull.shape},$$
where \(U\) is a uniform random variable with range [0, 1].
The centering time \(C\) is generated from
uniform distribution \([0, uniform.max]\),
then we define the censor status as
\(\delta = I(T \le C)\) and observed time as \(R = \min\{T, C\}\).
The underlying regression coefficient \(\beta\) has
uniform distribution [2m, 10m],
where \(m = 5 \sqrt{2log(p)/n}\).

For `family = "mgaussian"`

, the data model is
$$Y = X \beta + E.$$
The non-zero values of regression matrix \(\beta\) are sampled from
uniform distribution [m, 100m] and \(m=5 \sqrt{2log(p)/n}.\)

For `family= "multinomial"`

, the data model is $$Prob(Y = 1) = \exp(X \beta + E)/(1 + \exp(X \beta + E)).$$
The non-zero values of regression coefficient \(\beta\) has
uniform distribution [2m, 10m] and \(m = 5 \sqrt{2log(p)/n}.\)

In the above models, \(\epsilon \sim N(0, \sigma^2 )\) and \(E \sim MVN(0, \sigma^2 \times I_{q \times q})\),
where \(\sigma^2\) is determined by the `snr`

and q is `y.dim`

.

```
# Generate simulated data
n <- 200
p <- 20
support.size <- 5
dataset <- generate.data(n, p, support.size)
str(dataset)
#> List of 3
#> $ x : num [1:200, 1:20] -0.626 0.184 -0.836 1.595 0.33 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:20] "x1" "x2" "x3" "x4" ...
#> $ y : num [1:200, 1] -170.92 68.42 2.05 -7.15 96.92 ...
#> $ beta: num [1:20] 0 0 0 0 0 ...
```