vignettes/v10-algorithm.Rmd
v10-algorithm.Rmd
The ABESS algorithm employing โsplicingโ technique can exactly solve
general best subset problem in a polynomial time. The aim of this page
to provide a complete and coherent documentation for ABESS algorithm
such that users can easily understand the ABESS algorithm and its
variants, thereby facilitating the usage of abess
software.
We provide the details of splicing approach to linear model as follows,
and its variants can be applied in similar ways.
Consider the constraint minimization problem, where ${n}()=|y-X |{2}^{2}. $ Without loss of generality, we consider . Given any initial set with cardinality , denote and compute We call and as the active set and the inactive set, respectively.
Given the active set and , we can define the following two types of sacrifices:
Unfortunately, it is noteworthy that these two sacrifices are incomparable because they have different sizes of support set. However, if we exchange some โirrelevantโ variables in and some โimportantโ variables in , it may result in a higher-quality solution. This intuition motivates our splicing method. Specifically, given any splicing size , define
to represent least relevant variables in and to represent most relevant variables in Then, we splice and by exchanging and and obtain a new active set Let , and be a threshold. If , then is preferable to The active set can be updated iteratively until the loss function cannot be improved by splicing. Once the algorithm recovers the true active set, we may splice some irrelevant variables, and then the loss function may decrease slightly. The threshold can reduce this unnecessary calculation. Typically, is relatively small, e.g.ย
In practice, the support size is usually unknown. We use a data-driven procedure to determine s. For any active set , define an as follows: where . To identify the true model, the model complexity penalty is and the slow diverging rate is set to prevent underfitting. Theorem 4 states that the following ABESS algorithm selects the true support size via SIC.
Let be the maximum support size. We suggest as the maximum possible recovery size. Typically, we set where denotes the integer part of .
abess()
function
Notations in algorithm | Parameters in | Definitions |
---|---|---|
max.splicing.iter | The maximum number of splicing algorithm | |
c.max | The maximum splicing size | |
support.size | A sequence representing the support sizes | |
tune.type | The type of criterion for choosing the support size | |
group.index | A vector indicator the group that each variable belongs to | |
lambda | A value for regularized best subset selection | |
Golden-section searching | gs.range | Upper and lower bounds of the search range |