CMX Lunch Seminar
Controlling the false positive error in model selection is a prominent paradigm for gathering evidence in data-driven science. In model selection problems such as variable selection and graph estimation, models are characterized by an underlying Boolean structure such as presence or absence of a variable or an edge. Therefore, false positive error or false negative error can be conveniently specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, the increasing complexity of modern datasets has been accompanied by the use of sophisticated modeling paradigms in which defining false positive error is a significant challenge. For example, models specified by structures such as partitions (for clustering), permutations (for ranking), directed acyclic graphs (for causal inference), or subspaces (for principal components analysis) are not characterized by a simple Boolean logical structure, which leads to difficulties with formalizing and controlling false positive error. We present a generic approach to endow a collection of models with partial order structure, which leads to systematic approaches for defining natural generalizations of false positive error and methodology for controlling this error.
(Joint work with Armeen Taeb, Mateo Diaz, Peter Bühlmann, Parikshit Shah)