Finding Sparse Features in Strongly Confounded Medical Binary Data


A typical task in statistical genetics is to find a sparse linear relation between genotypes with phenotypes, but often the data are confounded by age, ethnicity or population structure. We generalize the linear mixed model (LMM) Lasso approach for feature selection under confounding to the case of binary labels. This case is much more involved, as marginalization over the correlated noise leads to an intractable integral. We can overcome this problem with approximate inference techniques. We demonstrate on synthetic and real-world data that the sparse features that our method finds are less correlated with the top confounders.

NIPS 2015 Workshop on Machine Learning in Healthcare
Oral Presentation

More Publications