Strategies for tuning unsupervised learning hyperparameters in the context of dimensionality reduction for multimodal omics data
- Date
- September 18 (Thu) 14:00 - 15:00, 2025 (JST)
- Speaker
-
- Dorothy Ellis (Postdoctoral Researcher, Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences (IMS))
- Venue
- via Zoom
- Language
- English
- Host
- Catherine Beauchemin
We are actively developing multi-omics winnowing in R (mowR), a non-negative matrix factorization (NMF)-based model that expands upon the functionality of joint graph-regularized single cell sparse non-negative matrix factorization (jrSiCKLSNMF) from Ellis et al. (2023). “Omics” data characterize the molecular components of a biological sample. Examples of omics modalities include transcriptomics (RNA), epigenomics (epigenetic modifications), metabolomics (metabolites), proteomics (proteins), and genomics (DNA). Multi-omics analysis involves the integration of two or more of these modalities, and omics data are often high-dimensional and sparse. Therefore, dimension reduction techniques are often required to extract interpretable information from these datasets.
NMF, one such dimension reduction technique, finds a low-dimensional approximation of M omics features by N observations data matrix X via the product of an M × D loadings matrix W and D × N activations matrix H, where the number of latent factors D << min(M, N ). The jrSiCKLSNMF model extends the basic NMF model by fitting a shared H across v ∈ {1, ..., V } omics count modalities. It also incorporates ridge regularization on H, graph regularization on feature matrix Wv in modality v, and sum-to-one L2 norm constraints on the rows of H. We extend jrSiCKLSNMF to mowR by implementing mini-batch updates (Serizel et al., 2016), modality-specific loss functions (e.g. Poisson K-L divergence for count modalities and Frobenius norm for Gaussian modalities), modality-specific activation matrices Hv and weights ωv on H to allow constraints on Wv , loss weights, LASSO regularization on H, and L2 norm constraints on Wv .
We also introduce a novel technique to tune hyperparameters for unsupervised data by combining the data thinning/count splitting techniques outlined in Neufeld et al. (2023, 2024) with Bayesian optimization as implemented in the R package ParBayesianOptimization from Wilson (2018). In this talk, we focus on mowR’s hyperparameter tuning strategy, highlighting its current limitations and strategies to overcome them.
This is a closed event for scientists. Non-scientists are not allowed to attend. If you are not a member or related person and would like to attend, please contact us using the inquiry form. Please note that the event organizer or speaker must authorize your request to attend.