Strategies for tuning unsupervised learning hyperparameters in the context of dimensionality reduction for multimodal omics data

Seminar ComSHeL Seminar

Date: September 18 (Thu) 14:00 - 15:00, 2025 (JST)
Speaker: Dorothy Ellis (Postdoctoral Researcher, Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences (IMS))
Venue: via Zoom
Language: English
Host: Catherine Beauchemin

We are actively developing multi-omics winnowing in R (mowR), a non-negative matrix factorization (NMF)-based model that expands upon the functionality of joint graph-regularized single cell sparse non-negative matrix factorization (jrSiCKLSNMF) from Ellis et al. (2023). “Omics” data characterize the molecular components of a biological sample. Examples of omics modalities include transcriptomics (RNA), epigenomics (epigenetic modifications), metabolomics (metabolites), proteomics (proteins), and genomics (DNA). Multi-omics analysis involves the integration of two or more of these modalities, and omics data are often high-dimensional and sparse. Therefore, dimension reduction techniques are often required to extract interpretable information from these datasets.

NMF, one such dimension reduction technique, finds a low-dimensional approximation of M omics features by N observations data matrix X via the product of an M × D loadings matrix W and D × N activations matrix H, where the number of latent factors D << min(M, N ). The jrSiCKLSNMF model extends the basic NMF model by fitting a shared H across v ∈ {1, ..., V } omics count modalities. It also incorporates ridge regularization on H, graph regularization on feature matrix Wv in modality v, and sum-to-one L2 norm constraints on the rows of H. We extend jrSiCKLSNMF to mowR by implementing mini-batch updates (Serizel et al., 2016), modality-specific loss functions (e.g. Poisson K-L divergence for count modalities and Frobenius norm for Gaussian modalities), modality-specific activation matrices Hv and weights ωv on H to allow constraints on Wv , loss weights, LASSO regularization on H, and L2 norm constraints on Wv .

We also introduce a novel technique to tune hyperparameters for unsupervised data by combining the data thinning/count splitting techniques outlined in Neufeld et al. (2023, 2024) with Bayesian optimization as implemented in the R package ParBayesianOptimization from Wilson (2018). In this talk, we focus on mowR’s hyperparameter tuning strategy, highlighting its current limitations and strategies to overcome them.

This is a closed event for scientists. Non-scientists are not allowed to attend. If you are not a member or related person and would like to attend, please contact us using the inquiry form. Please note that the event organizer or speaker must authorize your request to attend.

Inquire about this event