ComSHeL Seminar
10 events
-
SeminarPrediction of viral evolution and exploration of next-pandemic viruses
June 15 (Mon) 15:00 - 16:00, 2026
Jumpei Ito (Professor, Research Institute for Microbial Diseases, The University of Osaka)
One of the major challenges in controlling viral infectious diseases is that viruses continuously alter their properties through evolution. During the COVID-19 pandemic, for example, variants with enhanced immune escape and increased fitness emerged successively, thereby making epidemic control substantially more difficult. In this seminor, I will introduce our research on understanding and predicting viral evolution and epidemic dynamics by integrating protein language models, massive viral genome sequence data, and large-scale experimental datasets to model the relationships among viral genotypes, antigenicity, and fitness. Another major factor complicating the control of viral infectious diseases is the cross-species transmission of viruses harbored by wild animals to humans and livestock, leading to the emergence of novel infectious diseases. The COVID-19 pandemic, for instance, is thought to have originated from a coronavirus carried by horseshoe bats that subsequently spilled over into humans. To prepare for future pandemics, it is essential to comprehensively identify and systematically catalog viruses circulating in wildlife populations. In this seminar, I will also present our research on efficiently discovering novel viruses from massive public RNA-seq datasets by predicting viral infection based on host immune responses.
Venue: Seminar Room #359 (Main Venue) / via Zoom
Event Official Language: English
-
Seminar
Synthetic Data from Domain Knowledge: Pretraining Medical Deep Networks under Data Scarcity
May 18 (Mon) 14:00 - 15:00, 2026
Naoki Nonaka (Senior Research Scientist, Medical Science Deep Learning Team, Division of Applied Mathematical Science, RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS))
Training deep learning models typically requires large-scale data, yet in the medical domain such data are often difficult to obtain due to privacy constraints, the rarity of certain diseases, and the high cost of acquisition. In this talk, I present one approach to this challenge: pretraining with synthetic data generated from domain knowledge. As concrete examples, I introduce the synthesis of electrocardiograms (ECG) and phonocardiograms (PCG). For ECG, each waveform component (P, Q, R, S, and T) is modeled with Gaussian functions; for PCG, synthetic signals are generated by combining S1 and S2 heart sounds with modulated noise. I show that pretraining a model on such synthetic data and then fine-tuning on a small amount of real data substantially improves classification performance compared to training on real data alone, and that this improvement becomes more pronounced as the size of the real dataset decreases. I will also touch on extensions such as self-supervised learning with synthetic data and a comparison between knowledge-driven simulators and learned generative models, and discuss the broader potential of domain knowledge as a data source for medical applications where real data are limited.
Venue: Seminar Room #359 (Main Venue) / via Zoom
Event Official Language: English
-
Seminar
Challenges in virology & neurodegeneration: improving experimental procedures and theoretical insights
April 20 (Mon) 14:00 - 15:00, 2026
Catherine Beauchemin (Deputy Director, RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS))
After repeatedly finding errors in experimental data provided by collaborators, my group developed an online tool (midSIN, https://midsin.roadcake.org/) to improve estimating the concentration of infectious viruses in samples. This led to an unexpected new collaboration with researchers working to measure the concentration of aggregating fibrils in samples from patients suffering from neurodegenerative diseases such as Dementia with Lewy Body and Parkinson's. In the first part of my talk, I will introduce the basics of how infectious virions and aggregating fibril concentrations are measured experimentally, and discuss challenges in tackling these assays' limitations to improve their accuracy and sensitivity. In the second part of my talk, I will discuss the challenges we face in trying to identify the type and minimal number of experimental measurements required to predict the severity and transmission efficacy of diverse influenza viruses collected as part of pandemic surveillance efforts. I hope you will join the talk to learn of these challenges and consider contributing new ideas or approaches to overcome them.
Venue: Hybrid Format (4F #435-437 and Zoom), Main Research Building
Event Official Language: English
-
Seminar
Data-Driven Stratification and Prediction of Complex Diseases
March 24 (Tue) 14:00 - 15:15, 2026
Eiryo Kawakami (Team Director, Medical Science Data-driven Mathematics Team, Division of Applied Mathematical Science, RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS))
Many common diseases such as cancer, chronic heart failure, and diabetes exhibit substantial biological and clinical heterogeneity, which complicates diagnosis, risk assessment, and treatment decisions. In this talk, I introduce a data-driven framework for disease stratification and prediction using machine learning applied to multidimensional medical data. First, unsupervised machine learning methods are used to identify previously unrecognized disease subtypes based on clinical and biomarker data. These stratification approaches reveal hidden patient groups with distinct clinical characteristics and prognoses. To enable practical application in clinical datasets, we further develop supervised learning models that reproduce and generalize unsupervised clusters, allowing robust subtype estimation even in datasets with missing variables. Next, I present approaches for early disease detection using large-scale medical history data, focusing on combinations of comorbidities as early indicators of severe diseases. Finally, I discuss how large-scale deep learning models can be leveraged to predict disease prognosis from medical images and other high-dimensional data. These studies demonstrate how machine learning can redefine disease categories and enable earlier detection and more precise prediction in heterogeneous diseases.
Venue: Hybrid Format (3F #359 and Zoom), Main Research Building
Event Official Language: English
-
Seminar
5th ComSHeL Seminar
October 31 (Fri) 11:00 - 12:00, 2025
Motoko Kotani (Executive Director of Science, RIKEN)
Title: Discrete Geometric Analysis and its application to materials science Abstract: Discrete Geometric Analysis is a discrete version of Geometric Analysis. It is however not just its discretization but a development of methods to bridge discrete and continuum. I will explain those and share some applications to materials science with you.
Venue: Hybrid Format (3F #359 and Zoom), Seminar Room #359 (Main Venue) / via Zoom
Event Official Language: English
-
Seminar
Strategies for tuning unsupervised learning hyperparameters in the context of dimensionality reduction for multimodal omics data
September 18 (Thu) 14:00 - 15:00, 2025
Dorothy Ellis (Postdoctoral Researcher, Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences (IMS))
We are actively developing multi-omics winnowing in R (mowR), a non-negative matrix factorization (NMF)-based model that expands upon the functionality of joint graph-regularized single cell sparse non-negative matrix factorization (jrSiCKLSNMF) from Ellis et al. (2023). “Omics” data characterize the molecular components of a biological sample. Examples of omics modalities include transcriptomics (RNA), epigenomics (epigenetic modifications), metabolomics (metabolites), proteomics (proteins), and genomics (DNA). Multi-omics analysis involves the integration of two or more of these modalities, and omics data are often high-dimensional and sparse. Therefore, dimension reduction techniques are often required to extract interpretable information from these datasets. NMF, one such dimension reduction technique, finds a low-dimensional approximation of M omics features by N observations data matrix X via the product of an M × D loadings matrix W and D × N activations matrix H, where the number of latent factors D << min(M, N ). The jrSiCKLSNMF model extends the basic NMF model by fitting a shared H across v ∈ {1, ..., V } omics count modalities. It also incorporates ridge regularization on H, graph regularization on feature matrix Wv in modality v, and sum-to-one L2 norm constraints on the rows of H. We extend jrSiCKLSNMF to mowR by implementing mini-batch updates (Serizel et al., 2016), modality-specific loss functions (e.g. Poisson K-L divergence for count modalities and Frobenius norm for Gaussian modalities), modality-specific activation matrices Hv and weights ωv on H to allow constraints on Wv , loss weights, LASSO regularization on H, and L2 norm constraints on Wv . We also introduce a novel technique to tune hyperparameters for unsupervised data by combining the data thinning/count splitting techniques outlined in Neufeld et al. (2023, 2024) with Bayesian optimization as implemented in the R package ParBayesianOptimization from Wilson (2018). In this talk, we focus on mowR’s hyperparameter tuning strategy, highlighting its current limitations and strategies to overcome them.
Venue: via Zoom
Event Official Language: English
-
Seminar
A Discussion on Quantum Machine Learning for Medical Data
August 26 (Tue) 14:00 - 15:00, 2025
Satoru Sugimoto (Senior Research Scientist, Medical Science Data-driven Mathematics Team, Division of Applied Mathematical Science, RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS))
Our team is investigating the applicability of machine learning using quantum computers to medical data. In this talk, we will provide a brief overview of supervised machine learning for medical data as a topic for discussion.
Venue: via Zoom
Event Official Language: English
-
Seminar
ComSHeL collaboration planning
July 22 (Tue) 14:00 - 15:00, 2025
The objective of this 3rd monthly meeting of the ComSHeL Study Group is to discuss specific collaborations we could undertake across our own Teams/Divisions on projects of common interest to take advantage of our complementary skills and expertise. We also want to consider how ComSHeL could help respond to specific calls for focus in certain research areas within RIKEN and the broader funding landscape.
Venue: Hybrid Format (3F #359 and Zoom), Main Research Building
Event Official Language: English
-
Seminar
ComSHeL introductions meeting
June 24 (Tue) 14:00 - 15:30, 2025
Following our Launch Meeting on May 1st, in this second meeting of our study group we plan for each member of the ComSHeL Study Group and anyone who joins us that day to introduce their research briefly to get to know one another's focus and expertise. If you are interested in possibly collaborating with ComSHeL members and/or you would like to get to know some of the researchers who joined us as part of iTHEMS new Division of Applied Mathematical Science, please join us. I extended the duration to 90 min (from our usual 60 min) to make sure we have enough time to hear from everyone. Each attendee will have approximately 4 minutes to explain their past, current, or upcoming research and time will be kept strictly. Time might be adjusted on the day of the meeting based on the number of applicants. If you would like to show some slides (max 3 slides), please prepare them in advance and send them to cbeau@riken.jp in PDF format no later than June 20. But no one should feel they must prepare slides: it is fine to speak freely and informally about your work.
Venue: Hybrid Format (3F #359 and Zoom), Seminar Room #359
Event Official Language: English
-
Seminar
ComSHeL Launch Meeting
May 1 (Thu) 14:00 - 15:00, 2025
This is the very first meeting of the new Computationally-drive Solutions for Healthier Lives (ComSHeL) Study Group. The study group brings together members from iTHEMS' Fundamental Division together with the ECL Mathematical Genomics Unit and Teams from iTHEMS Applied Math Division (Medical Science Data-driven Math Team and Medical Science Deep Learning Team). The goal of this first meeting will be to discuss and decide on the format for this monthly study group, and to get to know each other (each member introducing their research briefly). I hope you can take the time to join us.
Venue: Hybrid Format (3F #359 and Zoom), Seminar Room #359
Event Official Language: English
10 events