June 3 at 10:00 - 11:00, 2021 (JST)
  • via Zoom

This talk is a summary of research that have done by me and my team during 2016~2019. I was a postdoc researcher in Aalto university/Helsinki university in Finland. In the team, a worldwide active collaboration has happened between many fields including statistical physics, biology, computer science and statistics.
The target is to analyze ultra-high dimensional large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, without phenotypic data. Interacting networks of resistance, virulence and core machinery genes are identified. Many different approaches have been invented and they can be generally applied to other datasets with similar mathematical setting. I will explain methods based on statistical model [1,2], mutual information [3], and theoretical performance analysis for statistical model [4]. In the end, I will briefly introduce a new phenomenon of random matrix which is discovered during the research process for statistical significance filtering [5].

*Please refer to the email to get access to the Zoom meeting room.


  1. Marcin J. Skwark, et al. “Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis”, PLOS Genetics,13(2): e1006508. February 2017
  2. Santeri Puranen, et al., “SuperDCA for genome-wide epistasis analysis”, Microbial Genomics, 2018 4
  3. Johan Pensar, et al., “Genome-wide epistasis and co-selection study using mutual information”, Nucleic acids research, Oxford University Press, 47, 18, e112-e112, Oct 10 2019
  4. Alia Abbara, et al., “Learning performance in inverse Ising problems with sparse teacher couplings”, Journal of Statistical Mechanics: Theory and Experiment, (2020) 073402, July 2020
  5. Yingying Xu, et al., “Inverse finite-size scaling for high-dimensional significance analysis”, Physical Review. E 97, 062112, June 2018

Related News