by Michael Seifert, Khalil Abou-El-Ardat, Betty Friedrich, Barbara Klink, and Andreas Deutsch
Changes in gene expression programs play a central role in the development of cancer. Deletions and duplications of chromosomal regions directly influence the expression levels of affected genes. Such local chromosomal dependencies lead to highly significant positive correlations of gene expression levels of neighboring genes. These dependencies should be utilized to improve the modeling and the analysis of individual tumor expression profiles.
We develop a novel model class of autoregressive higher-order Hidden Markov Models (HMMs) that carefully exploit local data-dependent chromosomal dependencies to improve the identification of differentially expressed genes in individual tumors. Autoregressive higher-order HMMs overcome generally existing limitations of standard first-order HMMs in the modeling of dependencies between genes in close chromosomal proximity by the usage of higher-order state-transitions in combination with autoregressive emissions as novel model features. We train autoregressive higher-order HMMs by a specifically developed Bayesian Baum-Welch algorithm that enables to integrate prior knowledge on the measurement distribution of genes. We apply autoregressive higher-order HMMs to the analysis of breast cancer and different types of brain tumor gene expression data and perform in-depth model evaluation studies. We find that autoregressive higher-order HMMs clearly improve the identification of overexpressed genes with underlying gene copy number duplications in breast cancer in comparison to mixture models, standard first- and higher-order HMMs, and other existing related methods. This performance benefit is attributed to the simultaneous usage of higher-order state-transitions in combination with autoregressive emissions and could not be reached by using each of these two features independently. We also find that autoregressive higher-order HMMs are better able to identify differentially expressed genes in tumor independent of the underlying gene copy number status in comparison to the majority of related methods. This is further supported by the identification of well-known and of previously unreported hotspots of differential expression in glioblastomas demonstrating the efficacy of autoregressive higher-order HMMs for the analysis of individual tumor expression profiles. Moreover, we reveal interesting novel details of systematic alterations of gene expression levels in known cancer signaling pathways distinguishing oligodendrogliomas, astrocytomas and glioblastomas.
The paper Autoregressive Higher-Order Hidden Markov Models: Exploiting Local Chromosomal Dependencies in the Analysis of Tumor Expression Profiles has been published in PLoS One.
Note: ARHMMs can also be applied to the analysis of log-ratio profiles of tumors analyzed by RNA-seq. I recently applied ARHMMs successfully to compare different glioma grades based on log-ratio profiles generated by Cuffdiff.
- ARHMM: A ZIP file containing a JAR file for analyzing tumor gene expression data sets by ARHMMs. This file also contains the considered breast cancer and the glioma gene expression data sets.
- DSHMM: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles
- SHMM: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data
- PHHMM: improved analysis of Array-CGH data
- MeDIP-HMM: HMM-based analysis of DNA methylation profiles
- HMM Book: Hidden Markov Models with Applications in Computational Biology