hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images
- Publisher:
- MDPI
- Publication Type:
- Journal Article
- Citation:
- Cancers, 2023, 15, (9), pp. 2569
- Issue Date:
- 2023-04-30
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Gene expression can be used to subtype breast cancer with improved prediction of risk of recurrence and treatment responsiveness over that obtained using routine immunohistochemistry IHC However in the clinic molecular profiling is primarily used for ER breast cancer which is costly tissue destructive requires specialised platforms and takes several weeks to obtain a result Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost effectively We propose a new computationally efficient approach called hist2RNA inspired by bulk RNA sequencing techniques to predict the expression of 138 genes incorporated from 6 commercially available molecular profiling tests including luminal PAM50 subtype from hematoxylin and eosin H amp E stained whole slide images WSIs The training phase involves the aggregation of extracted features for each patient from a pretrained model to predict gene expression at the patient level using annotated H amp E images from The Cancer Genome Atlas TCGA n 335 We demonstrate successful gene prediction on a held out test set n 160 corr 0 82 across patients corr 0 29 across genes and perform exploratory analysis on an external tissue microarray TMA dataset n 498 with known IHC and survival information Our model is able to predict gene expression and luminal PAM50 subtype Luminal A versus Luminal B on the TMA dataset with prognostic significance for overall survival in univariate analysis c index 0 56 hazard ratio 2 16 95 CI 1 12 3 06 i p i lt 5 10 sup 3 sup and independent significance in multivariate analysis incorporating standard clinicopathological variables c index 0 65 hazard ratio 1 87 95 CI 1 30 2 68 i p i lt 5 10 sup 3 sup The proposed strategy achieves superior performance while requiring less training time resulting in less energy consumption and computational cost compared to patch based models Additionally hist2RNA predicts gene expression that
Please use this identifier to cite or link to this item: