Exploring Clinical Knowledge to Enhance Deep Learning Models for Medical Report Generation

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Automatic generation of long and coherent medical reports regarding the given medical images (e.g. Chest X-ray and Fundus Fluorescein Angiography (FFA)) has great potential to support clinical practice. Researchers have explored advanced methods, especially deep learning, from computer vision and natural language processing for the generation of readable medical reports. However, when writing a report, experts make inferences with prior clinical knowledge. Not surprisingly, existing methods with insufficient medical knowledge find it hard to achieve comparable promising performances in generic image caption fields since even researchers without a medical background cannot understand those images thoroughly, either. Thus, this thesis mainly investigates how to explore clinical knowledge to enhance deep learning models for automatic report generation. The thesis first explores knowledge by mimicking radiologists' working patterns and utilizes such knowledge to guide an encoder-decoder framework to generate accurate reports. Since medical decisions may lead to life-or-death consequences, a reliable rationale for interpretation is also excepted, along with accurate prediction. However, existing medical report generation (MRG) benchmarks lack both explainable annotations and reliable evaluation tools, also hindering the current research advances. This thesis then proposes an explainable and reliable MRG benchmark based on FFA Images and Reports (FFA-IR). Based on the FFA-IR, the thesis extracts structural information from clinical recorded reports and explores such clinical knowledge to enhance a cross-modal Transformer for ophthalmic report generation along with corresponding disease diagnosis. In the last, to stimulate the potential of backbone networks, the thesis explores clinical knowledge to enhance the pretraining progress to improve the quality of predicted reports. To validate proposed approaches and components, extensive experiments are also conducted in various downstream tasks, such as disease classification, medical VQA and medical image-text retrieval.
Please use this identifier to cite or link to this item: