Novel overlapping subgraph clustering for the detection of antigen epitopes

Publication Type:
Journal Article
Citation:
Bioinformatics, 2018, 34 (12), pp. 2061 - 2068
Issue Date:
2018-06-15
Full metadata record
© The Author(s) 2018. Motivation Antigens that contain overlapping epitopes have been occasionally reported. As current algorithms mainly take a one-antigen-one-epitope approach to the prediction of epitopes, they are not capable of detecting these multiple and overlapping epitopes accurately, or even those multiple and separated epitopes existing in some other antigens. Results We introduce a novel subgraph clustering algorithm for more accurate detection of epitopes. This algorithm takes graph partitions as seeds, and expands the seeds to merge overlapping subgraphs based on the term frequency-inverse document frequency (TF-IDF) featured similarity. Then, the merged subgraphs are each classified as an epitope or non-epitope. Tests of our algorithm were conducted on three newly collected datasets of antigens. In the first dataset, each antigen contains only a single epitope; in the second, each antigen contains only multiple and separated epitopes; and in the third, each antigen contains overlapping epitopes. The prediction performance of our algorithm is significantly better than the state-of-art methods. The lifts of the averaged f-scores on top of the best existing methods are 60, 75 and 22% for the single epitope detection, the multiple and separated epitopes detection, and the overlapping epitopes detection, respectively. Availability and implementation The source code is available at github.com/lzhlab/glep/. Supplementary informationSupplementary dataare available at Bioinformatics online.
Please use this identifier to cite or link to this item: