Data mining methods for conformational B-cell epitope prediction

Ren, Jing

Data mining methods for conformational B-cell epitope prediction

Ren, Jing

Permalink

Publication Type:: Thesis
Issue Date:: 2016

Closed Access

	Filename	Description	Size
	01front.pdf	contents and abstract	178.39 kB	Adobe PDF	View/Open
	02whole.pdf	thesis	7.21 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ren, Jing
dc.date.accessioned	2016-11-18T00:58:24Z
dc.date.available	2016-11-18T00:58:24Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/10453/62388
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description	NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. The hardcopy may be available for consultation at the UTS Library.
dc.description.abstract	NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. ----- Antigen-antibody binding is an essential immune mechanism. As the binding site on the antigen side, B-cell epitope plays a fundamental role in immune recognition, and provides an ideal target for vaccine development, disease diagnosis and immunotherapy. Of the B-cell epitopes, conformational B-cell epitopes make up more than 90%. However, the high complexity of wet-lab experiments is a primary obstacle in the identification of conformational B-cell epitopes. The current prediction methods, ignoring some vital issues, are typically of limited performance. To tackle the potential performance-affecting issues, this study designs and implements a series of data mining methods for conformational B-cell epitope prediction from both antigen structure and sequence. The study also proposes a practical propensity that can be used to recognise conserved B-cell epitopes effectively. A major contribution of this thesis is the construction of more accurate structure-based prediction models. A serious problem of existing structure-based approaches is that they conventionally build their prediction models on antigens isolated directly from antigen-antibody bound structures (i.e., quaternary structures), which contain unfair binding site information, such as shape and B-factor. It is recognised as one of the primary causes of the unsatisfactory performance. To deal with this issue, this study develops a new prediction method CeePre based on antigen unbound structures (i.e., tertiary structures). Additionally, this work applies the tertiary-structure derived B-factor, and shows its effectiveness through propensity analysis. Based on the principle of antigenic residue aggregation, a second-step learning is deployed to further refine the results. A second key issue that inhibits performance improvements is the incomplete annotation of the data sets. There can be multiple epitopes on one antigen; nevertheless, in most cases, only a portion of the epitopes have been determined or annotated. This situation is particularly obvious in the previous bound structure based methods. They conventionally label only one epitope for each antigen; all the unselected or undetermined epitopes are labelled as non-epitope, leading to bias in epitope prediction. A novel positive-unlabelled learning method is proposed to handle this issue, and it is applied in conformational B-cell epitope prediction. With manually labelled species, a species-specific analysis is performed on several propensities. This analysis comes to an important conclusion: similar trends between epitope and surface exist in different species, which implies that general predictors can work for all species; however, the details vary, and thus refinement by using species information may help to enhance prediction performance. Another primary contribution of this thesis is to build an accurate prediction model from antigen sequences. The purpose is to overcome the main drawback of structure-based methods: fewer antigen structures are available than antigen sequences. In addition, this approach focuses on a common problem of data heterogeneity; thus, a staged heterogeneity learning framework is proposed. It learns both characteristics and heterogeneity of data in a phased manner. The framework is applied to build a sequence-based conformational B-cell epitope prediction model, which achieves excellent performance for heterogeneous data sources. Furthermore, an algorithm is designed to cluster the predicted individual antigenic residues into conformational B-cell epitopes so as to provide a strong potential for real-world applications such as vaccine development. A conserved epitope is an epitope retained by multiple strains of a virus. It is the target of a broadly neutralising antibody. Identification of conserved epitopes can help to design broad-spectrum vaccines. This thesis proposes a very effective propensity Average Amino Acid Conservation Score (AAACS) to identify conserved epitopes; its effectiveness is validated on influenza HA (hemagglutinin) antigen. All the proposed prediction methods in this thesis have superior performance in comparison with the state-of-the-art approaches. They would contribute to the recognition and application of B-cell epitopes.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Data mining methods for conformational B-cell epitope prediction	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	closed_access

Abstract:

NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. ----- Antigen-antibody binding is an essential immune mechanism. As the binding site on the antigen side, B-cell epitope plays a fundamental role in immune recognition, and provides an ideal target for vaccine development, disease diagnosis and immunotherapy. Of the B-cell epitopes, conformational B-cell epitopes make up more than 90%. However, the high complexity of wet-lab experiments is a primary obstacle in the identification of conformational B-cell epitopes. The current prediction methods, ignoring some vital issues, are typically of limited performance. To tackle the potential performance-affecting issues, this study designs and implements a series of data mining methods for conformational B-cell epitope prediction from both antigen structure and sequence. The study also proposes a practical propensity that can be used to recognise conserved B-cell epitopes effectively. A major contribution of this thesis is the construction of more accurate structure-based prediction models. A serious problem of existing structure-based approaches is that they conventionally build their prediction models on antigens isolated directly from antigen-antibody bound structures (i.e., quaternary structures), which contain unfair binding site information, such as shape and B-factor. It is recognised as one of the primary causes of the unsatisfactory performance. To deal with this issue, this study develops a new prediction method CeePre based on antigen unbound structures (i.e., tertiary structures). Additionally, this work applies the tertiary-structure derived B-factor, and shows its effectiveness through propensity analysis. Based on the principle of antigenic residue aggregation, a second-step learning is deployed to further refine the results. A second key issue that inhibits performance improvements is the incomplete annotation of the data sets. There can be multiple epitopes on one antigen; nevertheless, in most cases, only a portion of the epitopes have been determined or annotated. This situation is particularly obvious in the previous bound structure based methods. They conventionally label only one epitope for each antigen; all the unselected or undetermined epitopes are labelled as non-epitope, leading to bias in epitope prediction. A novel positive-unlabelled learning method is proposed to handle this issue, and it is applied in conformational B-cell epitope prediction. With manually labelled species, a species-specific analysis is performed on several propensities. This analysis comes to an important conclusion: similar trends between epitope and surface exist in different species, which implies that general predictors can work for all species; however, the details vary, and thus refinement by using species information may help to enhance prediction performance. Another primary contribution of this thesis is to build an accurate prediction model from antigen sequences. The purpose is to overcome the main drawback of structure-based methods: fewer antigen structures are available than antigen sequences. In addition, this approach focuses on a common problem of data heterogeneity; thus, a staged heterogeneity learning framework is proposed. It learns both characteristics and heterogeneity of data in a phased manner. The framework is applied to build a sequence-based conformational B-cell epitope prediction model, which achieves excellent performance for heterogeneous data sources. Furthermore, an algorithm is designed to cluster the predicted individual antigenic residues into conformational B-cell epitopes so as to provide a strong potential for real-world applications such as vaccine development. A conserved epitope is an epitope retained by multiple strains of a virus. It is the target of a broadly neutralising antibody. Identification of conserved epitopes can help to design broad-spectrum vaccines. This thesis proposes a very effective propensity Average Amino Acid Conservation Score (AAACS) to identify conserved epitopes; its effectiveness is validated on influenza HA (hemagglutinin) antigen. All the proposed prediction methods in this thesis have superior performance in comparison with the state-of-the-art approaches. They would contribute to the recognition and application of B-cell epitopes.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/62388