Siamese network based features fusion for adaptive visual tracking

Publication Type:
Conference Proceeding
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11012 LNAI pp. 759 - 771
Issue Date:
Filename Description Size
camera-ready-2018.pdfPublished version2.16 MB
Adobe PDF
Full metadata record
© Springer Nature Switzerland AG 2018. Visual object tracking is a popular but challenging problem in computer vision. The main challenge is the lack of priori knowledge of the tracking target, which may be only supervised of a bounding box given in the first frame. Besides, the tracking suffers from many influences as scale variations, deformations, partial occlusions and motion blur, etc. To solve such a challenging problem, a suitable tracking framework is demanded to adopt different tracking scenes. This paper presents a novel approach for robust visual object tracking by multiple features fusion in the Siamese Network. Hand-crafted appearance features and CNN features are combined to mutually compensate for their shortages and enhance the advantages. The proposed network is processed as follows. Firstly, different features are extracted from the tracking frames. Secondly, the extracted features are employed via Correlation Filter respectively to learn corresponding templates, which are used to generate response maps respectively. And finally, the multiple response maps are fused to get a better response map, which can help to locate the target location more accurately. Comprehensive experiments are conducted on three benchmarks: Temple-Color, OTB50 and UAV123. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance on these benchmarks.
Please use this identifier to cite or link to this item: