A novel tree pattern-based violence detection model using audio signals

Yildiz, AM; Barua, PD; Dogan, S; Baygin, M; Tuncer, T; Ooi, CP; Fujita, H; Rajendra Acharya, U

A novel tree pattern-based violence detection model using audio signals

Yildiz, AM Barua, PD Dogan, S Baygin, M Tuncer, T Ooi, CP Fujita, H Rajendra Acharya, U

Permalink

Publisher:: PERGAMON-ELSEVIER SCIENCE LTD
Publication Type:: Journal Article
Citation:: Expert Systems with Applications, 2023, 224
Issue Date:: 2023-08-15

Closed Access

	Filename	Description	Size
	1-s2.0-S095741742300533X-main.pdf	Published version	2.55 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yildiz, AM
dc.contributor.author	Barua, PD
dc.contributor.author	Dogan, S
dc.contributor.author	Baygin, M
dc.contributor.author	Tuncer, T
dc.contributor.author	Ooi, CP
dc.contributor.author	Fujita, H
dc.contributor.author	Rajendra Acharya, U
dc.date.accessioned	2024-03-19T07:37:22Z
dc.date.available	2024-03-19T07:37:22Z
dc.date.issued	2023-08-15
dc.identifier.citation	Expert Systems with Applications, 2023, 224
dc.identifier.issn	0957-4174
dc.identifier.issn	1873-6793
dc.identifier.uri	http://hdl.handle.net/10453/176926
dc.description.abstract	Physical violence detection using multimedia data is crucial for public safety and security. This is an important research area in information security and digital forensics. Research in video-based violence detection (VVD) has grown steadily in recent years with rapid increase in video surveillance systems worldwide. Verbal aggression detection technologies, on the other hand, are still limited due to the popularity of computer vision models. Thus, researchers have preferred to use computer vision models to detect violence using videos. We have presented a new automatic audio violence detection (AVD) model to fill this gap. Our AVD model is handcrafted and its details are as follows. This work collected a new audio dataset on verbal aggression from YouTube. A novel handcrafted model was proposed using multilevel feature extraction, feature selection, classification, and majority voting phases. A new local feature extraction function based on the binary tree was used to generate features from audio signals. We call this function tree pattern-23 (TreePat23), where 23 represents the number of wavelet bands/audio signals inputs. Wavelet bands were generated using tunable Q wavelet transform (TQWT) before being applied to our TreePat23 for feature extraction. The iterative neighborhood component analysis (INCA) and Chi2 were used to select the features. The selected features were classified using k nearest neighbors (kNN) and support vector machine (SVM) followed by iterative majority voting (IMV) method. The best-predicted vector was obtained by using a greedy algorithm. Finally, a new validation technique called leave one record out (LORO) cross-validation (CV) was used to validate the results. Our proposed TreePat23 model has attained classification accuracy of 89.68% and 89.75% with kNN and SVM, respectively. Our developed system has generated 14 results for each classifier and automatically selected the best result. Hence this model is a self-organized audio classification model which yielded over 89% classification accuracy for both classifiers using LORO CV strategy.
dc.language	English
dc.publisher	PERGAMON-ELSEVIER SCIENCE LTD
dc.relation.ispartof	Expert Systems with Applications
dc.relation.isbasedon	10.1016/j.eswa.2023.120031
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	01 Mathematical Sciences, 08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A novel tree pattern-based violence detection model using audio signals
dc.type	Journal Article
utslib.citation.volume	224
utslib.for	01 Mathematical Sciences
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Civil and Environmental Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2024-03-19T07:37:20Z
pubs.publication-status	Published
pubs.volume	224

Abstract:

Physical violence detection using multimedia data is crucial for public safety and security. This is an important research area in information security and digital forensics. Research in video-based violence detection (VVD) has grown steadily in recent years with rapid increase in video surveillance systems worldwide. Verbal aggression detection technologies, on the other hand, are still limited due to the popularity of computer vision models. Thus, researchers have preferred to use computer vision models to detect violence using videos. We have presented a new automatic audio violence detection (AVD) model to fill this gap. Our AVD model is handcrafted and its details are as follows. This work collected a new audio dataset on verbal aggression from YouTube. A novel handcrafted model was proposed using multilevel feature extraction, feature selection, classification, and majority voting phases. A new local feature extraction function based on the binary tree was used to generate features from audio signals. We call this function tree pattern-23 (TreePat23), where 23 represents the number of wavelet bands/audio signals inputs. Wavelet bands were generated using tunable Q wavelet transform (TQWT) before being applied to our TreePat23 for feature extraction. The iterative neighborhood component analysis (INCA) and Chi2 were used to select the features. The selected features were classified using k nearest neighbors (kNN) and support vector machine (SVM) followed by iterative majority voting (IMV) method. The best-predicted vector was obtained by using a greedy algorithm. Finally, a new validation technique called leave one record out (LORO) cross-validation (CV) was used to validate the results. Our proposed TreePat23 model has attained classification accuracy of 89.68% and 89.75% with kNN and SVM, respectively. Our developed system has generated 14 results for each classifier and automatically selected the best result. Hence this model is a self-organized audio classification model which yielded over 89% classification accuracy for both classifiers using LORO CV strategy.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176926