Resource-efficient video analytics and streaming for immersive experiences

Wu, Jingrou

Resource-efficient video analytics and streaming for immersive experiences

Wu, Jingrou

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (12.68 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wu, Jingrou
dc.date.accessioned	2025-12-12T01:35:19Z
dc.date.available	2025-12-12T01:35:19Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/190919
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	The evolution of immersive media technologies, such as Extended Reality (XR) and volumetric video, has gained significant attention from academia and industry. Video analytics, powered by artificial intelligence, plays an essential role in immersive applications. It facilitates real-time object detection, tracking, and scene understanding, thereby enabling interactive experiences. Real-time volumetric video streaming emerges as a critical application in immersive media. It offers 6 Degrees of Freedom (DoF), allowing users to explore the video entirely by changing their positions (X, Y, Z) and viewport directions (yaw, pitch, roll). Despite their promise, immersive media applications demand extensive computation and bandwidth resources, often exceeding the capabilities of current infrastructures. This thesis addresses these challenges by proposing dynamic resource allocation strategies and efficient streaming schemes to provide high-quality, low-latency real-time analytics and video streaming for immersive media. The proposed solutions aim to optimize video analytics and volumetric video streaming within limited resources, ensuring a seamless and responsive user experience. First, the thesis introduces a dual-image Field Programmable Gate Arrays (FPGAs) solution in video analytics pipelines to address the limitations of immutable computing resources. This innovation allows for flexible resource allocation between CPUs and GPUs by switching between different images of FPGA. Novel algorithms for FPGA resource allocation and video analytics configuration selection are developed to optimize accuracy across both single and multiple-camera scenarios. Second, the thesis proposes a Progressive Layer-based Volumetric Video Streaming (PLVS) framework to overcome the challenge of limited bandwidth resources and network dynamics by prefetching future frames in advance when current bandwidth resources are sufficient. To further improve network utilization, volumetric videos are partitioned into several layers of varying quality, allowing for progressive refinement and efficient retrieval during streaming. The PLVS framework is applied to two practical scenarios where frames can be skipped or not. It enables adaptive quality refinement and more efficient video prefetching in diverse streaming environments. Third, the thesis develops a Hierarchical Reinforcement Learning-based (HRL) streaming scheme to balance computing and bandwidth resources in tile-based volumetric video streaming. Tiling and culling techniques are employed to reduce the transmission data size by eliminating invisible tiles, which bring extra computation overhead. The culling performance and compression efficiency are highly dependent on tile size, and the optimal tile size varies depending on the viewports of users and dynamic bandwidth conditions. The proposed streaming scheme jointly considers the tile size selection and quality allocation, maintaining a balance between computing and bandwidth resources.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/190919/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Jingrou Wu
dc.rights	au.edu.uts.lib/cph
dc.title	Resource-efficient video analytics and streaming for immersive experiences	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

The evolution of immersive media technologies, such as Extended Reality (XR) and volumetric video, has gained significant attention from academia and industry. Video analytics, powered by artificial intelligence, plays an essential role in immersive applications. It facilitates real-time object detection, tracking, and scene understanding, thereby enabling interactive experiences. Real-time volumetric video streaming emerges as a critical application in immersive media. It offers 6 Degrees of Freedom (DoF), allowing users to explore the video entirely by changing their positions (X, Y, Z) and viewport directions (yaw, pitch, roll). Despite their promise, immersive media applications demand extensive computation and bandwidth resources, often exceeding the capabilities of current infrastructures. This thesis addresses these challenges by proposing dynamic resource allocation strategies and efficient streaming schemes to provide high-quality, low-latency real-time analytics and video streaming for immersive media. The proposed solutions aim to optimize video analytics and volumetric video streaming within limited resources, ensuring a seamless and responsive user experience. First, the thesis introduces a dual-image Field Programmable Gate Arrays (FPGAs) solution in video analytics pipelines to address the limitations of immutable computing resources. This innovation allows for flexible resource allocation between CPUs and GPUs by switching between different images of FPGA. Novel algorithms for FPGA resource allocation and video analytics configuration selection are developed to optimize accuracy across both single and multiple-camera scenarios. Second, the thesis proposes a Progressive Layer-based Volumetric Video Streaming (PLVS) framework to overcome the challenge of limited bandwidth resources and network dynamics by prefetching future frames in advance when current bandwidth resources are sufficient. To further improve network utilization, volumetric videos are partitioned into several layers of varying quality, allowing for progressive refinement and efficient retrieval during streaming. The PLVS framework is applied to two practical scenarios where frames can be skipped or not. It enables adaptive quality refinement and more efficient video prefetching in diverse streaming environments. Third, the thesis develops a Hierarchical Reinforcement Learning-based (HRL) streaming scheme to balance computing and bandwidth resources in tile-based volumetric video streaming. Tiling and culling techniques are employed to reduce the transmission data size by eliminating invisible tiles, which bring extra computation overhead. The culling performance and compression efficiency are highly dependent on tile size, and the optimal tile size varies depending on the viewports of users and dynamic bandwidth conditions. The proposed streaming scheme jointly considers the tile size selection and quality allocation, maintaining a balance between computing and bandwidth resources.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/190919