Resource-efficient video analytics and streaming for immersive experiences
- Publication Type:
- Thesis
- Issue Date:
- 2025
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
The evolution of immersive media technologies, such as Extended Reality (XR) and volumetric video, has gained significant attention from academia and industry. Video analytics, powered by artificial intelligence, plays an essential role in immersive applications. It facilitates real-time object detection, tracking, and scene understanding, thereby enabling interactive experiences.
Real-time volumetric video streaming emerges as a critical application in immersive media.
It offers 6 Degrees of Freedom (DoF), allowing users to explore the video entirely by changing their positions (X, Y, Z) and viewport directions (yaw, pitch, roll).
Despite their promise, immersive media applications demand extensive computation and bandwidth resources, often exceeding the capabilities of current infrastructures. This thesis addresses these challenges by proposing dynamic resource allocation strategies and efficient streaming schemes to provide high-quality, low-latency real-time analytics and video streaming for immersive media. The proposed solutions aim to optimize video analytics and volumetric video streaming within limited resources, ensuring a seamless and responsive user experience.
First, the thesis introduces a dual-image Field Programmable Gate Arrays (FPGAs) solution in video analytics pipelines to address the limitations of immutable computing resources. This innovation allows for flexible resource allocation between CPUs and GPUs by switching between different images of FPGA. Novel algorithms for FPGA resource allocation and video analytics configuration selection are developed to optimize accuracy across both single and multiple-camera scenarios.
Second, the thesis proposes a Progressive Layer-based Volumetric Video Streaming (PLVS) framework to overcome the challenge of limited bandwidth resources and network dynamics by prefetching future frames in advance when current bandwidth resources are sufficient. To further improve network utilization, volumetric videos are partitioned into several layers of varying quality, allowing for progressive refinement and efficient retrieval during streaming. The PLVS framework is applied to two practical scenarios where frames can be skipped or not. It enables adaptive quality refinement and more efficient video prefetching in diverse streaming environments.
Third, the thesis develops a Hierarchical Reinforcement Learning-based (HRL) streaming scheme to balance computing and bandwidth resources in tile-based volumetric video streaming. Tiling and culling techniques are employed to reduce the transmission data size by eliminating invisible tiles, which bring extra computation overhead. The culling performance and compression efficiency are highly dependent on tile size, and the optimal tile size varies depending on the viewports of users and dynamic bandwidth conditions. The proposed streaming scheme jointly considers the tile size selection and quality allocation, maintaining a balance between computing and bandwidth resources.
Please use this identifier to cite or link to this item:
