MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents

Publisher:
Association for Computational Linguistics (ACL)
Publication Type:
Conference Proceeding
Citation:
2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics Proceedings of the Conference Findings Naacl 2025, 2025, pp. 5830-5841
Issue Date:
2025-01-01
Full metadata record
Understanding temporal dynamics is critical for conversational agents, enabling effective content analysis and informed decision-making. However, time-aware datasets, particularly for persona-grounded conversations, are still limited, which narrows their scope and diminishes their complexity. To address this gap, we introduce MTPChat, a multimodal, time-aware persona dialogue dataset that integrates linguistic, visual, and temporal elements within dialogue and persona memory. Leveraging MTPChat, we propose two time-sensitive tasks: Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP), both designed to assess a model’s ability to understand implicit temporal cues and dynamic interactions. Additionally, we present an innovative framework featuring an adaptive temporal module to effectively integrate multimodal streams and capture temporal dependencies. Experimental results validate the challenges posed by MTPChat and demonstrate the effectiveness of our framework in multimodal time-sensitive scenarios.
Please use this identifier to cite or link to this item: