Abstract: Text-to-video (T2V) generative models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous ...
Semantic Entity Alignment and Non-Corresponding Reasoning for Text-to-Image Person Re-identification
Abstract: With the rapid development of intelligent surveillance technology, the massive amount of multimodal data (e.g., videos, images, and text) has imposed higher demands on efficient information ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results