📢 System Requirements: Both the official Python inference code and the ComfyUI workflow were tested on Ubuntu 20.04 with Python 3.10, PyTorch 2.5.1, and CUDA 12.1 on an NVIDIA A800 GPU. Before ...
Abstract: Automatic Audio Captioning (AAC) aims at generating natural language descriptions for audio content. However, existing methods are often affected by latent confounders and spurious ...
We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving ...
According to Andrej Karpathy on X, he released a 243-line, dependency-free Python implementation that can both train and run a GPT model, presenting the full algorithmic content without external ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results