Abstract: Automatically creating description sentences for images is a task that involves aligning image under-standing with natural language processing. This paper presents a model for image ...
Abstract: State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained audio neural networks (PANNs) as encoders for feature extraction. However, the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results