Abstract: Automatically creating description sentences for images is a task that involves aligning image under-standing with natural language processing. This paper presents a model for image ...
Abstract: State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained audio neural networks (PANNs) as encoders for feature extraction. However, the ...