This paper aims at the transferability of the zero-shot captioning for out-of-domain images. As shown in this image, we demonstrate the susceptibility of pre-trained vision-language models and large ...