VILT
文章标题:ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 作者:Wonjae Kim, Bokyung Son, Ildoo Kim 发表时间:(ICML 2021) offical code 第一个摆脱了目 …
Tag
带有 多模态学习 标签的 3 篇文章。
文章标题:ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 作者:Wonjae Kim, Bokyung Son, Ildoo Kim 发表时间:(ICML 2021) offical code 第一个摆脱了目 …
文章标题:Learning Transferable Visual Models From Natural Language Supervision 作者:Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini …
文章标题:Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 作者:Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, …