Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб BLIP2: BLIP with frozen image encoders and LLMs в хорошем качестве

BLIP2: BLIP with frozen image encoders and LLMs 7 месяцев назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru



BLIP2: BLIP with frozen image encoders and LLMs

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. BLIP-2 is a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pretrained in two stages. The first stage bootstraps vision-language representation learning from a frozen image encoder. The second stage bootstraps vision-to-language generative learning from a frozen language model. BLIP-2 achieves state-of-the-art performance on various visionlanguage tasks, despite having significantly fewer trainable parameters than existing methods. For example, BLIP-2 outperforms Flamingo80B by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters. BLIP-2 also has emerging capabilities of zero-shot image-to-text generation that can follow natural language instructions. In this video, I will talk about the following: What can the BLIP-2 model do? How is the BLIP-2 model pretrained? How does BLIP-2 model perform? For more details, please look at https://arxiv.org/pdf/2301.12597.pdf and https://github.com/salesforce/LAVIS/t... Li, Junnan, Dongxu Li, Silvio Savarese, and Steven Hoi. "Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models." arXiv preprint arXiv:2301.12597 (2023).

Comments