Comments (2)
Thank you for your question and you are referring to one of its highlights. To summarize, firstly, a language-binding approach is more suitable for the majority of language-based downstream tasks because no intermediate modality is required. Secondly, LanguageBind is pre-trained on the VIDAL-10M dataset, where the direct alignment of video, infrared, depth, and language data pairs outperforms the indirect integration of image-based modality data. Additionally, within VIDAL-10M, the language modality is a multi-view textual description enhanced by advanced models like ChatGPT. This ensures that the central modality possesses sufficient semantic information to bind effectively with other modal data.
from languagebind.
Got it, thanks for your patient reply!
from languagebind.
Related Issues (20)
- Use of undefined functions during fine_tune with custom audio data HOT 1
- Combination of multiple modalities HOT 3
- VIT-H model on other modality [Audio/Depth/Thermal] HOT 1
- The length of text that the text encoder can handle HOT 1
- Inquiry on Unimodal Fine-Tuning with Locked Image in LanguageBind
- Fine-tuneing LLM + LanguageBind? HOT 1
- confusion about VIDAL-10M video-text data
- Inconsistent running results of inference.py HOT 5
- where is LanguageBind_Image HOT 2
- gpu资源 HOT 1
- How to load pt model trained according to Training LanguageBind step? HOT 1
- 关于视频文本的训练问题
- Clarification questions about the framework HOT 4
- Non-reproducible MSRVTT results - I get R@1 accuracy less than 1% HOT 2
- NameError: name 'get_audio_anno' is not defined
- Any plans to use Long-CLIP to extend text input token limit?
- Pretraining on video dataset without lora.
- Are some of these models interchangeable?
- Video-Language Pre-training hours
- 关于数据集的一些问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from languagebind.