Comments (1)
Yes, it makes a lot of sense to do some generative tasks! But we are currently focusing only on discriminative tasks like classification, retrieval. We don't plan video caption generation tasks at the moment because the key to that task is the data, and we've already proven the validity of our data.
There are many sources for our dataset. As stated in the paper, we can use different data sources for different downstream tasks. Perhaps the generated data can be used to generate a more semantically correct caption, as some recent work such as BLIP has observed that the generated caption favors the model. By the way, in our internal experiments, mixing multiple generated data sources returned better results.
Also, thanks to the fact that multiple modal data are aligned to the language, it might be a good direction to try multiple modal interactions!
from languagebind.
Related Issues (20)
- Use of undefined functions during fine_tune with custom audio data HOT 1
- Combination of multiple modalities HOT 3
- VIT-H model on other modality [Audio/Depth/Thermal] HOT 1
- The length of text that the text encoder can handle HOT 1
- Inquiry on Unimodal Fine-Tuning with Locked Image in LanguageBind
- Fine-tuneing LLM + LanguageBind? HOT 1
- confusion about VIDAL-10M video-text data
- Inconsistent running results of inference.py HOT 5
- where is LanguageBind_Image HOT 2
- gpu资源 HOT 1
- How to load pt model trained according to Training LanguageBind step? HOT 1
- 关于视频文本的训练问题
- Clarification questions about the framework HOT 4
- Non-reproducible MSRVTT results - I get R@1 accuracy less than 1% HOT 2
- NameError: name 'get_audio_anno' is not defined
- Any plans to use Long-CLIP to extend text input token limit?
- Pretraining on video dataset without lora.
- Are some of these models interchangeable?
- Video-Language Pre-training hours
- 关于数据集的一些问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from languagebind.