Topic: vision-language Goto Github
Some thing interesting about vision-language
Some thing interesting about vision-language
vision-language,Official implementation of SEED-LLaMA (ICLR 2024).
Organization: ailab-cvc
Home Page: https://ailab-cvc.github.io/seed
vision-language,多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
User: airaria
Home Page: https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca
vision-language,[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
User: algolzw
Home Page: https://algolzw.github.io/daclip-uir
vision-language,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Organization: alibabaresearch
vision-language,MixGen: A New Multi-Modal Data Augmentation
Organization: amazon-science
vision-language,[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Organization: astra-vision
Home Page: https://astra-vision.github.io/PODA/
vision-language,🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
User: chendelong1999
Home Page: https://arxiv.org/abs/2306.11029
vision-language,CLIPort: What and Where Pathways for Robotic Manipulation
User: cliport
Home Page: https://cliport.github.io
vision-language,PyTorch implementation of MCM (Delving into out-of-distribution detection with vision-language representations), NeurIPS 2022
Organization: deeplearning-wisc
vision-language,A Framework of Small-scale Large Multimodal Models
Organization: dlcv-buaa
Home Page: https://arxiv.org/abs/2402.14289
vision-language,NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
User: doc-doc
vision-language,Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts"
Organization: explainableml
vision-language,[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search".
User: fahadshamshad
Home Page: https://fahadshamshad.github.io/Clip2Protect/
vision-language,Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Organization: google-research
vision-language,[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
User: henghuiding
vision-language,PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
User: howard-hou
vision-language,[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
User: huanglizi
vision-language,Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Organization: idea-research
Home Page: https://arxiv.org/abs/2303.05499
vision-language,Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
User: ivonajdenkoska
vision-language,日本語LLMまとめ - Overview of Japanese LLMs
Organization: llm-jp
Home Page: https://llm-jp.github.io/awesome-japanese-llm
vision-language,This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
User: longzw1997
vision-language,Hierarchical Universal Language Conditioned Policies
User: lukashermann
Home Page: http://hulc.cs.uni-freiburg.de
vision-language,Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Organization: marqo-ai
Home Page: https://www.marqo.ai/
vision-language,[ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for Language-guided Navigation"
User: marsaki
vision-language,"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Organization: mbzuai-oryx
Home Page: https://mbzuai-oryx.github.io/Video-ChatGPT
vision-language,💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain. (CVPR2021)
User: mczhuge
vision-language,CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
User: mees
Home Page: http://calvin.cs.uni-freiburg.de
vision-language,Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
User: mertyg
vision-language,Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
User: mikewangwzhl
vision-language,Tools for movie and video research
Organization: movienet
Home Page: http://movienet.github.io
vision-language,Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
User: muzairkhattak
Home Page: https://muzairkhattak.github.io/ProText/
vision-language,Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Organization: ofa-sys
vision-language,Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Organization: ofa-sys
vision-language,A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Organization: ofa-sys
vision-language,DriveLM: Driving with Graph Visual Question Answering
Organization: opendrivelab
Home Page: https://opendrivelab.com/DriveLM/
vision-language,[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
User: qiantianwen
vision-language,PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Organization: salesforce
vision-language,[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "
User: sauradip
Home Page: https://sauradip.github.io/project_pages/STALE/
vision-language,A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Organization: shikras
Home Page: https://arxiv.org/abs/2307.12813
vision-language,[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
User: sunzey
Home Page: https://aleafy.github.io/alpha-clip
vision-language,Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
User: txh-mercury
Home Page: https://arxiv.org/abs/2305.18500
vision-language,[ICRA 2024 Oral] Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Organization: uark-aicv
Home Page: https://uark-aicv.github.io/OpenFusion/
vision-language,[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Organization: uark-aicv
Home Page: https://uark-aicv.github.io/VLTinT/
vision-language,VaLM: Visually-augmented Language Modeling. ICLR 2023.
User: victorwz
Home Page: https://openreview.net/forum?id=8IN-qLkl215
vision-language,[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Organization: wisconsinaivision
Home Page: https://vip-llava.github.io/
vision-language,Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
User: woodfrog
Home Page: https://vse-infty.github.io/
vision-language,Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
User: yangli18
vision-language,[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
User: zchoi
Home Page: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9882977
vision-language,[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
User: zchoi
Home Page: https://www.ijcai.org/proceedings/2022/0224.pdf
vision-language,Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.
User: zjr2000
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.