Topic: multi-modal Goto Github

Some thing interesting about multi-modal

👇 Here are 267 public repositories matching this topic...

boschresearch / oasis

multi-modal,Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

Organization: boschresearch

semantic-image-synthesis gan image-to-image-translation computer-vision multi-modal generative-adversarial-networks deep-learning pytorch image-generation label-to-image-translation

bytedance / salmonn

multi-modal,SALMONN: Speech Audio Language Music Open Neural Network

Organization: bytedance

Home Page: https://bytedance.github.io/SALMONN/

audio audio-processing large-language-models multi-modal speech speech-recognition bytedance tsinghua-university music iclr2024

dirtyharrylyl / transformer-in-vision

multi-modal,Recent Transformer-based CV and related works.

User: dirtyharrylyl

transformer vision-transformers computer-vision self-attention multi-modal visual-language deep-learning paper

docarray / docarray

multi-modal,Represent, send, store and search multimodal data

Organization: docarray

Home Page: https://docs.docarray.org/

cross-modal data-structures dataclass deep-learning docarray elasticsearch fastapi machine-learning multi-modal multimodal nearest-neighbor-search nested-data neural-search protobuf pydantic pytorch qdrant semantic-search weaviate

dvlab-research / lisa

multi-modal,Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Organization: dvlab-research

large-language-model llm multi-modal segmentation

endlesssora / tsit

multi-modal,[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation

User: endlesssora

generative-adversarial-network gan image-to-image-translation image-generation image-manipulation two-stream-networks versatile feature-transformation multi-scale style-transfer

haiyang-w / unitr

multi-modal,[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"

User: haiyang-w

Home Page: https://arxiv.org/abs/2308.07732

3d 3d-object-detection backbone camera iccv2023 multi-modal point-cloud transformer unified 3d-segmentation

iflytek / vle

multi-modal,VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Organization: iflytek

multi-modal vle cv language nlp vision llm

intellabs / fastrag

multi-modal,Efficient Retrieval Augmentation and Generation Framework

Organization: intellabs

nlp benchmark colbert information-retrieval semantic-search sentence-transformers summarization transformers diffusion knowledge-graph

jokieleung / awesome-visual-question-answering

multi-modal,A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

User: jokieleung

awesome-list vqa attention-networks multi-modal multi-modal-learning

juliarobotics / caesar.jl

multi-modal,Robust robotic localization and mapping, together with NavAbility(TM). Reach out to [email protected] for help.

Organization: juliarobotics

Home Page: https://www.wherewhen.ai

multi-modal parametric-navigation-solutions caesar slam isam robotics julia database non-parametric

junchen14 / multi-modal-transformer

multi-modal,The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

User: junchen14

image-transformer language efficiency-transformer vision-transformer video-transformer video-language mlp-mixer transformer-readling-list multi-modal multi-modal-cvpr2021

kav-k / gptdiscord

multi-modal,A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

User: kav-k

artificial-intelligence asyncio gpt3 help-wanted openai openai-api python dalle2 embeddings extractive-question-answering

kyegomez / rt-2

multi-modal,Democratization of RT-2 "RT-2: New model translates vision and language into action"

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence attention-mechanism embodied-agent gpt4 multi-modal robotics transformer

kyegomez / zeta

multi-modal,Build high-performance AI models with modular building blocks

User: kyegomez

Home Page: https://zeta.apac.ai

artificial-intelligence multi-modal transformers deep-learning gpt4 llama2 multi-agent-systems multi-modal-learning multi-platform pytorch speech-recognition transformer longnet

liuyang-ict / awesome-visual-transformers

multi-modal,[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.

User: liuyang-ict

classification detection multi-modal multi-sensor-fusion point-cloud segmentation self-supervision transformer

lucidrains / dalle-pytorch

multi-modal,Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

User: lucidrains

artificial-intelligence deep-learning attention-mechanism text-to-image transformers multi-modal

marqo-ai / marqo

multi-modal,Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Organization: marqo-ai

Home Page: https://www.marqo.ai/

deep-learning information-retrieval machinelearning vector-search tensor-search clip multi-modal search-engine transformers vision-language

medmnist / medmnist

multi-modal,[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification

Organization: medmnist

Home Page: https://medmnist.com/

dataset benchmark automl mnist medical medical-image-analysis medmnist multi-modal decathlon medical-imaging

microsoft / farmvibes-ai

multi-modal,FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Organization: microsoft

Home Page: https://microsoft.github.io/farmvibes-ai/

agriculture ai geospatial geospatial-analytics stac sustainability multi-modal remote-sensing weather

modelscope / agentscope

multi-modal,Start building LLM-empowered multi-agent applications in an easier way.

Organization: modelscope

Home Page: https://modelscope.github.io/agentscope/

agent chatbot gpt-4 large-language-models llm llm-agent multi-agent distributed-agents multi-modal llama3

modelscope / data-juicer

multi-modal,A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Organization: modelscope

data-analysis data-science dataset large-language-models llm nlp chinese data-visualization opendata gpt

modelscope / modelscope

multi-modal,ModelScope: bring the notion of Model-as-a-Service to life.

Organization: modelscope

Home Page: https://www.modelscope.cn/

nlp cv speech multi-modal science deep-learning machine-learning python

ofa-sys / chinese-clip

multi-modal,Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Organization: ofa-sys

chinese computer-vision multi-modal-learning nlp pytorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language

open-compass / vlmevalkit

multi-modal,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Organization: open-compass

Home Page: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

gpt-4v large-language-models llava multi-modal openai vqa llm openai-api qwen gpt computer-vision pytorch gpt4 chatgpt clip vit evaluation claude gemini

open3da / ll3da

multi-modal,[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Organization: open3da

Home Page: https://ll3da.github.io/

3d 3d-models gpt instruction-tuning language-model llm multi-modal 3d-to-text scene-understanding cvpr2024

openbmb / minicpm-v

multi-modal,MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Organization: openbmb

minicpm minicpm-v multi-modal

opengvlab / internvl

multi-modal,[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Organization: opengvlab

Home Page: https://internvl.github.io/

image-classification image-text-retrieval llm mme semantic-segmentation video-classification vision-language-model vit-22b vit-6b multi-modal

openmotionlab / motiongpt

multi-modal,[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

Organization: openmotionlab

Home Page: https://motion-gpt.github.io

3d-generation chatgpt gpt language-model motion motion-generation text-driven text-to-motion motiongpt multi-modal

patrickjohncyh / fashion-clip

multi-modal,FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.

User: patrickjohncyh

nlp clip ecommerce nlp-machine-learning fashion multi-modal transformer

pku-yuangroup / languagebind

multi-modal,【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Organization: pku-yuangroup

Home Page: https://arxiv.org/abs/2310.01852

language-central multi-modal pretraining zero-shot

pku-yuangroup / moe-llava

multi-modal,Mixture-of-Experts for Large Vision-Language Models

Organization: pku-yuangroup

Home Page: https://arxiv.org/abs/2401.15947

large-vision-language-model mixture-of-experts moe multi-modal

pku-yuangroup / video-llava

multi-modal,Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Organization: pku-yuangroup

Home Page: https://arxiv.org/pdf/2311.10122.pdf

instruction-tuning large-vision-language-model multi-modal

qin2dim / hcaptcha-challenger

multi-modal,🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

User: qin2dim

Home Page: https://docs.captchax.top/

yolov5 hcaptcha opencv-python onnx-models hcaptcha-solver solver onnx yolo onnxruntime playwright

salesforce / unicontrol

multi-modal,Unified Controllable Visual Generation Model

Organization: salesforce

Home Page: https://canqin001.github.io/UniControl-Page/

aigc generation multi-modal

scisharp / llamasharp

multi-modal,A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Organization: scisharp

Home Page: https://scisharp.github.io/LLamaSharp

chatbot gpt llama llamacpp llm semantic-kernel llava multi-modal llama2 llama3

tangxyw / recsyspapers

multi-modal,推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

User: tangxyw

Home Page: https://tangxyw.github.io/

calibration causal-inference cold-start contrastive-learning debias distillation diverse fairness match papers

tebmer / awesome-knowledge-distillation-of-llms

multi-modal,This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

User: tebmer

data-augmentation instruction-following kd knowledge-distillation large-language-model llm self-training survey compression data-synthesis

thudm / cogvlm

multi-modal,a state-of-the-art-level open visual language model | 多模态预训练模型

Organization: thudm

cross-modality language-model multi-modal pretrained-models visual-language-models

thudm / cogvlm2

multi-modal,GPT4V-level open-source multi-modal model based on Llama3-8B

Organization: thudm

cogvlm language-model multi-modal pretrained-models

thudm / visualglm-6b

multi-modal,Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Organization: thudm

chatglm-6b gpt multi-modal

v-iashin / specvqgan

multi-modal,Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

User: v-iashin

Home Page: https://v-iashin.github.io/SpecVQGAN

transformer vqvae gan pytorch audio-generation video-features melgan multi-modal video-understanding vggsound vas bmvc evaluation-metrics audio video

valhalla / valhalla

multi-modal,Open Source Routing Engine for OpenStreetMap

Organization: valhalla

Home Page: https://valhalla.github.io/valhalla/

openstreetmap dijkstra astar tiled directions isochrones multi-modal traveling-salesman routing-engine routing

vercel / modelfusion

multi-modal,The TypeScript library for building AI applications.

Organization: vercel

Home Page: https://modelfusion.dev

chatbot gpt-3 javascript js llm openai ts typescript whisper ai

wangsuzhen / audio2head

multi-modal,code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021

User: wangsuzhen

talking-head multi-modal talking-face paper ijcai2021 codes

wangxiao5791509 / multimodal_bigmodels_survey

multi-modal,[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

User: wangxiao5791509

audio depth event-camera multi-modal natural-language pengchenglab point-cloud pre-training radar review

wisconsinaivision / vip-llava

multi-modal,[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Organization: wisconsinaivision

Home Page: https://vip-llava.github.io/

chatbot clip foundation-models gpt-4 gpt-4-vision llama llama2 llava multi-modal vision-language visual-prompting cvpr2024

wjun0830 / qd-detr

multi-modal,Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

User: wjun0830

Home Page: https://arxiv.org/abs/2303.13874

computer-vision moment-retrieval multi-modal video-highlight-detection video-retrieval video-summarization text-video-retrieval deep-learning detection-transformer

zjukg / mygo

multi-modal,[Paper][Preprint 2024] MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

Organization: zjukg

Home Page: https://arxiv.org/abs/2404.09468

contrastive-learning knowledge-graph-completion multi-modal multi-modal-fusion multi-modal-knowledge-graph mygo tokenization

zjunlp / deepke

multi-modal,[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Organization: zjunlp

Home Page: http://deepke.zjukg.cn/

knowledge-graph relation-extraction chinese named-entity-recognition attribute-extraction low-resource document-level information-extraction pytorch deepke

Topic: multi-modal Goto Github

👇 Here are 267 public repositories matching this topic...

boschresearch / oasis

bytedance / salmonn

dirtyharrylyl / transformer-in-vision

docarray / docarray

dvlab-research / lisa

endlesssora / tsit

haiyang-w / unitr

iflytek / vle

intellabs / fastrag

jokieleung / awesome-visual-question-answering

juliarobotics / caesar.jl

junchen14 / multi-modal-transformer

kav-k / gptdiscord

kyegomez / rt-2

kyegomez / zeta

liuyang-ict / awesome-visual-transformers

lucidrains / dalle-pytorch

marqo-ai / marqo

medmnist / medmnist

microsoft / farmvibes-ai

modelscope / agentscope

modelscope / data-juicer

modelscope / modelscope

ofa-sys / chinese-clip

open-compass / vlmevalkit

open3da / ll3da

openbmb / minicpm-v

opengvlab / internvl

openmotionlab / motiongpt

patrickjohncyh / fashion-clip

pku-yuangroup / languagebind

pku-yuangroup / moe-llava

pku-yuangroup / video-llava

qin2dim / hcaptcha-challenger

salesforce / unicontrol

scisharp / llamasharp

tangxyw / recsyspapers

tebmer / awesome-knowledge-distillation-of-llms

thudm / cogvlm

thudm / cogvlm2

thudm / visualglm-6b

v-iashin / specvqgan

valhalla / valhalla

vercel / modelfusion

wangsuzhen / audio2head

wangxiao5791509 / multimodal_bigmodels_survey

wisconsinaivision / vip-llava

wjun0830 / qd-detr

zjukg / mygo

zjunlp / deepke

Recommend Projects

Recommend Topics

Recommend Org