Topic: multi-modal Goto Github
Some thing interesting about multi-modal
Some thing interesting about multi-modal
multi-modal,Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Organization: boschresearch
multi-modal,SALMONN: Speech Audio Language Music Open Neural Network
Organization: bytedance
Home Page: https://bytedance.github.io/SALMONN/
multi-modal,Recent Transformer-based CV and related works.
User: dirtyharrylyl
multi-modal,Represent, send, store and search multimodal data
Organization: docarray
Home Page: https://docs.docarray.org/
multi-modal,Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Organization: dvlab-research
multi-modal,[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation
User: endlesssora
multi-modal,[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
User: haiyang-w
Home Page: https://arxiv.org/abs/2308.07732
multi-modal,Efficient Retrieval Augmentation and Generation Framework
Organization: intellabs
multi-modal,A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
User: jokieleung
multi-modal,Robust robotic localization and mapping, together with NavAbility(TM). Reach out to [email protected] for help.
Organization: juliarobotics
Home Page: https://www.wherewhen.ai
multi-modal,The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
User: junchen14
multi-modal,A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
User: kav-k
multi-modal,Democratization of RT-2 "RT-2: New model translates vision and language into action"
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multi-modal,Build high-performance AI models with modular building blocks
User: kyegomez
Home Page: https://zeta.apac.ai
multi-modal,[TNNLS] A Comprehensive Survey of Awesome Visual Transformer Literatures.
User: liuyang-ict
multi-modal,Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
User: lucidrains
multi-modal,Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Organization: marqo-ai
Home Page: https://www.marqo.ai/
multi-modal,[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
Organization: medmnist
Home Page: https://medmnist.com/
multi-modal,FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Organization: microsoft
Home Page: https://microsoft.github.io/farmvibes-ai/
multi-modal,Start building LLM-empowered multi-agent applications in an easier way.
Organization: modelscope
Home Page: https://modelscope.github.io/agentscope/
multi-modal,A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Organization: modelscope
multi-modal,ModelScope: bring the notion of Model-as-a-Service to life.
Organization: modelscope
Home Page: https://www.modelscope.cn/
multi-modal,Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Organization: ofa-sys
multi-modal,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Organization: open-compass
Home Page: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
multi-modal,[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
Organization: open3da
Home Page: https://ll3da.github.io/
multi-modal,[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
Organization: opengvlab
Home Page: https://internvl.github.io/
multi-modal,[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Organization: openmotionlab
Home Page: https://motion-gpt.github.io
multi-modal,FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
User: patrickjohncyh
multi-modal,【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Organization: pku-yuangroup
Home Page: https://arxiv.org/abs/2310.01852
multi-modal,Mixture-of-Experts for Large Vision-Language Models
Organization: pku-yuangroup
Home Page: https://arxiv.org/abs/2401.15947
multi-modal,Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Organization: pku-yuangroup
Home Page: https://arxiv.org/pdf/2311.10122.pdf
multi-modal,🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
User: qin2dim
Home Page: https://docs.captchax.top/
multi-modal,Unified Controllable Visual Generation Model
Organization: salesforce
Home Page: https://canqin001.github.io/UniControl-Page/
multi-modal,A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Organization: scisharp
Home Page: https://scisharp.github.io/LLamaSharp
multi-modal,推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
User: tangxyw
Home Page: https://tangxyw.github.io/
multi-modal,This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
User: tebmer
multi-modal,a state-of-the-art-level open visual language model | 多模态预训练模型
Organization: thudm
multi-modal,GPT4V-level open-source multi-modal model based on Llama3-8B
Organization: thudm
multi-modal,Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Organization: thudm
multi-modal,Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
User: v-iashin
Home Page: https://v-iashin.github.io/SpecVQGAN
multi-modal,Open Source Routing Engine for OpenStreetMap
Organization: valhalla
Home Page: https://valhalla.github.io/valhalla/
multi-modal,The TypeScript library for building AI applications.
Organization: vercel
Home Page: https://modelfusion.dev
multi-modal,code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021
User: wangsuzhen
multi-modal,[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
User: wangxiao5791509
multi-modal,[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Organization: wisconsinaivision
Home Page: https://vip-llava.github.io/
multi-modal,Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
User: wjun0830
Home Page: https://arxiv.org/abs/2303.13874
multi-modal,[Paper][Preprint 2024] MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion
Organization: zjukg
Home Page: https://arxiv.org/abs/2404.09468
multi-modal,[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Organization: zjunlp
Home Page: http://deepke.zjukg.cn/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.