Topic: multimodal Goto Github
Some thing interesting about multimodal
Some thing interesting about multimodal
multimodal,Actionable AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
Organization: alan-ai
Home Page: https://alan.app/
multimodal,Actionable AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)
Organization: alan-ai
multimodal,Actionable AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
Organization: alan-ai
Home Page: https://alan.app
multimodal,Actionable AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
Organization: alan-ai
Home Page: https://alan.app
multimodal,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Organization: alibabaresearch
multimodal,An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
User: arrowluo
Home Page: https://arxiv.org/abs/2104.08860
multimodal,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
User: atfortes
multimodal,Images to inference with no labeling (use foundation models to train supervised models).
Organization: autodistill
Home Page: https://docs.autodistill.com
multimodal,This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR23).
User: dailenson
multimodal,Represent, send, store and search multimodal data
Organization: docarray
Home Page: https://docs.docarray.org/
multimodal,Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
User: enricoros
Home Page: https://big-agi.com
multimodal,A curated list of Multimodal Related Research.
User: eurus-holmes
multimodal,A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Organization: facebookresearch
Home Page: https://mmf.sh/
multimodal,WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Organization: google-research-datasets
Home Page: https://github.com/google-research-datasets/wit
multimodal,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
User: haotian-liu
Home Page: https://llava.hliu.cc
multimodal,Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Organization: idea-ccnl
multimodal,HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Organization: internlm
multimodal,InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Organization: internlm
multimodal,Meta-Transformer for Unified Multimodal Learning
User: invictus717
Home Page: https://arxiv.org/abs/2307.10802
multimodal,🪩 Create Disco Diffusion artworks in one line
Organization: jina-ai
multimodal,☁️ Build multimodal AI applications with cloud-native stack
Organization: jina-ai
Home Page: https://docs.jina.ai
multimodal,Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal,Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal,日本語LLMまとめ - Overview of Japanese LLMs
Organization: llm-jp
Home Page: https://llm-jp.github.io/awesome-japanese-llm
multimodal,Curated tutorials and resources for Large Language Models, AI Painting, and more.
Organization: luban-agi
multimodal,Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
User: lucidrains
multimodal,Foundation Architecture for (M)LLMs
Organization: microsoft
Home Page: https://aka.ms/GeneralAI
multimodal,Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Organization: microsoft
Home Page: https://aka.ms/GeneralAI
multimodal,ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
Organization: modelscope
multimodal,Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
User: next-gpt
Home Page: https://next-gpt.github.io/
multimodal,A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Organization: nvidia
Home Page: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
multimodal,Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Organization: ofa-sys
multimodal,A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Organization: ofa-sys
multimodal,OpenMMLab Pre-training Toolbox and Benchmark
Organization: open-mmlab
Home Page: https://mmpretrain.readthedocs.io/en/latest/
multimodal,Multimodal-GPT
Organization: open-mmlab
multimodal,[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Organization: openbmb
multimodal,InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Organization: opengvlab
Home Page: https://igpt.opengvlab.com
multimodal,Video Foundation Models & Data for Multimodal Understanding
Organization: opengvlab
multimodal,Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Organization: rerun-io
Home Page: https://rerun.io/
multimodal,Easily compute clip embeddings and build a clip retrieval system with them
User: rom1504
Home Page: https://rom1504.github.io/clip-retrieval/
multimodal,Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
User: rom1504
multimodal,This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
User: skalskip
multimodal,SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Organization: stability-ai
Home Page: https://platform.stability.ai/
multimodal,notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
User: swyxio
Home Page: https://latent.space/
multimodal,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Organization: unum-cloud
Home Page: https://unum-cloud.github.io/uform/
multimodal,🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.
User: wangrongsheng
multimodal,Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Organization: x-plug
Home Page: https://arxiv.org/abs/2401.16158
multimodal,mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Organization: x-plug
multimodal,mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Organization: x-plug
Home Page: https://www.modelscope.cn/studios/damo/mPLUG-Owl
multimodal,(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
User: yutong-zhou-cv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.