m-bain Goto Github PK
Name: Max Bain
Type: User
Bio: multimodal
Twitter: maxhbain
Blog: maxbain.com
Name: Max Bain
Type: User
Bio: multimodal
Twitter: maxhbain
Blog: maxbain.com
Mapping a variable-length sentence to a fixed-length vector using BERT model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
Video embeddings for retrieval - code for the paper "Use What You Have: Video retrieval using representations from collaborative experts"
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Condensed Movies Challenge 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
Hydra is a framework for elegantly configuring complex applications
LAVIS - A One-stop Library for Language-Vision Intelligence
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Automated Audiovisual Behaviour Recognition in Wild Primates
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
A pytorch implemented classifier for Multiple-Label classification
Simple Diarization model
A simple command line tool to show GPU usage on a SLURM cluster
Pytorch port of Google Research's VGGish model used for extracting audio features.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Implementations of Transformers for Video
Easily create large video dataset from video urls
Extract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
Large-scale text-video dataset. 10 million captioned short videos.
Robust Speech Recognition via Large-Scale Weak Supervision
OpenAI Whisper ASR Webservice API
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.