Muhammad Maaz's Projects
An experimental open-source attempt to make GPT-4 fully autonomous.
:book: For those who wanna learn Bash
Contrastive Self-Supervised Learning for Visual Recognition: A Survey
The repository contains the code to solve the id switches of tracks labelled using Intel's CVAT tool.
Windows and Linux version of Darknet Yolo v3 & v2 Neural Networks for object detection (Tensor Cores are used)
Destruction and Construction Learning for Fine-grained Image Recognition
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Detectron2 is FAIR's next-generation platform for object detection, segmentation and other visual recognition tasks.
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Official implementation of the paper "DETReg: Unsupervised Pretraining with Region Priors for Object Detection".
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Discipline Anomaly Detection using Audio and Video Processing
An easy implementation of Faster R-CNN (https://arxiv.org/pdf/1506.01497.pdf) in PyTorch.
[CADL'22, ECCVW] Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications".
Structured Edge Detection Toolbox
Facial Keypoints detection using CNN modified from https://github.com/udacity/P1_Facial_Keypoints
PyTorch extensions for high performance and large scale training.
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks [CVPR 2024].
Image Captioning Using CNN-RNN Architecture modified from https://github.com/udacity/CVND---Image-Captioning-Project
Official repository for "Intriguing Properties of Vision Transformers" (NeurIPS 2021--Spotlight)
Pytorch0.4.1 codes for Lighthead-RCNN
š„š„ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
PyTorch reimplementation of the paper "MaxViT: Multi-Axis Vision Transformer" [arXiv 2022].
Mask-Guided Attention Network for Occluded Pedestrian Detection. (ICCV'19)
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes