Coder Social home page Coder Social logo

computer_vision_final's Introduction

Computer Vision (DATA130051) Final Projects

This is the final projects for Computer Vision (DATA130051)

Authors: Zixuan Chen, Yanjun Shao, Boyuan Yao

Task1

You may also find these steps in the official documents of MMSegmentation.

Prerequisites

Step 0. Install MMCV using MIM. (Installing mmcv-full may take about 10 mins)

pip install -U openmim
mim install mmcv-full

Step 1. Install MMSegmentation.

git clone https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
pip install -v -e .

Detect Videos

from mmseg.apis import inference_segmentor, init_segmentor
import mmcv
import cv2
import tqdm

# DeepLabv3+
config_file = 'configs/deeplabv3plus/deeplabv3plus_r101-d8_fp16_512x1024_80k_cityscapes.py'
checkpoint_file = 'checkpoints/deeplabv3plus_r101-d8_fp16_512x1024_80k_cityscapes_20200717_230920-f1104f4b.pth'

# SETR
config_file = 'configs/setr/setr_vit-large_pup_8x1_768x768_80k_cityscapes.py'
checkpoint_file = 'checkpoints/setr_pup_vit-large_8x1_768x768_80k_cityscapes_20211122_155115-f6f37b8f.pth'

img_file = 'resources/video.mp4'
out_file = 'resources/output.avi'


# build the model from a config file and a checkpoint file
model = init_segmentor(config_file, checkpoint_file, device='cuda:0')

# test a video and show the results
video = mmcv.VideoReader('resources/video.mp4')
fourcc = cv2.VideoWriter_fourcc(*'XVID')
writer = cv2.VideoWriter(out_file, fourcc, video.fps, (video.width, video.height))

for frame in tqdm.tqdm(video):
    result = inference_segmentor(model, frame)
    output = model.show_result(frame, result, opacity=0.5)
    writer.write(output)
writer.release() 

You can find the result video here.

链接:https://pan.baidu.com/s/1_EtihmU6v1Gqqs6pT8GwFg 提取码:0216 --来自百度网盘超级会员V5的分享

Task2

The code is the same with the mid-term Faster R-CNN code, and can be found here.

All the trained models can be downloaded here.

Task3

In this task we train transformer-based models on CIFAR-100 from scratch, improving data efficiency by data augmentation, distillation mentioned in DeiT and use specially designed models.

To use our trained ViT models, you need to first download the models from here (password: e6qt), and using ViT_test.py to load the models, e.g.

python ViT_test.py --src vit_baseline.pt # Indicates the source .pt file

To train ViT models, use python ViT_train.py --help to seek help, here is an example

python main.py --net_type vit # Net type: ViT or DeiT
			   --workers 16 # Number of workers to load images
			   -b 64 # Batchsize
			   --verbose # Verbose the training process
			   --lr 0.01 # Learning rate
			   --epochs 50 # Number of epochs for base phase
			   --model_save # Save model
			   --aug_type cutmix # Data augmentation types
			   --beta 10 # Hyperparameter for data augmentation
			   --aug_prob 0.5 # Augmentation probability
			   --save_path ./vit_cutmix.pt # Path to save the model
			   -tp ./log/vit_cutmix # Tensorboard log path
			   -tl /vit/ # Tensorboard label
			   --scheduler # Using Cosine Annealing scheduler
			   --restart 2 # Number of restart phaeses (N - 1)
			   --mult 2 # mult factor for number of epochs after each restart
			   --dim 768 # Dimension after embedding
			   --depth 16 # Number of transformer blocks
			   --heads 12 # Number of heads for multi-heads attention
			   --mlp_dim 1024 # MLP dimension

The above codes could be found in folder Task3. We also provide a teacher model ResNet-152 mentioned in our report for you to train your DeiT.

The extra work Swin Transformer with localization can be found here. We get the code from this repo and applied some slight changes to it so that it can be adapted to our Pytorch version. The trained model can be found following this link.

The Task3/CvT folder contains a reproduced-from-scratch Convolutional Vision Transformer. Unfortunately, we are not able to reproduce the result as expected.

computer_vision_final's People

Contributors

cypher30 avatar super-dainiu avatar zixuan417 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.