Coder Social home page Coder Social logo

deeprob-final-project's Introduction

DeepRob-Trackformer

Reimplementation, Evaluation, and Fine-tuning of Trackformer on Novel Data

Final Project

Based on the paper:

TrackFormer: Multi-Object Tracking with Transformers paper by Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe and Christoph Feichtenhofer.

Title Breakdown

Reimplementation - The main goal of our project was to reimplement the paper for testing and evaluation

Evaluation -Using the reimplemented trackformer we tested new data previously not tested using trackformer to explore the limitations of the technique

Fine-tuning - Improving the existing model to track more successfully on datasets outside of the original scope

Trackformer - New and Unique approach to Multi-Object tracking using transformers

Inputs/Outputs

Inputs:

-New data previously not tested on trackformer. -Videos with characteristics significantly different from MOT17 that was used to train model.

Outputs:

-Part 1 - Results from testing on novel data -Part 2 - New model based on original trackformer model fine-tuned with new datasets (Not Completed)

Original Plan A

Original Plan A

Original Plan B

Original Plan B

Updated Plan A

Updated Plan A

Updated Plan B

Updated Plan B

Poster

Poster

Roadblocks

Getting Trackformer to Run on local machine

Each road block took hours to days to solve but steady progress was made throughout this first stage. Ended with GPU incompatibility with Cuda 10.2. Forced to abandon local machine and transition to Google Colab. Lost a couple weeks of work due to this roadblock. Wasn't able to know that this would be an issue until all other roadblocks were solved.

Old software

-Required downgrade to python 3.7 -Required downgrade to pytorch 1.5 -Required downgrade to torchvision 0.6 -Unable to use typical install methods because pytorch 1.5 is not supported with python 3.7 through pip

Debugging packages

-Trackformer requires many outdated and no longer maintained packages with functions that are now deprecated. -Constant issues building wheels for packages. Several days of debugging. -Particular issues with Lap and Lapsolver. -Debugging with pip, wheelbuilder, wheels, setup.py, and many other methods. -Eventually able to get all packages to build wheels and install.

Windows incompatibilities

-Issues with C++ toolkit due to windows incompatibilities -Pycoco version specified in trackformer repo is not compatible with windows. Forced to use a different version. Found windows compatible version on github and modified setup to work with trackformer.

Cuda

-Old Cuda version (10.2) needed for pytorch 1.5

GPU

-Local GPU not compatible with Cuda 10.2. RTX 4070 ti is too modern for Cuda 10.2 and incompatible with pytorch 1.5. COULD NOT BE OVERCOME ON LOCAL MACHINE. Forced to transition to Google Colab.

Getting Trackformer to Run on Google Colab

Google colab had its own set of issues that took several days to solve. Ultimately we were able to successfully run trackformer on Google Colab. This was a major milestone in the project. We could use the pre-traiuned model to test on new data. Ultimately we ran into hardware limitations on colab as well.

Testing different versions of Python and Pytorch

-We had many more compatibility issues between python and all the packages but through trial and error we were able to find a compatible version of python and pytorch. This ended up being a different version than we used on the local machine and different than the version specified in the trackformer repo.

Debugging packages

-We ran into many of the same problems as we did on the local machine. However, we were able to use some of the same solutions as we did on the local machine. -We once again had issues with Lap and Lapsolver and had to find different solutions for the colab environment. -We ended up having to use a combination of pip and conda to get all the packages to install correctly. -Pycoco tools was an issue again and we had to use a combination of conda and then pip to build the wheel for the version from the repo.

Testing

-We were finally able to run testing on the pre-trained model in colab. This was a major milestone in the project. We tested on the MOT17 dataset and were able to get results same as the original paper. -We were also able to test on the new types of data collected from various sources that led to the results that we have displayed in the poster.

Hardware limitations

-Debugging and difficulties in colab forced us to upgrade to colab pro to get access to more compute units and more GPU memory. -Primary testing was done with the T4 GPUs available from colab pro. This was sufficient for testing on the MOT17 dataset but was not sufficient for training. -Training hit VRAM limits almost immediately and was unable to train on the T4 GPUs. T4s have approximately 15GB of VRAM on colab and that was not enough. (Original paper used 32GB VRAM) -Upgrading to Colab PRO+ gave us access to V100 GPUs (16gb VRAM) and more compute units. We got slightly more VRAM which let us do a bit more training but it still stopped before 1 epoch completed. -Colab PRO+ also gave us access to A100 GPUs with 40gb of VRAM but unfortunately these GPUs are too modern and we encountered the same issues we had when trying to run locally. We could not overcome the incompatibility between trackformer, pytorch 1.5, and Cuda 10.2.

MSI Training Roadblocks

Module load cuda 11.2 & python 3.6.3_Anaconda Created env4 with python 3.7 Installed pip and requirements.txt Lap solver could not be built (cmake version too low) Tried to conda install from forge x Tried installing lapsolver again from pip Installed torch 1.7.1 and torchvision 0.8.2 from the paper with pip Installed pycoco from source Running install script fails (FIXED) Tried ensuring pip was upgraded and rebuilt setup tools X Tried to upgrade g++ but could not find the path to python 3.7 Verified path exists Retried with sourcing python 3.7.1 Used the bash command cxx==g++ to execute script with g++ Tried to conda install from forge cxx to update g++ Same issue (python not found) Tried using cxx compiler flag Tried to update conda to latest version File permission error, needed to deactivate the environment Tried to install latest gcc with conda from the source Used a source URL from a community post to update gcc (WORKED!) Got the mot17 dataset and unzipped, ran script to do coco annotation Running Track.py to test the pretrained model fails (FIXED) Caused by A40 GPU being incompatible with deprecated torch version Tried github fix on installing nightly version of cuda toolkit Tries to reinstall numpy with conda Pip installed typing_extensions after conda install doesn’t work Tried to install torch and torchvision with conda instead of pip Interestingly cuda is not available if conda is used to install torch Upgraded pytorch with pip to 1.10.0 Tried several different cuda versions (10.1, 9.1, 8.0) Switched to Mangi Code finally executes but no kernel images in msdeformable_im2col_cuda Tried different versions of cuda and cudatoolkit. Tried pip install from source pytorch+cu111 MOdule loaded cudnn/8.2.0 Reran build script (FIXED) Wrong numpy vector size Upgraded numpy to 1.20.3 (Author’s recommendation) (FIXED) Training Downloaded the custom sports mot dataset This was 35 GB, uploading to MSI would take 7 hours Eliminated much of the data to reduce the dataset to 5GB Fixed broken zip structure by running the command zif FFu Training Starts! But run into an issue: The json files now have pointers to nonexistent files. Changing these points is a long process. NEXT STEPS: Either update all the pointers or find a way to move the entire dataset. Could Try Rsync

Novel Data Testing Results

Minneapolis Lakes Dataset

Mecca Data

Anime Tracking Data

Sports Messi Data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.