Coder Social home page Coder Social logo

I am currently a internship at Pika Lab, and also a master student of Information Processing Lab at University of Washington. I am currently working on video understanding and generation, as well as embodied agent. Have a look at my homepage for more details.

When I am not doing research, I like photography, traveling, and singing.



My GPTs:


Updates:

  • 07/2024: Two papers accepted to ACM MM 2024.
  • 07/2024: Two papers accepted to ECCV 2024.
  • 06/2024: One technique report accepted to CVPR 2024 workshop @ NTIRE.
  • 06/2024: We are working with Pika Lab to develop next-generation video understanding and generation models.
  • 05/2024: One paper accepted to CVPR 2024 workshop @ Embodied AI.
  • 04/2024: We are hosting CVPR 2024 Long-form Video Understanding Challenge @ LOVEU.
  • 04/2024: Invited talk at AgentX seminar about our STEVE series works.
  • 03/2024: One paper accepted to ICLR 2024 workshop at LLM Agents.
  • 02/2024: Two papers accepted to CVPR 2024.
  • 02/2024: Invited talk at AAAI 2024 workshop at IMAGEOMICS.
  • 12/2023: One paper accepted to ICASSP 2024.
  • 12/2023: One paper accepted to AAAI 2024.
  • 11/2023: Two papers accepted to WACV 2024 and its workshop at CV4Smalls.
  • 09/2023: One paper accepted to ICCV 2023 workshop at TNGCV-DataComp.
  • 09/2023: One paper accepted to IEEE T-MM.
  • 08/2023: One paper accepted to BMVC 2023.
  • 07/2023: Two papers accepted to ACM MM 2023.
  • 07/2023: Finished my research internship at Microsoft Research Asia (MSRA), Beijing.
  • 07/2023: Two papers accepted to ICCV 2023.

Wenhao Chai's Projects

llm-agent-paper-list icon llm-agent-paper-list

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

minisora icon minisora

The Mini Sora project aims to explore the implementation path and future development direction of Sora.

missing-label-detection icon missing-label-detection

With imperfect bounding box annotation, 30% of missing labels in this project, normal detection method like YOLOv5 doesn’t achieve a relatively good result. In our project, we use COCO dataset. And we greatly eliminate the negative influence on missing labels by using a modified loss function and dynamic weight.

moviechat icon moviechat

[CVPR 2024] πŸŽ¬πŸ’­ chat with over 10K frames of video!

multi-modality-arena icon multi-modality-arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

muse-maskgit-pytorch icon muse-maskgit-pytorch

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

old_web icon old_web

personal website built on beautiful jekyll, feel free to clone and modify

openscene icon openscene

3D Occupancy Prediction Benchmark in Autonomous Driving

pose2img icon pose2img

pose-driven human natural image generation based on latent diffusion model

poseda icon poseda

[ICCV 2023] Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

rese1f icon rese1f

Config files for my GitHub profile.

stablevideo icon stablevideo

[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing

steve icon steve

β›πŸ’Ž STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment

tuning_playbook icon tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

uniap icon uniap

[AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

univhp icon univhp

Unified Human-centric Perception Model and Benchmark in Sports

vfd-2000 icon vfd-2000

[ICTAI 2022] VFD-2000 Dataset and official page for "Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model"

video-dataset-maker icon video-dataset-maker

A pipeline covers downloading videos from YouTube and extracting frames using ffmpeg.

video_captioning_datasets icon video_captioning_datasets

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.