Coder Social home page Coder Social logo

rstcanet's Introduction

RSTCANet

Residual Swin Transformer Channel Attention Network for Image Demosaicing (EUVIP2022)

Wenzhu Xing, Karen Egiazarian


Abstract: Image demosaicing is problem of interpolating full-resolution color images from raw sensor (color filter array) data. During last decade, deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. In recent years, vision transformers have been designed and successfully used in various computer vision applications. One of the recent methods of image restoration based on a Swin Transformer (ST), SwinIR, demonstrates state-of-the-art performance with a smaller number of parameters than neural network-based methods. Inspired by the success of SwinIR, we propose in this paper a novel Swin Transformer-based network for image demosaicing, called RSTCANet. To extract image features, RSTCANet stacks several residual Swin Transformer Channel Attention blocks (RSTCAB), introducing the channel attention for each two successive ST blocks. Extensive experiments demonstrate that RSTCANet outperforms state-of-the-art image demosaicing methods, and has a smaller number of parameters.


Network Architecture

The architecture of our proposed residual Swin Transformer Channel Attention network (RSTCANet). The network consists of three modules: the shallow feature extraction, the deep feature extraction, and the image reconstruction modules. The shallow feature extraction module is composed of a pixel shuffle layer and a vanilla linear embedding layer. For deep feature extraction, we propose residual Swin Transformer Channel Attention blocks (RSTCAB) to extract both hierarchical window based self-attention-aware features and vertical channel-attention-aware features. This module consists of K RSTCAB and one 3x3 convolutional layer. The shallow and deep features are first aggregated by a long skip connection before they fed into the image reconstruction module. The image reconstruction consists of the up-sampling layer and two 3x3 convolutional layers.

There are N Swin Transformer layers (STL) and N/2 channel attention blocks (CA), and one 3x3 convolutional layer in our proposed residual Swin Transformer Channel Attention block (RSTCAB). There is also a skip connection in the RSTCAB, guaranteeing that the RSTCAB will focus on the differences between the input and output images. For each two successive STL, the channel attention block generates the channel statistics with the input of two STLs and multiplies the produced attention with the output of two STLs. The N channel attention blocks in the same RSTCAB share parameters.

Architecture Variants

The parameter settings of different model variants. C is the channel number. K and N denote the number of RSTCAB and the number of STL in one RSTCAB, respectively.

Code

  1. Pretrained Models:

    1. RSTCANet_B.pth (5.5 MB)
    2. RSTCANet_S.pth (16.0 MB)
    3. RSTCANet_L.pth (32.6 MB)
  2. Test:

    python main_test_dm.py --model_name RSTCANet_B --testset_name urban100 --nc 72 --num_head 6 --N 6 --K 2
    python main_test_dm.py --model_name RSTCANet_S --testset_name urban100 --nc 96 --num_head 6 --N 6 --K 4
    python main_test_dm.py --model_name RSTCANet_L --testset_name urban100 --nc 128 --num_head 8 --N 8 --K 4

Results on Image Demosaicing

Resulting Images

Visual results comparison of different demosaicing methods. (a) Ground-truth and selected area; (b) Ground-truth; (c) Mosaiced; (d) IRCNN; (e) RSTCANet-B; (f) DRUNet; (g) RSTCANet-S; (h) RNAN; (i) RSTCANet-L.

Acknowledgement

This code borrows heavily from SwinIR and SwinTransformer.

rstcanet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

shuvo001 cv-ip

rstcanet's Issues

About training the model

I found your paper on the internet and get interested with it. I want to ask about the way you trained the model.
I saw that you are using SWIN transformer in your model. When you trained the model, did you train it from scratch with your dataset, or did you use any pretrained weights? One more question, can you tell me what environments you used to train this model?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.