Coder Social home page Coder Social logo

fetnet's Introduction

I am a first-year graduate student of Xidian University. My supervisor is Professor Cheng Deng.

My area of interest is multimodal understanding, co-speech gesture generation.

Personal page: Guangtao Lyu(吕光涛)

Feel free to contact me at Guangtao Lyu(吕光涛)([email protected])

🔥 News

  • 2023.06: Obtained a bachelor's degree from Wuhan University of Technology.
  • 2023.05: One paper is accepted by PR2023.
  • 2022.03: One paper is accepted by ICME2022.

Some of my Github Public Stats :octocat:

fetnet's People

Contributors

guangtaolyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fetnet's Issues

训练数据集准备方式咨询

作者好,首先感谢开源FETNet。

背景:我想使用FETNet获取去除手写体后的图片。因此需要使用自己的数据集进行训练,目前有两种方式准备数据集。

第一:使用PS工具,将手写体涂抹成白色
image

第二:打印图片,然后人工手写红色字体,最后扫描。 通过不同颜色区分
image

问题:

  1. 这两种数据准备方式的优缺点,哪种方式更适用于FETNet
  2. 数据集的数量最少需要达到多少张才足够
  3. 有没有更好的数据准备方式推荐

希望能得到回答,不胜感激。

CUDA OOM

Hello. Thank you for your work and contribution.
I'm playing with this to remove text and noticed that if I feed a large and medium size images I get the error: OutOfMemoryError: CUDA out of memory. Tried to allocate 485.16 GiB. GPU 0 has a total capacity of 23.50 GiB of which 11.37 GiB is free.

Your model is quite small and I'm surprised why it requires so much memory (more than 10 GB)?

I tried downsizing the input image to 1024 x 1024 but still got OOM.

At the moment I can make an inference model using bfloat16 and an image size of 1024 x 512. In this setup, the model uses 8GB, which is still quite a lot.
My code:

import torch
from modules.Losses import *
from torchvision.utils import make_grid
from torchvision import transforms as T
from PIL import Image
from modules.FETNet import FETNet
from utils.erode import *
from matplotlib import pyplot as plt


def plot(image,si=[12,12]):
    fig, ax = plt.subplots(figsize=si);ax.imshow(image,cmap='gray')
    ax.get_xaxis().set_visible(False);ax.get_yaxis().set_visible(False)
    plt.show()


# Downsample the input image to reduce memory usage
def load_and_preprocess_image(image_path, size=(512, 1024)):           # NOTE: set (1024, 1024) to get OOM
    img = Image.open(image_path).convert('RGB')
    img = img.resize(size)
    img = to_tensor(img).float()
    img = torch.unsqueeze(img, 0)
    return img


to_tensor = T.ToTensor()
to_pil_image = T.ToPILImage()

G = FETNet(3)
ckpt_dict = torch.load("scut_enstext.pth")
G.load_state_dict(ckpt_dict)
G = G.to("cuda").to(torch.bfloat16)
G.eval()

total_params = sum(p.numel() for p in G.parameters())
print(total_params)

img = load_and_preprocess_image("2.jpg")

with torch.no_grad():
    img = img.to("cuda").to(torch.bfloat16)
    fake_B, masks_out = G(img)

comp_B = fake_B * (1 - masks_out) + img * masks_out
for k in range(comp_B.size(0)):
    grid = make_grid(comp_B[k:k + 1])
    grid = to_pil_image(grid)
    plot(grid)

Test image:
2

Therefore the issue remains the same. Don't you think that the model is consuming too much memory?
Perhaps there is a memory leak somewhere in the source code?

关于attention score计算的一点疑问

作者你好:
关于Text_Texture_Erase_And_Enhance_Module里面attention score的计算有点疑问

x_out = (torch.bmm(x_out, b_att_1.permute(0, 2, 1))).view(m_batchsize, c, width, height)

self.softmax = nn.Softmax(dim=1)
f1 = x_1.view(m_batchsize, -1, width * height)
b_att = torch.bmm(f1.permute(0, 2, 1), f1)
f1 = x_mask.view(m_batchsize, -1, width * height)
mask_att = torch.bmm(f1.permute(0, 2, 1), f1)
b_att = b_att * mask_att
b_att = b_att.view(m_batchsize, -1, width, height)
b_att = self.softmax(b_att)`

b_att 的计算是(batchsize,查询query,维度dim,)@ (batchsize,维度dim,查询key,)=(batchsize,查询query,查询key)
这里的softmax源码是(dim=1)而不是(dim-1),请问为什么不是在查询key做softmax?

预测时,是否对输入图像的尺寸有要求

非常好的项目,终于有个端到端的方法进行文字擦除了,预测时省去了文本检测以及mask的构造,但是由于我不是CV方向的,所以有些简单的问题并不清楚,希望能够得到解答。

  1. 我使用提供的权重进行预测时,经过多次测试发现,输入图像的高宽必须满足2的n次方,例如128/256/512等这些长度,否则会报错,请问这是模型设计就是如此,还是存在bug。
  2. 如果我要自己训练模型,是不是只需要论文中如下图的a、b、c即可
    image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.