Light

guangtaolyu / fetnet Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 2.0 478 KB

FETNet: Feature Erasing and Transferring Network for Scene Text Removal

Python 100.00%

fetnet's Introduction

I am a first-year graduate student of Xidian University. My supervisor is Professor Cheng Deng.

My area of interest is multimodal understanding, co-speech gesture generation.

Personal page: Guangtao Lyu(吕光涛)

Feel free to contact me at Guangtao Lyu(吕光涛)([email protected])

🔥 News

2023.06: Obtained a bachelor's degree from Wuhan University of Technology.
2023.05: One paper is accepted by PR2023.
2022.03: One paper is accepted by ICME2022.

Some of my Github Public Stats

fetnet's People

Contributors

Stargazers

Watchers

Forkers

baggio321 buiduchanh

fetnet's Issues

训练数据集准备方式咨询

作者好，首先感谢开源FETNet。

背景：我想使用FETNet获取去除手写体后的图片。因此需要使用自己的数据集进行训练，目前有两种方式准备数据集。

第一：使用PS工具，将手写体涂抹成白色

第二：打印图片，然后人工手写红色字体，最后扫描。通过不同颜色区分

问题：

这两种数据准备方式的优缺点，哪种方式更适用于FETNet
数据集的数量最少需要达到多少张才足够
有没有更好的数据准备方式推荐

希望能得到回答，不胜感激。

CUDA OOM

Hello. Thank you for your work and contribution.
I'm playing with this to remove text and noticed that if I feed a large and medium size images I get the error: OutOfMemoryError: CUDA out of memory. Tried to allocate 485.16 GiB. GPU 0 has a total capacity of 23.50 GiB of which 11.37 GiB is free.

Your model is quite small and I'm surprised why it requires so much memory (more than 10 GB)?

I tried downsizing the input image to 1024 x 1024 but still got OOM.

At the moment I can make an inference model using bfloat16 and an image size of 1024 x 512. In this setup, the model uses 8GB, which is still quite a lot.
My code:

import torch
from modules.Losses import *
from torchvision.utils import make_grid
from torchvision import transforms as T
from PIL import Image
from modules.FETNet import FETNet
from utils.erode import *
from matplotlib import pyplot as plt


def plot(image,si=[12,12]):
    fig, ax = plt.subplots(figsize=si);ax.imshow(image,cmap='gray')
    ax.get_xaxis().set_visible(False);ax.get_yaxis().set_visible(False)
    plt.show()


# Downsample the input image to reduce memory usage
def load_and_preprocess_image(image_path, size=(512, 1024)):           # NOTE: set (1024, 1024) to get OOM
    img = Image.open(image_path).convert('RGB')
    img = img.resize(size)
    img = to_tensor(img).float()
    img = torch.unsqueeze(img, 0)
    return img


to_tensor = T.ToTensor()
to_pil_image = T.ToPILImage()

G = FETNet(3)
ckpt_dict = torch.load("scut_enstext.pth")
G.load_state_dict(ckpt_dict)
G = G.to("cuda").to(torch.bfloat16)
G.eval()

total_params = sum(p.numel() for p in G.parameters())
print(total_params)

img = load_and_preprocess_image("2.jpg")

with torch.no_grad():
    img = img.to("cuda").to(torch.bfloat16)
    fake_B, masks_out = G(img)

comp_B = fake_B * (1 - masks_out) + img * masks_out
for k in range(comp_B.size(0)):
    grid = make_grid(comp_B[k:k + 1])
    grid = to_pil_image(grid)
    plot(grid)

Test image:

Therefore the issue remains the same. Don't you think that the model is consuming too much memory?
Perhaps there is a memory leak somewhere in the source code?

关于attention score计算的一点疑问

作者你好：
关于Text_Texture_Erase_And_Enhance_Module里面attention score的计算有点疑问

FETNet/modules/FETNet.py

Line 138 in c8357ea

    
           x_out = (torch.bmm(x_out, b_att_1.permute(0, 2, 1))).view(m_batchsize, c, width, height)

self.softmax = nn.Softmax(dim=1)
f1 = x_1.view(m_batchsize, -1, width * height)
b_att = torch.bmm(f1.permute(0, 2, 1), f1)
f1 = x_mask.view(m_batchsize, -1, width * height)
mask_att = torch.bmm(f1.permute(0, 2, 1), f1)
b_att = b_att * mask_att
b_att = b_att.view(m_batchsize, -1, width, height)
b_att = self.softmax(b_att)`

b_att 的计算是（batchsize，查询query，维度dim，）@ （batchsize，维度dim，查询key，）=（batchsize，查询query，查询key）
这里的softmax源码是(dim=1)而不是（dim-1），请问为什么不是在查询key做softmax？

预测时，是否对输入图像的尺寸有要求

非常好的项目，终于有个端到端的方法进行文字擦除了，预测时省去了文本检测以及mask的构造，但是由于我不是CV方向的，所以有些简单的问题并不清楚，希望能够得到解答。

我使用提供的权重进行预测时，经过多次测试发现，输入图像的高宽必须满足2的n次方，例如128/256/512等这些长度，否则会报错，请问这是模型设计就是如此，还是存在bug。
如果我要自己训练模型，是不是只需要论文中如下图的a、b、c即可

模型链接失效

@GuangtaoLyu 模型权重的百度网盘链接失效。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.