qcf-568 / doctamper Goto Github PK

[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution

Python 100.00%

doctamper's Introduction

DocTamper

The DocTamper dataset is now avaliable at BaiduDrive and Google Drive (part1 and part2).

The DocTamper dataset is only available for non-commercial use, you can request a password for it by sending an email with education email to [email protected] explaining the purpose.

To visualize the images and their corresponding ground-truths from the provided .mdb files, you can run this command "python vizlmdb.py --input DocTamperV1-FCD --i 0".

The official implementation of the paper Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution is in the "models" directory.

I delay the release of training codes as forced by my supervisor and the cooperative enterprise who bought them. My training pipline for DocTamper dataset and the IoU metric heavily brought from a famous project in this area, the results of the paper can be easily re-produced with it, you just need to adjust the loss functions and the learing rate decay curve. I also used its augmentation pipline except for (RandomBrightnessContrast, ShiftScaleRotate, CoarseDropout).

Open Source Scheme:
1、Inference models and codes: June, 2023.
2、Training codes: TBD.
3、Data synthesis code: Within 2024.

Any question about this work please contact [email protected].

If you find this work useful in your research, please consider citing:

@inproceedings{qu2023towards,
  title={Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution},
  author={Qu, Chenfan and Liu, Chongyu and Liu, Yuliang and Chen, Xinhong and Peng, Dezhi and Guo, Fengjun and Jin, Lianwen},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5937--5946},
  year={2023}
}

doctamper's People

Contributors

Stargazers

Watchers

Forkers

yangqianzi caoanda charlesi8 dhruvtyagi15357 lgry cognigen-xyz semikonductor nomiluks

doctamper's Issues

details of data augmentation

Hi, instead of jpeg, is there any augmentation used? is the augmentation order manner (augmentations-then-jpeg or jpeg-then-augmentations)?

The image is not reopened after that last compression

https://github.com/qcf-568/DocTamper/blob/db5189013c9ab4ab4404b0a7a3e64f388dcde18b/dataloader.py#L76C23-L76C23
Hello,
In the dataloader.py file, it appears that the image isn’t reopened after the final compression. This means that if there are three compressions, the image provided to the model has only undergone two compressions. However, the DCT coefficients and the quantization table are from the final compression, which is problematic. Could you clarify the reasoning behind this approach, or is this simply an oversight in the code?

About doctamper

doctamper数据集中都是被篡改过的？真实图像和篡改图像的比例是怎么样，是否有表示图像是篡改还是真实图像的标签呢？

关于重压缩

你好，请问您在训练的时候按照q压缩了一次，在test时压缩1-3次，还是在train和test都压缩了随机1-3次

这个只能区分拍的照片的文档类篡改吗还是普通pdf版的数字文档也能检测

Cat(D0,0, D0,1, D0,2, D0,3)

Cat(D0,0, D0,1, D0,2, D0,3)这个里面提到的 concatenate operation具体是怎么融合呢

'rgb': np.clip(np.abs(dct),0,20) why？

Tampering detection in an image captured from a tampered document.

Hi, does this model works for use cases whereby a person captures an image of a document using scanner for example, make changes to the document, capture an image of tampered document using smartphone and submit to the system? I tried this model on images with no tampering, captured using smartphone camera but model is showing lot of tampering. Please see the attached image.

Any guidance is truly appreciated.

FileNotFoundError: [Errno 2] No such file or directory: 'pks/DocTamperV1-TrainingSet/_75.pk'

Traceback (most recent call last):
File "/mnt/data/experiments/DocTamper-main/models/train.py", line 114, in
train_data = TamperDataset(train_imgs_dir, 'train')
File "/mnt/data/experiments/DocTamper-main/models/train.py", line 36, in init
with open('pks/'+roots+'_%d.pk'%minq,'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'pks/DocTamperV1-TrainingSet/_75.pk'

I followed your hints and wrote train.py, but I don't know how to generate the pk file for the training set

public pristine set

Hi, very appreciate for your work, i was wondering if you are planning to public to pristine dataset for further study? Since all the samples in DocTamper dataset are forgery

Question regarding VPH and Swin for feature extraction

Hi, thanks for providing the code for digital tamper detection, great work!

I had a doubt, if the VPH and Swin layers were trained with the full dtd network end2end or they were pre-trained on image-net separately and plugged in as feature extractors?

Had this doubt since, you were using the last three stages of swin transformer.

T-SROIE dataset link

Hello

I know you didn't make the dataset but please can you point out where to find it? Do they also share code/weight? Are the links in the publication? I don't speak Chinese and I didn't find them when translating...

full capacity

I encountered an issue with the training code I wrote: the memory usage kept increasing until it reached full capacity and stopped (around the 4th epoch), preventing the completion of multiple training iterations.
For confidentiality purposes, could you please provide an email address so that I can privately send you this Python file?

Why do you need to enter dct and qtable in the fph branch instead of just dct, what is the significance of qtable?

self.swin = torch.load('swin_imagenet.pt')

self.vph = torch.load('vph_imagenet.pt')
self.swin = torch.load('swin_imagenet.pt')
self.fph = FPH()
In line 294 of the dtd.py file, why is it not necessary to define self.vph = VPN() and self.swin = self.swin = SwinTransformerV2(), and how is from swins import * used in line 29, and why can't I debug any information on the self.swin variable.

No read the compressed image

In line 76, the image is saved in JPEG format, and the JPEG information is read from the file (line 77). However, the RGB data is not re-read from the file.

some variances VPH(nn.Module) of dtd.py do not exist, depths and self.dims.could you please add them?

Specific tampering type of DocTamper Dataset

As shown in Table 2 of this paper, Copy-move, splicing and generation are all included, the specific number of different tampering types is clearly given, So I wonder whether the tampering type of each image is labeled?
or how does the specific number in Table 2 counted, and is there any way to redivide the dataset with tampering type?
Thanks！

两个embedding层的作用是什么呢

self.obembed self.qtembed的的作用是什么呢

how to generate test_masks in tsroie

how to generate test_masks in tsroie https://github.com/qcf-568/DocTamper/blob/main/models/tsroie/infer_sroie.py ,which label is rectangle,.The code is gt_mask = cv2.imread('test_masks/'+path[:-4]+'.png',0), why use .png, instead of .jpg?

请问数据集中是否存在组合篡改类型，以及能否提供对比的其他方法在DocTamper上训练的模型权重？

数据集中的图像是否存在组合篡改类型，就是一张图像中既包括复制粘贴又包括拼接或生成类型呢？

是否可以提供论文中对比的方法在DocTamper上训练的模型权重，以便后续进行对比研究

AttributeError: 'GELU' object has no attribute 'approximate'

Hi, I am getting this issue while running eval_dtd.py. I have installed the given requirements.txt. Any idea why this might be happenig?

有关训练和测试压缩因子的问题

作者您好，我想向您请教如下问题：
1、训练过程中图片的压缩因子应当如何选取
2、测试过程中图片最后不需要以q为质量因子压缩一次吗（代码中图片压缩存入了temp中但是没有保存）

3、我使用了您提供的代码和权重在DocTamper上进行了测试，在压缩因子为75仅压缩一次的情况下，P R F分别为0.588 0.487 0.533，相较压缩因子在75-100之间选取、压缩1到3次的情况有所下降，请问这是正常现象吗

关于数据集

你好，请问您有计划释出png版本的数据集嘛，数据集中除了掩码和篡改图片之外还有其他的信息吗？dct系数是否存储在数据集中？感谢回答！

How to test the code on a sample image

I want to test the code on the sample image

can the network be trained?

I use your network from class seg_dtd to train my dataset,but it is abnormal,it can not convergence.

sss

代码怎么跑啊，我需要自己训练数据吗

训练集随机压缩因子问题

我在论文中看到 ”所有模型都使用动态JPEG压缩进行训练，以匹配测试集的配置。“但是训练集没有提供像测试集一样"xxx_75.pk"类似 记录随机压缩因子的文件。我想问一下，您提供的训练集是否已经压缩过？
如果没有压缩过，那么训练集随机压缩因子的 minq 参数是多少，是论文中提到的 75 吗？用这个固定的minq训练集来训练，分别在不同的minq测试集进行测试？ 还是对应着测试集的？比如使用minq=85 的训练集训练模型来测试minq=85的测试集。
非常感谢您的解答！

Dataset password

After the dataset is downloaded from Baidu Cloud, why do you need a password to decompress it? What is the password set by the author？

About training CATNET on Doctamper

您好，不好意思打扰了我想再请教一下，目前我是用CATNET：https://github.com/mjkwon2021/CAT-Net 这个项目里的默认设置和训练脚本在doctamper的整个训练集和测试集上训练和测试，在训练的时候loss确实在下降，但是到测试集validate的时候，输出的那几个数值都一模一样，第一轮iou就在49.2，后面几轮就连小数都没有变过，loss也加上过lovasz但也没区别，尝试不加载预训练的RGB和DCT也差不多，我检查了一下用来验证的模型确实是更新过的，这方面不太懂感觉很奇怪，大佬可以帮忙分析一下可能是哪里出问题了吗，感谢感谢。

model.load_state_dict(ckpt['state_dict']) 错误

请教一个问题：

model = seg_dtd('',2).cuda()
ckpt = torch.load("./pths/dtd_sroie.pth",map_location='cpu')

以上没有提示错误（DTD类中load 2个权重文件路径已经修改为从 ./pths 目录load）

运行到：
model.load_state_dict(ckpt['state_dict'])

提示错误：
model.load_state_dict(ckpt['state_dict'])
File "/home/vipuser/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for seg_dtd:
Missing key(s) in state_dict: "model.vph.downsample_layers.0.0.weight", "model.vph.downsample_layers.0.0.bias", "model.vph.downsample_layers.0.1.weight", "model.vph.downsample_layers.0.1.bias", "model.vph.downsample_layers.1.0.weight", "model.vph.downsample_layers.1.0.bias", "model.vph.downsample_layers.1.1.weight", "model.vph.downsample_layers.1.1.bias", "model.vph.stages.0.0.gamma", "model.vph.stages.0.0.dwconv.weight", "model.vph.stages.0.0.dwconv.bias", "model.vph.stages.0.0.norm.weight", "model.vph.stages.0.0.norm.bias", "model.vph.stages.0.0.pwconv1.weight", ........

权重文件是从百度网盘下载的，

麻烦问下这有可能是什么问题。谢谢。

doctamper中全都是被篡改的image吗，没有正样本吗（我指的是没有被篡改的样本）

DocYamperV1

After the dataset is downloaded from Baidu Cloud, why do you need a password to decompress it? What is the password set by the author?

Is the JPEG compression rate during training selected from the range of 100-75 or 100-88?

For the CLTD compression, the chosen range is (100-S/T, 100), with a total of 100K iterations and T being 8192. Does this imply that the minimum compression rate is calculated as 100-(100K/8192), which equals 88?

failure to load the weight which you provide

测试demo，文件缺失

你好，请问如果我想检测我自己的图片，运行哪个文件呢？还有一个问题是“sroie_test_1011.json”和“data.mdb lock.mdb”这三个文件在哪里下载呢？

请问仓库里的代码是关于什么的呢，不是训练代码那是啥?

Regarding dataset access

Respected Sir,
I am a student from IIT Kharagpur, India, working on a Problem Statement relating to fraud detection in medical invoices and bills. We have our competition starting today. I had mailed you earlier regarding this but did not get any reply.

I urgently need access to the dataset.

Please provide me with the password to access the dataset. You can contact me at [email protected].

Inference on PNG images?

Hi, I want to ask if the model can inference on PNG images instead of JPG, JPEG images

能否提供数据集篡改类型的信息？

训练代码什么时候能发布啊

Invalid JPEG file structure: two SOI markers

dataset里面进行数据预处理的时候，doctamper的某些图片经过多次(一般3次)压缩之后，会出现Invalid JPEG file structure: two SOI markers的报错，但目前我这边没法捕捉到该error，暂时没办法定位是哪一行代码出的问题。

requirements.txt依赖库冲突

requirements.txt中库版本冲突，手动无法解决，可否检查更新一下？

Where does it save the inference results? That is the regions where the tampering is there in document.

how to implement CLTD?

Hello author, I sent an email to your email address to request a password for learning, but I haven't received a reply yet.I'm very sorry to take up your valuable time!

能否公开您的训练数据集生成代码？

这无疑是一篇非常优秀的论文，对于文档篡改检测研究有着非常大的推动作用，尤其是公布了一个非常优质的文档篡改数据集。但是对于我们这种人来说，对于论文中您非常详细的论述如何构建一个数据集那部分，实在是不得其意，自己动手更是举步维艰，所以就当是行行好，帮助一下我们这些还在苦苦挣扎的同行，不知是否可以公开您的训练数据集生成代码，好让我们可以从头复现这篇著作，从而在复现并且学习的过程中找到更多灵感，为文档篡改检测研究做出更多的贡献。当然，如果出于无奈，实在不能公开的话我们也理解，那您可否能比论文中更加详细的解释一下数据集生成过程，好让我们循规蹈矩，照虎画猫。感恩感恩，感激不尽。