Coder Social home page Coder Social logo

textgan's Introduction

TextGAN: Unsuperivsed Text Segmentation (PyTorch 1.0.1)

Text segmentation is a difficult problem because of the potentially vast variation in text and scene landscape. Moreover, systems that learn to perform text segmentation usually need non-trivial annotation efforts. This repositry conaints the implementation of unsupervised method to segment text at the pixel-level from scene images. The model we propose, which relies on generative adversarial neural networks, segments text intelligently; and does not therefore need to associate the scene image that contains the text to the ground-truth of the text. The main advantage is thus skipping the need to obtain the pixel-level annotation dataset, which is normally required in training powerful text segmentation models. The code is basesd on PyTorch 1.0.1 and might also work with >=0.4 versions.

Trained models can be found in text_segmentation256-Jun-2. Each model has been built using only 9 residual blocks.

Prerequisites

Numpy; PIL

Installing

Download or clone the repositry, and off you go

Datasets

Place the training and testing samples in two separate folders, called train and test, respectively. Each folder should have the scene-text images in a folder called A and the pixel-wise level annotations in another folder called B. The testing folder should have paired images to verify the performance via F1, but the training folder can have unpaired images. This is a simple and straightforward strategy, you just need to copy your images into these folders. The default name of the folder containing these train and test folders is called 'text_segmentation256', but can be changed by the user accordingly. The folder 'text_segmentation256' is placed outside the implementation, so make sure to correct the path according to your folder's path.

Running the tests

To train a model, use CycleGAN_text.py; To test the model, use test_GAN_AB.py

Author

  • Mohammed Al-Rawi -

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Text Segmentation Samples via CycleGAN

      img                                  GAN(img)                      -ve(img)                      GAN(-ve(img))           GAN(img)+GAN(-ve(img)) Samples Samples Samples Samples Samples Samples Samples

textgan's People

Contributors

morawi avatar

Stargazers

 avatar

Watchers

 avatar

textgan's Issues

RuntimeError: CUDA out of memory.

I had 7 GB GPU VRAM available and around 13 GB RAM available with training configurations:

  • torch==1.4.0
  • batch size=1
  • text_batch_size=5
  • data_mode = ''

I have placed my training data outside the repo with relative path (as per cyclegan_text.py):

../data/<Project_name>/train/A/*
../data/<Project_name>/train/B/*

Similarly test files. Also, commented line 56 in cyclegan_text.py:

opt.dataset_name = 'text_segmentation' + str(opt.img_width)

#############

But when I run python3 cyclegan_text.py, I get following error trace:

Experiment parameters Namespace(AMS_grad=True, aligned=False, b1=0.5, b2=0.999, batch_size=16, batch_test_size=1, channels=3, checkpoint_interval=5, data_mode='', dataset_name='mandate', decay_epoch=10, epoch=0, experiment_name='mandate-Apr-5', img_height=80, img_width=500, lambda_GAN_AB=tensor(1., device='cuda:0'), lambda_GAN_BA=tensor(1., device='cuda:0'), lambda_cycle_A=tensor(10., device='cuda:0'), lambda_cycle_B=tensor(10., device='cuda:0'), lambda_id_A=tensor(5., device='cuda:0'), lambda_id_B=tensor(5., device='cuda:0'), lr=0.0002, n_cpu=8, n_epochs=20, n_residual_blocks=9, p_RGB2BGR_augment=0, p_invert_augment=0, sample_interval=100, seed_value=12345, show_progress_every_n_iterations=20, test_interval=10, use_F1_loss=False, use_whollyG=False)

../data/mandate/train/B/.
../data/mandate/test/B/.
Traceback (most recent call last):
File "cyclegan_text.py", line 267, in
loss_id_B = criterion_identity_B(G_AB(real_B), real_B)
File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/tft-ml/TextGAN/models.py", line 76, in forward
return self.model(x)
File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/tft-ml/TextGAN/models.py", line 33, in forward
return x + self.conv_block(x)
RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 7.80 GiB total capacity; 5.90 GiB already allocated; 27.06 MiB free; 6.64 GiB reserved in total by PyTorch)

I am watching nvidia-smi but it seems that GPU is not being fully utilized, neither by this process nor by some other processes.

Tried, but didn't worked:

  1. torch.cuda.empty_cache()
  2. Downgrading torch version to 0.4.0 and 1.0.0 and 1.1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.