Coder Social home page Coder Social logo

Comments (8)

da03 avatar da03 commented on June 1, 2024

Hi @wzlxjtu, since this model is only trained on im2latex-100k, without any font/environment variations, it is normal that neural networks would fail to generalize to other scientific domains even though the images appear very similar to humans. To get a more robust model, you might need to construct a training dataset with the same level of noise (e.g., if you want to do scientific papers, you might need to render latex in various font sizes and font families).

However, it's weird that screenshots would fail, I think you might need to rescale the screenshots to match the font size as the training images (e.g., if '\lambda' is 8-by-10 pixels in the training set, you might need to rescale such that the size remains the same in the screenshot).

from im2markup.

da03 avatar da03 commented on June 1, 2024

btw, for the screenshots, you might also need to make sure that they are in grayscale, and downsampled by 2 if you took a screenshot of an unpreprocessed image.

from im2markup.

wzlxjtu avatar wzlxjtu commented on June 1, 2024

Hi @da03 , I found out that it's the padding on the left and top that's playing a critical rule. The padding should be 4 pixels (as stated in your paper, 8 pixels and then downsampled by 2). After I got the padding correct, I got some output that makes sense. However, seems like the way you downsample the image is also critical for precision. I tried to linearly downsample the original images in the IM2LATEX-100K dataset but could not reproduce your preprocessed image. Take bc13232098.png for example.

Yours:
image
Mine:
image
Original:
image

Did you downsample the images with Gaussian filter or anything like that? Am I missing some other important preprocessing steps? I tried to find this information but seems like this step was not documented. I really appreciate your help!

from im2markup.

da03 avatar da03 commented on June 1, 2024

Hmm interesting. I used LANCZOS resampling:
https://github.com/harvardnlp/im2markup/blob/master/scripts/utils/image_utils.py#L56

from im2markup.

wzlxjtu avatar wzlxjtu commented on June 1, 2024

Oh! Really appreciate it!

from im2markup.

HongChow avatar HongChow commented on June 1, 2024

btw, for the screenshots, you might also need to make sure that they are in grayscale, and downsampled by 2 if you took a screenshot of an unpreprocessed image.

@da03 would you please tell me why need downsampled by 2 ?
Great thanks

from im2markup.

da03 avatar da03 commented on June 1, 2024

It's because during preprocessing we downsampled by 2. Since deep neural networks do not work on out-of-domain data, at test time we need to do the same preprocessing. In order to get a model that's robust against resolutions or color maps, we need to add those transformations/noise during training as well.

from im2markup.

vyaslkv avatar vyaslkv commented on June 1, 2024

I am also facing the same issue and not getting results for the images outside the test dataset I did the preprossing step using below:
but still not getting the sensible results

onmt_preprocess -data_type img
-src_dir data/im2text/images/
-train_src data/im2text/src-train.txt
-train_tgt data/im2text/tgt-train.txt -valid_src data/im2text/src-val.txt
-valid_tgt data/im2text/tgt-val.txt -save_data data/im2text/demo
-tgt_seq_length 150
-tgt_words_min_frequency 2
-shard_size 500
-image_channel_size 1

from im2markup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.