Comments (8)
Hi @wzlxjtu, since this model is only trained on im2latex-100k, without any font/environment variations, it is normal that neural networks would fail to generalize to other scientific domains even though the images appear very similar to humans. To get a more robust model, you might need to construct a training dataset with the same level of noise (e.g., if you want to do scientific papers, you might need to render latex in various font sizes and font families).
However, it's weird that screenshots would fail, I think you might need to rescale the screenshots to match the font size as the training images (e.g., if '\lambda' is 8-by-10 pixels in the training set, you might need to rescale such that the size remains the same in the screenshot).
from im2markup.
btw, for the screenshots, you might also need to make sure that they are in grayscale, and downsampled by 2 if you took a screenshot of an unpreprocessed image.
from im2markup.
Hi @da03 , I found out that it's the padding on the left and top that's playing a critical rule. The padding should be 4 pixels (as stated in your paper, 8 pixels and then downsampled by 2). After I got the padding correct, I got some output that makes sense. However, seems like the way you downsample the image is also critical for precision. I tried to linearly downsample the original images in the IM2LATEX-100K dataset but could not reproduce your preprocessed image. Take bc13232098.png for example.
Did you downsample the images with Gaussian filter or anything like that? Am I missing some other important preprocessing steps? I tried to find this information but seems like this step was not documented. I really appreciate your help!
from im2markup.
Hmm interesting. I used LANCZOS resampling:
https://github.com/harvardnlp/im2markup/blob/master/scripts/utils/image_utils.py#L56
from im2markup.
Oh! Really appreciate it!
from im2markup.
btw, for the screenshots, you might also need to make sure that they are in grayscale, and downsampled by 2 if you took a screenshot of an unpreprocessed image.
@da03 would you please tell me why need downsampled by 2 ?
Great thanks
from im2markup.
It's because during preprocessing we downsampled by 2. Since deep neural networks do not work on out-of-domain data, at test time we need to do the same preprocessing. In order to get a model that's robust against resolutions or color maps, we need to add those transformations/noise during training as well.
from im2markup.
I am also facing the same issue and not getting results for the images outside the test dataset I did the preprossing step using below:
but still not getting the sensible results
onmt_preprocess -data_type img
-src_dir data/im2text/images/
-train_src data/im2text/src-train.txt
-train_tgt data/im2text/tgt-train.txt -valid_src data/im2text/src-val.txt
-valid_tgt data/im2text/tgt-val.txt -save_data data/im2text/demo
-tgt_seq_length 150
-tgt_words_min_frequency 2
-shard_size 500
-image_channel_size 1
from im2markup.
Related Issues (20)
- - HOT 1
- not working for below type of images (other than given by you). I think we need to put images in particular format HOT 8
- can anyone share the trained model file which is genralized on any type of image like mathpix HOT 3
- [Please Respond] Can you help me training the model for to recognize the out of given data image set HOT 1
- how to remove katex parser error HOT 1
- target vocab size HOT 5
- There is a bug in preprocess_latex.js HOT 3
- error importation cudnn HOT 20
- [regarding real dataset] Please respond HOT 18
- I am getting None with intermediate weights HOT 1
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 2270: invalid continuation byte HOT 7
- How to make code show predicted mathematical expression in latex format HOT 1
- can you explain about value 'Accuracy'?
- why downsample by 2 in preprocess HOT 2
- Why using lua instead of python? HOT 1
- can you explain src\modeel\cnn.lua
- Getting low accuracy using customized images for test. HOT 2
- 'perl' and 'cat' is not recognized
- Can you provide a vocab dictionary?
- The python version of the dataset resource is not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from im2markup.