robertostling / eflomal Goto Github PK
View Code? Open in Web Editor NEWEfficient Low-Memory Aligner
License: GNU General Public License v3.0
Efficient Low-Memory Aligner
License: GNU General Public License v3.0
Hi,
I train a huge alignment model using +15M sentences and we get "failed to read token: Success". Also, on another occasion, I get the error "failed to read length: Success"
It does not occur with smaller data. Nither with small data with very long sentences.
Could it be the memory problem? Maybe eflomal saves some temporal files which I have not taken into account?
Any idea?
regards and thanks in advance
Would it be hard to add the fastalign
format -i
option back ? It would be useful to avoid having to change scripts :)
Hi, what if I want to increase MAX_SENT_LEN from 0x400 to, let's say 0x1000. (changes max sentence's length in eflomal.c and eflomal.pyx). Would it work, what's your recommendation about increasing allowed maximum sentence length more than current limitation 1024 characters ?
I was hoping there was an aligner that actually worked on Windows, but unfortunately it doesn't. The first issue I bump into is the file handling.
On Windows you can't re-open NamedTemporaryFile
files that are open - it will give a Permission Denied error.
EDIT: I spoke before my turn. The issue seems more intricate than what I had assumed. Apologies. The issue remains, though.
Trace:
PS C:\tools\eflomal> python .\align.py -s .\source.txt -t .\target.txt
Traceback (most recent call last):
File ".\align.py", line 142, in <module>
if __name__ == '__main__': main()
File ".\align.py", line 136, in main
use_gdb=args.debug)
File "python\eflomal\eflomal.pyx", line 123, in eflomal.align
with open(source_filename, 'rb') as f:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\user\\AppData\\Local\\Temp\\tmphqbksy6m'
Hi @robertostling , I'm somewhat new with this, but I just tried compiling eflomal into my Ubuntu machine in WSL 2 and got an error while running the make command and I cannot find any solution on Google.
Here's the output:
cc -Ofast -march=native -Wall --std=gnu99 -Wno-unused-function -g -fopenmp -c eflomal.c
cc1: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
make: *** [Makefile:10: eflomal.o] Error 1
Any idea why?
Any help would greatly be appreciated. I just can't get around this issue.
I'm trying to run eflomal
, but keep getting the following error:
Traceback (most recent call last):
File "./align.py", line 142, in <module>
if __name__ == '__main__': main()
File "./align.py", line 136, in main
use_gdb=args.debug)
File "python/eflomal/eflomal.pyx", line 152, in eflomal.align
subprocess.call(args)
File "/home/bene/anaconda3/lib/python3.6/subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/bene/anaconda3/lib/python3.6/subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "/home/bene/anaconda3/lib/python3.6/subprocess.py", line 1333, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'eflomal'
I tried running both eflomal
and align.py
, from both the eflomal
directory and outside it. Am I doing something wrong?
I found publication where is written:
We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners – even with abundant parallel data; e.g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences
https://arxiv.org/pdf/2004.08728.pdf
what do you think about it?
I started research few hours ago and do not have opinion and do not know if should I learn more about eflomal or about "Static and Contextualized Embeddings"
I install eflomal into a prefix, as follows:
cloud-user@opus-rr:~/guarani/source/eflomal$ python3 setup.py install --prefix=$HOME/guarani/local
running install
running bdist_egg
running egg_info
writing requirements to eflomal.egg-info/requires.txt
writing eflomal.egg-info/PKG-INFO
writing dependency_links to eflomal.egg-info/dependency_links.txt
writing top-level names to eflomal.egg-info/top_level.txt
reading manifest file 'eflomal.egg-info/SOURCES.txt'
writing manifest file 'eflomal.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.4/eflomal.cpython-34m.so -> build/bdist.linux-x86_64/egg
creating stub loader for eflomal.cpython-34m.so
byte-compiling build/bdist.linux-x86_64/egg/eflomal.py to eflomal.cpython-34.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying eflomal.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying eflomal.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying eflomal.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying eflomal.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying eflomal.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.eflomal.cpython-34: module references __file__
creating 'dist/eflomal-0.1-py3.4-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Creating /home/cloud-user/guarani/local/lib/python3.4/site-packages/site.py
Processing eflomal-0.1-py3.4-linux-x86_64.egg
removing '/home/cloud-user/guarani/local/lib/python3.4/site-packages/eflomal-0.1-py3.4-linux-x86_64.egg' (and everything under it)
creating /home/cloud-user/guarani/local/lib/python3.4/site-packages/eflomal-0.1-py3.4-linux-x86_64.egg
Extracting eflomal-0.1-py3.4-linux-x86_64.egg to /home/cloud-user/guarani/local/lib/python3.4/site-packages
eflomal 0.1 is already the active version in easy-install.pth
Installed /home/cloud-user/guarani/local/lib/python3.4/site-packages/eflomal-0.1-py3.4-linux-x86_64.egg
Processing dependencies for eflomal==0.1
Searching for numpy==1.8.2
Best match: numpy 1.8.2
numpy 1.8.2 is already the active version in easy-install.pth
Using /usr/lib/python3/dist-packages
Finished processing dependencies for eflomal==0.1
Everything seems to work, and the prefix is in my PYTHONPATH:
$ echo $PYTHONPATH
:/home/cloud-user/guarani/local/lib/python3.4/site-packages/
But when I try to run it I get the following error:
$ cat ../../iterations/grn-spa.0.eflomal
Traceback (most recent call last):
File "align.py", line 142, in <module>
if __name__ == '__main__': main()
File "align.py", line 136, in main
use_gdb=args.debug)
File "eflomal.pyx", line 156, in eflomal.align (python/eflomal/eflomal.c:3657)
File "/usr/lib/python3.4/subprocess.py", line 537, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.4/subprocess.py", line 1457, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '/home/cloud-user/guarani/source/eflomal/eflomal'
Any ideas?
Dear Robert,
after installed the required dependencies, then build Elomal, I have run the Python install script:
$ sudo python3 setup.py install
running install
Checking .pth file support in /usr/local/lib/python3.4/dist-packages/
/usr/bin/python3 -E -c pass
TEST PASSED: /usr/local/lib/python3.4/dist-packages/ appears to support .pth files
running bdist_egg
running egg_info
creating eflomal.egg-info
writing requirements to eflomal.egg-info/requires.txt
writing eflomal.egg-info/PKG-INFO
writing dependency_links to eflomal.egg-info/dependency_links.txt
writing top-level names to eflomal.egg-info/top_level.txt
writing manifest file 'eflomal.egg-info/SOURCES.txt'
reading manifest file 'eflomal.egg-info/SOURCES.txt'
writing manifest file 'eflomal.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'eflomal' extension
creating build
creating build/temp.linux-x86_64-3.4
creating build/temp.linux-x86_64-3.4/python
creating build/temp.linux-x86_64-3.4/python/eflomal
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include/python3.4m -c python/eflomal/eflomal.c -o build/temp.linux-x86_64-3.4/python/eflomal/eflomal.o
In file included from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0,
from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
from /usr/lib/python3/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from python/eflomal/eflomal.c:353:
/usr/lib/python3/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
python/eflomal/eflomal.c: In function ‘__pyx_f_7eflomal_write_text’:
python/eflomal/eflomal.c:2328:7: error: format not a string literal and no format arguments [-Werror=format-security]
fprintf(__pyx_v_f, __pyx_k_0);
^
...
At the end, the script failed with exit status 1
gcc version 4.8.4
Thank you for help, best, Philippe
Trying to execute the make command I get the following error:
clang: error: unsupported option '-fopenmp'
I tried to install gcc via brew using:
brew install cmake gcc
but it doesn't change anything.
How to fix it?
Hi Robert,
thank you for your help and the nice development you made with your team, I want to give Eflomal a try.
But can you add some indications about the input data format supported by Eflomal (or tell me where it is written)?
I have formatted my data like in the "3rdparty/data/" folder and just learned (see #2 (comment)) it wasn't necessary :-) .
Best, Philippe
When I try to run the make command, I get the following error:
ld: library not found for -lrt collect2: error: ld returned 1 exit status make: *** [eflomal] Error 1
I am using gcc 8.3
Any ideas?
Hi, I can not compile the eflomal with gcc 4.9.3 using the make command. The error is as follows. I can not find the solution through Google. Do you encounter the error ever?
cc -lm -lrt -lgomp eflomal.o -o eflomal
eflomal.o: In function align._omp_fn.2': /home/user/work/tool/align/eflomal/eflomal.c:867: undefined reference to
omp_get_num_threads'
/home/user/work/tool/align/eflomal/eflomal.c:867: undefined reference to omp_get_thread_num' /home/user/work/tool/align/eflomal/eflomal.c:870: undefined reference to
GOMP_critical_start'
/home/user/work/tool/align/eflomal/eflomal.c:870: undefined reference to GOMP_critical_end' eflomal.o: In function
align._omp_fn.1':
/home/user/work/tool/align/eflomal/eflomal.c:848: undefined reference to omp_get_num_threads' /home/user/work/tool/align/eflomal/eflomal.c:848: undefined reference to
omp_get_thread_num'
/home/user/work/tool/align/eflomal/eflomal.c:851: undefined reference to GOMP_critical_start' /home/user/work/tool/align/eflomal/eflomal.c:851: undefined reference to
GOMP_critical_end'
eflomal.o: In function main._omp_fn.0': /home/user/work/tool/align/eflomal/eflomal.c:1007: undefined reference to
omp_get_num_threads'
/home/user/work/tool/align/eflomal/eflomal.c:1007: undefined reference to omp_get_thread_num' eflomal.o: In function
align':
/home/user/work/tool/align/eflomal/eflomal.c:848: undefined reference to GOMP_parallel' /home/user/work/tool/align/eflomal/eflomal.c:867: undefined reference to
GOMP_parallel'
/home/user/work/tool/align/eflomal/eflomal.c:867: undefined reference to GOMP_parallel' /home/user/work/tool/align/eflomal/eflomal.c:848: undefined reference to
GOMP_parallel'
eflomal.o: In function main': /home/user/work/tool/align/eflomal/eflomal.c:955: undefined reference to
omp_set_nested'
/home/user/work/tool/align/eflomal/eflomal.c:1007: undefined reference to `GOMP_parallel'
collect2: error: ld returned 1 exit status
: recipe for target 'eflomal' failed
make: *** [eflomal] Error 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.