Coder Social home page Coder Social logo

apertium / apertium-eng-spa Goto Github PK

View Code? Open in Web Editor NEW
2.0 10.0 8.0 676.45 MB

Apertium translation pair for English and Spanish

License: GNU General Public License v2.0

Makefile 0.11% XSLT 0.12% Shell 0.01% M4 0.01% Python 0.01% Perl 0.01% XML 99.38% Rich Text Format 0.37%
apertium-trunk

apertium-eng-spa's Introduction

English--Spanish
===================================================================

You need apertium-3.0 and lttoolbox-3.0 to use this translator.  To compile
the linguistic data simply do:

autoreconf -fvi
./configure
make
make install

inside of this directory.


===================================================================

More information about this module, and others can be found on
the Apertium: Wiki, https://wiki.apertium.org


apertium-eng-spa's People

Contributors

albertonl avatar alxmamaev avatar bentley avatar ftyers avatar gramirez-prompsit avatar jaspock avatar jimregan avatar jonorthwash avatar kindleton avatar kj7rrv avatar mespla avatar mlforcada avatar mr-martian avatar nordfalk avatar paulomalvar avatar sushain97 avatar tinodidriksen avatar tradumatica avatar unhammer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-eng-spa's Issues

Detecting Names (at least ones with titles)

"Mrs de Palacio" is translated as "Mrs of Palace".
Even without complex NERs, it should consider stuff in front of titles as Names.

Transfer rule to ignore/transliterate, probably?

Pronoun generation when translating verbs

When Spanish verbs are translated to English, the pronoun isn't generated even when it is not ambiguous (at least in the examples I came across).

For example,
"Si la Asamblea está de acuerdo, haré lo que el señor Evans acaba de sugerir."
translates to
"If the Assembly agrees, will do what the gentleman Evans finishes to suggest."

This should be 'I will do'.
A transfer rule should fix this.

Find out if WikDict dictionaries can be used to improve vocabulary

I'm the developer of http://www.wikdict.com and I'm considering to use the generated dictionaries to improve apertium. If this works out for one language pair, I'll be able to provide the same for many additional language pairs. The data comes originally from Wiktionary and is licensed under CC-BY-SA 3.0.
The same process might be usable for the dictionaries from http://www.freedict.org , but those are less homogenous, so I'll leave that for later.

I've done a quick first try to convert entries and would like some feedback on the current state.

Example:

<e><p><l>house<s n="n"></l><r>casa</r></p></e>
<e><p><l>house<s n="v"></l><r>alojar</r></p></e>
<e><p><l>house<s n="v"></l><r>envolver</r></p></e>
<e><p><l>house<s n="v"></l><r>almacenar</r></p></e>
<e><p><l>house<s n="v"></l><r>albergar</r></p></e>
<e><p><l>house<s n="v"></l><r>hospedar</r></p></e>
<e><p><l>house<s n="v"></l><r>encajar</r></p></e>

Full data at: http://download.wikdict.com/apertium/

Things to note:

  • I've just generated single lines of entries, not a valid XML file
  • This is generated from a monodirectional dictionary eng->spa
  • Gender and POS is still missing for the target language. I'll probably be able to add this for many entries by joining to the dictionary for the opposite translation direction
  • I didn't find a proper match for the following POS, so I left them like this: 'article', 'conjunction', 'particle', 'phraseologicalUnit', 'postposition', 'prefix', 'proverb', 'suffix', 'symbol'

My main question is: how close is this to being usable for Apertium and which are the minimum Todos before it will get any usage? It's obvious to me that this is not ready, yet. But I would like to have a realistic overview whether I can get it in a useable state at all before doing more complicated steps.

Move to three letter ISO codes

This pair should be moved to three letter ISO codes. The name should probably be apertium-eng-spa.

The following files (at minimum) will need to be checked:

  • Makefile.am
  • configure.ac
  • modes.xml
  • README

The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/

_US no longer works

Moving to three‐letter codes results in the US variant translation not working.

Comparing git master built from “make dist” to apertium-en-es-0.8.0:

$ echo 'Los colores' | apertium es-en   
The colours
$ echo 'Los colores' | apertium es-en_US
The colors
$ echo 'Los colores' | apertium spa-eng   
The colours
$ echo 'Los colores' | apertium spa-eng_US
USAGE: apertium-transfer trules preproc biltrans [input [output]]
       apertium-transfer -b trules preproc [input [output]]
       apertium-transfer -n trules preproc [input [output]]
       apertium-transfer -x extended trules preproc biltrans [input [output]]
       apertium-transfer -c trules preproc biltrans [input [output]]
       apertium-transfer -t trules preproc biltrans [input [output]]
  trules     transfer rules file
  preproc    result of preprocess trules file
  biltrans   bilingual letter transducer file
  input      input file, standard input by default
  output     output file, standard output by default
  -b         input from lexical transfer
  -n         don't use bilingual dictionary
  -x bindix  extended mode with user dictionary
  -c         case-sensitiveness while accessing bilingual dictionary
  -t         trace (show rule numbers and patterns matched)
  -T         trace, for apertium-transfer-tools (also sets -t)
  -z         null-flushing output on '
  -h         shows this message

eng tagger bugs

  • eng tagger is tagging $some_number as ^$ ^some_number<num>$
    input: AIDS treatments that may cost $18 000 per annum for one person are obviously not affordable by countries whose annual health budget may be less than $5 per capita
    output: ^AIDS<n><acr><sg>$ ^treatment<n><pl>$ ^that<cnjsub>$ ^may<vaux><inf>$ ^cost<vblex><inf>$ ^$ ^18<num>$ ^000<num>$ ^per<pr>$ ^*annum$ ^for<pr>$ ^one<num><sg>$ ^person<n><sg>$ ^be<vbser><pres>$ ^obviously<adv>$ ^not<adv>$ ^affordable<adj>$ ^by<pr>$ ^country<n><pl>$ ^whose<rel><aa><mf><sp>$ ^annual<adj>$ ^health<n><sg>$ ^budget<n><sg>$ ^may<vaux><inf>$ ^be<vbser><inf>$ ^less~than<pr>$ ^$ ^5<num>$ ^per~capita<adj>$ ^.<sent>$
  • Also, it is considering IT as pronoun
    input: system of electronic employment cards and using suitable IT systems to monitor all these policies.
    output: ^system<n><sg>$ ^of<pr>$ ^electronic<adj>$ ^employment<n><sg>$ ^card<n><pl>$ ^and<cnjcoo>$ ^use<vblex><ger>$ ^suitable<adj>$ ^prpers<prn><subj><p3><nt><sg>$ ^system<n><pl>$ ^to<pr>$ ^monitor<vblex><inf>$ ^all<predet><sp>$ ^this<det><dem><pl>$ ^policy<n><pl>$ ^.<sent>$

Missing translation for "brightness"

First of all, apologies if this is not the proper way to report this kind of issues. I've asked on #apertium channel of IRC and they suggested to report here.

I've found an unkown word on the translation from English to Spanish: brightness

I don't know how can contribute to include "brightness --> brillo" pair translation. Any insight? Thanks!

Superfluous whitespace added: eng-spa

Translating from Englsih to Spanish, extra whitespace is added, including a space before full stop at end of sentances.

  1. Pilar is 25 years old. She is studying medicine in Tarragona and has lots of friends.
    Pilar is very likable and kind.

  2. She is going out with a young man called Javier. He is 30 years old and works in Barcelona.
    Javier is an engineer. Javier likes going out in the evenings, going to the cinema and meeting friends.

  1. Pilar tiene 25 años . Está estudiando medicina en Tarragona y tiene muchos amigos.
    Pilar es muy *likable y clase.

  2. Está saliendo con un hombre joven llamó Javier. Tiene 30 años y obras en Barcelona.
    Javier es un ingeniero . A Javier le gusta salir en las tardes, yendo al cine y cumpliendo amigos.

Brackets '[' and ']' translation bug

lt-proc -b spa-eng.autobil.bin
on
^común<adj><mf><sg>$ ^[<lpar>$ ^*COM$ ^(<lpar>$ ^2002<num>$ ^)<rpar>$ ^186<num>$ ^?<sent>$ ^*C5$ ^-<guio>$ ^0331<num>$ ^2002<num>$ ^-<guio>$ ^2002<num>$ ^2175<num>$ ^(<lpar>$ ^*COS$ ^)<rpar>$ ^]<rpar>$ ^.<sent>$
gives me
^común<adj><mf><sg>/common<adj><sg>$ [<lpar>$ ^*COM$ ^(<lpar>$ ^2002<num>$ ^)<rpar>$ ^186<num>$ ^?<sent>$ ^*C5$ ^-<guio>$ ^0331<num>$ ^2002<num>$ ^-<guio>$ ^2002<num>$ ^2175<num>$ ^(<lpar>$ ^*COS$ ^)<rpar>$ ^]^<rpar>/@<rpar>$ ^.<sent>/.<sent>$

Build issue (AP_MKINCLUDE)

I am trying to build the data for this language pair using,

autoreconf -fvi
./configure
make
make install

./configure succeeds, but includes the output:

./configure: line 3052: AP_MKINCLUDE: command not found

Then, make does some things, but then errors with

make: *** No rule to make target 
'modes/eng-spa.mode', needed by 'all-am'.  Stop.

This is on macOS and I have,

$ apertium -V
Apertium 3.8.1
$ lt-proc --version
lt-proc version 3.6.8

It seems like it's missing a function that I see in the apertium repo (AP_MKINCLUDE in apertium.m4). Is there some setup I need with apertium install beyond having the tools in my path?

Thanks

Repo size

This repo is an absurd 662 MiB. There's a bunch of big files that do not need to exist in history, so at some point this repo should be scrubbed.

Move to three letter ISO codes

This pair should be moved to three letter ISO codes. The name should probably be apertium-eng-spa.

The following files (at minimum) will need to be checked:

  • Makefile.am
  • configure.ac
  • modes.xml
  • README

The pair should also be checked to see if it can be adapted to work with monolingual language packages in languages/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.