Coder Social home page Coder Social logo

own-pt / openwordnet-pt Goto Github PK

View Code? Open in Web Editor NEW
150.0 17.0 35.0 742.85 MB

OpenWordnet-PT: an open access wordnet for Portuguese

Home Page: http://openwordnet-pt.org

License: Other

Shell 100.00%
computational-linguistics wordnet lexical-semantics

openwordnet-pt's Introduction

Open Portuguese WordNet (OWN-PT)

This repository hosts Portuguese WordNet data in textual format, this is an experimental branch of http://openwordnet-pt.org. It is linked to (but independent from) the Open English WordNet.

You can also get the data in JSON and RDF format.

See the Wiki for how the data was generated, how it compares to Princeton WordNet and what is the syntax of the text files. This data is validated and exported by the mill tool — see its repository for more information about validation, export formats, etc.

openwordnet-pt's People

Contributors

arademaker avatar fcbr avatar fredsonerd avatar gdemelo avatar odanoburu avatar rfhaeusler avatar vcvpaiva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openwordnet-pt's Issues

Hugo report

  • formas não lematizadas (e.g. no plural)
  • formas terminadas em '' (e.g. tempo)
  • muitas EMs... (países, estados, rios)
  • república_da

Verbs of Emotion in OpenWN-PT

  1. 01819147-v 'deprive of courage or hope'; V2; add to the Portuguese entry (empty at the moment)
    desencorajar, dissuadir, desanimar
    which are in the Spanish and Catalan versions
  2. 00838043-v ✪ 'make believe with the intent to deceive'; V2;
    PT: simular , aparentar , proceder com hipocrisia , fingir
    remove "proceder com hipocrisia"
  3. 02684924-v 'continue a certain state, condition, or activity'; V2;
    PT: prosseguir , continuar , sustenar
    Remove "sustenar"? I think it doesn't exist in Portuguese?

don't know what to do about "causar aborrecimento"
01821884-v 'cause to be bored'; V2;
PT: fatigar , aborrecer , chatear , entediar , cansar , causar aborrecimento
remove it or not?

Adding verbs without se to OWN-PT

Missing from OPenWN-PT Valeria's suggestion for synset

1.safar 02074377-v
2.revoltar 02583139-v
3.reafirmar 01011923-v
4.queixar 00907147-v
5.pavonear 02141973-v
6.onerar (only have "onerar com") NA
7. malograr NA
8.irradiar 02686952-v
9.intrometer 00780191-v
10.insurgir 02583139-v
11.habituar 00273445-v
12.filiar 02598765-v
13.exaurir 00075021-v
14.escancarar 02718750-v
15.enroscar 01868370-v
16.enamorar 00148597-v
17.empavonar 02141973-v
18.embasbacar 02164531-v
19.desenganar 01798936-v
20.descamar 00009492-v
21.desbocar 00865387-v
22.corresponder 01006810-v
23.contorcer 01868370-v
24.compenetrar 00728954-v
25.assegurar 00890590-v
26.arrepender 01796582-v
27.aproximar 02666060-v
28.apiedar (-se de) 01821996-v
29.aperceber 02106506-v
30.apear 01958452-v
31.apartar 01560984-v
32.amoldar 01560984-v
33.aconchegar 01425348-v

Typos in Verbs

TYPOs:

  1.  00859153-v   annimar-se, alegrar-se     become cheerful
    

annimar-se--> animar-se
2.
00860292-v aclamar, aplaudir, applaudir, louvar express approval of

applaudir --> remove as 'aplaudir' is there

arquar (pode ser coisa de portugal...) --> curvar-se "The road bends"

  1. 00276373-v disordenar, baguncar bring disorder to

baguncar --> bagunçar (cedilha faltando)
1.
00636888-v basar, fundamentar use as a basis for; found on

basar ---> basear (Catalao?)

  1. 02668523-v cometer uma tentado fail to agree with; be in violation of; as of rules or patterns

cometer uma tentado --> "ir contra"

  1. 00527232-v compost convert to compost

compost ---> "fazer adubo"

8.de ?

We need to change 5 synsets below:

8a. 02031335-a      de esquerda, de, esquerda, esquerdista  believing in or supporting tenets of the political left
    remove "de, esquerda" as we already have "de esquerda" and "esquerdista"

8.b 04452615-n      caixa, ferramentas, de  a box or chest or cabinet for holding hand tools

transform "caixa, ferramentas, de" into "caixa de ferramentas"

8.c 09866222-n      apostas, de, agenciador     a gambler who accepts and pays off bets (especially on horse races)

transform "apostas, de, agenciador" into "agenciador de apostas" ???

8.d 00411570-r      alguma, de, forma   in no manner

transform "alguma, de, forma" into "de forma alguma"

8.e 01188144-v      morrendo, de, faminto, esfomear, fome, estar    be hungry; go without food

transform "morrendo, de, fome" into "morrer de fome", "esfomear", "estar faminto"

dyskwalifikować
not found anymore...

  1. 00361797-v      inchar, edemaciar   become bloated or swollen or puff up
    

remove "edemaciar"

  1. em ?
    00627824-v ler, diagonal, superficialmente, em read superficially
    transformar "ler, diagonal, superficialmente, em read superficially" into
    "ler em diagonal", "ler superficialmente"

12.00431327-v espessar-se, engrossar, avolumar make thick or thicker
Remover espessar-se

  1. 01065456-v explanar, explicar define
    Remove "explanar"
  2. 00700896-v      inicializar, inciar     assign an initial value to a computer program
    
    transform "inciar" to "iniciar"

15.02384686-v invitar, convidar invite someone to one's house
remove "invitar"

16.00794640-v irver, fazer visita have recourse to or make an appeal or request for help or information to

This synset is completely wrong. Remove "irver, fazer visita " add "se dirigir"

  1. 02098827-v      peneirar, joeirar   move as if through a sieve
    

Keep as it is, I believe it's from Portugal (joio e trigo)

  1. 01322854-v      jugar, trucidar, massacrar, matar   kill (animals) usually for food consumption
    

    transform "jugar" into "abater"

lf ? not found now

  1. 02066510-v      luir, fluir, correr     move or progress freely as if in a stream
    

    remove "luir"

  2. 01492725-v      contundir, machudar     injure the underlying soft tissue or bone of
    

    machudar --> machucar

  3. 01649999-v propulsar, montivar, provocar, motivar, estimular give an incentive for action
    Remove "montivar"

  4. 02250340-v      pag antecipadamente, franquear  pay for something before receiving it
    

    pag antecipadamente --> pagar antecipadamente

  5. 02737569-v      pertenecer, pertencer   be rightly classified in a class or category
    

    Remove "pertenecer"

  6. 00924777-v      prophetizar     foretell by divine inspiration
    

    prophetizar --> profetizar

  7. 00343334-v      reocorrer, repitir, acontecer de novo   happen or occur again
    

    repitir --> repetir

  8. 02223630-v      mondar, sachar  remove unwanted elements
    

    remover "mondar, sachar" adicionar "remover indesejaveis"

  9. 01196037-v      se, abster-se, privar, abster   choose not to consume
    

    Remove "se"

  10. 01319885-v      gadanhar, segar, ceifar     cut with a scythe
    

    01320009-v colher, segar, respigar gather, as of natural products
    Remove "segar" form both? might be from Portugal

  11. 01272457-v      alcatroar, cobrir com alcatrão, tar    coat with tar
    

    Remove "tar' from synset.

Thanks,
Valeria

salad

07806221-n ✪ 'food mixtures either arranged on a plate or tossed and served with a moist dressing';
should only be "salada" in Portuguese, not
salada , Saladas

remove casa de Synset 02913152-n

no momento temos: casa, edifício, edifícios, prédio
Remove casa and edificios, leaving only edifício, prédio.

Synset 02726305-n

now: Portuguese casa, apartamento, aposentos
desired: Portuguese apartamento

Synset 03544360-n
now: Portuguese teatro, habitação, casa, Habitações, vivenda, edifícios residenciais, firma
desired: Portuguese casa, habitação, vivenda, edifício residencial

Synset 08559508-n
now: Portuguese casa, Lar
desired: Portuguese casa, lar

Synset 08078020-n
now: Portuguese agregado familiar, pessoa da família que mora na mesma casa, casa, família, classe, linhagem
desired: Portuguese casa, família, lar

Synset 03002816-n
now: Portuguese casa, chalé
desired: Portuguese chalé

Synset 15273626-n

Remove the word 'ano' from this Synset 15273626-n. it must be a typo.
also remove the word 'ano' from synset 15153787-n.

Remove "Presidente, Listas de presidentes" from the synset
10468750-n reitor, Presidente, Listas de presidentes, presidente

Remove "Presidente dos estados unidos, Presidente dos Estados Unidos, Presidente dos estados unidos da américa, Presidente dos Estados Unidos da América"
below, add "presidente dos Estados Unidos da América" to the synset

10467395-n Presidente dos estados unidos, Presidente dos Estados Unidos, Presidente dos estados unidos da américa, Presidente dos Estados Unidos da América, presidente

Remove the plural form from
06556481-n mandatos, mandato a document giving an official instruction or command
Remove plural and capitalized from synset
08256968-n Partido Político, partido, Partido político, partidos políticos, partido político
leaving only partido político, partido.

Add "poder, ofício" to synset
13945102-n cargo (of a government or government official) holding an office means being in power
Remove Eleicoes from synset
00181781-n Eleições, eleição a vote to select the winner of a position or political office.
Remove "países, pais" from
08168978-n países, nação, país, pais, república, união
thanks!

Inglaterra Synset 08871007-n

Synset 08871007-n reads Portuguese Esportistas da Inglaterra, Inglaterra
remove Esportistas da Inglaterra,
Synset 08885211-n reads Portuguese Inglaterra, Yorkshire
remove Inglaterra,

incluir

célere -> lenta
celeridade -> lentidão

DHBB checking 1 (nouns)

Remover the word "ano" from Synset 15153787-n
Remove the word "ano" from Synset 15273626-n
Remove the words "Listas de presidentes", "Presidente" from Synset 10468750-n
Remove the words "Presidente dos Estados Unidos, Presidente dos estados unidos, presidente, Presidente dos estados unidos da américa" from Synset 10467395-n.
Add the word "legenda" to synset 08256968-n

DHBB-3

Remove "Membros" from synset 05560244-n
Remove "Penis, Penís, Pénis," from synset 05526384-n
Remove "candidato" from Synset 10001647-n, add "demandante"
Remove "Candidatos" from Synset 09890749-n, add "concorrente"
Remove "movimento" from Synset 00294190-n???not sure what to add instead

corrigir: Arak? remove, add 'desertar' main verb here

Synset 02584097-v
Portuguese Arak, abandonar
substitute by desertar, abandonar

Synset 00496673-v
substitute "abandonar' by 'deixar'

Synset 02227741-v
change Portuguese abandonar
for desistir, abandonar

Synset 01083044-v
change to
desistir, abandonar, sair fora

Synset 02316304-v
change to
renunciar, desistir, abandonar

Synset 02229055-v
change Portuguese legar, entregar, dar, testar, abandonar
to deixar, passar, dar, entregar, legar (nao tem nem testar, nem abandonar)

Synset 00613683-v change to
deixar, abandonar

Synset 02223136-v change to
desfazer-se de, abandonar

Synset 00363110-v change to
desistir, parar, abandonar

Synset 02383440-v
change to
deixar, partir, abandonar

Synset 02303331-v change to
renunciar, perder, privar-se

Synset 01579028-n

at the moment it says:
Portuguese Corvus, Corvo
it should be simply
Portuguese corvo

No Capitals, no Latin name.

Synset 02236124-v

Reads at the moment
Portuguese admitir, obter, aceitar
remove admitir, from the set

Synset 01263445-a

we have at the moment
Portuguese animal, brutal, pecuário
remover pecuário, substituir por bestial

Synset 01270004-a

sedento
is correct
mas adicionar "com sede" ?
gloss: com necessidade ou vontade de beber "depois de brincar as criancas estavam com sede"

abduzir e aduzir sao diferentes, mas foram "conflated"

synset Synset 01471043-v is about 'abduct' sequestrar, extrair, in Portuguese abduzir.
while Synset 01015866-v is about 'abduce, adduce, cite' in Portuguese aduzir (alegar, citar) but the two have been conflated in 01015866-v in Portuguese, which has
Portuguese alegar, retrair, aduzir, abduzir.
Have to remove abduzir from this synset and create a new one for
01449427-v: abduct which doesn't exist in Portuguese, yet.

to sing or not to sing

01067816-v 'to make melodious sounds'; V1, V2;
01731031-v 'deliver by singing'; V2;

(with-synset 01731031-v
(add "cantar")
)

(with-synset 01067816-v
(add "cantar")
)
don't know if we should add "cantar" also to
01043887-v. 'make a whining, ringing, or whistling sound'; V1;
a gente diz "sair cantando pneu"...

corrigir

Camara (legislativa)

Need to translate synset Synset 08318904-n as camara (with circumflex on a) camera legislativa. useful for the work on DHBB.

Ator

09765278-n ✪ 'a theatrical performer';
PT:intérprete , Atores de teatro , atriz , actriz , ator , artista , cômico , comediante , actor
remove "Atores de teatro".
Keep atriz and "actriz" (from Portugal?) wordnet has also
09767700-n 'a female actor';
PT: atriz , actriz , ator , actor
so problem with gender is worse than I expected...

DHBB-4

Remove "lua" from synset 15209413-n
Remove "lua" from Synset 15206296-n
Remove the word "generalidades" from Synset 10123844-n
Remove the word "Generais" from 10125786-n
Remove "Ciência política, Ciência Política" from Synset 06148148-n
Remove " empresas" from synset 08056231-n
Remove "dia, tempo (parâmetro)" from Synset 15122231-n
Remove "braço, perna, Mão" from Synset 05564590-n
We have a problem with Synset 01197634-a 'most helpful and reliable' the example seems to be for
"braço direito", ie my right-hand man

Synset 00826509-v

Reads at the moment
Portuguese apreciar, criticar
remove apreciar!! this is very wrong.

gerar novo arquivo RDF

  1. gerar novo RDF
  2. renomear blank nodes (words e wordsenses)
  3. corrigir nome de word nodes que estão com wordXXXX e não word-XXXX onde XXXX é a forma léxica da palavra com espaços trocados por underscore.

pineapple

(with-synset 07753275-n
(remove "Ananas"))

Synset 07381231-n

devia ser cocoricoco' e
Synset 07382286-n devia ser cacarejo (do galo)

synset Synset 01579028-n
devia ser simplesmente corvo
nao:
Corvus, Corvo

DHBB-2

Remove "Partido político, partidos políticos, Partido Político," from synset 08256968-n
Remove "Eleições" from synset 00181781-n
Remove " pais, países, " from synset 08168978-n
Remove "myto" from synset 06372680-n
Remove "Descendência ", but add "descendência" from/to synset 10373998-n
Remove "filho, criança, menino" from Synset 09992837-n

Fruit trees problems

NB: For many fruits we will have this problem.
WN has two senses for "avocado", the fruit and the tree. In Portuguese we have two different lexical items: abacate and abacateiro and OpenWN-PT now has both items for both senses. so we need to remove one item from each synset:
07764847-n (15) abacateiro, abacate avocado, alligator pear, avocado pear, aguacate a pear-shaped tropical fruit with green or blackish skin and rich yellowish pulp enclosing a single large seed
11706761-n
abacateiro, abacate
avocado, avocado tree, Persea Americana

THAT IS:
(with-synset 07764847-n
(remove "abacateiro"))

(with-synset 11706761-n
(remove "abacate"))

Synset 07644382-n bird

a traducao correta em portugues 'e so' "ave" nesse caso. veja a explicacao:

the flesh of a bird or fowl (wild or domestic) used as food 

precisamos remover 'passaro' e adicionar 'ave'.

another one: Synset 01579028-n
Portuguese Corvus, Corvo
should be only "corvo"

e Synset 07381231-n devia existir em portugues como cocoricoco'.
enquanto
Synset 07382286-n devia existir em portugues como cacarejo (similar ao espanhol)

Synset 13104059-n
Portuguese Árvores, árvore, árvores, Arvore, árvove
remove all the rest, keep only árvore
Synset 13912260-n missing in portuguese árvore (grafo)

Removing 'verbos conjugados' from synsets

  1. 00907147-v queixar-se, queixe-se, lastimar-se, lamentar, reclamar express complaints, discontent, displeasure, or unhappiness

Remover 'queixe-se', adicionar 'queixar', sem 'se'.

  1. 02073714-v fugir, evada-se, escapar, esconder-se run away; usually includes taking something or somebody along

transformar 'evada-se' em 'evadir'

  1. 00059899-v fazer abortar, ter mau êxito, malograr-se, abortar terminate a pregnancy by undergoing an abortion

remove ter mau êxito, malograr-se

00579977-n too many expressions

00579977-n Serviço Militar , conscrição , serviço militar obrigatório , Serviço militar

I suggest leaving only 'serviço militar', as even this might be wrong, if the definition is to be believed.

abdicar

retirar do synset Synset 02379198-v o verbo deserdar
should read
abdicar, renunciar, abrir mão de, demitir-se, resignar,

Synset 00014742-v sleep

We have at the moment
Portuguese dormir, cochilar, tirar uma soneca, pernoitar
remove pernoitar
as it does not correspond to the physical act of going to sleep.

More typos in verbs

  1.  02673965-v      exceler, distinguir-se, ser o melhor, adquirir excelência, ir além dos outros     distinguish oneself
    

    remover exceler --> exceder?
    remover "ir além dos outros"

  2.  01203074-v      abastar     be able to feed
    

    remover abastar --> manejar?

melhorias do Alberto

04258982-n 'the underside of footwear or a golf club';
(with-synset 04258982-n
(remove "Planta do pé"))

Need a template to add new entries:

  1. we don't have the verb 'solar' em 'solar o bolo, ou solar sapatos'. (heel=colocar novo salto no sapato)
  2. need a mechanism for when the synset exists and should be there, but it isn't.
    example: have inscricao in
    06405699-n 'letters inscribed (especially words engraved or carved) on something';
    but don't have it in
    00615011-n 'the activity of inscribing (especially carving or engraving) letters or words';
    difference between the activity and the result.
  3. need a mechanism for extending a sense(?? don't know, for discussion)
    example: have adjective desinteressante as
    00691696-a ✪ 'lacking physical depth';
    but don't have it as "dull"
    00393992-a '(of color) very low in saturation';

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.