commonsense / conceptnet Goto Github PK

View Code? Open in Web Editor NEW

271.0 271.0 50.0 7.53 MB

ConceptNet: a semantic network of common sense knowledge

Home Page: http://csc.media.mit.edu/conceptnet

License: GNU General Public License v2.0

Python 100.00%

conceptnet's People

Contributors

Stargazers

Watchers

Forkers

barraq amumu-dev dheerajrajagopal dallarosa brainsciences kyoungrok0517 pombredanne michwill web5design won21kr vcillusion liyanghua azizur77 errord phlizik ilyych magicbill yangls06 bruce2008github javalover qutian lcmaizld chenhx rockefys liang2012 marviel xuanhan863 hysteam snowman828 psyoblade riseofthetigers wrapperband raymondeng guangminglion lwllovewf2010 colinsongf aaronzhangl bidexbido plexzhang littlefei omari1988 colubragens chenny0808 c4pt000 jaeyun95 waldo2590 waelbou3

conceptnet's Issues

REST term lists aren't normalized

The wiki states that: "Every term in the term list will be normalized according to the language you specify, so for example /list/en/dogs is the same as /list/en/dog."

But that's not the case:
http://conceptnet5.media.mit.edu/data/5.2/assoc/list/en/dog
http://conceptnet5.media.mit.edu/data/5.2/assoc/list/en/dogs

The former has many results; the latter has none.

"get" and "getting" are handled inconsistently

(from Launchpad bug list)

We have a number of passive concepts, like "get fired" and "getting served". They should be treated differently than their active counterparts, e.g., "fire" and "serve". "getting served" normalizes to "get serve", which seems right, but "get fired" normalizes to "fire".

I suspect "get" is a stopword but "getting" isn't, and stopword removal happens before lemmatization. Fixing this bug will require a test that clearly illustrates the desired behavior lest it break again ;)

Make PostgreSQL dumps work

A user reported on 2009-11-13 that PostgreSQL dumps are still not working.

My last update (from 2009-08-17):

I now have a script in my csc-sql home directory that will make a PostgreSQL dump for ConceptNet 4. The old script is still running, though.

Ken, the weekly script is writing to a directory that only you have write access to, and I don't know where it's running from. Can you update the script sometime, using the list of tables in ~rspeer/dump_db_without_users?

Portuguese data needs to be updated

We're running on a very old export of the Portuguese knowledge base (and it was even kind of vandalized at the time). We should update from the new database exports that Open Mind Commonsense no Brasil has sent us.

Outdated concept_net_client

Hi,

for a project for university I wanted to use this library to process some text.
For simplicity matters I just blindly copied the exemplary rest_client you've already implemented.
(https://github.com/commonsense/conceptnet/blob/master/conceptnet/webapi/rest_client.py)

While using it, I noticed that it is pretty much outdated. Since I still want to use your library via the web-API I figured that I could just refactor it and do a pull-request.

Now my question is: Is this even desired, otherwise I can also keep the changes to myself.

Have a great day,
Tim

Unix setup successful but issues with the output

we have been able to run all steps of the ConceptNet build process, and it works.
But, the output data does not provide anything but an empty string for "rel.label".
Can someone please assist?

e.g.
Output:
"rel": {
"@id": "/r/RelatedTo",
"@type": "Relation",
"label": ""
}

expected:
"rel": {
"@id": "/r/RelatedTo",
"@type": "Relation",
"label": "RelatedTo"
}

Please suggest if I missed something in the build process, and how it can be fixed?

database has unused data

Our database includes many tables that we don't use anymore. We should archive and remove them sometime.

However, we should not remove things that are used only occasionally, like the usertest_* tables.

Import more Chinese data

We have some Chinese data files, from the Taiwanese pet game, that should be imported into our database.

If RawAssertions were less wedged (issue GH-3), this would be easier.

Inconsistent web browse vs python API access

On browse page ConceptNet.io, search for "lead singer" gives reasonable return of related terms, as below:

Related terms

en front man (n) ➜
en frontman ➜
en frontwoman ➜
en vocalist ➜
en band ➜

However when accessing through python/API in this way:

obj = requests.get('http://api.conceptnet.io/related/c/en/lead_singer?filter=/c/en').json()

seem that it cannot handle the phrase:

[{'@id': '/c/en/lead_poisoning', 'weight': 0.706}, {'@id': '/c/en/lead_to', 'weight': 0.649}, {'@id': '/c/en/lead_acid', 'weight': 0.634}, {'@id': '/c/en/result_in', 'weight': 0.556}, {'@id': '/c/en/nicad', 'weight': 0.527}, {'@id': '/c/en/nickel_cadmium', 'weight': 0.506}, {'@id': '/c/en/alkaline_battery', 'weight': 0.503}, {'@id': '/c/en/lead', 'weight': 0.496}, {'@id': '/c/en/leads', 'weight': 0.491}, {'@id': '/c/en/electrolytic', 'weight': 0.472}, {'@id': '/c/en/batteries', 'weight': 0.462}, {'@id': '/c/en/plumbum', 'weight': 0.441}, {'@id': '/c/en/battery', 'weight': 0.434}, {'@id': '/c/en/lithium_battery', 'weight': 0.432}, {'@id': '/c/en/leadeth', 'weight': 0.429}, {'@id': '/c/en/li_ion', 'weight': 0.428}, {'@id': '/c/en/come_from', 'weight': 0.427}, {'@id': '/c/en/lithium_ion_battery', 'weight': 0.425}, {'@id': '/c/en/rechargeable_battery', 'weight': 0.413}, {'@id': '/c/en/duracell', 'weight': 0.408}, {'@id': '/c/en/electrolyte', 'weight': 0.407}, {'@id': '/c/en/bring_on', 'weight': 0.402}, {'@id': '/c/en/anode', 'weight': 0.397}, {'@id': '/c/en/terne', 'weight': 0.397}, {'@id': '/c/en/be_made', 'weight': 0.394}, {'@id': '/c/en/litharge', 'weight': 0.381}, {'@id': '/c/en/electrolytically', 'weight': 0.379}, {'@id': '/c/en/come_to', 'weight': 0.379}, {'@id': '/c/en/comes_to', 'weight': 0.378}, {'@id': '/c/en/charge_battery', 'weight': 0.377}, {'@id': '/c/en/cadmium', 'weight': 0.375}, {'@id': '/c/en/get_to', 'weight': 0.364}, {'@id': '/c/en/turn_to', 'weight': 0.362}, {'@id': '/c/en/calin', 'weight': 0.361}, {'@id': '/c/en/expected_to', 'weight': 0.361}, {'@id': '/c/en/led', 'weight': 0.359}, {'@id': '/c/en/end_up', 'weight': 0.347}, {'@id': '/c/en/mercury_poisoning', 'weight': 0.344}, {'@id': '/c/en/come_in', 'weight': 0.343}, {'@id': '/c/en/leaded', 'weight': 0.342}, {'@id': '/c/en/recharger', 'weight': 0.341}, {'@id': '/c/en/electroplating', 'weight': 0.34}, {'@id': '/c/en/minamata_disease', 'weight': 0.339}, {'@id': '/c/en/electronic_devices', 'weight': 0.339}, {'@id': '/c/en/overvoltage', 'weight': 0.338}, {'@id': '/c/en/rechargeable', 'weight': 0.338}, {'@id': '/c/en/deal_with', 'weight': 0.337}, {'@id': '/c/en/coming_in', 'weight': 0.335}, {'@id': '/c/en/charger', 'weight': 0.333}, {'@id': '/c/en/fall_in', 'weight': 0.333}]

However, other phrases seem to work ok:

obj = requests.get('http://api.conceptnet.io/related/c/en/kissing_cousin?filter=/c/en').json()

[{'@id': '/c/en/kissing_cousins', 'weight': 1.0}, {'@id': '/c/en/cousins', 'weight': 0.881}, {'@id': '/c/en/cousin', 'weight': 0.787}, {'@id': '/c/en/relative', 'weight': 0.66}, {'@id': '/c/en/kinswoman', 'weight': 0.626}, {'@id': '/c/en/niece', 'weight': 0.603}, {'@id': '/c/en/relatives', 'weight': 0.602}, {'@id': '/c/en/nephew', 'weight': 0.6}, {'@id': '/c/en/uncles', 'weight': 0.599}, {'@id': '/c/en/nephews', 'weight': 0.597}, {'@id': '/c/en/nieces', 'weight': 0.588}, {'@id': '/c/en/aunts', 'weight': 0.581}, {'@id': '/c/en/neice', 'weight': 0.578}, {'@id': '/c/en/siblings', 'weight': 0.568}, {'@id': '/c/en/younger_sibling', 'weight': 0.566}, {'@id': '/c/en/sister', 'weight': 0.546}, {'@id': '/c/en/brother', 'weight': 0.544}, {'@id': '/c/en/paternal_uncle', 'weight': 0.542}, {'@id': '/c/en/sibling', 'weight': 0.541}, {'@id': '/c/en/kinfolk', 'weight': 0.529}, {'@id': '/c/en/maternal_aunt', 'weight': 0.526}, {'@id': '/c/en/sisters', 'weight': 0.525}, {'@id': '/c/en/maternal_uncle', 'weight': 0.518}, {'@id': '/c/en/granduncle', 'weight': 0.515}, {'@id': '/c/en/stepbrother', 'weight': 0.511}, {'@id': '/c/en/kinsman', 'weight': 0.51}, {'@id': '/c/en/aunt', 'weight': 0.51}, {'@id': '/c/en/kid_sister', 'weight': 0.508}, {'@id': '/c/en/sistren', 'weight': 0.506}, {'@id': '/c/en/stepsister', 'weight': 0.5}, {'@id': '/c/en/uncle', 'weight': 0.499}, {'@id': '/c/en/paternal_aunt', 'weight': 0.498}, {'@id': '/c/en/grandniece', 'weight': 0.495}, {'@id': '/c/en/grandnephew', 'weight': 0.494}, {'@id': '/c/en/distantly_related', 'weight': 0.489}, {'@id': '/c/en/brothers', 'weight': 0.488}, {'@id': '/c/en/brethren', 'weight': 0.484}, {'@id': '/c/en/kinsfolk', 'weight': 0.483}, {'@id': '/c/en/younger_brother', 'weight': 0.477}, {'@id': '/c/en/kin', 'weight': 0.47}, {'@id': '/c/en/sista', 'weight': 0.463}, {'@id': '/c/en/kinfolks', 'weight': 0.455}, {'@id': '/c/en/aunty', 'weight': 0.451}, {'@id': '/c/en/elder_brother', 'weight': 0.447}, {'@id': '/c/en/family_reunion', 'weight': 0.447}, {'@id': '/c/en/grandparents', 'weight': 0.445}, {'@id': '/c/en/aunties', 'weight': 0.444}, {'@id': '/c/en/granddaughters', 'weight': 0.442}, {'@id': '/c/en/congeneric', 'weight': 0.441}, {'@id': '/c/en/cognatic', 'weight': 0.434}]

Does anybody know what's happening here?

ConceptTools

ConceptNet 2.1 had some useful reasoning tools that we never ported over, and people sometimes want them. Here's a possibly relevant old list of desiderata:

“Guess Concept”
“Guess Topic”
Extract new and existing concepts from given text (ConceptNet 2.1 with MontyLingua are not performing well enough)
“Summarize”
“Analogy” for given concept.

Move launchpad bugs over

https://bugs.launchpad.net/conceptnet still has some bugs that don't seem to be here. If they're still valid, we should move them over.

Multiple equivalent RawAssertions that only differ in creator name

A bug in csc.conceptnet.models allowed equivalent RawAssertions to exist, differing only in their creator. The underlying code has been fixed, but the API has not been updated to use the fix, because the database is now inconsistent with the code.

This affects all languages to some extent, but particularly Chinese, whose entire import was done after this bug was introduced. It also particularly affects people who add assertions through the API.

What's the detailed meaning of weight?

In the official document, the weight field is explained as "the strength with which this edge expresses this assertion. A typical weight is 1, but weights can be higher or lower. All weights are positive."

However, I still don't know how the weights are calculated in the graph. What's the metric to evaluate the weight value for each edge? By using occurrence frequency or other machine learning techniques such as random walk?

Thank you very much!

django.core.exceptions.ImproperlyConfigured: You haven't set the database ENGINE setting yet.

I'm a beginning ConceptNet developer. I followed all the instructions and installed: newest version of Python, Django, SQLite, ConceptNet 4 together with the database ConceptNet.db. I can import conceptnet.models.* in Python shell. The database file is stored in my home did (~/.conceptnet/ConceptNet.db - it's also set correctly in ConceptNet django config). But when I try to follow the basic example (dog = Concept.get('dog','en')) I receive the following error:

(...)
django.core.exceptions.ImproperlyConfigured: You haven't set the database ENGINE setting yet.
(...)

I've tried everything I could guess and I can't fix it... It's very annoying issue. Anyone can give me any suggestion how to fix it?