usc-isi-i2 / cskg Goto Github PK

View Code? Open in Web Editor NEW

111.0 111.0 13.0 5.54 MB

CSKG: The CommonSense Knowledge Graph

Home Page: https://cskg.readthedocs.io/en/latest/

License: Creative Commons Attribution Share Alike 4.0 International

Jupyter Notebook 95.59% Python 4.31% Shell 0.10%

commonsense-knowledge embeddings knowledge-graph

cskg's People

Contributors

Stargazers

Watchers

Forkers

daniel-mietchen socioprophet proska aahmadai stjordanis fastflair ii-research-yu socios-linux monesh-97 sarkhelritesh qqfox mkim2001 moonisali

cskg's Issues

Two exceptions in tutorial/Playing with grounding.ipynb notebook: "KeyError: 'token_embedders'" and "ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder.""

Hi, I am not able to parse questions and answers and ground them to ConceptNet/CSKG after doing the necessary setup/installation. I am using the https://github.com/usc-isi-i2/cskg/blob/master/tutorial/Playing%20with%20grounding.ipynb notebook for this purpose. This notebook uses 'groundcn' and 'groundcskg' directories which are not present in the CSKG GitHub repo ( https://github.com/usc-isi-i2/cskg/).

2021-08-29 23:53:45,240 - INFO - faiss.loader - Loading faiss with AVX2 support.
2021-08-29 23:53:45,317 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
2021-08-29 23:53:46,040 - INFO - allennlp.common.plugins - Plugin allennlp_models available
2021-08-29 23:53:46,603 - INFO - allennlp.common.file_utils - cache of https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz is up-to-date
2021-08-29 23:53:46,609 - INFO - allennlp.models.archival - loading archive file https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz from cache at /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686
2021-08-29 23:53:46,610 - INFO - allennlp.models.archival - extracting archive file /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686 to temp dir /tmp/tmpywzyqu_q
2021-08-29 23:53:47,333 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,340 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,341 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,344 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
/opt/conda/lib/python3.7/site-packages/allennlp/data/token_indexers/token_characters_indexer.py:60: UserWarning: You are using the default value (0) of `min_padding_length`, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.
  UserWarning,
2021-08-29 23:53:47,348 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,349 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,350 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,351 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,355 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,357 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,359 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,362 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
2021-08-29 23:53:47,364 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,372 - INFO - allennlp.common.params - type = from_instances
2021-08-29 23:53:47,373 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpywzyqu_q/vocabulary.
2021-08-29 23:53:47,420 - INFO - allennlp.common.params - model.type = coref
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.regularizer = None
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2021-08-29 23:53:47,423 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpywzyqu_q
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
    237             try:
--> 238                 value = self.params.pop(key)
    239             except KeyError:

KeyError: 'token_embedders'

During handling of the above exception, another exception occurred:

ConfigurationError                        Traceback (most recent call last)
<ipython-input-12-0f753a47034e> in <module>
----> 1 parse_trees=graphify.graphify_dataset(sentences)

~/cskg/grounding/graphify/graphify.py in graphify_dataset(sentences, output_file)
    275         spacy_parser = spacy.load(SPACY_MODEL, disable=['parser', 'tagger'])
    276 
--> 277     coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
    278     srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)
    279 

/opt/conda/lib/python3.7/site-packages/allennlp/predictors/predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs)
    364             plugins.import_plugins()
    365         return Predictor.from_archive(
--> 366             load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
    367             predictor_name,
    368             dataset_reader_to_load=dataset_reader_to_load,

/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
    225             config.duplicate(), serialization_dir
    226         )
--> 227         model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
    228 
    229         # Load meta.

/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in _load_model(config, weights_path, serialization_dir, cuda_device)
    273         weights_file=weights_path,
    274         serialization_dir=serialization_dir,
--> 275         cuda_device=cuda_device,
    276     )
    277 

/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
    417             # get_model_class method, that recurses whenever it finds a from_archive model type.
    418             model_class = Model
--> 419         return model_class._load(config, serialization_dir, weights_file, cuda_device)
    420 
    421     def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:

/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in _load(cls, config, serialization_dir, weights_file, cuda_device)
    316         remove_keys_from_params(model_params)
    317         model = Model.from_params(
--> 318             vocab=vocab, params=model_params, serialization_dir=serialization_dir
    319         )
    320 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    591                     constructor_to_call=constructor_to_call,
    592                     constructor_to_inspect=constructor_to_inspect,
--> 593                     **extras,
    594                 )
    595             else:

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    619                 # This class has a constructor, so create kwargs for it.
    620                 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621                 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
    622 
    623             return constructor_to_call(**kwargs)  # type: ignore

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
    198         explicitly_set = param_name in params
    199         constructed_arg = pop_and_construct_arg(
--> 200             cls.__name__, param_name, annotation, param.default, params, **extras
    201         )
    202 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
    305         return None
    306 
--> 307     return construct_arg(class_name, name, popped_params, annotation, default, **extras)
    308 
    309 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in construct_arg(class_name, argument_name, popped_params, annotation, default, **extras)
    339             elif isinstance(popped_params, dict):
    340                 popped_params = Params(popped_params)
--> 341             return annotation.from_params(params=popped_params, **subextras)
    342         elif not optional:
    343             # Not optional and not supplied, that's an error!

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    591                     constructor_to_call=constructor_to_call,
    592                     constructor_to_inspect=constructor_to_inspect,
--> 593                     **extras,
    594                 )
    595             else:

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    619                 # This class has a constructor, so create kwargs for it.
    620                 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621                 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
    622 
    623             return constructor_to_call(**kwargs)  # type: ignore

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
    198         explicitly_set = param_name in params
    199         constructed_arg = pop_and_construct_arg(
--> 200             cls.__name__, param_name, annotation, param.default, params, **extras
    201         )
    202 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
    301         return result
    302 
--> 303     popped_params = params.pop(name, default) if default != _NO_DEFAULT else params.pop(name)
    304     if popped_params is None:
    305         return None

/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
    241                 if self.history:
    242                     msg += f' at location "{self.history}"'
--> 243                 raise ConfigurationError(msg)
    244         else:
    245             value = self.params.pop(key, default)

ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."
parse_trees

The setup of the grounding/embedding notebooks needs to be documented

Right now the setup instructions are very incomplete - need to be updated and tested

https://github.com/usc-isi-i2/cskg/blob/master/Playing%20with%20embeddings.ipynb
https://github.com/usc-isi-i2/cskg/blob/master/Playing%20with%20grounding.ipynb

Error to get the relation label

I'm using the python kgtk library to work with the cskg.tsv file. When I execute the following query: kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.relation' --limit 3 """) I got the values `

node1;label	node2;label	relation
bicycle	bicycle shop	/r/AtLocation
bicycle	garage	/r/AtLocation
bicycle	lab	/r/AtLocation
`

but when I change to
kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.label' --limit 3 """)
I got the following error: `Exception in thread background thread for pid 81394:
Traceback (most recent call last):
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1641, in wrap
fn(*rgs, **kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2569, in background_thread
handle_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2269, in fn
return self.command.handle_command_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 869, in handle_command_exit_code
raise exc
sh.ErrorReturnCode_1:

RAN: /bin/bash -c 'kgtk query -i cskg.tsv --match '"'"'(n1)-[r]->(n2)'"'"' --where '"'"'n1.label="bicycle"'"'"' --return '"'"'n1.label, n2.label, r.label'"'"' --limit 3'

STDOUT:

STDERR:

no such column: graph_1_c1.label`

Is there any way to get the relation's label? When I run the query without the return parameter, I see a column with the label.

Playing with CSKG Grounding Notebook: 'grounding/groundcn' and 'grounding/groundcskg' folders empty causing failed imports

Issues while running Playing with CSKG Grounding Notebook:

Numberbatch file and BERT embeddings were downloaded and placed into the directory ../output/embeddings, and gunzip-ed.
Needed to install pygraphviz package before running the notebook:
!apt-get update -y
!apt-get install -y graphviz libgraphviz-dev graphviz-dev pkg-config
!python -m pip install pygraphviz
The ‘groundcn’ and ‘groundcskg’ folders in the ‘../grounding/’ folder are empty and do not contain ‘graphify’, due to which “from groundcn.graphify import graphify”, “from groundcn.graphify import link” and “from groundcskg.graphify import link” throw the ModuleNotFoundError error and rest of the code does not execute, as shown in screencapture5.pdf attached herewith.

No `requirements.txt` file

There is no requirements file or installation instructions for this repository.

Analyse CSKG Notebook: broken pipe errors while running zcat commands, kgtk queries and head commands

Issues while running Analyse CSKG Notebook:

In the first cell in that sets up environment variables and paths, the files mentioned in the code do not exist. I changed those paths to other files available in the shared drive folder
kg = "cskg_connected.kgtk.gz" (changed to “cskg.tsv.gz”)
nkg = "cskg-normalized.kgtk.gz" (changed to “cskg_connected_normalized.tsv.gz”)
The zcat commands, kgtk query and head commands do not give the desired results due to several errors including broken pipe, as shown in screencapture3.pdf attached herewith.

Dependency conflicts during installation/setup

Dependencies issues:

While installing dependencies for CSKG and “Playing with CSKG Grounding” notebook, I faced a problem of conflicting dependencies. I was trying to install the dependencies of both CSKG (https://github.com/usc-isi-i2/cskg/blob/master/requirements.txt) and grounding (renamed from mowgli-uci) (https://github.com/ucinlp/mowgli-uci/blob/master/requirements.txt). The console output of the install commands showing dependency conflicts is provided in screencapture1.pdf.
Following the suggestion in the console output, I removed the specific versions of packages from both requirements.txt files to allow pip to automatically install compatible versions of all packages, which worked for most of the packages. However, it shows make errors while building wheels for some packages, as shown in screencapture2.pdf.

allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."

Hello,

I am still facing this issue: #14

cskg/grounding/groundcn/graphify/graphify.py :
Line 277 coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
Line 278 srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)

Running the above code gives me the following error:
allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."

About mw:SameAs relation

I read CSKG paper ( https://doi.org/10.48550/arXiv.2012.11490 ), and the authors explained that there were mw:SameAs relationships in the data.
So, I downloaded a CSKG file (cskg.tsv.gz ) from https://zenodo.org/record/4331372, however I couldn't find them.
I tried the following commands.

% zgrep mw:SameAs cskg.tsv.gz

and

% kgtk query -i cskg.tsv.gz --match '()-[z{relation:"mw:SameAs"}]->()'

Could you let me know where I can find a dataset which contained that relationship?

even though I don't see any empty value in node2

How to retrieve data of CSKG using the KGTK command line tool.

I'd like to know some examples of retrieving ConceptNet data using the KGTK CLI tool.
When trying to retrieve the data as follows, I got an error.

% kgtk query -i ./cskg_connected.tsv.gz --match '()-[:`/r/IsA`]->()' --limit 3
no such column: graph_1_c1.label

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.