usc-isi-i2 / cskg Goto Github PK
View Code? Open in Web Editor NEWCSKG: The CommonSense Knowledge Graph
Home Page: https://cskg.readthedocs.io/en/latest/
License: Creative Commons Attribution Share Alike 4.0 International
CSKG: The CommonSense Knowledge Graph
Home Page: https://cskg.readthedocs.io/en/latest/
License: Creative Commons Attribution Share Alike 4.0 International
Hi, I am not able to parse questions and answers and ground them to ConceptNet/CSKG after doing the necessary setup/installation. I am using the https://github.com/usc-isi-i2/cskg/blob/master/tutorial/Playing%20with%20grounding.ipynb notebook for this purpose. This notebook uses 'groundcn' and 'groundcskg' directories which are not present in the CSKG GitHub repo ( https://github.com/usc-isi-i2/cskg/).
2021-08-29 23:53:45,240 - INFO - faiss.loader - Loading faiss with AVX2 support.
2021-08-29 23:53:45,317 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
2021-08-29 23:53:46,040 - INFO - allennlp.common.plugins - Plugin allennlp_models available
2021-08-29 23:53:46,603 - INFO - allennlp.common.file_utils - cache of https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz is up-to-date
2021-08-29 23:53:46,609 - INFO - allennlp.models.archival - loading archive file https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz from cache at /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686
2021-08-29 23:53:46,610 - INFO - allennlp.models.archival - extracting archive file /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686 to temp dir /tmp/tmpywzyqu_q
2021-08-29 23:53:47,333 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,340 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,341 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,344 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
/opt/conda/lib/python3.7/site-packages/allennlp/data/token_indexers/token_characters_indexer.py:60: UserWarning: You are using the default value (0) of `min_padding_length`, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.
UserWarning,
2021-08-29 23:53:47,348 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,349 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,350 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,351 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,355 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,357 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,359 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,362 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
2021-08-29 23:53:47,364 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,372 - INFO - allennlp.common.params - type = from_instances
2021-08-29 23:53:47,373 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpywzyqu_q/vocabulary.
2021-08-29 23:53:47,420 - INFO - allennlp.common.params - model.type = coref
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.regularizer = None
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2021-08-29 23:53:47,423 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpywzyqu_q
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
237 try:
--> 238 value = self.params.pop(key)
239 except KeyError:
KeyError: 'token_embedders'
During handling of the above exception, another exception occurred:
ConfigurationError Traceback (most recent call last)
<ipython-input-12-0f753a47034e> in <module>
----> 1 parse_trees=graphify.graphify_dataset(sentences)
~/cskg/grounding/graphify/graphify.py in graphify_dataset(sentences, output_file)
275 spacy_parser = spacy.load(SPACY_MODEL, disable=['parser', 'tagger'])
276
--> 277 coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
278 srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)
279
/opt/conda/lib/python3.7/site-packages/allennlp/predictors/predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs)
364 plugins.import_plugins()
365 return Predictor.from_archive(
--> 366 load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
367 predictor_name,
368 dataset_reader_to_load=dataset_reader_to_load,
/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
225 config.duplicate(), serialization_dir
226 )
--> 227 model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
228
229 # Load meta.
/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in _load_model(config, weights_path, serialization_dir, cuda_device)
273 weights_file=weights_path,
274 serialization_dir=serialization_dir,
--> 275 cuda_device=cuda_device,
276 )
277
/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
417 # get_model_class method, that recurses whenever it finds a from_archive model type.
418 model_class = Model
--> 419 return model_class._load(config, serialization_dir, weights_file, cuda_device)
420
421 def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:
/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in _load(cls, config, serialization_dir, weights_file, cuda_device)
316 remove_keys_from_params(model_params)
317 model = Model.from_params(
--> 318 vocab=vocab, params=model_params, serialization_dir=serialization_dir
319 )
320
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
591 constructor_to_call=constructor_to_call,
592 constructor_to_inspect=constructor_to_inspect,
--> 593 **extras,
594 )
595 else:
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
619 # This class has a constructor, so create kwargs for it.
620 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
622
623 return constructor_to_call(**kwargs) # type: ignore
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
198 explicitly_set = param_name in params
199 constructed_arg = pop_and_construct_arg(
--> 200 cls.__name__, param_name, annotation, param.default, params, **extras
201 )
202
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
305 return None
306
--> 307 return construct_arg(class_name, name, popped_params, annotation, default, **extras)
308
309
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in construct_arg(class_name, argument_name, popped_params, annotation, default, **extras)
339 elif isinstance(popped_params, dict):
340 popped_params = Params(popped_params)
--> 341 return annotation.from_params(params=popped_params, **subextras)
342 elif not optional:
343 # Not optional and not supplied, that's an error!
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
591 constructor_to_call=constructor_to_call,
592 constructor_to_inspect=constructor_to_inspect,
--> 593 **extras,
594 )
595 else:
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
619 # This class has a constructor, so create kwargs for it.
620 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
622
623 return constructor_to_call(**kwargs) # type: ignore
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
198 explicitly_set = param_name in params
199 constructed_arg = pop_and_construct_arg(
--> 200 cls.__name__, param_name, annotation, param.default, params, **extras
201 )
202
/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
301 return result
302
--> 303 popped_params = params.pop(name, default) if default != _NO_DEFAULT else params.pop(name)
304 if popped_params is None:
305 return None
/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
241 if self.history:
242 msg += f' at location "{self.history}"'
--> 243 raise ConfigurationError(msg)
244 else:
245 value = self.params.pop(key, default)
ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."
parse_trees
Right now the setup instructions are very incomplete - need to be updated and tested
https://github.com/usc-isi-i2/cskg/blob/master/Playing%20with%20embeddings.ipynb
https://github.com/usc-isi-i2/cskg/blob/master/Playing%20with%20grounding.ipynb
I'm using the python kgtk library to work with the cskg.tsv file. When I execute the following query: kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.relation' --limit 3 """)
I got the values `
node1;label | node2;label | relation |
---|---|---|
bicycle | bicycle shop | /r/AtLocation |
bicycle | garage | /r/AtLocation |
bicycle | lab | /r/AtLocation |
` |
but when I change to
kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.label' --limit 3 """)
I got the following error: `Exception in thread background thread for pid 81394:
Traceback (most recent call last):
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1641, in wrap
fn(*rgs, **kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2569, in background_thread
handle_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2269, in fn
return self.command.handle_command_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 869, in handle_command_exit_code
raise exc
sh.ErrorReturnCode_1:
RAN: /bin/bash -c 'kgtk query -i cskg.tsv --match '"'"'(n1)-[r]->(n2)'"'"' --where '"'"'n1.label="bicycle"'"'"' --return '"'"'n1.label, n2.label, r.label'"'"' --limit 3'
STDOUT:
STDERR:
no such column: graph_1_c1.label`
Is there any way to get the relation's label? When I run the query without the return parameter, I see a column with the label.
Issues while running Playing with CSKG Grounding Notebook:
There is no requirements file or installation instructions for this repository.
Issues while running Analyse CSKG Notebook:
Dependencies issues:
While installing dependencies for CSKG and “Playing with CSKG Grounding” notebook, I faced a problem of conflicting dependencies. I was trying to install the dependencies of both CSKG (https://github.com/usc-isi-i2/cskg/blob/master/requirements.txt) and grounding (renamed from mowgli-uci) (https://github.com/ucinlp/mowgli-uci/blob/master/requirements.txt). The console output of the install commands showing dependency conflicts is provided in screencapture1.pdf.
Following the suggestion in the console output, I removed the specific versions of packages from both requirements.txt files to allow pip to automatically install compatible versions of all packages, which worked for most of the packages. However, it shows make errors while building wheels for some packages, as shown in screencapture2.pdf.
Hello,
I am still facing this issue: #14
cskg/grounding/groundcn/graphify/graphify.py :
Line 277 coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
Line 278 srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)
Running the above code gives me the following error:
allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."
I read CSKG paper ( https://doi.org/10.48550/arXiv.2012.11490 ), and the authors explained that there were mw:SameAs
relationships in the data.
So, I downloaded a CSKG file (cskg.tsv.gz
) from https://zenodo.org/record/4331372, however I couldn't find them.
I tried the following commands.
% zgrep mw:SameAs cskg.tsv.gz
and
% kgtk query -i cskg.tsv.gz --match '()-[z{relation:"mw:SameAs"}]->()'
Could you let me know where I can find a dataset which contained that relationship?
I tried to reproduce creating CSKG procedure, by running consolidation/create_cskg.sh
.
But I can't find any input/mappings/*.csv
files in this repository for further step.
Could you provide wn_wn_mappings.csv
, fn_cn_mappings.csv
and wn_wdt_mappings.csv
?
I read CSKG paper ( https://doi.org/10.48550/arXiv.2012.11490 ), and the authors explained that there were mw:SameAs
relationships in the data.
So, I downloaded a CSKG file (cskg.tsv.gz
) from https://zenodo.org/record/4331372, however I couldn't find them.
Could you let me know where I can find a dataset which contained that relationship?
In the tutorial notebook 'Analyzing CSKG', I got an error when i run '!kgtk normalize --verbose -i $PKG -o $TMPKG --columns-to-lower 'relation;dimension' source sentence 'node1;label' 'relation;label' 'node2;label'
The error says:
Empty node2 value when lowering 7 to 2: dimension in input line 1
even though I don't see any empty value in node2
I'd like to know some examples of retrieving ConceptNet data using the KGTK CLI tool.
When trying to retrieve the data as follows, I got an error.
% kgtk query -i ./cskg_connected.tsv.gz --match '()-[:`/r/IsA`]->()' --limit 3
no such column: graph_1_c1.label
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.