Coder Social home page Coder Social logo

usc-isi-i2 / cskg Goto Github PK

View Code? Open in Web Editor NEW
111.0 111.0 13.0 5.54 MB

CSKG: The CommonSense Knowledge Graph

Home Page: https://cskg.readthedocs.io/en/latest/

License: Creative Commons Attribution Share Alike 4.0 International

Jupyter Notebook 95.59% Python 4.31% Shell 0.10%
commonsense-knowledge embeddings knowledge-graph

cskg's People

Contributors

bin-go2 avatar dependabot[bot] avatar filievski avatar shubhamnagarkar avatar zhizhizhi997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cskg's Issues

Two exceptions in tutorial/Playing with grounding.ipynb notebook: "KeyError: 'token_embedders'" and "ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder.""

Hi, I am not able to parse questions and answers and ground them to ConceptNet/CSKG after doing the necessary setup/installation. I am using the https://github.com/usc-isi-i2/cskg/blob/master/tutorial/Playing%20with%20grounding.ipynb notebook for this purpose. This notebook uses 'groundcn' and 'groundcskg' directories which are not present in the CSKG GitHub repo ( https://github.com/usc-isi-i2/cskg/).

2021-08-29 23:53:45,240 - INFO - faiss.loader - Loading faiss with AVX2 support.
2021-08-29 23:53:45,317 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
2021-08-29 23:53:46,040 - INFO - allennlp.common.plugins - Plugin allennlp_models available
2021-08-29 23:53:46,603 - INFO - allennlp.common.file_utils - cache of https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz is up-to-date
2021-08-29 23:53:46,609 - INFO - allennlp.models.archival - loading archive file https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz from cache at /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686
2021-08-29 23:53:46,610 - INFO - allennlp.models.archival - extracting archive file /root/.allennlp/cache/b37780f4ed0365ac7b155e8b33e2c64c80658a2615a99eabe298a9980914bc92.1ad851a2740b5abf5c5806bc254ecfaa4f4865bcb09ce64d09bfab6db423a686 to temp dir /tmp/tmpywzyqu_q
2021-08-29 23:53:47,333 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,334 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,339 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,340 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,341 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,342 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,343 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,344 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
/opt/conda/lib/python3.7/site-packages/allennlp/data/token_indexers/token_characters_indexer.py:60: UserWarning: You are using the default value (0) of `min_padding_length`, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.
  UserWarning,
2021-08-29 23:53:47,348 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,349 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,350 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,351 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,353 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,354 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,355 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,356 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,357 - INFO - allennlp.common.params - dataset_reader.type = coref
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2021-08-29 23:53:47,358 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2021-08-29 23:53:47,359 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.max_span_width = 10
2021-08-29 23:53:47,360 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.type = characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.namespace = token_characters
2021-08-29 23:53:47,361 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.character_tokenizer = <allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer object at 0x7fc78540f250>
2021-08-29 23:53:47,362 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.start_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.end_tokens = None
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.min_padding_length = 0
2021-08-29 23:53:47,363 - INFO - allennlp.common.params - dataset_reader.token_indexers.token_characters.token_min_padding_length = 0
2021-08-29 23:53:47,364 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = single_id
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tokens
2021-08-29 23:53:47,366 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.lowercase_tokens = False
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.start_tokens = None
2021-08-29 23:53:47,367 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.end_tokens = None
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.feature_name = text
2021-08-29 23:53:47,368 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.default_value = THIS IS A REALLY UNLIKELY VALUE THAT HAS TO BE A STRING
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2021-08-29 23:53:47,370 - INFO - allennlp.common.params - dataset_reader.wordpiece_modeling_tokenizer = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.max_sentences = None
2021-08-29 23:53:47,371 - INFO - allennlp.common.params - dataset_reader.remove_singleton_clusters = False
2021-08-29 23:53:47,372 - INFO - allennlp.common.params - type = from_instances
2021-08-29 23:53:47,373 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpywzyqu_q/vocabulary.
2021-08-29 23:53:47,420 - INFO - allennlp.common.params - model.type = coref
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.regularizer = None
2021-08-29 23:53:47,422 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2021-08-29 23:53:47,423 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpywzyqu_q
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
    237             try:
--> 238                 value = self.params.pop(key)
    239             except KeyError:

KeyError: 'token_embedders'

During handling of the above exception, another exception occurred:

ConfigurationError                        Traceback (most recent call last)
<ipython-input-12-0f753a47034e> in <module>
----> 1 parse_trees=graphify.graphify_dataset(sentences)

~/cskg/grounding/graphify/graphify.py in graphify_dataset(sentences, output_file)
    275         spacy_parser = spacy.load(SPACY_MODEL, disable=['parser', 'tagger'])
    276 
--> 277     coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
    278     srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)
    279 

/opt/conda/lib/python3.7/site-packages/allennlp/predictors/predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs)
    364             plugins.import_plugins()
    365         return Predictor.from_archive(
--> 366             load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
    367             predictor_name,
    368             dataset_reader_to_load=dataset_reader_to_load,

/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
    225             config.duplicate(), serialization_dir
    226         )
--> 227         model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
    228 
    229         # Load meta.

/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in _load_model(config, weights_path, serialization_dir, cuda_device)
    273         weights_file=weights_path,
    274         serialization_dir=serialization_dir,
--> 275         cuda_device=cuda_device,
    276     )
    277 

/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
    417             # get_model_class method, that recurses whenever it finds a from_archive model type.
    418             model_class = Model
--> 419         return model_class._load(config, serialization_dir, weights_file, cuda_device)
    420 
    421     def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:

/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in _load(cls, config, serialization_dir, weights_file, cuda_device)
    316         remove_keys_from_params(model_params)
    317         model = Model.from_params(
--> 318             vocab=vocab, params=model_params, serialization_dir=serialization_dir
    319         )
    320 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    591                     constructor_to_call=constructor_to_call,
    592                     constructor_to_inspect=constructor_to_inspect,
--> 593                     **extras,
    594                 )
    595             else:

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    619                 # This class has a constructor, so create kwargs for it.
    620                 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621                 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
    622 
    623             return constructor_to_call(**kwargs)  # type: ignore

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
    198         explicitly_set = param_name in params
    199         constructed_arg = pop_and_construct_arg(
--> 200             cls.__name__, param_name, annotation, param.default, params, **extras
    201         )
    202 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
    305         return None
    306 
--> 307     return construct_arg(class_name, name, popped_params, annotation, default, **extras)
    308 
    309 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in construct_arg(class_name, argument_name, popped_params, annotation, default, **extras)
    339             elif isinstance(popped_params, dict):
    340                 popped_params = Params(popped_params)
--> 341             return annotation.from_params(params=popped_params, **subextras)
    342         elif not optional:
    343             # Not optional and not supplied, that's an error!

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    591                     constructor_to_call=constructor_to_call,
    592                     constructor_to_inspect=constructor_to_inspect,
--> 593                     **extras,
    594                 )
    595             else:

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in from_params(cls, params, constructor_to_call, constructor_to_inspect, **extras)
    619                 # This class has a constructor, so create kwargs for it.
    620                 constructor_to_inspect = cast(Callable[..., T], constructor_to_inspect)
--> 621                 kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
    622 
    623             return constructor_to_call(**kwargs)  # type: ignore

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in create_kwargs(constructor, cls, params, **extras)
    198         explicitly_set = param_name in params
    199         constructed_arg = pop_and_construct_arg(
--> 200             cls.__name__, param_name, annotation, param.default, params, **extras
    201         )
    202 

/opt/conda/lib/python3.7/site-packages/allennlp/common/from_params.py in pop_and_construct_arg(class_name, argument_name, annotation, default, params, **extras)
    301         return result
    302 
--> 303     popped_params = params.pop(name, default) if default != _NO_DEFAULT else params.pop(name)
    304     if popped_params is None:
    305         return None

/opt/conda/lib/python3.7/site-packages/allennlp/common/params.py in pop(self, key, default, keep_as_dict)
    241                 if self.history:
    242                     msg += f' at location "{self.history}"'
--> 243                 raise ConfigurationError(msg)
    244         else:
    245             value = self.params.pop(key, default)

ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."
parse_trees

Error to get the relation label

I'm using the python kgtk library to work with the cskg.tsv file. When I execute the following query: kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.relation' --limit 3 """) I got the values `

node1;label node2;label relation
bicycle bicycle shop /r/AtLocation
bicycle garage /r/AtLocation
bicycle lab /r/AtLocation
`

but when I change to
kgtk(""" query -i cskg.tsv --match '(n1)-[r]->(n2)' --where 'n1.label="bicycle"' --return 'n1.label, n2.label, r.label' --limit 3 """)
I got the following error: `Exception in thread background thread for pid 81394:
Traceback (most recent call last):
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1641, in wrap
fn(*rgs, **kwargs)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2569, in background_thread
handle_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2269, in fn
return self.command.handle_command_exit_code(exit_code)
File "/Users/u/opt/anaconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 869, in handle_command_exit_code
raise exc
sh.ErrorReturnCode_1:

RAN: /bin/bash -c 'kgtk query -i cskg.tsv --match '"'"'(n1)-[r]->(n2)'"'"' --where '"'"'n1.label="bicycle"'"'"' --return '"'"'n1.label, n2.label, r.label'"'"' --limit 3'

STDOUT:

STDERR:

no such column: graph_1_c1.label`

Is there any way to get the relation's label? When I run the query without the return parameter, I see a column with the label.

Playing with CSKG Grounding Notebook: 'grounding/groundcn' and 'grounding/groundcskg' folders empty causing failed imports

Issues while running Playing with CSKG Grounding Notebook:

  • Numberbatch file and BERT embeddings were downloaded and placed into the directory ../output/embeddings, and gunzip-ed.
  • Needed to install pygraphviz package before running the notebook:
    !apt-get update -y
    !apt-get install -y graphviz libgraphviz-dev graphviz-dev pkg-config
    !python -m pip install pygraphviz
  • The ‘groundcn’ and ‘groundcskg’ folders in the ‘../grounding/’ folder are empty and do not contain ‘graphify’, due to which “from groundcn.graphify import graphify”, “from groundcn.graphify import link” and “from groundcskg.graphify import link” throw the ModuleNotFoundError error and rest of the code does not execute, as shown in screencapture5.pdf attached herewith.

Analyse CSKG Notebook: broken pipe errors while running zcat commands, kgtk queries and head commands

Issues while running Analyse CSKG Notebook:

  • In the first cell in that sets up environment variables and paths, the files mentioned in the code do not exist. I changed those paths to other files available in the shared drive folder
    kg = "cskg_connected.kgtk.gz" (changed to “cskg.tsv.gz”)
    nkg = "cskg-normalized.kgtk.gz" (changed to “cskg_connected_normalized.tsv.gz”)
  • The zcat commands, kgtk query and head commands do not give the desired results due to several errors including broken pipe, as shown in screencapture3.pdf attached herewith.

Dependency conflicts during installation/setup

Dependencies issues:

allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."

Hello,

I am still facing this issue: #14

cskg/grounding/groundcn/graphify/graphify.py :
Line 277 coref_predictor = Predictor.from_path(COREF_MODEL, cuda_device=CUDA_DEVICE)
Line 278 srl_predictor = Predictor.from_path(SRL_MODEL, cuda_device=CUDA_DEVICE)

Running the above code gives me the following error:
allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder."

About mw:SameAs relation

I read CSKG paper ( https://doi.org/10.48550/arXiv.2012.11490 ), and the authors explained that there were mw:SameAs relationships in the data.
So, I downloaded a CSKG file (cskg.tsv.gz ) from https://zenodo.org/record/4331372, however I couldn't find them.
I tried the following commands.

% zgrep mw:SameAs cskg.tsv.gz

and

% kgtk query -i cskg.tsv.gz --match '()-[z{relation:"mw:SameAs"}]->()'

Could you let me know where I can find a dataset which contained that relationship?

Where can I find mappings file?

I tried to reproduce creating CSKG procedure, by running consolidation/create_cskg.sh.
But I can't find any input/mappings/*.csv files in this repository for further step.

Could you provide wn_wn_mappings.csv, fn_cn_mappings.csv and wn_wdt_mappings.csv?

'Empty node2 value' error in normalize

In the tutorial notebook 'Analyzing CSKG', I got an error when i run '!kgtk normalize --verbose -i $PKG -o $TMPKG --columns-to-lower 'relation;dimension' source sentence 'node1;label' 'relation;label' 'node2;label'

The error says:
Empty node2 value when lowering 7 to 2: dimension in input line 1

even though I don't see any empty value in node2

How to retrieve data of CSKG using the KGTK command line tool.

I'd like to know some examples of retrieving ConceptNet data using the KGTK CLI tool.
When trying to retrieve the data as follows, I got an error.

% kgtk query -i ./cskg_connected.tsv.gz --match '()-[:`/r/IsA`]->()' --limit 3
no such column: graph_1_c1.label

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.