Coder Social home page Coder Social logo

webke's Introduction

WebKE: Knowledge Triple Extraction from Semi-structured Web with Pre-trained Markup Language Models

This repository contains code and data for the paper: WebKE: Knowledge Triple Extraction from Semi-structured Web with Pre-trained Markup Language Models. Chenhao Xie, Wenhao Huang, Jiaqing Liang, Chengsong Huang and Yanghua Xiao. CIKM. 2021. [doi] [pdf]

Folders

pretrained_model/ contains the pretrained model HTMLBERT. webke/ contains all code and data.

Citation

@inproceedings{xie2021webke,
    title={WebKE: Knowledge Triple Extraction from Semi-structured Web with Pre-trained Markup Language Models.},
    author={Xie, Chenhao and Huang, Wenhao and Liang, Jiaqing and Huang, Chengsong and Xiao, Yanghua},
    booktitle={Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM)},
    year={2021}
}

webke's People

Contributors

redreamality avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

webke's Issues

Unable to reproduce extraction results

Hi!

I'm interested in your approach, and I'd like to run your model and reproduce your results so I can use it in my own research. However, I am unable to get the system up and running.

I've tried to run the model in a Python 3.7 Docker container, to ensure a correct environment and reproducability of the error.

docker run -it --volume=${PWD}:/app --name webke python:3.7.9 /bin/bash

I've downloaded the weights, models and dataset to the correct locations, and then tried executing the following commands in the Docker container:

# python --version
Python 3.7.13
# pip --version
pip 22.0.4 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
# pip install bert4keras==0.10.0 tensorflow==2.2.0 beautifulsoup4 tqdm
...

# cd app/webke
# mkdir results
# python html_extract_with_pos.py
2022-04-07 15:07:44.716868: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-07 15:07:44.716896: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2022-04-07 15:07:44.716918: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (8d3b7a0a3317): /proc/driver/nvidia/version does not exist
2022-04-07 15:07:44.717074: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-07 15:07:44.738792: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2599990000 Hz
2022-04-07 15:07:44.739629: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f8f78000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-04-07 15:07:44.739650: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
weight/tiny_object_nbaplayer_with_pos.weights
load data: 19941it [00:01, 14690.62it/s]
0it [00:00, ?it/s]354
59
WARNING:tensorflow:5 out of the last 7 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f8edee53830> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
WARNING:tensorflow:6 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f8eb9a23b90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Traceback (most recent call last):
  File "html_extract_with_pos.py", line 180, in <module>
    evaluate(data)
  File "html_extract_with_pos.py", line 145, in evaluate
    R = set([pred for pred in html_extract(d)])
  File "html_extract_with_pos.py", line 124, in html_extract
    preds_list = extract_preds(string, pos)
  File "/app/webke/predicate_extraction_with_pos.py", line 221, in extract_preds
    pred_preds = pred_model.predict([token_ids, segment_ids, x0, y0, x1, y1])
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 88, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1268, in predict
    tmp_batch_outputs = predict_function(iterator)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 618, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  indices[0,512] = 512 is not in [0, 512)
	 [[node model_3/Embedding-Position/Gather (defined at /app/webke/layers.py:625) ]] [Op:__inference_predict_function_63294]

Errors may have originated from an input operation.
Input Source operations connected to node model_3/Embedding-Position/Gather:
 model_3/Embedding-Position/ReadVariableOp/resource (defined at /app/webke/layers.py:618)	
 model_3/Embedding-Position/strided_slice_2 (defined at /app/webke/layers.py:597)

Function call stack:
predict_function

f1: 0.82353, precision: 1.00000, recall: 0.70000: : 1it [00:29, 29.27s/it]

When running it multiple times, it sometimes crashes in different stages. For instance, sometimes it appears to crash during segment extraction, and other times it crashes with predicate extraction. It seems to be an issue with the embedding size being passed into the model, but I cannot figure out what I am doing wrong.

I don't know if it is relevant, but I am currently not using GPU acceleration.

Hopefully you have an idea about what is going wrong and you can point me in the right direction to fix it.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.