Coder Social home page Coder Social logo

mickeysjm / setrank Goto Github PK

View Code? Open in Web Editor NEW
15.0 2.0 3.0 2.07 MB

The source code, dataset, and evaluation scripts used for SetRank, published in SIGIR 2018

License: Apache License 2.0

Python 25.34% Shell 0.14% C++ 4.30% Makefile 1.27% C 68.96%
information-retrieval elasticsearch ranking-algorithm sigir

setrank's Introduction

SetRank

Introduction

This repo includes all the benchmark datasets, source code, evaluation toolkit, and experiment results for SetRank framework developed for entity-set-aware literature search.

Data

The ./data/ folder contains two benchmark datasets used for evaluating literature search, namely S2-CS and TREC-BIO.

Model Implementations

The ./code/ folder includes the baseline models and our proposed SetRank framework (including AutoSetRank). The model implementations depend heavily on ElasticSearch 5.4.0 which is an open-sourced search engine for indexing and performing full-text search. Furthermore, you need to install the following python packages by typing the command:

$ pip3 install -r requirements.txt

After creating the index, you can perform the search using following commands:

$ cd ./code/SetRank
$ python3 setRank_TREC.py -query ../../data/S2-CS/s2_query.json -output ../../results/s2/setRank.run

The results will then be saved in "../../results/s2/setRank.run".

Evaluation Tool

The ./pytrec_eval/ folder includes the original evaluation toolkit pytrec_eval and our customized scripts for performing model evaluation.

You may first follow the instructions in ./pytrec_eval/README.md to install this packages and then conduct the model evaluation using following commands:

$ cd ./pytrec_eval/examples
$ ./eval.sh ../../results/s2/setRank.run setRank ## first argument is path to run file and the result save files

Experiment Results

The ./results/ folder includes all the experiment results reported in our paper. Specifically, each file with suffix .run is the model output ranking files; each file with suffix .eval.tsv is the query-specific evaluation result. Notice that in the paper, we only report the NDCG@{5,10,15,20}, while here we releases the experiment results in terms of other metrics, including MAP@{5,10,15,20} and success@{1,5,10}.

Citation

If you use the datasets or model implementation code produced in this paper, please refer to our SIGIR paper:

@inproceedings{JiamingShen2018ess,
  title={Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach},
  author={Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, and Jiawei Han},
  publisher={ACM},
  booktitle={SIGIR},
  year={2018},
}

Furthermore, if you use the pytrec_eval toolkit, please also consider citing the original paper:

@inproceedings{VanGysel2018pytreceval,
  title={Pytrec\_eval: An Extremely Fast Python Interface to trec\_eval},
  author={Van Gysel, Christophe and de Rijke, Maarten},
  publisher={ACM},
  booktitle={SIGIR},
  year={2018},
}

setrank's People

Contributors

mickeysjm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

setrank's Issues

What entity linker was used

I couldn't locate any resource or link to entity linker that was used in the publication. @mickeystroller can you provide the link for it ?

ConnectionRefusedError where should i configure this?

afert i run the command as README.md instructed, error occured:

=== Arguments ===
Input Query: ../../data/S2-CS/s2_query.json
Output Run: ../../results/s2/setRank.run
Parameters: title:20.0,abstract:5.0,title_ana:20.0,abstract_ana:5.0,title_mu:1000.0,abstract_mu:1000.0,title_ana_mu:1000.0,abstract_ana_mu:1000.0,entity_lambda:0.5,type_interaction:1.0,consider_entity_set:1.0,consider_word_set:1.0,consider_type:1.0,word_dependency:1.0
GET http://localhost:9200/trec/_search [status:N/A request:0.001s]
Traceback (most recent call last):
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/vagrant/software/conda/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 114, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/util/retry.py", line 333, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/vagrant/software/conda/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/vagrant/software/conda/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/vagrant/software/conda/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/vagrant/software/conda/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/vagrant/software/conda/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/home/vagrant/software/conda/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f6b8de68f98>: Failed to establish a new connection: [Errno 111] Connection refused

any advice, how can i fix it?

Suggest to loosen the dependency on textblob

Hi, your project SetRank(commit id: 9821514) requires "textblob==0.13.0" in its dependency. After analyzing the source code, we found that the following versions of textblob can also be suitable, i.e., textblob 0.9.0, 0.9.1, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.13.1, 0.14.0, 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0, 0.17.0, 0.17.1, since all functions that you directly (1 APIs: textblob.blob.TextBlob.init) or indirectly (propagate to 5 textblob's internal APIs and 0 outsider APIs) used from the package have not been changed in these versions, thus not affecting your usage.

Therefore, we believe that it is quite safe to loose your dependency on textblob from "textblob==0.13.0" to "textblob>=0.9.0,<=0.17.1". This will improve the applicability of SetRank and reduce the possibility of any further dependency conflict with other projects.

May I pull a request to further loosen the dependency on textblob?

By the way, could you please tell us whether such an automatic tool for dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.