whuir / dazer Goto Github PK

The Tensorflow implementation of accepted ACL 2018 paper "A deep relevance model for zero-shot document filtering", Chenliang Li, Wei Zhou, Feng Ji, Yu Duan, Haiqing Chen, http://aclweb.org/anthology/P18-1214

Python 99.49% Shell 0.51%

tensorflow zero-shot document-classification document-filtering deeplearning document-ranking

dazer's People

Contributors

Stargazers

Watchers

Forkers

tianke0711 hsouporto bigdataedison qmindong llouislu popocq yue123161 buaachuanwang raman-r-4978 howard1337 songkaisong himanshuverma02

dazer's Issues

Unable to reproduce MAP numbers

Hello,

Thank you for the great work. Below are the steps, I follow to run the code where I assume task = space.

Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to formulate training data and ignore training records corresponding to categories = ['sci.space'] and ['comp.graphics']. This way, training_data_size = 10,134
Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to get val/data data. This way, testing_data_size = 7,532
Set c.DAZER.train_class_num = 18 in sample.config. Rest of settings remain same.
Run sample-train.sh and sample-test.sh
Relevance score file is produced.
For the testing dataset, ignore document corresponding to ['comp.graphics'], mark the documents = 1 for category ['sci.space'] and mark the documents = 0 for rest of the categories.
Use https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html to calculate AP score for task = space where y_true is binary and y_score = relevance scores.

Following above steps, I get MAP ~ 0.050 which is way far from the reported number. Could you please let me know how did you calculate MAP scores? Additionally, please let me know if any of the above steps are incorrect. Thanks.

An end-to-end working example is appreciated

I just finished reading the paper and it's a great one! Very clearly written with solid experimental results.

It'll help greatly for people to try out your model if you can provide an end-to-end working example starting from publicly available word embeddings and datasets. The current code requires the user to follow a specific data format and it takes time to convert the data before feeding to the model.

whuir / dazer Goto Github PK

dazer's People

Contributors

Stargazers

Watchers

Forkers

dazer's Issues

Unable to reproduce MAP numbers

An end-to-end working example is appreciated

Where can I find the paper

The Data

training data format

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent