Coder Social home page Coder Social logo

whuir / dazer Goto Github PK

View Code? Open in Web Editor NEW
33.0 3.0 13.0 144 KB

The Tensorflow implementation of accepted ACL 2018 paper "A deep relevance model for zero-shot document filtering", Chenliang Li, Wei Zhou, Feng Ji, Yu Duan, Haiqing Chen, http://aclweb.org/anthology/P18-1214

Python 99.49% Shell 0.51%
tensorflow zero-shot document-classification document-filtering deeplearning document-ranking

dazer's People

Contributors

lichenliang-whu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dazer's Issues

Unable to reproduce MAP numbers

Hello,

Thank you for the great work. Below are the steps, I follow to run the code where I assume task = space.

  1. Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to formulate training data and ignore training records corresponding to categories = ['sci.space'] and ['comp.graphics']. This way, training_data_size = 10,134
  2. Use https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html to get val/data data. This way, testing_data_size = 7,532
  3. Set c.DAZER.train_class_num = 18 in sample.config. Rest of settings remain same.
  4. Run sample-train.sh and sample-test.sh
  5. Relevance score file is produced.
  6. For the testing dataset, ignore document corresponding to ['comp.graphics'], mark the documents = 1 for category ['sci.space'] and mark the documents = 0 for rest of the categories.
  7. Use https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html to calculate AP score for task = space where y_true is binary and y_score = relevance scores.

Following above steps, I get MAP ~ 0.050 which is way far from the reported number. Could you please let me know how did you calculate MAP scores? Additionally, please let me know if any of the above steps are incorrect. Thanks.

An end-to-end working example is appreciated

I just finished reading the paper and it's a great one! Very clearly written with solid experimental results.

It'll help greatly for people to try out your model if you can provide an end-to-end working example starting from publicly available word embeddings and datasets. The current code requires the user to follow a specific data format and it takes time to convert the data before feeding to the model.

The Data

I want to know which public dataset you are using, or can you send me the data you use, I hope to run the program correctly.Thank you.
My email [email protected]

training data format

what is positive_document and negative_document in training data format when training with 20 newsgroup dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.