Coder Social home page Coder Social logo

himatch's Introduction

HiMatch

The code for ACL-2021 Long Paper Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification

Dependency

PyTorch>1.1, sklearn, tqdm  

Dataset

RCV1-V2
WOS
EURLEX-57K
Glove.6B.300d.txt

Preprocess

Dataset Preprocess

Transform your dataset to json format file {'token': List[str], 'label': List[str]}
You can refer to data_modules/preprocess.py, and here is the WOS dataset Google Drive after preprocessing.

Label Prior Probability (Label Structure)

Preprocess the taxnomy format (data/wos.taxnomy and data/wos_prob_child_parent.json)
Extract Label Prior Probability

python helper/hierarchy_tree_statistic.py config/wos.json  

Label Description

We use classic TD-IDF to extract the representative words for each label.

python construct_label_desc.py  

For RCV1-V2, you can find label description from here.
In our follow-up actual practice, we found that introducing richer label representations is beneficial for further improvement.

Train

Modify the training settings in config/wos.json.

python train.py config/wos.json  

Hyperparamter Description

sample_num: 2. The averge label number of WOS is 2. For every positive label, we all regard them as positive label index and construct matching pairs.  
negative_ratio: 3. Coarse-grained label, wrong sibling label and other wrong label.  
total_sample_num: 2*3=6.  

Other Experimental Settings

The experimental settings on EURLEX-57K: KAMG
The experimental settings on BERT: Bert-Multi-Label-Text-Classification

Cite

@inproceedings{chen-etal-2021-hierarchy,
    title = "Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification",
    author = "Chen, Haibin  and Ma, Qianli  and Lin, Zhenxi  and Yan, Jiangyue",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    url = "https://aclanthology.org/2021.acl-long.337",
    pages = "4370--4379"
}

himatch's People

Contributors

haibin-chen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.