Coder Social home page Coder Social logo

tsuno0829 / hierarchical-text-multi-label-classificaiton Goto Github PK

View Code? Open in Web Editor NEW

This project forked from runlongyu/hierarchical-text-multi-label-classificaiton

0.0 1.0 0.0 33 KB

About Hierarchical Muti-Label Text Classification based on hybrid method (local & global).

License: Apache License 2.0

Python 100.00%

hierarchical-text-multi-label-classificaiton's Introduction

Hierarchical Text Multi Label Classificaiton

This repository is my research project, and it is also a study of TensorFlow, Deep Learning.

The main objective of the project is to solve the hierarchical multi-label text classification (HMC) problem. Different from the multi-label text classification, HMC classifies each instance (object) into several different paths of the class hierarchy.

Requirements

  • Python 3.6
  • Tensorflow 1.8 +
  • Numpy
  • Gensim

Introduction

Many real-world applications involve hierarchical multi-label classification and organize data in a hierarchical structure, classes are specialized into subclasses or grouped into superclasses, which is a good way to show the characteristics of data and provide a multidimensional perspective to tackle the problem.

Like most type of electronic document (e.g. web-pages, digital libraries, patents and e-mails), they are usually associated with one or more categories and all these categories are stored hierarchically in a tree or Direct Acyclic Graph (DAG).

The Figure show an example of predefined labels in hierarchical multi-label classification of documents in a patent texts.

  • Documents are shown as colored rectangles, labels as rounded rectangles.
  • Circles in the rounded rectangles indicate that the corresponding document has been assigned the label.
  • Arrows indicate hierarchical structure between labels.

Data

See data format in data folder which including the data sample files.

Text Segment

You can use jieba package if you are going to deal with the chinese text data.

Data Format

This repository can be used in other datasets(text classification) by two ways:

  1. Modify your datasets into the same format of the sample.
  2. Modify the data preprocess code in data_helpers.py.

Anyway, it should depends on what your data and task are.

Pre-trained Word Vectors

You can pre-training your word vectors(based on your corpus) in many ways:

  • Use gensim package to pre-train data.
  • Use glove tools to pre-train data.
  • Even can use a fasttext network to pre-train data.

Network Structure

HMC-LMLP

References:


HMCN

HMCN-F

HMCN-R

References:


About Me

黄威,Randolph

SCU SE Bachelor; USTC CS Master

Email: [email protected]

My Blog: randolph.pro

LinkedIn: randolph's linkedin

hierarchical-text-multi-label-classificaiton's People

Contributors

randolphvi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.