Coder Social home page Coder Social logo

tinierzhao / fws Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xiangyi-njust/fws

0.0 0.0 0.0 210.02 MB

the code for this paper : Automatic Recognition and Classification of Future Work Sentences from Academic Atricles in a Specific Domain

Python 36.64% PureBasic 63.36%

fws's Introduction

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Overview

Data and source Code for the paper "Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain".

The aim of this paper is automatic recognition and classification of Future Work Sentences (FWS) from academic articles. We choose Natural Language Preocessing (NLP) domain as an example, and use papers from three main conferences, namey ACL, EMNLP and NAACL (These conferences can be visited via https://aclanthology.org/), as exprimental dataset. Our work includes the followig aspects:

  • FWS Recognition: After human annotation of the future work sentence, we use traditional machine learning models including Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF), to judge whether one sentence is FWS or not.
  • FWS Classification: After FWS Recognition, we classify the FWS in paper into six types including Method, Resources, Evaluation, Application, Problem and Other, via Bert, Scibert, Textcnn and Bilstm models.
  • FWS Evaluation: In addition, we compare difference between keywords which are extracted from the FWS and abstracts in other papers published several years later, to evaluate the effectiveness of FWS.

Directory structure

FWS                                                  Root directory
├─ Dataset                                           Experimental datasets
│    ├─ Corpus For KeyphraseExtraction               Corpus for content analysis of FWS                 
│    │    └─ Title and Abstract.csv                  Corpus for content analysis of FWS,incuding title and absrtract
│    │
│    ├─ Corpus_For_FWS_Recognition.csv               Training dataset for FWS recognition 
│    ├─ Corpus_For_FWS_Recognition_Predict.csv       Sample testing dataset for recognition of FWS
│    ├─ Corpus_For_FWS_TypeClassify.csv              Training dataset for FWS classification 
│    └─ Corpus_For_FWS_TypeClassify_Predict.csv      Sample testing dataset for FWS classification 
│   
├─ FWS Classification                                Module of FWS classification  
│    ├─ Bert.py					     Source code of BERT/SciBERT classification model
│    ├─ Bilstm.py				     Source code of Bi-LSTM model
│    ├─ TextCNN.py				     Source code of TextCNN model
│    ├─ logs.txt				     Log file which records classification performance of classification model
│    ├─ main.py					     Source code for selecting a model to train Corpus_For_FWS_Recognition by command line arguments
│    ├─ predict.py				     Source code for using trained model to predict label of FWS in test dataset
│    ├─ run.py					     Source code to start training process of FWS classification
│    └─ weights					     Model's weight
│           ├─ bilstm                                Weight of Bi-LSTM model
│           └─ textcnn                               Weight of TextCNN model
│
├─ FWS Recognition                                   Module of FWS recognition 
│    ├─ main.py					     Source code of data preprocessing, training and testing of FWS recognition model
│    └─ run.py					     Source code to start training of FWS recognition
│
└─ README.md

Dataset discription

We release our all train dataset in Dataset directory:

  • Corpus_For_FWS_Recognition.csv: Traning dataset for classification of Future Work Sentence, it contains 9, 009 FWS and 55, 887 Non-FWS respectively.
  • Corpus_For_FWS_TypeClassify.csv: Traning dataset for Recognition of Future Work Sentence, it contains 9, 009 records.

    Each line of Corpus_For_FWS_Recognition includes:

  • id: Paper ID in ACL Anthology.
  • year: Year of publication
  • text: Content of FWS or Non-FWS.
  • label: 1: FWS and 0: Non-FWS.
  • chapter: Type of chapter headings.

    Each line of Corpus_For_FWS_TypeClassify.csv includes:

  • id: Paper ID in ACL Anthology.
  • lable: Six types of FWS including method, resources, evaluation, application, problem and other.
  • text: Content of FWS.

    Additionaly, we release sample our test dataset, if you need the whole data, contact us please.

    Quick start

    To reproduce our experiment result, you can follow these steps:

    Recognition

    based on your system, open the terminal in the FWS Recognition directory and type this command

    python run.py 

    Classify

    based on your system, open the terminal in the FWS Classification directory and type this command

    python run.py

    Extract keywords

    We provide two notebooks, you can follow the steps to extract keywords and do some preprocess work

    Citation

    Please cite the following paper if you use these codes and datasets in your work.

    Chengzhi Zhang, Yi Xiang, Wenke Hao, Zhicheng Li, Yuchen Qian, Yuzhuo Wang. Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain. Journal of Informetrics, 2023, 17(1): 101373. [doi] [arXiv] [Dataset & Source Code]

  • fws's People

    Contributors

    chengzhizhang avatar xiangyi-njust avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.