Coder Social home page Coder Social logo

aqibsaeed / research-paper-categorization Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 27.0 76 KB

Research paper classification using machine learning and NLP

License: Apache License 2.0

Jupyter Notebook 100.00%
nlp text-classification machine-learning

research-paper-categorization's Introduction

Research paper categorization

  • Python notebook and dataset for the blog post.
  • Make sure you have Python 2.7 and following libraries installed: scikit-learn (0.17), pandas, numpy, nltk and gensim. Anaconda distribution is recommended.

research-paper-categorization's People

Contributors

aaqibsaeed avatar aqibsaeed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

research-paper-categorization's Issues

Python2 or Python 3...xrange or range

Hello,

When I run the code, the first error I get is that range i snot found as I am running the code in python3. Should this be run in python2? I tried it sklearn was not found.

def preProcessing(features):
num_titles = features.size
clean_titles = []
stops = set(stopwords.words("english"))
for i in range( 0, num_titles): < problem is here
#letters_only = re.sub("[^a-zA-Z]", " ", features[i])
words = features[i].lower().split()
words = [w.lower() for w in words if not w in stops]
clean_titles.append(" ".join(words))
return clean_titles

TypeError: only integer arrays with one element can be converted to an index

Hello,

Thanks for sharing this project. I am trying to run it and I have encountered a couple of problems. I will submit another pug report in a moment.

Here I am getting an error that I tried to fix by setting the inputs to document_term_matrix as numpy arrays but that still do not help. Do you have any ideas on what the problem is?

I am using Python 3.

Regards

Burke

TypeError Traceback (most recent call last)
in ()
----> 1 precision, recall, fscore = crossValidate(chisqDtm, labels, "SVM", 10)

in crossValidate(document_term_matrix, labels, classifier, nfold)
16 for train_index, test_index in skf:
17 print(train_index, test_index)
---> 18 X_train, X_test = document_term_matrix[train_index], document_term_matrix[test_index]
19 y_train, y_test = labels[train_index], labels[test_index]
20 model = clf.fit(X_train, y_train)

TypeError: only integer arrays with one element can be converted to an index

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.