Coder Social home page Coder Social logo

tsinik / bow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brendano/bow

0.0 2.0 0.0 718 KB

A patched version of bow & rainbow 20020213 that compiles with modern gcc 4.0.1, OSX 10.5

Home Page: original => http://www.cs.cmu.edu/~mccallum/bow/

Makefile 0.97% C 95.69% Emacs Lisp 1.71% C++ 0.20% Shell 0.34% Perl 1.08%

bow's Introduction

@chapter Bag Of Words Library README

@c set the vars BOW_VERSION
@include version.texi

@samp{libbow}, version @value{BOWVERSION}.

@include libbow-desc.texi


@section Rainbow

@samp{Rainbow} is a standalone program that does document
classification.  Here are some examples:

@itemize @bullet

@item

@example
rainbow -i ./training/positive ./training/negative
@end example

Using the text files found under the directories
@file{./positive} and @file{./negative},
tokenize, build word vectors, and write the resulting data structures
to disk.

@item

@example
rainbow --query=./testing/254
@end example

Tokenize the text document @file{./testing/254}, and classify it,
producing output like:

@example
/home/mccallum/training/positive 0.72
/home/mccallum/training/negative 0.28
@end example

@item

@example
rainbow --test-set=0.5 -t 5
@end example

Perform 5 trials, each consisting of a new random test/train split and
outputs of the classification of the test documents.

@end itemize

Typing @samp{rainbow --help} will give list of all rainbow options.

After you have compiled @samp{libbow} and @samp{rainbow}, you can run
the shell script @file{./demo/script} to see an annotated demonstration
of the classifier in action.

More information and documentation is available at
http://www.cs.cmu.edu/~mccallum/bow


@format
Rainbow improvements coming eventually:
   Better documentation.
   Incremental model training.
@end format



@section Arrow

@samp{Arrow} is a standalone program that does document retrieval by
TFIDF.  

Index all the documents in directory @samp{foo} by typing

@example
arrow --index foo
@end example

Make a single query by typing

@example
arrow --query
@end example

then typing your query, and pressing Control-D.

If you want to make many queries, it will be more efficient to run arrow
as a server, and query it multiple times without restarts by
communicating through a socket.  Type, for example,

@example
arrow --query-server=9876
@end example

And access it through port number 9876.  For example:

@example
telnet localhost 9876
@end example

In this mode there is no need to press Control-D to end a query.  Simply
type your query on one line, and press return.


@section Crossbow

@samp{Crossbow} is a standalone program that does document clustering.
Sorry, there is no documentation yet.


@section Archer

@samp{Archer} is a standalone program that does document retrieval with
AltaVista-type queries, using +, -, "", etc.  The commands in the
"arrow" examples above also work for archer.  See "archer --help" for
more information.

bow's People

Contributors

brendano avatar drdub avatar kevintownsend avatar

Watchers

James Cloos avatar Nikolaos Tsinganos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.