Coder Social home page Coder Social logo

jamesponddotco / wikiextract Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 39 KB

[READ-ONLY] A word extractor for Wikipedia articles.

Home Page: https://sr.ht/~jamesponddotco/wikiextract/

License: GNU General Public License v2.0

Makefile 17.39% Go 82.61%
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

wikiextract's Introduction

wikiextract

builds.sr.ht status

wikiextract is a word extractor for Wikipedia articles. It can extract words bigger than 4 characters from a given Wikipedia page or list of pages and save them to a file you can later use as the source for generating diceware passwords.

Installation

From source

First install the dependencies:

  • Go 1.22 or above.
  • make.
  • scdoc.

Switch to the latest stable tag, v1.0.0, then compile and install:

git checkout v1.0.0
make
sudo make install

Usage

$ wikiextract --help
NAME:
   wikiextract - a simple word extractor for Wikipedia articles

USAGE:
   wikiextract [global options] 

VERSION:
   1.0.0

GLOBAL OPTIONS:
   --input-url value, -u value [ --input-url value, -u value ]  the URL of the Wikipedia page
   --input-file value, -f value                                 a file containing a list of URLs
   --output value, -o value                                     the path to the output file
   --help, -h                                                   show help
   --version, -v                                                print the version

$ wikiextract -u 'https://en.wikipedia.org/wiki/Wikipedia' -o 'output.txt'

See wikiextract(1) after installing for more information.

Contributing

Anyone can help make wikiextract better. Send patches on the mailing list and report bugs on the issue tracker.

You must sign-off your work using git commit --signoff. Follow the Linux kernel developer's certificate of origin for more details.

All contributions are made under the GPL-2.0 license.

Resources

The following resources are available:


Released under the GPL-2.0 license.

wikiextract's People

Contributors

jamesponddotco avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.