mwoffliner
is a tool for making a local HTML snapshot of
any online (recent) Mediawiki instance. It goes through all articles
(or a selection if specified) and writes the HTML/images to a local
directory. It has mainly been tested against Wikimedia projects like
Wikipedia, Wiktionary, ... But it should also work for any recent
Mediawiki.
- *NIX Operating System (Linux/macOS)
- NodeJS
- Redis
- Libzim (On linux we automatically download binaries)
- Various build tools that are probably already installed on your machine (libjpeg, gcc)
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && \
source ~/.bashrc && \
nvm install stable && \
node --version
> brew install redis
See instructions here: https://github.com/openzim/libzim
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && \
source ~/.bashrc && \
nvm install stable && \
node --version
> sudo apt-get install redis-server
> npm i -g mwoffliner
> mwoffliner --help
> mwoffliner \
--mwUrl=https://es.wikipedia.org \
[email protected] \
--verbose \
--format=nozim \ # Won't make a final ZIM file
--articleList=./articleList # Will download one article
const mwoffliner = require('mwoffliner');
const parameters = {
mwUrl: "https://es.wikipedia.org",
adminEmail: "[email protected]",
verbose: true,
format: "nozim",
articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise
Please see CONTRIBUTING.md
git clone https://github.com/openzim/mwoffliner.git
cd mwoffliner
npm i
./watch.sh # Watch for changes in "src/*"
We follow a nearly exact tslint:recommended
scheme - you can see more information here: ./tslint.json
It's best to use TSLint to check your code as you develop, this project is pre-configured for development with VSCode and the TSLint plugin.
There is a pre-configured debug config for VSCode, just click on the debugging tab.
Make sure you read CONTRIBUTING.md for tips on how to best debug and submit issues.
To publish, it's best to use a clean clone of the project:
git clone https://github.com/openzim/mwoffliner.git
npm i # required for Snyk checks
./build.sh
npm publish # you must be logged in already (npm login)
There are two Wikitext parsers. mwoffliner uses Parsoid.
- Wikitext is the name of the markup language that Wikipedia uses.
- MediaWiki is a PHP package that runs a wiki, including Wikipedia.
- MediaWiki includes a parser for Wikitext into HTML, and this parser creates Wikipedia currently.
- There is another Wikitext parser, called Parsoid, implemented in Javascript (Node.js).
- Parsoid is planned to eventually become the main parser for Wikipedia.
- mwoffliner uses Parsoid.
- mwoffliner calls Parsoid and then post-processes the results for offline format.