Coder Social home page Coder Social logo

Comments (6)

gnosygnu avatar gnosygnu commented on August 12, 2024

Hi! This is actually available already. You can refer to home/wiki/Help:Import/Command-line within XOWA. There's a section for command-line import at: home/wiki/Help:Contents

Let me know if you run into any issues, or if the instructions aren't clear. Thanks!

from xowa.

pioterj avatar pioterj commented on August 12, 2024

Good to hear. I can't find the instructions you are referring to. Are they somewhere at https://gnosygnu.github.io/xowa/?

from xowa.

gnosygnu avatar gnosygnu commented on August 12, 2024

Nope. XOWA currently has most of it documentation within the app. In this case, you would do the following:

  • Start XOWA:
  • Copy-paste "home/wiki/Help:Import/Command-line" to the url bar

I list the wikitext below, but you're better off reading it within XOWA.

I am planning to upload these to https://gnosygnu.github.io/xowa/. However, there are a lot of pages and I'd like to automate generation and synchronization of them. If I can't get around to coding a system in the next few months, I'll just upload them all by hand.

XOWA can import a wiki using a plain text file and a command-line.
{{Help/Css}}

== Import simple.wikipedia.org through the command-line ==
* Open up a command-line. For example, on Windows, run <span class='bold'>cmd</span>
* Run the following: <span class='console'>java -jar {{#invoke:Xowa_url|plat_jar}} --cmd_file {{#invoke:Xowa_url|plat_url|xowa_build.gfs}} --app_mode cmd</span>
* Wait about 10 minutes for the script to complete
* Launch XOWA and enter <span class='url'>simple.wikipedia.org</span> in the URL bar

== Import a different wiki by editing the build script ==
* Open the following file in a [[Help:Text_editor|text editor]]: <span class='path'>{{#invoke:Xowa_url|plat_url|xowa_build.gfs}}</span>. See Script below for the full text.
* Replace all instances of <span class='bold'>simple.wikipedia.org</span> with the domain name. For example, for English Wikipedia, use <span class='bold'>en.wikipedia.org</span>
* Run the command-line import again.
* Launch XOWA and enter in the domain name in the the URL bar.

== Import a wiki with a manual download ==
=== Download the wiki dump ===
* Navigate to https://dumps.wikimedia.org/enwiki
* Click on the '''latest''' directory
* Download the file just under "'''Articles, templates, media/file descriptions, and primary meta-pages.'''". It should read '''enwiki-latest-pages-articles.xml.bz2'''
: The download is 11+ GB and may take anywhere between 2 and 5 hours to complete.
: If you also want talk pages, you should download the "'''Recombine all pages, current versions only.'''" version. It should read '''enwiki-latest-pages-meta-current.xml.bz2'''. Note that this dump is twice the size of the regular dump.

=== Specify location of the wiki dump ===
* In the build script, replace the following line:
: <span class='code'>add ('simple.wikipedia.org', 'text.init') {src_bz2_fil = '/your_directory/simplewiki-20130103-pages-articles.xml.bz2';}</span>

== Script ==
<pre class='code'>
// do not show a "Press enter to continue" at the end of the script
app.bldr.pause_at_end = 'n';

// run xowa.gfs
app.scripts.run_file_by_type('xowa_cfg_app');

// import wiki; for more info see [[Help:Import/Command-line]]
app.bldr.cmds {
  // delete all files in directory; note that subdirectories and file databases ("-file.xowa") will not be deleted
  add     ('simple.wikipedia.org' , 'util.cleanup')          {delete_all = 'y';}

  // download main dump file; contains all articles
  add     ('simple.wikipedia.org' , 'util.download')         {dump_type = 'pages-articles';}

  // download categorylinks file; contains links from category to pages
  add     ('simple.wikipedia.org' , 'util.download')         {dump_type = 'categorylinks';}

  // download page_props file; contains information on hidden categories
  add     ('simple.wikipedia.org' , 'util.download')         {dump_type = 'page_props';}

  // start wiki import
  add     ('simple.wikipedia.org' , 'text.init');

  // import articles
  add     ('simple.wikipedia.org' , 'text.page');

  // generate search data
  add     ('simple.wikipedia.org' , 'text.search');

  // generate main category data
  add     ('simple.wikipedia.org' , 'text.cat.core');

  // import category links
  add     ('simple.wikipedia.org' , 'text.cat.link');

  // apply hidden categories
  add     ('simple.wikipedia.org' , 'text.cat.hidden');

  // end import
  add     ('simple.wikipedia.org' , 'text.term');

  // import css into wiki
  add     ('simple.wikipedia.org' , 'text.css');

  // cleanup temp files; delete xml and bz2
  add     ('simple.wikipedia.org' , 'util.cleanup')          {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
}

// run cmds
app.bldr.run;
</pre>

from xowa.

pioterj avatar pioterj commented on August 12, 2024

Thanks. One of the steps is "Launch XOWA" that involves starting a UI. So actually there's no way to use it in completely headless mode, right?

from xowa.

gnosygnu avatar gnosygnu commented on August 12, 2024

So actually there's no way to use it in completely headless mode, right?

Well, there are two other ways, but I'm not sure how they'll work for your environment:

Run XOWA as an HTTP-server

Run XOWA in command-line mode

  • Open up a shell
  • Run either of the following:
    • (wikitext) java -jar xowa_linux_64.jar --app_mode cmd --show_license n --show_args n --cmd_text "app.shell.fetch_page('Help:Import/Command-line' 'wiki');"
    • (html) java -jar xowa_linux_64.jar --app_mode cmd --show_license n --show_args n --cmd_text "app.shell.fetch_page('Help:Import/Command-line' 'html');"
  • Read the text in the shell, or pipe to a text file and read in a text-editor

Let me know if neither of the above works. Thanks.

from xowa.

gnosygnu avatar gnosygnu commented on August 12, 2024

I'm going to mark this ticket closed. The command-line route should handle setup without a UI. Keep in mind this is what I use to generate all the wikis for archive.org. If you have questions, please feel free to reopen the issue and I'll respond. Thanks.

from xowa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.