waldyrious / primerpedia Goto Github PK

View Code? Open in Web Editor NEW

11.0 4.0 7.0 133 KB

Simplified extracts of Wikipedia articles, showing just the basic information.

Home Page: https://primerpedia.toolforge.org

License: Other

JavaScript 64.88% HTML 19.47% CSS 15.65%

wikipedia tldr summary

primerpedia's Introduction

Primerpedia

Primerpedia is a proof-of-concept demo for the Concise Wikipedia proposal. It provides short summaries of Wikipedia articles, for when you just need a quick overview at the topic.

Try the live demo here: https://primerpedia.toolforge.org

^{(If that link is down, try this one
or this one.)}

To achieve this, it uses the MediaWiki API to fetch the first section of an article, and cleans it up for presentation, removing extra details and editing-related templates (currently using the cleanup procedure implemented by the MobileFrontend extension).

This tool should also help identify issues with the lead sections of Wikipedia articles, which, according to the Manual of Style, "should define the topic and summarize the body of the article with appropriate weight."

primerpedia's People

Contributors

Stargazers

Watchers

Forkers

bennylin swalling imclab jdew192837 witia1

primerpedia's Issues

Implement search function

Preferably with autocomplete

Move auxiliary functions to a separate utils.js file

To avoid getting primerpedia.js too long, and to better separate primerpedia-specific functionality from generic DOM manipulation functions, I believe it could be a good idea to split the auxiliary functions (e.g. apiRequest, toggleVisibility, clearNode) to a separate file, say utils.js or somesuch.

The main goal is to keep the project's sources simple, concise, and easy to read (and thus welcoming to new contributors).

Add level-of-detail / length control

There could be a set of radio buttons, a dropdown list or even a slider (UI up for discussion) to pick between different content lengths:

clause
- structured
  - Wikidata description
  - short descriptions
- unstructured
sentence (WP:LEADSENTENCE)
paragraph (WP:LEADPARAGRAPH)
section (WP:LEADSECTION)

Better layout in small screens

Followup to #10

make it fit correctly in mobile browsers

do they report larger sizes than they should?

Allow loading a specific page by passing an url parameter

Suggested here by oscar.vives

Question: using query string (location.search) or fragment identifier (location.hash)? A query string seems to be more appropriate from a semantic point of view (we'd be actually providing the app with parameters to dictate its output, rather than specifying a part of the complete output to be shown).

Setup Project for Primerpedia

Proposal for how to configure the first project for primerpedia.

The way projects are easiest to understand in the context of github is that they are a representation of the existing workflow in the context of a fixed goal.

This goal can be a certain event (like the end of an release cycle) or a timespan (often also called sprint) or a set date.

I'm not sure if this project has any formal "launch" since it's an ever evolving tech demo (as I understand it) so I think the best thing would be to have annual projects. That way older tickets and issue can "fade away".

A Project contains Columns which represent the current state of the managed issue. By convention an issue should usually step through each (or most) columns from left to right. These manipulations are tracked within the issue.

I propose these columns which correspond with a new workflow

Preparation
Backlog
In Progress
On Hold
Done

Since Issues can be added as Notes into an Project, Column 1 Preparation is to be used for that. Also any Ticket that will be solved within this Project but has to be supplemented in any way should be stored here.

Backlog contains any Issues that remain yet to be worked on in the current Project.

Any issue that is currently worked on needs to be moved into In Progress . While we did not yet have the problem that a issue is worked on by two different developers this will ensure that there is a specific point in time when work on a ticket started. In addition to that this should also go hand in hand with setting an assignee. But I don't think that's possible for non contributors.

Issues that are resolved should finally be put into Done. This can also be used when changing to the next Project.

Additionally to the Columns remains the Topic of Labels. Since we do not use them for anything but the intended use case I do not see any need for change.

Example Picture:

Fix permalink feature to include the full URL

The "copy permalink" feature was implemented in #42, but there seems to be an issue. Relevant quotes from #41:

if I search for a string (or load a random article) and then click the clipboard icon, what's copied is not the entire URL but just, e.g. /primerpedia/?search=John_Doe.

and

This [happens] since the function getShareableLink only uses location.pathname which of course only contains /primerpedia.
It's a straight forward fix to add location.origin as well but I'm fairly certain that I tested this functionality thoroughly.

Probably wanted to append it to

primerpedia/primerpedia.js

Line 112 in 8dfcbcb

var shareLink = window.location.href;

but forgot?

and

While trying to implement this feature I found that it's not completely done with just adding window.location.origin (as it can start with // which would break the link). There needs some proper url handling (add if search is missing, change if it's available) in place.

Setup translation on TranslateWiki.net

Once the multilingual setup described in #46 is in place, it could be nicer to set up the translation on TranslateWiki.net, which would reuse an existing platform that's well integrated with Wikimedia projects, and provide us with more visibility among the community of translators (meaning, hopefully, more available languages for end users).

At that point, the i18n work done on our custom system, assuming it is a simple one, should be primarily in the translation side, so it would be reusable in the most part, by importing the existing translations into TWN.

Update URL upon submitting a search query

So that the URL can be shared. Background: #13.

Ditch jQuery

It's overkill. IIRC only the ajax call is using jQuery to an appreciable amount. Try replacing it with vanilla JS. Potentially useful links:

Show automated summary from Reasonator when article doesn't have an intro section

Show loading text & graphic

Due to AJAX requests. Especially useful if on slow connections.

Don't show disambiguation pages when using random function

What others should be hidden as well? maybe lists?

Use "search" or "page" URL parameters, depending on the intent

#21 and #34 implemented a way to control the interface of Primerpedia via a URL parameter, which is currently named "search".

Here's a use case where the current behavior may be confusing:

When I enter a search term, the search box keeps the entered text, but the URL shows the title of the page; if I then click the random link twice, the URL changes each time to reflect this, and the search box is emptied. All this is fine and expected.
But if I now click the back button on the browser, the search field gets filled with the value of the URL, which is redundant for pages loaded via the random function, and is incorrect for pages loaded via the search function (since the search terms typically do not match the title of the article exactly, even if only in capitalization).

I'm wondering if we shouldn't set the url parameter to search= and page=, respectively, in order to be able to make the interface behave as expected in both cases, i.e. only fill the search box automatically for pages loaded via search, and show the search terms rather than the actual article title.

Of course, if both parameters are present simultaneously, one must take precedence: we could either (1) fill the search terms back into the search input without triggering a search (and perform a search with the exact page title to load it), or (2) ignore the page title and execute a search with the provided terms, hoping that it results in the same page (which most of the time it probably will). For a URL like ...?search=united_states&page=canada, the first approach would show the Canada article, while the second approach would show the United States article.

Use + or _ instead of %20 in URL, for ease of sharing

(See #34 and #22 for context.)

Currently the URL is updated to reflect the article being shown, and this includes raw spaces. For example:

http://waldyrious.github.io/primerpedia/?search=Avatar%20(2009%20film)

It would be easier to read and share such URLs if the spaces were instead replaced by +, as is customary for form-submitted inputs, or _, which is what Wikipedia uses for article titles. So in the case above, that would be:

http://waldyrious.net/primerpedia/?search=Avatar_(2009_film)

Both URLs should work and show the title in the HTML with spaces, and in the URL with either _ or +, for both searches performed manually, and articles loaded using the random feature.

Fix the icons so the entire image is clickable

Probably making the icons display: inline-block will make the active (clickable) area cover the entire image.

However, the problem may be something else, since the active area seems to be shifted downwards rather than actually smaller than the image.

Speculative: aggregated stats

Example of possible stats:

Percentage of articles with descriptions at each level (as described in #19)
Appropriateness of the lead (as described in more detail in #51)
Comparison of first sentence with Wiktionary description and Wikidata description (see #31)

Translate the Primerpedia interface (i18n)

A simple approach could be to store a config file in this repo (as suggested in #38) for the strings that would need to be translated; this will probably be a quick way to getting a functional multi-language site off the ground.

Eventually, we will want to set up the translation on TranslateWiki, rather than rolling out our own translation system in this repo. That's tracked in #47.

(Edit - just fixed a broken link)

Support loading articles from other languages

~~If not the interface itself, at least the content.~~ edit: as discussed below, this issue will tackle only content multi-language support, and others will be opened to address the interface translation.

apiUrl and article.url need to be configurable.

Make Primerpedia available on Wikimedia Toolforge

It appears to be possible to host Toolforge apps directly from a github repo -- see for example the OABot project (repository / live site).

Having Primerpedia on the Toolforge would increase its visibility and integrate it better with the Wikimedia ecosystem. (Note: it would continue to be available via the Github Pages hosting service, at https://waldyrious.github.io/primerpedia and http://waldyrious.net/primerpedia.)

New Toolforge domains

(The first one seems to have CSS issues)

Announcement post: New names for everyone!

Support browser history navigation

See http://onjava.com/pub/a/onjava/2005/10/26/ajax-handling-bookmarks-and-back-button.html?page=2 (there are probably more concise guides around, but this is a start)

Add permalink to current search query

Having a link visible in the actual page would make sharing the page more evident than simply copying the URL, which would be available once #22 is implemented.

Disable search button when search field is empty

Currently, clicking the search button with an empty search field results a random page being loaded, which might be unexpected. It would be better to deactivate the search button whenever the search field is empty.

Replace gif loading icon with SVG one

This one, for instance, would be a great candidate, as it's freely licensed and the actual source code is very simple.

Add description of what Primerpedia is before any article is loaded

Allow login and editing directly from Primerpedia

Will need server-side component, see http://www.mediawiki.org/wiki/API:Data_formats#JSON_callback_restrictions (state-changing actions aren't possible when using JSON in callback mode)

Hide CC icon

Don't show it until an article is actually loaded. Also add to the tooltip that the link to the original article is on the title.

Alternatively, load a random article on load (this doesn't affect the tooltip enhancement)

cleanup text

remove infoboxes, parser errors (missing references, maybe others?) maintenance templates, etc.

Show links to other articles

...and make them point to primerpedia. Currently all links are stripped by the MobileFrontend API module. (I wonder if this behavior is customizable?)

URLs with spaces and parenthesis not preserved when navigating the history

Noticed a bug where URLs with spaces and parenthesis (e.g. “John_Bowman_(footballer)”) are not properly preserved when navigating the history back and forth (even though they do get filled into the search bar correctly!)

Add instructions for managing the Toolserver instance

Follow-up of #50. Mostly for self-reference.

Commands for restarting the web service if it's down
Commands for updating the static copy at https://tools-static.wmflabs.org/primerpedia/

Get rid of flowtype.js and use CSS instead

Currently we're using (a variant of) flowtype.js for dynamically adjusting text size and line spacing according to the container element's size. Modern browsers make this possible using only CSS (based on the available screen size), so it would be nice to get rid of the extra script.

Here's a very simple (2-line) implementation, which could be adapted for this project.

Provide an automated score of the lead content

The lead shouldn't be too long, nor too short; it should be readable, etc.

Such issues may serve as prompts for the user, with suggestions for editing the lead ~~(#3)~~.

The templates listed at Wikipedia:Template messages/Cleanup § Introduction may help in identifying articles with additional issues in the lead.

add link to edit original intro

?action=edit&section=0

[upstream] implement extracts using Parsoid

The extracts are currently generated with the old API which doesn't produce valid xhtml5, so any malformed ajax request just won't show up in the content area.

Parsoid does produce xhtml5, so Extension:TextExtracts needs to be extended to support parsing with Parsoid rather than the old API. Trail of bug tracker links (updated as of 2017-11-20):

~~https://bugzilla.wikimedia.org/65169~~
original request: "TextExtracts: Support for Parsoid-based extracts"
~~https://phabricator.wikimedia.org/T67169~~
same issue, migrated to Phabricator
~~https://phabricator.wikimedia.org/T113094~~
"[EPIC] The Page Summary API needs to provide useful content for the majority of articles"
~~https://phabricator.wikimedia.org/T166272~~
sub-issue of the above. "HTML version of text extracts is not balanced/well formed and naive".
Closed, declined:

I have made the decision that this will not be fixed. T170617 will make sure that api consumers know about this problem. For those who want well formed HTML we will be providing a new service on RESTBase which will guarantee that (please follow along with T113094)

https://phabricator.wikimedia.org/T165017
AC (acceptance criteria?): "[...] The extract property is present and contains valid HTML."
Closed, resolved. Does this mean the functionality is now available?
https://phabricator.wikimedia.org/T177431
it looks like this is the task whose completion will enable well-formed HTML extracts (extract_html in the example output). Note: isn't that already available?

Handle articles with empty intros

e.g. http://en.wikipedia.org/wiki/1123_in_Ireland?oldid=516608391

A list of such articles is available at Category:Pages missing lead section.

Add a footer listing the license and the URL of the article

Suggested here by @hashar

Write a CONTRIBUTING.md file

Simple instructions about what's expected of contributors and of maintainers, mention of semantic newlines, and other items I might be overlooking at the moment.

Fetch description from Wikidata

Once the simpler variable content length feature is implemented (#19), it would be nice to compare the first sentence of the Wikipedia article with the Wikidata description, and eventually even allow editing them (#3)

The API calls to get the description from Wikidata should be pretty similar to the ones already used to fetch the article content, but I haven't tried them. From a quick look, I'd say that either wbsearchentities or wbgetentities actions are the ones we'd care about (there are examples at the bottom of both pages, as well as links to the live sandbox where experiments can be performed).

get json through api
normalize searching datalist using javascript