Coder Social home page Coder Social logo

openrefine / commonsextension Goto Github PK

View Code? Open in Web Editor NEW
12.0 9.0 7.0 192 KB

An OpenRefine extension that helps with Wikimedia Commons editing: start projects from Wikimedia Commons categories; Commons-specific GREL functions.

License: BSD 3-Clause "New" or "Revised" License

JavaScript 23.24% Java 66.92% HTML 2.97% Less 6.86%
java openrefine sdc wikimedia wikicommons extension

commonsextension's Introduction

Wikimedia Commons Extension for OpenRefine

This extension provides several helpful functionalities for OpenRefine users who want to edit (structured data of) media files (images, videos, PDFs...) on Wikimedia Commons. For more info, documentation and how-tos about OpenRefine for Wikimedia Commons, see https://commons.wikimedia.org/wiki/Commons:OpenRefine.

Features included in this extension:

  • Start an OpenRefine project by loading file names from one or more Wikimedia Commons categories (including category depth)
  • Add columns with Commons categories and/or M-ids of each file name
  • File names will already be reconciled when starting the project
  • A few dedicated GREL commands allow basic processing and extraction of Wikitext: extractFromTemplate and value.extractCategories
  • (In this extension's 0.1.1 release and later) Basic support for file thumbnail previews of existing Wikimedia Commons files. Thumbnails are displayed for some (but not all) file types/extensions. There is currently thumbnail support for jpeg, gif, png, djvu, pdf, svg, webm and ogv files.

It works with OpenRefine 3.6.x and later versions of OpenRefine. It is not compatible with OpenRefine 3.5.x or earlier. (OpenRefine supports editing Wikimedia Commons from version 3.6; this is not possible in earlier versions.)

This extension was first released in October 2022. It has been funded by a Wikimedia project grant.

How to use this extension

Install this extension in OpenRefine

Download the .zip file of the latest release of this extension. Unzip this file and place the unzipped folder in your OpenRefine extensions folder. Read more about installing extensions in OpenRefine's user manual.

When this extension is installed correctly, you will now see the additional option 'Wikimedia Commons' when starting a new project in OpenRefine.

Start an OpenRefine project from one or more Wikimedia Commons categories

After installing this extension, click the 'Wikimedia Commons' option to start a new project in OpenRefine. You will be prompted to add one or more Wikimedia Commons categories.

There's no need to type the Category: prefix.

You can specify category depth by typing or selecting a number in the input field after each category. Depth 0 means only files from the current category level; depth 1 will retrieve files from one sub-category level down, etc.

Next, in the project preview screen (Configure parsing options), you can choose to also include a column with each file's M-id (unique MediaInfo identifier) and/or Commons categories.

File names will already be reconciled when your project starts.

When you load larger categories (thousands of files) in a new project, OpenRefine will start slowly and will give you a memory warning. This is a known issue. Wait for a bit; the project will eventually start. The Commons Extension has been tested with a project of more than 450,000 files.

GREL commands to extract data from Wikitext

The Wikimedia Commons Extension also enables two dedicated GREL commands, which help to extract specific information from the Wikitext of Wikimedia Commons files. (GREL, General Refine Expression Language, is a dedicated scripting language used in OpenRefine for many flexible data operations. For a general reference on using GREL in OpenRefine, see https://docs.openrefine.org/manual/grelfunctions.)

Firstly, retrieve the Wikitext from a list of Commons files in your project. In the column menu of the reconciled file names' column, select Edit column > Add column from reconciled values... and select Wikitext in the resulting dialog window.

From this new column with Wikitext, you can now extract values and categories as described below. Start by selecting Edit column > Add column based on this column... in the column menu. In the next dialog window, you can use various specific GREL commands:

Extract values from template parameters: extractFromTemplate

Use the following syntax:

extractFromTemplate(value, "BHL", "source")[0]

where you replace BHL with the name of the template (without curly brackets) and source with the parameter from which you want to extract the value. This GREL syntax will return the first (and usually the only) value of said parameter, e.g. https://www.flickr.com/photos/biodivlibrary/10329116385.

Extract Wikimedia Commons categories: value.extractCategories

Use the following syntax:

value.extractCategories().join('#')

This GREL syntax will return all categories mentioned in the Wikitext, separated by the # character, which you can then use to split the resulting cell further as needed.

Development

Building from source

Run

mvn package

This creates a zip file in the target folder, which can then be installed in OpenRefine.

Developing it

To avoid having to unzip the extension in the corresponding directory every time you want to test it, you can also use another set up: simply create a symbolic link from your extensions folder in OpenRefine to the local copy of this repository. With this setup, you do not need to run mvn package when making changes to the extension, but you will still to compile it with mvn compile if you are making changes to Java files, and restart OpenRefine if you make changes to any files.

Releasing it

  • Make sure you are on the master branch and it is up to date (git pull)
  • Open pom.xml and set the version to the desired version number, such as <version>0.1.0</version>
  • Commit and push those changes
  • Add a corresponding git tag, with git tag -a v0.1.0 -m "Version 0.1.0" (when working from GitHub Desktop, you can follow this process and manually add the v0.1.0 and Version 0.1.0 tags)
  • Push the tag to GitHub: git push --tags (in GitHub Desktop, just push again)
  • Create the zip file for the release: mvn package
  • Create a new release on GitHub at https://github.com/OpenRefine/CommonsExtension/releases/new, providing a release title (such as "Commons extension 0.1.0") and a description of the features in this release. Upload the zip file you generated at the previous step as an attachment (it can be found in the target subfolder of your local copy of the repository).
  • Open pom.xml and set the version to the expected next version number, followed by -SNAPSHOT. For instance, if you just released 0.1.0, you could set <version>0.1.1-SNAPSHOT</version>
  • Commit and push those changes.

commonsextension's People

Contributors

antoine2711 avatar j-sal avatar trnstlntk avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonsextension's Issues

New structure for category fetching

To simplify our code, I would propose the following architecture for the category fetching, based on Java iterators. We would need the following classes:

  • A class (say FileRecord) which would essentially represent the contents of a record in the project (although it would not yet be formatted as a list of rows). It would contain the attributes:

    • a file name
    • its corresponding mid
    • the list of categories it belongs to
  • A class where the constructor takes a single category name as parameter, and implements the Iterator<FileRecord> interface: it iterates over the file names contained in that category. In each FileRecord the categories would be left empty as a first step. So really the only task of this class would be to make the HTTP requests to the Commons API with the appropriate paging.

  • A class which takes an Iterator<FileRecord> (an iterator over file names) as parameter, and implements Iterator<FileRecord> again: its task would be to fetch the categories each file belongs to, and store them in each FileRecord.

  • A class which takes an Iterator<FileRecord> and implements TableDataReader. Its task would be to convert each FileRecord to one or more rows (by spreading the categories down on blank rows as we are currently trying to do)

With all those building blocks, you could then combine them (chain them) all together into the importer.

Implement findTemplateValues function

We could have a findTemplateValues function (name to be improved) which would work like this:

  • first argument, mandatory: the wikitext to parse
  • second argument, mandatory: the name of the template to look for in the wikitext
  • third argument, mandatory too: the name of the template parameter to extract

It would return the list of values of the given parameter in the given template.

For instance, calling findTemplateValues(value, "foo", "bar") on a cell containing the following value:

{{some template|bar=test}}
{{foo|bar={{other template}}}}
{{foo| foo = not important| bar = second value }}

should return
["{{other template}}", "second value" ].

Extensive documentation about templates in Wikitext can be found here: https://en.wikipedia.org/wiki/Help:Template (but that is probably much more than you need)

extractCategories fails on some example wikitext

Running the value.extractCategories() expression on the following cell value should give some categories as output, but it returns the empty list:

== {{int:filedesc}} ==
{{Information
|Description={{en|1=View of Earth taken during ISS Expedition 30.}}
|Source=[https://eol.jsc.nasa.gov/SearchPhotos/photo.pl?mission=ISS030&roll=E&frame=226922 JSC Gateway to Astronaut Photography of Earth]
|Date=2012-04-12 05:20:47
|Author=Earth Science and Remote Sensing Unit, NASA Johnson Space Center
|Permission=
|other_versions=
|other_fields=
{{InFi|name=Sun Azimuth|value=33°}}
{{InFi|name=Sun Elevatation|value=-34°}}
{{InFi|name=Altitude|value={{convert|211|nmi|km}}}}
{{InFi|name=Mission|value=ISS030}}
{{InFi|name=Roll|value=E}}
{{InFi|name=Frame|value=226922}}
{{InFi|name=Camera|value=NIKON D3S S/N: 2008336}}
{{InFi|name=Focal length|value=28 mm}}
}}

{{location}}

{{NASA-image|id=ISS030-E-226922|center=JSC}}
== {{int:license-header}} ==
{{PD-USGov-NASA}}

[[Category:ISS Expedition 30 Crew Earth Observations (dump)|226922]]
[[Category:Taken with Nikon D3s]]

I would be curious to know why, and if it can be fixed :)

Implement Commons categories (+ category depth) as 'starting point' for new OpenRefine projects

(This first was an issue in the OpenRefine/OpenRefine repository, but we have decided to implement this as part of the Commons extension.)

When editing batches of Wikimedia Commons files, regular Commons and GLAM contributors will typically take one or more Wikimedia Commons categories as input or 'starting point' in various tools. Examples include AC/DC, the ISA Tool, and VisualFileChange. It would be great if that would be possible in OpenRefine too (rather than asking users to start with a list of file names).

Alternatives considered

In the earlier integration scenarios, we kind of assumed that users would start off with lists of file names.

However, this does produce some extra hurdles to users (which may be especially difficult and annoying to newcomers). They would have to use other tools in order to get a list of file names on Wikimedia Commons, which complicates the workflow.

Typically, users would resort to

These tools are not rocket science, but they're not super intuitive either. It's absolutely possible to help users figure this out through documentation and by providing sample queries that are very easy to modify. That said, it will be a more smooth and seamless and less frustrating experience if the Commons Extension facilitates direct usage of Commons categories.

Additional context

Typically, Wikimedia tools allow refined interaction with Commons categories, which we want to facilitate too:

  • Choosing/selecting categories at once
  • Specifying subcategory depth for each individual category

Various screenshots of how other Wikimedia tools do this:

The ISA Tool

image

PetScan

image
(link to this example)

Thumbnail previews of media files from URL, to be uploaded to a Wikibase (or Wikimedia Commons)

Scenario: User wants to use OpenRefine to upload new files to Wikimedia Commons, from the web. They select a set of URLs (or start a project from API, or similar).

We want to give users the option to toggle previews (thumbnails) of the media files, and the option to click the thumbnails to enlarge them. Such preview thumbnails are helpful during editing (e.g. to check if a certain thing is indeed depicted in a file, without having to preview the file via another application on the local drive itself).

Wireframes for this feature have been drawn by @lozanaross for another scenario (show thumbnails of files already on Wikimedia Commons), but I think the basic UX behavior is the same? OpenRefine/OpenRefine#5154 provides the technical basis for making this feature possible.

Most recent wireframes I found (v4, development version; shows files on Commons):

image

Issue reporting in Wikibase edit schema, focused on the Wikimedia Commons use case

When using OpenRefine's schema editor to prepare edits and uploads to Wikimedia Commons, users should get clear feedback (in the 'issues' tab) about what may go wrong in their edit/upload batches.

This feedback will in some cases be different from the generic feedback for any Wikibase (i.e. not just constraint related) and specific to Wikimedia Commons: think about feedback that concerns duplicate file names, file names with invalid characters...

  • @trnstlntk to write specifications for this (what potential problems do we need to cover here; draft copy for clear error messages?..) - deadline May 31, 2022
  • @wetneb to build this - deadline June 30, 2022

Build template support (minimum version) - w/o wikitext generation

From the roadmap documentation: this relates to building support for "schemas with holes" in OpenRefine (current schemas include the values, but we also need option to preserve only the shape of a schema, w/o the metadata values). Schemas will be saved as json files in the extension repo and potentially users can contribute templates by creating a pull request and following a review process towards merging.

Thumbnail previews of media files available on Wikimedia Commons

Scenario: user wants to use OpenRefine to add structured data to existing files from Wikimedia Commons. They load a series of file paths from Wikimedia Commons and reconcile them with Wikimedia Commons.

We want to give users the option to toggle previews (thumbnails) of the media files, and the option to click the thumbnails to enlarge them. Such preview thumbnails are helpful during editing (e.g. to check if a certain thing is indeed depicted in a file, without having to click on to the file page on Commons).

Wireframes for this feature have been drawn by @lozanaross and OpenRefine/OpenRefine#5154 provides the technical basis for making this feature possible.

Most recent wireframes I found (v4, development version):

image

Display entire Commons category names in category autosuggestion

When I want to select a Commons category with a long name in the Commons Extension category selector screen, and when there are multiple categories that have the same beginning, it's currently not possible to distinguis the difference between them during autosuggest. See screenshot:

image

In this example, we have for instance Category:Sculptures in Rotterdam-Noord / Category:Sculptures in Rotterdam-Zuid etc etc. It would be good to redesign the dropdown a bit so that the full category names become visible (even though they may sometimes be really long).
In the screenshot above, the category names (truncated) are actually repeated (bigger bold and smaller greyish), which may replicate other design patterns in OpenRefine (?) but perhaps it makes more sense to display the category name only once, but then fully.

For comparison, this is what autosuggest looks like in Wikimedia Commons' own search box
image

Thumbnail previews of media files from local drive, to be uploaded to a Wikibase (or Wikimedia Commons)

Scenario: User wants to use OpenRefine to upload new files to Wikimedia Commons, from harddrive. They select a set of files from their harddrive.

We want to give users the option to toggle previews (thumbnails) of the media files, and the option to click the thumbnails to enlarge them. Such preview thumbnails are helpful during editing (e.g. to check if a certain thing is indeed depicted in a file, without having to preview the file via another application on the local drive itself).

Wireframes for this feature have been drawn by @lozanaross for another scenario (show thumbnails of files already on Wikimedia Commons), but I think the basic UX behavior is the same? OpenRefine/OpenRefine#5154 provides the technical basis for making this feature possible. In the backend

Most recent wireframes I found (v4, development version; shows files on Commons):

image

Feature request: simple image upload from one or multiple Flickr URL(s)

Many Wikimedians upload (appropriately licensed) images from Flickr to Wikimedia Commons.

Upload from Flickr is supported by the (default) Wikimedia Commons UploadWizard (up to 500 files at once): https://commons.wikimedia.org/wiki/Special:UploadWizard
image
Which is pretty barebones (it literally takes the photo description as on Flickr and then still needs manual input for license and other metadata).
image

Another frequently used tool is Flickr2Commons by Magnus Manske: https://flickr2commons.toolforge.org/#/
image
It offers a bit more flexibility and intelligence, but still uploads the descriptions in a quite barebones way.

Advantages of Flickr integration in OpenRefine would include:

  • Ability to parse and reconcile elements from the Flickr file descriptions
  • Addition of (more) refined and diverse structured data, file names, and Wikitext to the files upon upload

I am posting this after receiving a request for test uploading from the very awesome Biodiversity Heritage Library, which stores many files on Flickr and for whom such a transfer functionality (including the more advanced features that OpenRefine could offer) would be very helpful. Here's one example of an album in their vast Flickr repository. I can imagine more GLAMs are in this situation.

We can't do this before the October 2022 Wikimedia grant deadline, but it's something to keep an eye on for future development. It would be good to involve the Wikimedia Commons / OpenRefine user community to help us prioritize this feature request.

GREL function for {{Creator:Peter Paul Rubens}} {{Institution:Rijksmuseum}} style templates

Wikimedia Commons Wikitext also often contains templates that are formed as {{Creator:Peter Paul Rubens}} or {{Institution:Rijksmuseum}} (note the : instead of a | character). Because such templates are very prevalent (the {{Creator}} and {{Institution}} one are used on millions of files) it would be great to be able to have a dedicated GREL syntax/function for these. See https://commons.wikimedia.org/wiki/Template:Creator and https://commons.wikimedia.org/wiki/Template:Institution for a bit more info about both templates.

In the {{Creator:Peter Paul Rubens}} and {{Institution:Rijksmuseum}} examples respectively, the OpenRefine end user will want to extract Peter Paul Rubens and Rijksmuseum.

This file has both templates, as just one example.

Depth support for category fetching

We want to support fetching subcategories recursively up to some depth, like other tools like Petscan.

Here is a proposed architecture for this.

/**
  * Fetches a category recursively, up to the given depth, from the MediaWiki API.
  * The stream of FileRecords contains the filenames and mids, but not the related
  * categories (which must be fetched separately).
  * Set the depth to 0 to ignore subcategories.
  */
static Iterator<FileRecord> listCategoryMembers(String endpoint, String categoryName, int depth) {
    // TODO
}

/**
 * Fetches the direct subcategories of a given category, from the MediaWiki API.
 * The supplied stream contains category names (TBD: with or without the `Category:` prefix?).
 */
static Iterator<String> fetchSubcategories(String endpoint, String categoryName) {
    // TODO
}

/**
 * Fetches the files which are direct members of a given category, from the MediaWiki API.
 * The stream of FileRecords contains the filenames and mids, but not the related
 * categories (which must be fetched separately).
 */
static Iterator<FileRecord> fetchDirectFileMembers(String endpoint, String categoryName) {
   // TODO
}

/**
 * Internal function used to iterate over the paginated results of the MediaWiki API
 * when fetching files or categories. This function is used both by fetchSubcategories and
 * by fetchDirectFileMembers.
 * The `subcategories` parameter can be set to true to fetch categories and false to fetch files
 */
static Iterator<JsonNode> fetchCategoryMembers(String endpoint, String categoryName, boolean subcategories) {
   // TODO
}

To migrate to this architecture, I propose the following steps:

  • the current FileFetcher class is adapted to implement Iterator<JsonNode> instead of Iterator<FileRecord>: it is no longer responsible for parsing each JSON result into a FileRecord. Furthermore, the FileFetcher constructor takes a new boolean parameter indicating whether it should fetch files or subcategories (it cannot do both).
  • the static method fetchCategoryMembers is a simple wrapper on top of FileFetcher
  • the removed parsing code is moved into fetchDirectFileMembers, which converts the Iterator<JsonNode> to an Iterator<FileRecord> by parsing each result
  • similarly, the fetchSubcategories does a similar parsing, but extracting only the category names without pageids
  • finally, the listCategoryMembers method uses both fetchSubcategories and fetchDirectFileMembers into a recursive algorithm which parses categories up to a certain depth.

User testing in September

This issue serves as a deadline to complete and test a number of the added-value features needed to fulfil the additional WMF grant with an October deadline.

Onboarding - UI steps / documentation

Based on how the rest of the development process is going in the summer, this issue will be reviewed again and determined whether it involves interventions on the UI level (e.g. guided tour for new users via strategic pop-ups around the UI), or takes on a purely documentation character via e.g. video tutorials and written instructions.

Design a UI with which CommonsExtension users interact with Wikitext-specific GREL commands

@j-sal is building various pieces of very helpful GREL syntax that will make it easier for users of OpenRefine's Commons Extension to parse and process Wikitext.

It would be good to have a dedicated UI with which users can choose and test these various GREL commands; ideally without even having to know/memorize them or look them up. It would also be good to have some preview functionality of what each operation will do.

For inspiration / input for this task:

Register `ExtractFromTemplate()` in `controller.js`

New functions added to the Commons extension, such as the extractFromTemplate() GREL function for retrieving Template Values as specified in #2, need to be registered in the extension's controller.js module so that they are visible in OpenRefine.

Support for extracting positional template parameters

Sometimes we want to extract the values of template parameters which are positional (just designated by positions, not parameter names).

For instance:
{{location|60.165787|24.9460811}}
(taken from Sandra's parsing wishlist).

We could want to extract the first and second values in the template. One possible way would be to extend the functionality of the findTemplateValues() function that we already have, so that it accepts a number as third argument, like this:

findTemplateValues(wikitext, 'location', 1)

On the sample wikitext above, it would return ['60.165787'].

When running findTemplateValues(wikitext, 'location', 2) you would get ['24.9460811'].

Filenames already reconciled when starting an OpenRefine project from Commons categories

When users start an OpenRefine project using the Commons categories feature, it would be great if the file name column would already be reconciled against Wikimedia Commons.

Currently the project loads like this (checked M-id and categories options):
image
Where the user still needs to manually reconcile the file names.

It would be great if a project would start with the file names already reconciled like this:
image

Review interface copy and tooltips in @Lozanaross wireframes dd Apr 26, 2022

During an OpenRefine/SDC workshop at De Krook in Ghent, April 26, 2022, @lozanaross presented a new version of Wikimedia Commons wireframes: https://xd.adobe.com/view/fdf5a12c-9c30-4449-9eba-d1ea8523dddb-a8a6/

Several of these wireframes need a review by @trnstlntk

  • General review of interface copy that is specific to Wikimedia Commons

  • Create draft texts for tooltips - (i) icons which are newly introduced here

  • For tooltips for structured data statements in the Schema dialog: investigate whether these are generically applicable to Commons or Wikidata (and hence be a Wikidata statement) or are really OR-specific (and hence need to be maintained outside Wikidata)

Make usage of 'category depth' input field a bit clearer by greying out (until category is entered) and perhaps by integrating a counter

To make the function of the 'category depth' input box a bit clearer, @lozanaross suggests it should be greyed out until the user picks a category, then it becomes white.

Additionally, it could include a counter (currently users didn't understand they are supposed to type a number). If fetching only the current category means depth=0, perhaps the input box can be prefilled with a 0 by default, and when clicking the user gets a dropdown with more numbers to select?

Allow people to remove Commons categories from the selected list, while listing desired categories to retrieve files from

When I start the Commons extension, and I am listing the Commons categories I am interested in retrieving files from, I sometimes makes mistakes. I may have listed several categories (and their depth) but then may want to remove one or more of them again (because I changed my mind, I mistyped, etc)

E.g. in the example below:
image
Oops! I absolutely do NOT want to load files from the Category:Sculptures in Amsterdam. Why would I? Sculptures in Rotterdam and Delft are way cooler.

It would be great to add the option to remove categories here, by adding a cross at the end of each line (behind the category depth field). If the user clicks that cross, the line will be deleted and files from the category will not be retrieved anymore.

We already use such a 'removing cross' in various other places in OpenRefine's interface, e.g. in the selection of (and removal of) reconciliation services:
Screenshot 2022-09-06 at 14 42 03

Warn user that they are working with one or more file name(s) that already exist(s) on Wikimedia Commons

File names on Wikimedia Commons must be unique (two files can't have the same name).

The default Wikimedia Commons UploadWizard warns the user when they are naming a file the same as one that already exists.

Image

It would be great if OpenRefine also warns uploaders of new Commons files if this happens. I can imagine this will be part of the 'Issues' tab when creating a schema for uploading files to Commons, see #22

Build the UI to pick a template

This relates specifically to the possibility to select templates from a dropdown, as well as possibility to save them (see latest version of the wires below):

Image

Integrate thumbnail previews during the Wikimedia Commons batch file upload process

See https://commons.wikimedia.org/wiki/Commons_talk:OpenRefine

Request received in a conversation in the Wikimedia Commons Telegram channel. During a batch upload process of media files, it is extremely helpful if one can (easily) see thumbnail previews of the media files that are being uploaded.

Some existing Wikimedia Commons (batch) upload tools support this indeed (the default UploadWizard, for instance), others don't (Pattypan only shows previews of files and their infoboxes during the checking phase of the upload process, after all data has been prepared already).

OpenRefine is essentially a data-centric tool, so this may be a stretch, but it's good to have this request on the radar, as it makes a lot of sense IMO.

Support the specific OpenRefine/SDC upload workflow from a IIIF endpoint

When talking to potential users of the Structured Data on Commons (SDC) batch upload functionalities for OpenRefine, we hear a lot about the use case of IIIF endpoints.

IIIF is the International Image Interoperability Framework. According to the framework's website it is "a set of open standards for delivering high-quality, attributed digital objects online at scale. It’s also an international community developing and implementing the IIIF APIs. IIIF is backed by a consortium of leading cultural institutions."

Many cultural institutions around the world present their files through a IIIF endpoint. This is indeed a section-wide API standard.

Many IIIF endpoint managers are, or may be, interested to upload files to Wikimedia Commons leveraging this specific set of APIs.

  1. In any endpoint, the source files to be uploaded to Wikimedia Commons can be called upon in a specific standardized way.
  2. Metadata about the files (if present) can also be called upon in the same kind of standardized way.

OpenRefine users can use both of these specific API calls, during project creation and while wrangling data inside OpenRefine. But that's advanced stuff, and we can make that process easier.

We can tackle this in various ways:

  1. Lightweight, documentation-focused approach: we don't build specific features for IIIF users but we document the process well for them;
  2. And/or (perhaps at a later stage, if we see a lot of interest in this) we indeed create a specific IIIF-focused feature or wizard, probably to be used during project creation.

Rename 'Include nested category levels:' to 'Subcategory depth:'

During user testing, @lozanaross asked users (without giving them instructions) to guess what the subcategory depth input field meant. Many people thought it was a checkbox and most couldn't guess what it was supposed to do (especially people unfamiliar with other tools to work with Wikimedia Commons categories).

image

We suggest to rename the Include nested category levels: text in the interface to Subcategory depth: which is a bit shorter, and gives more indication that a number needs to be entered in the input field.

User testing in June

This issue serves as a deadline to complete and test a number of the minimum requirement issues to fulfil the original WMF grant with what we promised for a June deadline.

Make it possible to Import IIIF collections

IIIF and the IIIF Presentation API are used by many GLAM institutions and the ability to import records IIIF Collections would greatly reusers who wish to clean GLAM data or users of the Commons extension.

Proposed solution

Given the collection root URL, an importer would traverse its content and fetch data from the various IIIF manifests in it.

Additional context

Category autocompletion while entering Commons categories

Refinement / specific sub-feature for #3.

Many Wikimedia Commons tools/interfaces allow users to enter / work with Commons categories. Usually, these tools or interfaces offer autocompletion of names of Commons categories. We will make Wikimedia Commons users in OpenRefine quite happy if the Commons Extension also offers this functionality!

Just for inspiration, showing what this looks like in various tools.

image
In the Wikimedia Commons UploadWizard

image
In the HotCat gadget (note the blue checkmark that appears when the user has selected a correct category name)

image
In the ISA Tool (the user types something without Category: but then Category: is being displayed)

Finalize basic Commons-specific schema template specifications: Information, Artwork, Art Photo, Book

We've continued thinking about Wikimedia Commons-specific template support inside OpenRefine's Wikibase schema builder.

We've come up with the concept of 'schema templates': basically, these are empty Wikibase schemas inside OpenRefine. For the Wikimedia Commons use case, we want to add a few default ones corresponding to frequently-used file information templates that are also Structured Data on Commons-driven (Information, Artwork, Art Photo, Book - for now). Users will be able to add their own.

image

Current wireframes by @lozanaross were based on a spreadsheet by Sandra, but need some tweaking and finalization (basically, there will be other custom statements that reflect current SDC modeling conventions). @trnstlntk will create these based on her knowledge of these modeling practices.

Update README.md with basic documentation on installation and functionalities of this extension

By end October it would be good to update this repo's README.md so that it becomes clear for laypeople who want to install and use the extension. Basic things to include:

  • Info on how to install the extension (can link to more specific info in our docs, but let's have some basic info here too)
  • Info and a few examples on the GREL commands we built (same)
  • Info and examples of the workflow of starting a project with Commons categories (same)

Decide upon, design and develop parsing options that appear after user has entered Commons categories

After a user has entered one or more Commons categories to start an OpenRefine project with, it makes sense to present them with some custom parsing options in the 'Configure Parsing Options' dialog window.

Several that I can think of, which would make sense, from the top of my head:

  • Reconcile file names (+ let users specify the language against which they want to reconcile - for possible data extension they may want to do later)
  • Display thumbnails of files? y/n
  • Display column with Wikitext? y/n
  • Display column with categories? y/n
  • Display one or more columns with some SDC in it already (user specifies the properties they are interested in)

I may miss some obvious ones, and I can imagine that conversations with potential end users may give us more good ideas/suggestions.

As for building this, the existing Wikitext parsing options in OpenRefine can be used for inspiration (although that interface is not optimal).

In our March 17 team meeting we talked about this a bit. Some of this should be retrievable via API, bypassing the need for the end user to run the Commons reconciliation service, which would be a good thing!

Add extension tab in UI

The Commons extension needs to be accessible from the 'Create project'->'Get data from' options.

Follow documentation from the technical reference to implement the required .js files, using the 'Database' and the 'GData' extension files as examples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.