Coder Social home page Coder Social logo

alphagov / govuk-knowledge-graph-search Goto Github PK

View Code? Open in Web Editor NEW
0.0 29.0 1.0 3.66 MB

Web app to make the Knowledge Graph simpler to search

Home Page: https://docs.data-community.publishing.service.gov.uk/tools/govsearch/

JavaScript 3.24% SCSS 3.22% Dockerfile 0.10% TypeScript 88.38% Shell 0.43% Nunjucks 4.63%
govuk

govuk-knowledge-graph-search's Issues

Mixed search with taxons misses results

Searching for the keyword "passeport" and dispaying the taxon of results will show the "Brexit" taxon.
However searching for "passeport" and specifying to search in the "Brexit" taxon will return no results.

Bug: can't do advanced search without keywords

https://trello.com/c/7qi16AUA/2248-allow-search-without-keywords

If you want to show all pages that are published using Specialist Publisher you would normally go to Advanced search, select Publisher and click Search. But that does nothing.

This is because search won't start if the keywords field is empty, which is a bug. A workaround is to just enter a whitespace in the field, but a proper fix would be to allow search to run even with an empty keywords field.

Ignore common words in meta queries

Searching for "The Ministry of Justice" won't show the meta box for it, because "The" isn't in the Organisation's title.
Maybe it would be better to remove "The" (and other common words, like "of" or "for").

But we need to think of cases where it makes a difference first.

Adding any more filters might require a redesign

Users have asked for date-range filters on two fields: first published at, and updated at. This would introduce several new and complex filters into a design that already has a lot of filters.

Current filters: keyword search

image

Current filters: advanced search

image

Proposed filters: all types of search

  • on a date
  • on or before a date
  • on or after a date
  • between two dates (inclusive)
  • As above, for both the "First published at" and the "Updated at" fields

Resources

Related ideas

The "tabs" could be removed, so that only a single kind of search is offered, which would be the current "Advanced" search.

image

Accessibility: add "jump to" links

So that you can jump back to the top of the form from the focused Results heading, as well as jump to the actual results from the heading.

Invisible links but that become visible by tabbing

Partial-matching of links within GOV.UK

https://trello.com/c/LpFksgoJ/2249-allow-partial-matching-in-link-search

A user has asked:

We're looking for a partial query string /email-signup?link=/topic/ to find pages where government publishers have added links to specialist topic page email subscriptions to the main body of the page.

The email sign up button is on each specialist topic page - for example https://www.gov.uk/topic/business-tax/vat where the full email sign up link is https://www.gov.uk/email-signup?link=/topic/business-tax/vat

We've seen a couple of examples where content designers have added a link to the email sign up in the body of the page in the past. Unhelpfully we don't have those page links now and we also need to identify all the pages that might have that so they can be removed when we retire specialist topic pages.

Currently, if what the user enters into the link-search box matches a regular expression ^((https:\/\/)?((www\.)?gov\.uk))?\/ then it is normalised to begin with https://www.gov.uk/, and an exact match is required. Otherwise, a partial match is allowed.

The users' search works if they omit the leading / and search for email-signup?link=/topic/, but there's no way that they could know to do that.

Populate dropdowns from tables in the search dataset

Trello

It would be best to only ever query the search dataset from this app, so that it's easier to reason about the data pipeline.

bigQuery(`
SELECT DISTINCT locale
FROM \`content.locale\`
`),
bigQuery(`
SELECT name
FROM \`search.taxon\`
`),
bigQuery(`
SELECT DISTINCT title
FROM \`graph.organisation\`
`),

Tasks

Add a column for the number of documents in a collection

I made a spreadsheet of document collections, and how many documents are in each collection. 376 (6%) of them don't have any documents. Another 247 (4%) only have one document. Someone in another organisation found this useful, and asked whether they can run their own report.

This requires work in https://github.com/alphagov/govuk-knowledge-graph-gcp. It could use links from the Publishing API database, or it could get them from the Content API database.

Fix CSV generation

  1. Dates are now shown as objects
  2. Withdrawn reason (and possibly other prose) isn't properly escaped

Support results other than pages

A recent request was for a list of links. Not a list of pages where a matching link appears, but a list of the links themselves.

a list of all the urls linked to from GOV.UK that are part of the domain example.gov.uk

We used to do this, but a redesign and dwindling staff numbers meant we couldn't continue.

Other types of result that could be returned:

  • abbreviation
  • organisation
  • role
  • person
  • transaction
  • contact details
  • table
  • phone number
  • entity

Some of these are still part of the things table in GovGraph.

Add tests

We need integration and unit tests to avoid breaking anything when making changes.

Simplest would be to ensure the generated cypher queries are the right ones depending on the values of the user-set parameters.

Support both partial and exact matches in link search

Current link search is only partial. A user requested exact matches of /government/statistical-data-sets, which returns many results that link to URLs that only begin with that string.

Would a single checkbox suffice? Exact matches only?

We should check that the links being matched are fully resolved, i.e. the above search would match exactly https://www.gov.uk/government/statistical-data-sets, even if the actual link in a page is only relative.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.