alphagov / govuk-knowledge-graph-search Goto Github PK
View Code? Open in Web Editor NEWWeb app to make the Knowledge Graph simpler to search
Home Page: https://docs.data-community.publishing.service.gov.uk/tools/govsearch/
Web app to make the Knowledge Graph simpler to search
Home Page: https://docs.data-community.publishing.service.gov.uk/tools/govsearch/
https://trello.com/c/0e30HOUh/2251-improve-the-deployment-pipeline
Following many frontend changes, the tests need updating.
We're seeing blank entries (rather than unset) in custom dimensions in the analytics data, since #201 was deployed.
Searching for the keyword "passeport" and dispaying the taxon of results will show the "Brexit" taxon.
However searching for "passeport" and specifying to search in the "Brexit" taxon will return no results.
Maybe some aria attribute
https://trello.com/c/7qi16AUA/2248-allow-search-without-keywords
If you want to show all pages that are published using Specialist Publisher you would normally go to Advanced search, select Publisher and click Search. But that does nothing.
This is because search won't start if the keywords field is empty, which is a bug. A workaround is to just enter a whitespace in the field, but a proper fix would be to allow search to run even with an empty keywords field.
Currently we use the BigQuery tables from the govuk-knowledge-graph in the GCP project.
See https://github.com/alphagov/govuk-knowledge-graph-search/blob/bigquery/bigquery.ts#L13
We should use en environment variable instead so the right BigQuery is used depending on the
project we're on (dev, staging or production).
When more than one metabox we might want to add a disambiguation page
The back button currently doesn't always go back to the previous page.
Searching for "The Ministry of Justice" won't show the meta box for it, because "The" isn't in the Organisation's title.
Maybe it would be better to remove "The" (and other common words, like "of" or "for").
But we need to think of cases where it makes a difference first.
Users have asked for date-range filters on two fields: first published at, and updated at. This would introduce several new and complex filters into a design that already has a lot of filters.
Current filters: advanced search
The "tabs" could be removed, so that only a single kind of search is offered, which would be the current "Advanced" search.
e.g. ?selected-words=justice&search-in-title=true&area=any
Either use variables, which is a bit complicated when you have a varying number of keywords in your query, or sanitize the text inputs: keywords, excluded keywords, links, etc.
See also https://neo4j.com/developer/kb/protecting-against-cypher-injection/
So that you can jump back to the top of the form from the focused Results heading, as well as jump to the actual results from the heading.
Invisible links but that become visible by tabbing
use npm ci
Since the CSV is the same even if the users uses pagination or clicks toggles, we don't need to recalculate it every time.
The BigQuery project history shows the same query twice when "topic" search is used once.
SELECT "Taxon" as type, * FROM search.taxon WHERE lower(name) = lower(@name);
https://trello.com/c/120IsxSh/2250-fix-query-descriptions
govuk-knowledge-graph-search/src/ts/utils.ts
Line 100 in 541671c
For example, when searching "Uk help and services in Bahrain", the keywords that are actually used are "Uk", "help", "services", "in", and "Bahrain", but the query description says "pages that contain help and services and Bahrain".
https://trello.com/c/LpFksgoJ/2249-allow-partial-matching-in-link-search
A user has asked:
We're looking for a partial query string
/email-signup?link=/topic/
to find pages where government publishers have added links to specialist topic page email subscriptions to the main body of the page.The email sign up button is on each specialist topic page - for example https://www.gov.uk/topic/business-tax/vat where the full email sign up link is https://www.gov.uk/email-signup?link=/topic/business-tax/vat
We've seen a couple of examples where content designers have added a link to the email sign up in the body of the page in the past. Unhelpfully we don't have those page links now and we also need to identify all the pages that might have that so they can be removed when we retire specialist topic pages.
Currently, if what the user enters into the link-search box matches a regular expression ^((https:\/\/)?((www\.)?gov\.uk))?\/
then it is normalised to begin with https://www.gov.uk/
, and an exact match is required. Otherwise, a partial match is allowed.
The users' search works if they omit the leading /
and search for email-signup?link=/topic/
, but there's no way that they could know to do that.
See this GovGraph PR:
It would be best to only ever query the search
dataset from this app, so that it's easier to reason about the data pipeline.
govuk-knowledge-graph-search/bigquery.ts
Lines 80 to 91 in 17feed8
I made a spreadsheet of document collections, and how many documents are in each collection. 376 (6%) of them don't have any documents. Another 247 (4%) only have one document. Someone in another organisation found this useful, and asked whether they can run their own report.
This requires work in https://github.com/alphagov/govuk-knowledge-graph-gcp. It could use links from the Publishing API database, or it could get them from the Content API database.
A recent request was for a list of links. Not a list of pages where a matching link appears, but a list of the links themselves.
a list of all the urls linked to from GOV.UK that are part of the domain example.gov.uk
We used to do this, but a redesign and dwindling staff numbers meant we couldn't continue.
Other types of result that could be returned:
Some of these are still part of the things
table in GovGraph.
When the info box for persons is merged, it will only show one organisation per role, because I couldn't quickly see how to show more.
Some roles (I've forgotten the example) belong to more than one organisation.
And perhaps even HTML. A user tried to search for s1.
, which is GovSpeak for a numbered step.
I haven't checked that this is really a bug, but I think contentID
must be contentId
otherwise it will silently return null
.
govuk-knowledge-graph-search/src/ts/neo4j.ts
Line 423 in f5c9804
We need integration and unit tests to avoid breaking anything when making changes.
Simplest would be to ensure the generated cypher queries are the right ones depending on the values of the user-set parameters.
Simplifies the CI pipeline as you no longer need to install and run sass and webpack/ts.
Current link search is only partial. A user requested exact matches of /government/statistical-data-sets
, which returns many results that link to URLs that only begin with that string.
Would a single checkbox suffice? Exact matches only
?
We should check that the links being matched are fully resolved, i.e. the above search would match exactly https://www.gov.uk/government/statistical-data-sets
, even if the actual link in a page is only relative.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.