Coder Social home page Coder Social logo

alphagov / govuk-knowledge-graph-search Goto Github PK

View Code? Open in Web Editor NEW
0.0 31.0 1.0 3.72 MB

Web app to make the Knowledge Graph simpler to search

Home Page: https://docs.data-community.publishing.service.gov.uk/tools/govsearch/

JavaScript 3.24% SCSS 3.22% Dockerfile 0.10% TypeScript 88.36% Shell 0.43% Nunjucks 4.66%
govuk

govuk-knowledge-graph-search's Introduction

Gov Search (frontend)

Gov Search is a search engine for GOV.UK, with advanced functionality for content designers. This repository includes the code of the GovSearch front-end.

This is an ExpressJS application written in TypeScript. It shows the user a search interface and queries the backend to fetch and display search results.

The full documentation is available in the Data Community Tech Docs.

Running locally

  • Prerequisite: you need to be logged in to gcloud (using the gcloud command) and have access to BigQuery (specifically you need the BigQuery Viewer and BigQuery Job Runner roles). Ask a BigQuery admin to add you.

  • Clone this repository

  • Run npm install to install all dependencies

  • Install Sass and compile the Sass sources to CSS with sass ./src/frontend/scss/main.scss > ./public/main.css

  • Install webpack and compile the browser-side Typescript code to JavaScript by just running webpack

  • Copy assets from the GOVUK Frontend run npm run copy-assets

  • Set an environment variable called PROJECT_ID to the name of the GCP project your server will be running on. This is so the server knows where to connect to to run searches get the data. For instance, use govuk-knowledge-graph-dev to get the data from the development backend.

  • Set an environment variable called ENABLE_AUTH to "false" (or anthing but "true", or don't set one at all), as you won't need authentication on your local machine

  • Start the server with npm run dev.

  • Point your browser to https://localhost:8080 (the port can be changed using the PORT environment variable)

Developing

Files

  • src/backend/app.ts: the main server file. All the .ts files in the same folder are server-side code.

  • src/backend: the server-side files.

  • src/frontend: the main browser-side files. webpack compiles everything to public/main.js.

  • src/scss/main.scss: the Sass file that sass compiles to public/main.css

  • ./public/assets: publicly served fonts and images

Software architecture

Mostly for historical reasons, much of the functionality offered runs browser-side. That's why the application is more JavaScript-heavy than your usual alphagov app. Although the JavaScript code is generated from TypeScript sources, it doesn't use any framework like React.

The browser-side code uses the Elm Architecture model: the whole application's state is held in a variable called state, and a function called view renders the HTML that corresponds to the current value of state, and sets event handlers on that HTML. Whenever an event happens (user clicks on a button, or a search returns) the handleEvent function updates the state accordingly and runs view again. This forms the main interaction loop. For instance:

  • The user enters search terms and clicks "search"
  • The DOM event handler (defined in view.ts) triggered runs handleEvent, which:
    • retrieves the new search terms from the form
    • and updates the state (specifically state.selectedKeywords) with the new values
    • starts the search in BigQuery via the API offered by the ExpressJS server.
    • and meanwhile calls view to show the "Please wait" message.
  • Eventually the API call returns and triggers handleEvent, which updates state with the search results
  • handleEvent also calls view
  • view renders the state, including the search results.
  • The page waits for the next event etc.

Running tests

unit tests

We use Jest npm run test

end-to-end tests

We use Cypress, which is installed automatically on installing the dev npm packages. If Chrome is installed on your system it should be as simple as running npx cypress open for the interactive version and npx cypress run for the command-line version.

To run a single test file , use --spec. For instance:

cypress run --spec cypress/e2e/url.cy.ts

Deployment

Staging Deployment

Staging deployment is triggered automatically whenever a pull request is merged into the main branch.

This is made possible by the deploy-staging GitHub Action.

Steps:

  1. Create your feature/fix on a new branch.
  2. Create a PR targeting the main branch.
  3. Ensure the PR passes all the CI checks and has been approved
  4. Merge
  5. The action will deploy the changes. CloudRun revisions that have been deployed automatically start with the "main-" prefix.

Manually: You can also trigger the workflow manually in the actions tab of the repo, by selecting the "deploy-staging" workflow.

Production Deployment

Production deployments have to be triggered manually for security reasons.

  1. After testing the changes in the staging environment, create a PR of the main branch against the production branch.
  2. Review and approve the PR.
  3. Merge the PR.
  4. Wait for completion of the create-release-tag GitHub Action. It will create a new release and a tag in the GitHub repository for every production deployment.
  5. Once the release appears in the repository, manually run the production-deploy workflow. Select the latest tag from the production branch.
  6. (Optional) Write a custom description for the release in Github.

Please note:

  • Deployments from local machine should be limited to the development environment.
  • The deploy-staging GitHub Action will run again during the PR from main to production. This is expected and won't cause issues.
  • There may be a slight delay (usually less than a minute) before the new release appears in the repository after the create-release-tag action has completed.

Deployment Steps

  1. Go to production site https://govgraphsearch.dev/ and view the source.
  2. Look for the line beginning <!-- Google Tag Manager (noscript) --><noscript><iframe src="https://www.googletagmanager.com/ns.html? and note the values of the URL parameters id and gtm_auth. They look like GTM-XXXXXXX and aWEg5ABBXXXXXXXXXXXXXXXXX.
  3. Run the script deploy-to-gcp.sh located at the root directory
  4. Enter the value of id as in step 2 as the GTM tracking ID.
  5. Enter the value of gtm_auth in step 3 as the GTM AUTH.
  6. You may be prompted to authenticate with gcloud run login, in which case do so and start again.
  7. Choose the GCP region europe-west2
  8. Continue. Check in the web console that a revision was deployed, and try using it at https://govgraphsearch.dev.

## Logging

We use Pino for logging. Pino enables structured logging, human-readable formatting, top-notch performance.

⚠️⚠️ Pino doesn't behave the same as console.log. ⚠️⚠️

You can't pass it infinite arguments. Instead, it's one string at a time. If you want to log an object alongside your log, then pass the object as the first argument, then the string.

e.g

console.log('this', 'is', 'a', 'test')
// =
log.info('this is a test')

console.log('Look at this object', { ...anObject })
// =
log.info({ ...anObject }, 'Look at this object')

console.log('A few objects', { ...obj1 }, { ...obj2 })
// Separate your logs instead
log.info('A few objects')
log.info({ ...obj1 }, 'Object 1')
log.info({ ...obj2 }, 'Object 2')

// This WONT WORK
// Second object will be swallowed as objects go to the first argument only.
log.info(obj1, obj2)

If you want the request object logged alongside:

function middeware(req, res, next) {
  req.log.info('A log')
  // logs: {...req} A log
}

To use the logger otherwise:

import log from './utils/logging'

log.info('in msg')
log.debug('debug msg')
log.error('error')
log.error(error, 'error')

// If you want to log an object:
log.info({ a: 123, b: 456 }, 'This is a log')

govuk-knowledge-graph-search's People

Contributors

dependabot[bot] avatar guilhem-fry avatar j-marvin-gds avatar jideboris avatar maxf avatar nacnudus avatar sengi avatar somme avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nacnudus

govuk-knowledge-graph-search's Issues

Filter by date

Users ask for this, but more research is needed because there are so many different ways to filter by date.

  • By published_at or first_published_at or updated_at or publisher_updated_at?
  • On/before/after/between dates?

Populate dropdowns from tables in the search dataset

Trello

It would be best to only ever query the search dataset from this app, so that it's easier to reason about the data pipeline.

bigQuery(`
SELECT DISTINCT locale
FROM \`content.locale\`
`),
bigQuery(`
SELECT name
FROM \`search.taxon\`
`),
bigQuery(`
SELECT DISTINCT title
FROM \`graph.organisation\`
`),

Tasks

Support both partial and exact matches in link search

Current link search is only partial. A user requested exact matches of /government/statistical-data-sets, which returns many results that link to URLs that only begin with that string.

Would a single checkbox suffice? Exact matches only?

We should check that the links being matched are fully resolved, i.e. the above search would match exactly https://www.gov.uk/government/statistical-data-sets, even if the actual link in a page is only relative.

Adding any more filters might require a redesign

Users have asked for date-range filters on two fields: first published at, and updated at. This would introduce several new and complex filters into a design that already has a lot of filters.

Current filters: keyword search

image

Current filters: advanced search

image

Proposed filters: all types of search

  • on a date
  • on or before a date
  • on or after a date
  • between two dates (inclusive)
  • As above, for both the "First published at" and the "Updated at" fields

Resources

Related ideas

The "tabs" could be removed, so that only a single kind of search is offered, which would be the current "Advanced" search.

image

Ignore common words in meta queries

Searching for "The Ministry of Justice" won't show the meta box for it, because "The" isn't in the Organisation's title.
Maybe it would be better to remove "The" (and other common words, like "of" or "for").

But we need to think of cases where it makes a difference first.

Bug: can't do advanced search without keywords

https://trello.com/c/7qi16AUA/2248-allow-search-without-keywords

If you want to show all pages that are published using Specialist Publisher you would normally go to Advanced search, select Publisher and click Search. But that does nothing.

This is because search won't start if the keywords field is empty, which is a bug. A workaround is to just enter a whitespace in the field, but a proper fix would be to allow search to run even with an empty keywords field.

Optionally support fuzzy search

Users report that it's annoying to search for words that happen to be substrings of common longer words, such as "cat" being a substring of "category". There's an unmet need for a Google-style search that is also filterable by metadata.

Support results other than pages

A recent request was for a list of links. Not a list of pages where a matching link appears, but a list of the links themselves.

a list of all the urls linked to from GOV.UK that are part of the domain example.gov.uk

We used to do this, but a redesign and dwindling staff numbers meant we couldn't continue.

Other types of result that could be returned:

  • abbreviation
  • organisation
  • role
  • person
  • transaction
  • contact details
  • table
  • phone number
  • entity

Some of these are still part of the things table in GovGraph.

Fix CSV generation

  1. Dates are now shown as objects
  2. Withdrawn reason (and possibly other prose) isn't properly escaped

Add a column for the number of documents in a collection

I made a spreadsheet of document collections, and how many documents are in each collection. 376 (6%) of them don't have any documents. Another 247 (4%) only have one document. Someone in another organisation found this useful, and asked whether they can run their own report.

This requires work in https://github.com/alphagov/govuk-knowledge-graph-gcp. It could use links from the Publishing API database, or it could get them from the Content API database.

Accessibility: add "jump to" links

So that you can jump back to the top of the form from the focused Results heading, as well as jump to the actual results from the heading.

Invisible links but that become visible by tabbing

Mixed search with taxons misses results

Searching for the keyword "passeport" and dispaying the taxon of results will show the "Brexit" taxon.
However searching for "passeport" and specifying to search in the "Brexit" taxon will return no results.

Add tests

We need integration and unit tests to avoid breaking anything when making changes.

Simplest would be to ensure the generated cypher queries are the right ones depending on the values of the user-set parameters.

Partial-matching of links within GOV.UK

https://trello.com/c/LpFksgoJ/2249-allow-partial-matching-in-link-search

A user has asked:

We're looking for a partial query string /email-signup?link=/topic/ to find pages where government publishers have added links to specialist topic page email subscriptions to the main body of the page.

The email sign up button is on each specialist topic page - for example https://www.gov.uk/topic/business-tax/vat where the full email sign up link is https://www.gov.uk/email-signup?link=/topic/business-tax/vat

We've seen a couple of examples where content designers have added a link to the email sign up in the body of the page in the past. Unhelpfully we don't have those page links now and we also need to identify all the pages that might have that so they can be removed when we retire specialist topic pages.

Currently, if what the user enters into the link-search box matches a regular expression ^((https:\/\/)?((www\.)?gov\.uk))?\/ then it is normalised to begin with https://www.gov.uk/, and an exact match is required. Otherwise, a partial match is allowed.

The users' search works if they omit the leading / and search for email-signup?link=/topic/, but there's no way that they could know to do that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.