Coder Social home page Coder Social logo

dfe-digital / gias-query-tool Goto Github PK

View Code? Open in Web Editor NEW
10.0 13.0 2.0 4.85 MB

The easiest way to write advanced queries against the Get Information About Schools (GIAS) database

Home Page: https://gias-api-development.test.teacherservices.cloud/

Makefile 0.04% SQL 99.96%
data postgis postgresql uk-schools gias gias-api

gias-query-tool's Introduction

The [Unofficial] GIAS Query Toolkit

Ever wanted quickly query the Get Information About Schools dataset but couldn't because it's provided in an unwieldy CSV file that's not properly-encoded and has too many columns?

Great, you're in the right place!

This set of tools downloads and imports the data into a locally-running PostgreSQL database and lets you take advantage of PostGIS to supercharge your queries.

Getting up and running

Prerequisites

  • GNU Make, used to run the automatic download and import
  • GNU Iconv, fix file encoding
  • GNU Wget for downloading the GIAS CSV file
  • PostgreSQL with an local superuser account
  • PostGIS for geographic query goodness

Running the command

To download, cleanse, import and build the data objects only a single command should be required.

make

When debugging, use make refresh to run the import steps without repeatedly downloading the export file.

make refresh

Manual importing

The entire import, apart from file cleansing, is written in standard SQL. Executing the statements needs to be done in the correct order, the Makefile is the best place to get a feel for how it works.

Sending this data to BigQuery

Dependencies

pip install bigquery-schema-generator
brew install gcloud-cli
gcloud auth login
gcloud config set project PROJECT_ID

The script assumes you have a gias dataset in your BigQuery project and a Google Cloud Storage bucket called rugged-abacus-uploads.

Uploading

First build the tables in your local database

make

Then run the import

make load_to_bq

Tables and views

The importer creates the following database objects:

Name Type Description
schools table All schools, both open and closed
deprivation_pupil_premium table DPP information broken down by school
open_schools materialized view Only open schools
regions table England's regions and associated gegoraphic information
local_authorities table England's local authorities and associated gegoraphic information
establishment type School types (eg. Foundation school, Free school)
establishment_group type School categories (eg. Independent Schools, Universities, Colleges
gender type School gender policies (eg. Boys, Girls, Mixed)
ofsted_rating type All Ofsted ratings, including deprecated ones
phase type School phases (eg. Secondary, Primary, 16 plus)
rural_urban_classification type Classification of a school's setting, source links in definition

FAQs

Why use enumerated types when you could've just used a varchar?

Efficiency aside, the main reason is to allow ordering by rank rather than alphabetic position.

I want to query a column that's not included in the import, how do I add it?

  1. Add your desired column to the create statement in ddl/tables/create_schools.sql. Ensure it is the correct datatype.
  2. Then add your new column name to the list of target columns in dml/import_schools.sql and add a the corresponding column in the source data (i.e. the source CSV) to the select part of the statement. For datatypes other than varchar, cast appropriately with ::new_datatype There are plenty of examples of casting in the file already
  3. See if it worked, run make refresh to drop everything and re-import

Where can I find and how do I import other geographic data?

There are plenty of great sources for educational geographic data:

Once you've found a useful dataset, select the Shapefile download option if available. Now we can use shp2pgsql to import it. It comes with a GUI which makes the process very simple.

shp2pgsql spatial data loader

Alternatively, and more-flexibly (if the dataset you want isn't available as a Shapefile), you can use GDAL's ogr2ogr.

Nomenclature

Word Definition
EduBase The old name for Get information about schools (GIAS)
Ofsted The Office for Standards in Education, Children's Services and Skills (Ofsted) is a non-ministerial department of the UK government, reporting to Parliament.A
URN A six-digit number used by the UK government to identify educational establishments in the United Kingdom.

Example queries

"What's the breakdown of school genders by Ofsted rating?" ๐Ÿ˜•

select
  os.ofsted_rating as "Ofsted rating",
  os.gender,
  count(*)
from
  open_schools os
group by
  os.ofsted_rating,
  os.gender
order by
  os.ofsted_rating,
  os.gender
\crosstabview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Ofsted rating     โ”‚ Boys โ”‚ Girls โ”‚ Mixed โ”‚ (null) โ”‚ Not applicable โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Outstanding          โ”‚   58 โ”‚    95 โ”‚  3343 โ”‚      1 โ”‚                โ”‚
โ”‚ Good                 โ”‚  139 โ”‚   107 โ”‚ 13844 โ”‚      1 โ”‚                โ”‚
โ”‚ Requires improvement โ”‚   37 โ”‚    19 โ”‚  2017 โ”‚        โ”‚                โ”‚
โ”‚ Inadequate           โ”‚   20 โ”‚    13 โ”‚    70 โ”‚        โ”‚                โ”‚
โ”‚ Serious Weaknesses   โ”‚    2 โ”‚     1 โ”‚    97 โ”‚        โ”‚                โ”‚
โ”‚ Special Measures     โ”‚    7 โ”‚     1 โ”‚   165 โ”‚        โ”‚                โ”‚
โ”‚ (null)               โ”‚  166 โ”‚   225 โ”‚  4886 โ”‚    256 โ”‚           1345 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

"Find all the schools within 3km of Stonehenge" ๐Ÿค”

select
  os.urn,
  os.name
from
  open_schools os
where st_dwithin(
  os.coordinates,                         -- Database column that holds the school's location
  st_setsrid(
    st_makepoint(-1.826194, 51.178868),   -- Stonehenge's coords
    4326                                  -- World Geodetic System
  ),
  3000                                    -- Search radius in metres
);


โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  urn   โ”‚                     name                      โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 145545 โ”‚ Larkhill Primary School                       โ”‚
โ”‚ 143006 โ”‚ St Michael's Church of England Primary School โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Obligatory sense check ๐Ÿง

larkhill_primary

Looks good!

"I'd like a percentile summary of the twenty local authorities with the lowest average deprivation pupil premium, excluding authorities with fewer than fifteen qualifying schools" ๐Ÿคญ

select
    os.local_authority,
    percentile_disc(0.4) within group (order by dpp.allocation) as "P40", -- discrete percentile at 0.4 (40%)
    percentile_disc(0.5) within group (order by dpp.allocation) as "P50",
    percentile_disc(0.6) within group (order by dpp.allocation) as "P60",
    percentile_disc(0.7) within group (order by dpp.allocation) as "P70",
    percentile_disc(0.8) within group (order by dpp.allocation) as "P80",
    percentile_disc(0.9) within group (order by dpp.allocation) as "P90"
from
    deprivation_pupil_premium dpp
inner join
    open_schools os on dpp.urn = os.urn
group by
    os.local_authority
having
    count(*) > 15                                                         -- only select local authorities with more than fifteen schools
order by
    avg(dpp.allocation::decimal) asc                                      -- order by DPP allocation ascending, we want the lowest
limit
    20
;

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     local_authority      โ”‚    P40     โ”‚    P50     โ”‚    P60     โ”‚    P70     โ”‚     P80     โ”‚     P90     โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Rutland                  โ”‚ ยฃ13,200.00 โ”‚ ยฃ15,840.00 โ”‚ ยฃ18,480.00 โ”‚ ยฃ23,760.00 โ”‚  ยฃ43,560.00 โ”‚  ยฃ70,125.00 โ”‚
โ”‚ North Yorkshire          โ”‚ ยฃ10,560.00 โ”‚ ยฃ15,895.00 โ”‚ ยฃ25,080.00 โ”‚ ยฃ40,920.00 โ”‚  ยฃ62,645.00 โ”‚  ยฃ98,175.00 โ”‚
โ”‚ Cumbria                  โ”‚ ยฃ14,520.00 โ”‚ ยฃ20,570.00 โ”‚ ยฃ31,680.00 โ”‚ ยฃ48,620.00 โ”‚  ยฃ64,680.00 โ”‚ ยฃ118,800.00 โ”‚
โ”‚ West Berkshire           โ”‚ ยฃ19,800.00 โ”‚ ยฃ25,080.00 โ”‚ ยฃ38,280.00 โ”‚ ยฃ52,800.00 โ”‚  ยฃ74,800.00 โ”‚ ยฃ100,320.00 โ”‚
โ”‚ Windsor and Maidenhead   โ”‚ ยฃ26,400.00 โ”‚ ยฃ31,680.00 โ”‚ ยฃ43,560.00 โ”‚ ยฃ60,720.00 โ”‚  ยฃ74,800.00 โ”‚ ยฃ100,045.00 โ”‚
โ”‚ Wokingham                โ”‚ ยฃ22,440.00 โ”‚ ยฃ34,320.00 โ”‚ ยฃ43,560.00 โ”‚ ยฃ58,080.00 โ”‚  ยฃ73,865.00 โ”‚  ยฃ91,080.00 โ”‚
โ”‚ Herefordshire, County of โ”‚ ยฃ21,120.00 โ”‚ ยฃ26,400.00 โ”‚ ยฃ34,320.00 โ”‚ ยฃ50,490.00 โ”‚  ยฃ62,040.00 โ”‚ ยฃ118,800.00 โ”‚
โ”‚ Wiltshire                โ”‚ ยฃ21,120.00 โ”‚ ยฃ31,680.00 โ”‚ ยฃ44,880.00 โ”‚ ยฃ59,400.00 โ”‚  ยฃ81,840.00 โ”‚ ยฃ114,840.00 โ”‚
โ”‚ Buckinghamshire          โ”‚ ยฃ22,440.00 โ”‚ ยฃ29,040.00 โ”‚ ยฃ37,895.00 โ”‚ ยฃ47,685.00 โ”‚  ยฃ64,680.00 โ”‚ ยฃ135,465.00 โ”‚
โ”‚ Shropshire               โ”‚ ยฃ18,480.00 โ”‚ ยฃ23,760.00 โ”‚ ยฃ40,920.00 โ”‚ ยฃ62,645.00 โ”‚  ยฃ83,215.00 โ”‚ ยฃ125,290.00 โ”‚
โ”‚ Central Bedfordshire     โ”‚ ยฃ22,440.00 โ”‚ ยฃ35,640.00 โ”‚ ยฃ50,160.00 โ”‚ ยฃ63,360.00 โ”‚  ยฃ89,760.00 โ”‚ ยฃ128,095.00 โ”‚
โ”‚ Oxfordshire              โ”‚ ยฃ22,440.00 โ”‚ ยฃ30,360.00 โ”‚ ยฃ42,240.00 โ”‚ ยฃ61,380.00 โ”‚  ยฃ88,440.00 โ”‚ ยฃ134,173.00 โ”‚
โ”‚ Cheshire East            โ”‚ ยฃ19,800.00 โ”‚ ยฃ29,040.00 โ”‚ ยฃ47,520.00 โ”‚ ยฃ68,640.00 โ”‚ ยฃ100,320.00 โ”‚ ยฃ144,925.00 โ”‚
โ”‚ South Gloucestershire    โ”‚ ยฃ26,400.00 โ”‚ ยฃ34,320.00 โ”‚ ยฃ43,560.00 โ”‚ ยฃ62,040.00 โ”‚  ยฃ81,840.00 โ”‚ ยฃ136,043.00 โ”‚
โ”‚ Devon                    โ”‚ ยฃ21,120.00 โ”‚ ยฃ26,400.00 โ”‚ ยฃ44,880.00 โ”‚ ยฃ60,720.00 โ”‚  ยฃ92,400.00 โ”‚ ยฃ141,240.00 โ”‚
โ”‚ Dorset                   โ”‚ ยฃ22,440.00 โ”‚ ยฃ31,790.00 โ”‚ ยฃ47,520.00 โ”‚ ยฃ68,640.00 โ”‚  ยฃ92,400.00 โ”‚ ยฃ130,680.00 โ”‚
โ”‚ Gloucestershire          โ”‚ ยฃ23,760.00 โ”‚ ยฃ33,000.00 โ”‚ ยฃ49,060.00 โ”‚ ยฃ66,000.00 โ”‚  ยฃ96,305.00 โ”‚ ยฃ133,705.00 โ”‚
โ”‚ Leicestershire           โ”‚ ยฃ30,360.00 โ”‚ ยฃ44,880.00 โ”‚ ยฃ58,080.00 โ”‚ ยฃ71,280.00 โ”‚  ยฃ91,080.00 โ”‚ ยฃ134,640.00 โ”‚
โ”‚ Somerset                 โ”‚ ยฃ27,720.00 โ”‚ ยฃ36,960.00 โ”‚ ยฃ51,425.00 โ”‚ ยฃ72,600.00 โ”‚  ยฃ90,695.00 โ”‚ ยฃ153,120.00 โ”‚
โ”‚ Surrey                   โ”‚ ยฃ33,000.00 โ”‚ ยฃ42,240.00 โ”‚ ยฃ56,540.00 โ”‚ ยฃ75,240.00 โ”‚  ยฃ99,110.00 โ”‚ ยฃ141,240.00 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Try doing that in Excel ๐Ÿ˜…

"List all the currently-open schools in London excluding those in Kensington and Chelsea, Southwark, and Tower Hamlets" ๐Ÿคจ

with local_authorities_to_exclude as (
  select
    st_union(la.edge) as edges               -- union multiple edges into a single geometry
  from
    local_authorities la
  where
    name in (
      'Kensington and Chelsea',
      'Southwark',
      'Tower Hamlets'
    )
)
select
  distinct on (urn)
  os.urn,
  os.name,
  os.coordinates
from
  open_schools os
inner join                                   -- join on region containing coordinates
  regions r
    on st_contains(
      r.edge,
      os.coordinates::geometry
    )
inner join                                   -- exclude the named LAs from above
  local_authorities la
    on not st_contains(
      (select edges from local_authorities_to_exclude),
      os.coordinates::geometry
    )
where
  r.name = 'London'
;

There are too many results to list, but here's a screenshot displaying the results in QGIS. Note that QGIS fully supports PostGIS, all queries that include a geospatial column can be displayed and manipulated by the software and used to create reports or perform advanced queries.

Schools in London minus Kensington and Chelsea, Tower Hamlets and Southwark

gias-query-tool's People

Contributors

duncanjbrown avatar neilgilmour avatar peteryates avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gias-query-tool's Issues

We may host an API for this - is that ok?

Hiya @peteryates, I'm planning to build an API for our internal use using this repo as a starting point but build the API in Laravel. We'd be very happy to open-source it and/or host it and offer free access with registration. As the BDFL of GIAS querying I wanted to get your thoughts and blessing...

Test the importer on macOS

It appears that when running make on macOS, the reencoding of the file, done by iconv might fail.

This is probably due to an inconsistency between the versions of iconv available on Linux and macOS. If anyone with a Mac fancies trying this out please let me know the result here and I'll update the Makefile accordingly.

Fix the importer

Some new columns have been added to the dataset which causes the importer to fail.

The changes are:

72c72
< ,"InspectorateName (name)"
---
> ,"BSOInspectorateName (name)"
119c119
< ,"Inspectorate (name)"
---
> ,"InspectorateName (name)"
131a132,137
> ,"QABName (code)"
> ,"QABName (name)"
> ,"EstablishmentAccredited (code)"
> ,"EstablishmentAccredited (name)"
> ,"QABReport"
> ,"CHNumber"
  • Add the new columns to schools_raw
  • Test the import to make sure it works

Update email address list

This FOI request should give us a more up-to-date list of email addresses than the one supplied in #2. Once the request has been fulfilled, this can be started.

  • Replace data/email-addresses-2019-16-01.csv with the updated version and remove unnecessary columns
  • Check the import works

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.