Ever wanted quickly query the Get Information About Schools dataset but couldn't because it's provided in an unwieldy CSV file that's not properly-encoded and has too many columns?
Great, you're in the right place!
This set of tools downloads and imports the data into a locally-running PostgreSQL database and lets you take advantage of PostGIS to supercharge your queries.
- GNU Make, used to run the automatic download and import
- GNU Iconv, fix file encoding
- GNU Wget for downloading the GIAS CSV file
- PostgreSQL with an local superuser account
- PostGIS for geographic query goodness
To download, cleanse, import and build the data objects only a single command should be required.
make
When debugging, use make refresh
to run the import steps without repeatedly downloading
the export file.
make refresh
The entire import, apart from file cleansing, is written in standard SQL. Executing
the statements needs to be done in the correct order, the Makefile
is the best
place to get a feel for how it works.
Dependencies
pip install bigquery-schema-generator
brew install gcloud-cli
gcloud auth login
gcloud config set project PROJECT_ID
The script assumes you have a gias
dataset in your BigQuery project and a
Google Cloud Storage bucket called rugged-abacus-uploads
.
First build the tables in your local database
make
Then run the import
make load_to_bq
The importer creates the following database objects:
Name | Type | Description |
---|---|---|
schools |
table |
All schools, both open and closed |
deprivation_pupil_premium |
table |
DPP information broken down by school |
open_schools |
materialized view |
Only open schools |
regions |
table |
England's regions and associated gegoraphic information |
local_authorities |
table |
England's local authorities and associated gegoraphic information |
establishment |
type |
School types (eg. Foundation school, Free school) |
establishment_group |
type |
School categories (eg. Independent Schools, Universities, Colleges |
gender |
type |
School gender policies (eg. Boys, Girls, Mixed) |
ofsted_rating |
type |
All Ofsted ratings, including deprecated ones |
phase |
type |
School phases (eg. Secondary, Primary, 16 plus) |
rural_urban_classification |
type |
Classification of a school's setting, source links in definition |
Why use enumerated types when you could've just used a varchar
?
Efficiency aside, the main reason is to allow ordering by rank rather than alphabetic position.
- Add your desired column to the
create
statement inddl/tables/create_schools.sql
. Ensure it is the correct datatype. - Then add your new column name to the list of target columns in
dml/import_schools.sql
and add a the corresponding column in the source data (i.e. the source CSV) to theselect
part of the statement. For datatypes other thanvarchar
, cast appropriately with::new_datatype
There are plenty of examples of casting in the file already - See if it worked, run
make refresh
to drop everything and re-import
There are plenty of great sources for educational geographic data:
Once you've found a useful dataset, select the Shapefile download option
if available. Now we can use shp2pgsql
to import it. It comes with a GUI which makes the process very simple.
Alternatively, and more-flexibly (if the dataset you want isn't available as a Shapefile), you can use GDAL's ogr2ogr.
Word | Definition |
---|---|
EduBase | The old name for Get information about schools (GIAS) |
Ofsted | The Office for Standards in Education, Children's Services and Skills (Ofsted) is a non-ministerial department of the UK government, reporting to Parliament.A |
URN | A six-digit number used by the UK government to identify educational establishments in the United Kingdom. |
select
os.ofsted_rating as "Ofsted rating",
os.gender,
count(*)
from
open_schools os
group by
os.ofsted_rating,
os.gender
order by
os.ofsted_rating,
os.gender
\crosstabview
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโฌโโโโโโโโฌโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ Ofsted rating โ Boys โ Girls โ Mixed โ (null) โ Not applicable โ
โโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโชโโโโโโโโชโโโโโโโโชโโโโโโโโโชโโโโโโโโโโโโโโโโโก
โ Outstanding โ 58 โ 95 โ 3343 โ 1 โ โ
โ Good โ 139 โ 107 โ 13844 โ 1 โ โ
โ Requires improvement โ 37 โ 19 โ 2017 โ โ โ
โ Inadequate โ 20 โ 13 โ 70 โ โ โ
โ Serious Weaknesses โ 2 โ 1 โ 97 โ โ โ
โ Special Measures โ 7 โ 1 โ 165 โ โ โ
โ (null) โ 166 โ 225 โ 4886 โ 256 โ 1345 โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโดโโโโโโโโดโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโ
"Find all the schools within 3km of Stonehenge" ๐ค
select
os.urn,
os.name
from
open_schools os
where st_dwithin(
os.coordinates, -- Database column that holds the school's location
st_setsrid(
st_makepoint(-1.826194, 51.178868), -- Stonehenge's coords
4326 -- World Geodetic System
),
3000 -- Search radius in metres
);
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ urn โ name โ
โโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโก
โ 145545 โ Larkhill Primary School โ
โ 143006 โ St Michael's Church of England Primary School โ
โโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Obligatory sense check ๐ง
Looks good!
"I'd like a percentile summary of the twenty local authorities with the lowest average deprivation pupil premium, excluding authorities with fewer than fifteen qualifying schools" ๐คญ
select
os.local_authority,
percentile_disc(0.4) within group (order by dpp.allocation) as "P40", -- discrete percentile at 0.4 (40%)
percentile_disc(0.5) within group (order by dpp.allocation) as "P50",
percentile_disc(0.6) within group (order by dpp.allocation) as "P60",
percentile_disc(0.7) within group (order by dpp.allocation) as "P70",
percentile_disc(0.8) within group (order by dpp.allocation) as "P80",
percentile_disc(0.9) within group (order by dpp.allocation) as "P90"
from
deprivation_pupil_premium dpp
inner join
open_schools os on dpp.urn = os.urn
group by
os.local_authority
having
count(*) > 15 -- only select local authorities with more than fifteen schools
order by
avg(dpp.allocation::decimal) asc -- order by DPP allocation ascending, we want the lowest
limit
20
;
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ local_authority โ P40 โ P50 โ P60 โ P70 โ P80 โ P90 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโชโโโโโโโโโโโโโชโโโโโโโโโโโโโชโโโโโโโโโโโโโชโโโโโโโโโโโโโโชโโโโโโโโโโโโโโก
โ Rutland โ ยฃ13,200.00 โ ยฃ15,840.00 โ ยฃ18,480.00 โ ยฃ23,760.00 โ ยฃ43,560.00 โ ยฃ70,125.00 โ
โ North Yorkshire โ ยฃ10,560.00 โ ยฃ15,895.00 โ ยฃ25,080.00 โ ยฃ40,920.00 โ ยฃ62,645.00 โ ยฃ98,175.00 โ
โ Cumbria โ ยฃ14,520.00 โ ยฃ20,570.00 โ ยฃ31,680.00 โ ยฃ48,620.00 โ ยฃ64,680.00 โ ยฃ118,800.00 โ
โ West Berkshire โ ยฃ19,800.00 โ ยฃ25,080.00 โ ยฃ38,280.00 โ ยฃ52,800.00 โ ยฃ74,800.00 โ ยฃ100,320.00 โ
โ Windsor and Maidenhead โ ยฃ26,400.00 โ ยฃ31,680.00 โ ยฃ43,560.00 โ ยฃ60,720.00 โ ยฃ74,800.00 โ ยฃ100,045.00 โ
โ Wokingham โ ยฃ22,440.00 โ ยฃ34,320.00 โ ยฃ43,560.00 โ ยฃ58,080.00 โ ยฃ73,865.00 โ ยฃ91,080.00 โ
โ Herefordshire, County of โ ยฃ21,120.00 โ ยฃ26,400.00 โ ยฃ34,320.00 โ ยฃ50,490.00 โ ยฃ62,040.00 โ ยฃ118,800.00 โ
โ Wiltshire โ ยฃ21,120.00 โ ยฃ31,680.00 โ ยฃ44,880.00 โ ยฃ59,400.00 โ ยฃ81,840.00 โ ยฃ114,840.00 โ
โ Buckinghamshire โ ยฃ22,440.00 โ ยฃ29,040.00 โ ยฃ37,895.00 โ ยฃ47,685.00 โ ยฃ64,680.00 โ ยฃ135,465.00 โ
โ Shropshire โ ยฃ18,480.00 โ ยฃ23,760.00 โ ยฃ40,920.00 โ ยฃ62,645.00 โ ยฃ83,215.00 โ ยฃ125,290.00 โ
โ Central Bedfordshire โ ยฃ22,440.00 โ ยฃ35,640.00 โ ยฃ50,160.00 โ ยฃ63,360.00 โ ยฃ89,760.00 โ ยฃ128,095.00 โ
โ Oxfordshire โ ยฃ22,440.00 โ ยฃ30,360.00 โ ยฃ42,240.00 โ ยฃ61,380.00 โ ยฃ88,440.00 โ ยฃ134,173.00 โ
โ Cheshire East โ ยฃ19,800.00 โ ยฃ29,040.00 โ ยฃ47,520.00 โ ยฃ68,640.00 โ ยฃ100,320.00 โ ยฃ144,925.00 โ
โ South Gloucestershire โ ยฃ26,400.00 โ ยฃ34,320.00 โ ยฃ43,560.00 โ ยฃ62,040.00 โ ยฃ81,840.00 โ ยฃ136,043.00 โ
โ Devon โ ยฃ21,120.00 โ ยฃ26,400.00 โ ยฃ44,880.00 โ ยฃ60,720.00 โ ยฃ92,400.00 โ ยฃ141,240.00 โ
โ Dorset โ ยฃ22,440.00 โ ยฃ31,790.00 โ ยฃ47,520.00 โ ยฃ68,640.00 โ ยฃ92,400.00 โ ยฃ130,680.00 โ
โ Gloucestershire โ ยฃ23,760.00 โ ยฃ33,000.00 โ ยฃ49,060.00 โ ยฃ66,000.00 โ ยฃ96,305.00 โ ยฃ133,705.00 โ
โ Leicestershire โ ยฃ30,360.00 โ ยฃ44,880.00 โ ยฃ58,080.00 โ ยฃ71,280.00 โ ยฃ91,080.00 โ ยฃ134,640.00 โ
โ Somerset โ ยฃ27,720.00 โ ยฃ36,960.00 โ ยฃ51,425.00 โ ยฃ72,600.00 โ ยฃ90,695.00 โ ยฃ153,120.00 โ
โ Surrey โ ยฃ33,000.00 โ ยฃ42,240.00 โ ยฃ56,540.00 โ ยฃ75,240.00 โ ยฃ99,110.00 โ ยฃ141,240.00 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
Try doing that in Excel ๐
"List all the currently-open schools in London excluding those in Kensington and Chelsea, Southwark, and Tower Hamlets" ๐คจ
with local_authorities_to_exclude as (
select
st_union(la.edge) as edges -- union multiple edges into a single geometry
from
local_authorities la
where
name in (
'Kensington and Chelsea',
'Southwark',
'Tower Hamlets'
)
)
select
distinct on (urn)
os.urn,
os.name,
os.coordinates
from
open_schools os
inner join -- join on region containing coordinates
regions r
on st_contains(
r.edge,
os.coordinates::geometry
)
inner join -- exclude the named LAs from above
local_authorities la
on not st_contains(
(select edges from local_authorities_to_exclude),
os.coordinates::geometry
)
where
r.name = 'London'
;
There are too many results to list, but here's a screenshot displaying the results in QGIS. Note that QGIS fully supports PostGIS, all queries that include a geospatial column can be displayed and manipulated by the software and used to create reports or perform advanced queries.