Data Base Laboratory files project. The project is the modeling of data about the Higher Education.
All database from Inep
can be downloaded here (258MB Zip).
The files is structured the follow way:
ANEXOS/
contains all files that describe the datas;DADOS/
contains all database files inzip->csv
;FILTROS/
contains one file with some tips of how to model inSql
; andLEIA-ME/
contains a file that explain how to open the database files in softwareR
,SPSS
andSAS
.
One important file is ANEXOS/ANEXO I - Dicion rio de Dados e Tabelas Auxiliares/Dicion rio_de_Dados.xlsx
that contains the description of each field in all database.
This is the modeling that how all data are structured.
The nexts steps describe how to create all environment.
Execute the follows commands:
git clone https://github.com/danielventurini/ideal-memory
cd ideal-memory/
curl download.inep.gov.br/microdados/microdados_educacao_superior_2017.zip --output microdados.zip
unzip microdados.zip
mv Microdados_Educacao_Superior_2017 microdados # rename path
cd microdados/DADOS/
find . -name "*.zip" -exec unzip {} \; # extract all files
Using the PostgreSQL
, execute the query files in the follow order:
before execute this file -
create_temps.sql
-, open at line406-411
and change the/[absolute/path]/
to your absolute path.
Tip: Before the execution files, you can use the follow command:
drop schema if exists public cascade;
create schema public;
-
The
DM_CURSO.csv
file contains unique wrong register. At line6302
and columnT
, exists a wrong value inTP_ATRIBUTO_INGRESSO
that doesn't exists in tableTP_ATRIBUTO_INGRESSO
. All values from this table is0
,1
and2
; and the value at line6302
is3
. For this, in filecreate_temps
, at last line, this register is deleted from thecurso_temp
. -
The file
insert_tables.sql
has aUPDATE
that take many, many, many hours: line 995