Coder Social home page Coder Social logo

brazil-civil-registry-data's Introduction

Brazil Civil Registry Data

Raw scrapings of https://transparencia.registrocivil.org.br/

The idea is that if we minimize the number of people scraping their website, everyone will benefit, so this repo will try to keep fine grained data as possible. Due to the design of their website extracting detailed information may be costly.

If you feel any data you need is missing, please open an issue here.

Notice: This repo is just a copy of the data available at the site and isn't responsible for it, please read their documentation.

Also, the site scrapping is a continuous, incremental and lengthy process, and may introduce additional errors in the data, beware of that when analyzing it.

Tables

civil_registry_xxxxx.csv

Registrations at https://transparencia.registrocivil.org.br/registros

Monthly entries, contains all the reported cities and states, since 2015, there are multiple sub-types, see below.

name type notes
start_date date yyyy-mm-dd
Registration date period start (inclusive)
end_date date yyyy-mm-dd
Registration date period end (inclusive)
state string Registration UF code
state_ibge_code integer Registration state ibge code
city string Registration city name, if empty then deaths_total are state-wise
city_ibge_code integer Registration city ibge code, if empty then deaths_total are state-wise
xxxxx_total integer Total registrations at date
created_at datetime yyyy-mm-dd hh:mm
Approximated time the request to the server was made

civil_registry_deaths.csv

Scrap of all-cause death registrations

civil_registry_births.csv

Scrap of birth certificates registrations

civil_registry_covid_xxxxx.csv

Scrap of natural-cause deaths at https://transparencia.registrocivil.org.br/especial-covid (from Causas Cardiacas)

Notice : The name covid comes from their panel, actually the table contains natural causes, not only covid deaths.

Daily entries, there are multiple sub-types, see below.

name type notes
date date yyyy-mm-dd
Ocurrence date
state string Ocurrence UF code
state_ibge_code integer Ocurrence state ibge code
city string [optional] Ocurrence city name
city_ibge_code integer [optional] Ocurrence city ibge code
place string [optional] place(s) where the deaths occurred, + separated
(hospital, home, public, others)
gender string [optional] F, M
age_group string [optional] age group
(9-, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99, 100+, NA)
deaths_sars integer Number of SARS deaths (SRAG)
deaths_pneumonia integer Number of pneumonia deaths (PNEUMONIA)
deaths_respiratory_failure integer Number of respiratory failure deaths (INSUFICIENCIA_RESPIRATORIA)
deaths_septicemia integer Number of septicemia deaths (SEPTICEMIA)
deaths_indeterminate integer Number of indeterminate deaths (INDETERMINADA)
deaths_others integer Number of others deaths (OUTRAS)
deaths_covid19 integer Number of COVID-19 only deaths (COVID)
deaths_stroke integer Number of stroke deaths (AVC)
deaths_stroke_covid19 integer Number of stroke deaths with COVID-19 (COVID_AVC)
deaths_cardiopathy integer Number of cardiopathy deaths (CARDIOPATIA)
deaths_cardiogenic_shock integer Number of cardiogenic shock deaths (CHOQUE_CARD)
deaths_heart_attack integer Number of heart attack deaths (INFARTO)
deaths_heart_attack_covid19 integer Number of heart attack deaths with COVID-19 (COVID_INFARTO)
deaths_sudden_cardiac integer Number of sudden cardiac arrest deaths (SUBITA)
created_at datetime yyyy-mm-dd hh:mm
approximated time the data was produced according to the server

Notice: On the site, there are some displayed aggregations:

Name Aggregation
COVID-19 deaths_covid19 + deaths_stroke_covid19 + deaths_heart_attack_covid19
Demais óbitos cardiovasculares deaths_cardiopathy + deaths_cardiogenic_shock + deaths_sudden_cardiac

civil_registry_covid_states.csv

Table (no gender nor age group) for all the 27 brazilian states, since 2018

civil_registry_covid_cities.csv

Table (no gender nor age group) for brazilian cities over 100,000 population and capitals (about 317), since 2018

civil_registry_covid_states_detailed.csv

Table (with gender and age group) for all the 27 brazilian states, since 2019

civil_registry_covid_cities_detailed.csv

Table (with gender and age group) for brazilian cities over 500,000 population and capitals (about 56), since 2019

Notice: Normaly the repo is updated daily, except for the detailed scraps that normally is on a weekly basis (they take more than one day to scrap).

Changelog

2021-04-10

  • Added civil_registry_births.csv containing birth certificates
  • In order to improve the scrapping speed, the detailed covid scraps now reuse data from older scraps, it if detect no data change in broader queries (quarter, monthly, etc). So the created_at columns may reflect older dates since no data actually changed and was reused.

2021-01-16

Now includes year 2021

2020-06-26

Added cardiac causes, 7 more columns, from deaths_stroke to deaths_sudden_cardiac, as committed at cd7a6b3

Notice: COVID-19 deaths are now split in three columns (deaths_covid19, deaths_stroke_covid19, deaths_heart_attack_covid19), see table above.

2020-06-21

Fixes #4 by adding capital cities to detailed, as commited at fbef16

2020-06-13

In order to fix #3, the city of "Brasilia" (ibge_code=5300108) now contains the data for the whole state "DF" (ibge_code =53), as committed at a043e3d3

IBGE codes

https://www.ibge.gov.br/explica/codigos-dos-municipios.php

Licensing

Creative Commons Attribution ShareAlike

Please mention the original source and this repo.

More information, special thanks

brazil-civil-registry-data's People

Contributors

capyvara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

brazil-civil-registry-data's Issues

Include administrative regions in DF

O cartório civil retorna as regiões administrativas em separado: Brasilia, Ceilandia, Gama, Guara, etc.

Temos duas opções:

  • Agregamos tudo como se fosse brasilia
  • Mantemos separado e assumimos que alem de cidades vai entrar regiões administrativas (codigos mais longos)

Add capital cities to detailed scrap

Algumas capitais com menos de 500k pop não estavam entrando na lista, faltando:

TO (Palmas 1721000)
AC (Rio Branco 1200401)
RR (Boa Vista 1400100)
ES (Vitória 3205309)

Detailed scraps taking too long, specially cities

Atualmente os scraps detalhados estão demorando demais, o cities_detailed tem ~151840 queries (52 cidades * 365 dias * 4 locais * 2 sexos), e demora múltiplos dias para concluir, teria que ficar abaixo de ~100k queries (que já seriam várias horas).

É necessário re-tentativas porque o servidor entra numa atualização/manutenção nesse meio tempo.

Opções:

  • Fazer por mês: ~4992 queries
  • Fazer por semana epidemiológica: ~43382 queries (seria necessário fazer 2020 e 2019 separado)
  • Retirar cidades aumentando o cap populacional ou deixando só capitais (ver abaixo), cada cidade tirada reduziria ~2920 queries, teriam que sair 17
  • Retirar locais: ~37960 queries

Seria possível também fracionar, por exemplo deixando um mensal e um por semana epidemiológica.

Row(state='PA', state_ibge_code=15, city_ibge_code=1500800, city='Ananindeua', estimated_population=530598, is_capital=None)
Row(state='GO', state_ibge_code=52, city_ibge_code=5201405, city='Aparecida de Goiânia', estimated_population=578179, is_capital=None)
Row(state='SE', state_ibge_code=28, city_ibge_code=2800308, city='Aracaju', estimated_population=657013, is_capital=1)
Row(state='PA', state_ibge_code=15, city_ibge_code=1501402, city='Belém', estimated_population=1492745, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3300456, city='Belford Roxo', estimated_population=510906, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3106200, city='Belo Horizonte', estimated_population=2512070, is_capital=1)
Row(state='RR', state_ibge_code=14, city_ibge_code=1400100, city='Boa Vista', estimated_population=399213, is_capital=1)
Row(state='DF', state_ibge_code=53, city_ibge_code=5300108, city='Brasília', estimated_population=3015268, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3509502, city='Campinas', estimated_population=1204073, is_capital=None)
Row(state='MS', state_ibge_code=50, city_ibge_code=5002704, city='Campo Grande', estimated_population=895982, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3301009, city='Campos dos Goytacazes', estimated_population=507548, is_capital=None)
Row(state='RS', state_ibge_code=43, city_ibge_code=4305108, city='Caxias do Sul', estimated_population=510906, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3118601, city='Contagem', estimated_population=663855, is_capital=None)
Row(state='MT', state_ibge_code=51, city_ibge_code=5103403, city='Cuiabá', estimated_population=612547, is_capital=1)
Row(state='PR', state_ibge_code=41, city_ibge_code=4106902, city='Curitiba', estimated_population=1933105, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3301702, city='Duque de Caxias', estimated_population=919596, is_capital=None)
Row(state='BA', state_ibge_code=29, city_ibge_code=2910800, city='Feira de Santana', estimated_population=614872, is_capital=None)
Row(state='SC', state_ibge_code=42, city_ibge_code=4205407, city='Florianópolis', estimated_population=500973, is_capital=1)
Row(state='CE', state_ibge_code=23, city_ibge_code=2304400, city='Fortaleza', estimated_population=2669342, is_capital=1)
Row(state='GO', state_ibge_code=52, city_ibge_code=5208707, city='Goiânia', estimated_population=1516113, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3518800, city='Guarulhos', estimated_population=1379182, is_capital=None)
Row(state='PE', state_ibge_code=26, city_ibge_code=2607901, city='Jaboatão dos Guararapes', estimated_population=702298, is_capital=None)
Row(state='PB', state_ibge_code=25, city_ibge_code=2507507, city='João Pessoa', estimated_population=809015, is_capital=1)
Row(state='SC', state_ibge_code=42, city_ibge_code=4209102, city='Joinville', estimated_population=590466, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3136702, city='Juiz de Fora', estimated_population=568873, is_capital=None)
Row(state='PR', state_ibge_code=41, city_ibge_code=4113700, city='Londrina', estimated_population=569733, is_capital=None)
Row(state='AP', state_ibge_code=16, city_ibge_code=1600303, city='Macapá', estimated_population=503327, is_capital=1)
Row(state='AL', state_ibge_code=27, city_ibge_code=2704302, city='Maceió', estimated_population=1018948, is_capital=1)
Row(state='AM', state_ibge_code=13, city_ibge_code=1302603, city='Manaus', estimated_population=2182763, is_capital=1)
Row(state='RN', state_ibge_code=24, city_ibge_code=2408102, city='Natal', estimated_population=884122, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3303302, city='Niterói', estimated_population=513584, is_capital=None)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3303500, city='Nova Iguaçu', estimated_population=821128, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3534401, city='Osasco', estimated_population=698418, is_capital=None)
Row(state='TO', state_ibge_code=17, city_ibge_code=1721000, city='Palmas', estimated_population=299127, is_capital=1)
Row(state='RS', state_ibge_code=43, city_ibge_code=4314902, city='Porto Alegre', estimated_population=1483771, is_capital=1)
Row(state='RO', state_ibge_code=11, city_ibge_code=1100205, city='Porto Velho', estimated_population=529544, is_capital=1)
Row(state='PE', state_ibge_code=26, city_ibge_code=2611606, city='Recife', estimated_population=1645727, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3543402, city='Ribeirão Preto', estimated_population=703293, is_capital=None)
Row(state='AC', state_ibge_code=12, city_ibge_code=1200401, city='Rio Branco', estimated_population=407319, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3304557, city='Rio de Janeiro', estimated_population=6718903, is_capital=1)
Row(state='BA', state_ibge_code=29, city_ibge_code=2927408, city='Salvador', estimated_population=2872347, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3547809, city='Santo André', estimated_population=718773, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3548708, city='São Bernardo do Campo', estimated_population=838936, is_capital=None)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3304904, city='São Gonçalo', estimated_population=1084839, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3549904, city='São José dos Campos', estimated_population=721944, is_capital=None)
Row(state='MA', state_ibge_code=21, city_ibge_code=2111300, city='São Luís', estimated_population=1101884, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3550308, city='São Paulo', estimated_population=12252023, is_capital=1)
Row(state='ES', state_ibge_code=32, city_ibge_code=3205002, city='Serra', estimated_population=517510, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3552205, city='Sorocaba', estimated_population=679378, is_capital=None)
Row(state='PI', state_ibge_code=22, city_ibge_code=2211001, city='Teresina', estimated_population=864845, is_capital=1)
Row(state='MG', state_ibge_code=31, city_ibge_code=3170206, city='Uberlândia', estimated_population=691305, is_capital=None)
Row(state='ES', state_ibge_code=32, city_ibge_code=3205309, city='Vitória', estimated_population=362097, is_capital=1)

Add skin color

Agora tem as opções:
(Indiferente) ou Amarela, Branca, Ignorada, Indigena, Parda, Preta

Isso multiplicaria as permutações em 6x, inviável para os detailed, precisamos achar alguma opção.

Podemos agregar algumas? (ex: Parda e Preta)
Quão importante é esse dado?
Idéias?

2019 data in covid-specific files?

Hi Marcelo,

Thank you so much for providing this valuable resource! I noticed that in the file "civil_registry_covid_cities_detailed.csv" there are deaths listed for 2019, but of course COVID wasn't around in 2019. Do you have any idea why that might be? I know you are not the curator of these data, just thought you might have some easily available insight.

A little about myself, in case you're curious: I am an infectious disease researcher and epidemiologist working on understanding the transition of SARS-CoV-2 to endemicity (e.g., see our latest paper https://doi.org/10.1126/science.abe6522). I am currently working through hypotheses as to what may be causing the second wave in Manaus and having age-stratified excess mortality data would be hugely useful. Getting a better grasp on the details here may be of great import.

Thanks for any insight you might be able to provide!

With gratitude,
Jennie Lavine

Dúvidas sobre os dados extraídos do Registro Civil

Oi meu caro, tudo bem? Eu vi os dados que você está extraindo da base de registro civil sobre Covid-19 e gostaria de tirar algumas dúvidas com você. Você saberia me informar porque tem a divisão das mortes de Covid-19 em três variáveis:

  • deaths_stroke_covid19 - Number of stroke deaths with Covid-19;
  • deaths_heart_attack_covid19 - Number of heart attack deaths with Covid-19; e
  • deaths_covid19.
    Para encontrar o número de mortes por Covid-19 de cada região é preciso somar as três variáveis acima? É possível conseguir a mortalidade por Covid-19 associada a outros fatores como diabetes? Abraços, parabéns pelo trabalho e muito obrigado pela atenção.

Extract 2018 data

Check cost/benefit

There are some issues with RJ yet, so that is still being updated.

Add home city

Agora tem um campo "cidade_id_tipo": "cityFalecimento", parece que vão adicionar a opcao de cidade de residência.

Quão importante é esse dado, faz sentido colocar no mesmo .csv?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.