owid / co2-data Goto Github PK

View Code? Open in Web Editor NEW

634.0 634.0 216.0 148.04 MB

Data on CO2 and greenhouse gas emissions by Our World in Data

Home Page: https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions

Python 100.00%

co2-emissions energy environment greenhouse-gas-emissions

co2-data's People

Contributors

Stargazers

Watchers

Forkers

mfila beeb22 arnebab pradnya1208 jens-klenke pyouslater groundhogstate pmgarafola joseph-s-shapiro krueschan delaneyedwards ashleyzhao8 big-ocean adindkn eduardez felixkotu pytpan cierra4prez florence26 peregrinethan deanmalmgren kennkho nicoroach lwong121 sarah-dodamead nalhasan lcareycsis seoh-kim venerass tomkourou yuanjinren saharraihani linetonthat sonthuybacha hah323 yuriwg loveunity huginnhuginn rnttl marimor62 stjordanis lamaoudi adityakeshan rishiraj9211 cpsuperstore wizardshowing megcho dda-oo laia-ac cxzhangqi smwaziri k0al4 tanzunquan yuzhangnju braddoty rchardptrsn lindangulopez kamal0013 environmental-predictions franklinsantosm weixinedu rickodell kwangg7336 bridgetleonard2 shamalanrajesvaran tdao09 mahmoud-taya leonardasrdelic phan206 verenawi jorge-hercas davincee clementhelsens m-liu07 kavitshah98 jtorb21 adimeamk leguizamonluciano jxnsw mrxganesh pakawanpim akashv12 juniordevresearcher jessica-wyleung georgeoduor crankappel otaviodefilpo sajib-17 petrocardona zhufanpo elindquist2713 zxqing9711 jonyjesus18 1huang kcameron1223 theleadio wlmszdnb ollawone ryle68 aianatorgoeva

co2-data's Issues

Updates

I would like to know if this dataset is updated and how often get updates?

I saw in the changelog:
April 15, 2022:
Updated primary energy consumption data.
Updated CO2 data to include aggregations for the different country income levels.

However, when I look into the JSON file i cant see anything after 2020. am I missing someghing?

One more question, Is it posible to get live data for CO2 Emissions?

Eswatini doesn't have co2 per capita data

Not sure if there's a good reason for this, but it appears that Eswatini has data for both co2 and population, but not co2 per capita? Is this as simple as a division? (looks like it happens here: https://github.com/owid/co2-data/blob/master/scripts/co2_emissions.py#L230-L234)
The same omissions are in the csv and xlsx files

Edit: co2 per gdp is also missing

Update Global Carbon Project to 2020 data

The Global Carbon Project released in late 2020 a new update of their data, used in this repo as co2-data/scripts/input/co2/co2_gcp.xlsx

Our input should be updated to the 2020 version, and the pipeline should be re-run.

CO2 per capita reporting constant zero

Hi there,
The co2_per_capita column has the value of zero across all countries/regions and years in both the csv and xls files. I checked the python script but with my limited python debugging skills I didn't see anything obvious.
Cheers,
krueschan.

consumption co2 for Russia missing

Hi,

consumption co2 data in the dataset is missing for Russia even though it is known on the OWID website.

thanks for the great work!

Add Global Carbon Project Land-use chance Co2 emissions

Could you please add the land-use change co2 emissions through 2020 to the owid-co2-data.csv file?
This OWID page has them in the green line of the plot:
https://ourworldindata.org/co2-emissions

However, I am unable to find this data. Would be ideal to have a new column added to this file. thank you very much.

population for north america is wrong

population.csv:
...
North America,2019,366600992
...

that is roughly the population of the USA and Canada. North America is a lot more countries than that

Is there any timeline with regard to the release of 2022 data?

Incorrect Mapping of Population

Issue

By looking at the CO2 data explorer and the raw data, I observed that the region South America does not seem to have per-capita data which is possibly linked to incorrect raw data or reading raw data incorrectly.

Potential Cause

The issue might be due to a missing "population" value for this region. Indeed, when loading the population of countries and regions, the corresponding file in input/shared/population.csv does not contain an entry for South America. On the other hand, it contains entries for Latin America, which look complete to me as seen in the following snippet.

Latin America,2015,623934016
Latin America,2016,630145024
Latin America,2017,636233024
Latin America,2018,642217024
Latin America,2019,648121024

Moreover, when adding the population numbers for all regions up, the world population for 2019 lies around 7.6 billion people, i.e. when including Latin America, which seems valid to me.

Potential Solution

Changing Latin America to South America in input/shared/population.csv and running make-dataset.py should lead to per-capita data for South America as region. Alternatively, one can add a mapping when loading the population data and combining it with other dataframes. With those two options and potential other side-effects, I open this as an issue instead of a pull request.

Updating the database with the latest data

Hello there,
On the website of global carbon project, you can see the 2019 carbon budget has been released for a long time now.
When will that be merged into this dataset along with other parameters/values like energy consumption or methane emissions?

GHG update from CAIT

Dear OWD team,

The CAIT has data until 2018. Would it be possible to update the GHG total (total_ghg, ghg_per_capita) values? Current OWD dataset only goes until 2016.

Thanks

Potential performance issue: concat slow in pandas below 2.1 version

Issue Description:

Hello.
I have discovered a performance degradation in the .concat function of pandas version 1.5.2. And I notice the repository depends on pandas 1.5.2 in scripts/requirements.txt. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #50652 and #52685.
I also found that scripts/make_dataset.py used the influenced api. There may be more files using the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .concat.
Any other workarounds or solutions would be greatly appreciated.
Thank you!

Drop in most recent row of continental aggregates in GCP data

Variable trade_co2, imported from Annual CO2 emissions embedded in trade in our GCP grapher dataset, shows a drop of -20241964122 for Asia in 2019. This is potentially the case for other variables as well, and seem to affect all continental aggregates for the last calculated year.

Please add Land-use co2 emissions file from OWID to repository

Hello, related to #25 (comment)

Could you please kindly add the separate file: global-co2-fossil-plus-land-use.csv from:
https://ourworldindata.org/co2-emissions to this repository?

This would enable the file to be called from code and tracked. Thanks in advance, I hope this addresses your helpful comment.

gdp data question

Thank you for this resource.

The codebook indicates that the gdp data is: Total real gross domestic product, inflation-adjusted.

But it is not clear if the gdp data is in the same currency and different countries can be added. Could you please clarify if all the gdp data is in same currency (as well base year)?

Also, the latest GDP data is from 2016. Would it be possible to get a more recent update?

thank you

known quantity not filled in

owid_co2-data.csv

I don't understand this. The co2_growth_prct is shown as ,, for 1952, even though it must be 0, according to the co2 value for 1951 and 1952. same happens for 1953 to 1954. I would expect empty values ,, to mean NULL or unknown, but not 0 (zero).

iso_code,country,year,co2,co2_growth_prct,co2_growth_abs,consumption_co2,trade_co2,trade_co2_share,co2_per_capita,consumption_co2_per_capita,share_global_co2,cumulative_co2,share_global_cumulative_co2,co2_per_gdp,consumption_co2_per_gdp,co2_per_unit_energy,cement_co2,coal_co2,flaring_co2,gas_co2,oil_co2,other_industry_co2,cement_co2_per_capita,coal_co2_per_capita,flaring_co2_per_capita,gas_co2_per_capita,oil_co2_per_capita,other_co2_per_capita,share_global_coal_co2,share_global_oil_co2,share_global_gas_co2,share_global_flaring_co2,share_global_cement_co2,cumulative_coal_co2,cumulative_oil_co2,cumulative_gas_co2,cumulative_flaring_co2,cumulative_cement_co2,share_global_cumulative_coal_co2,share_global_cumulative_oil_co2,share_global_cumulative_gas_co2,share_global_cumulative_flaring_co2,share_global_cumulative_cement_co2,total_ghg,ghg_per_capita,methane,methane_per_capita,nitrous_oxide,nitrous_oxide_per_capita,primary_energy_consumption,energy_per_capita,energy_per_gdp,population,gdp
AFG,Afghanistan,1949,0.015,,,,,,0.002,,0,0.015,0,,,,,0.015,,,,,,0.002,,,,,0,,,,,0.015,,,,,0,,,,,,,,,,,,,,7663783,
AFG,Afghanistan,1950,0.084,475,0.07,,,,0.011,,0.001,0.099,0,0.004,,,,0.021,,,0.063,,,0.003,,,0.008,,0.001,0.004,,,,0.036,0.063,,,,0,0,,,,,,,,,,,,,7752000,19494799360
AFG,Afghanistan,1951,0.092,8.696,0.007,,,,0.012,,0.001,0.191,0,0.005,,,,0.026,,,0.066,,,0.003,,,0.008,,0.001,0.004,,,,0.061,0.129,,,,0,0,,,,,,,,,,,,,7840000,20063848448
AFG,Afghanistan,1952,0.092,,,,,,0.012,,0.001,0.282,0,0.004,,,,0.032,,,0.06,,,0.004,,,0.008,,0.001,0.003,,,,0.093,0.189,,,,0,0.001,,,,,,,,,,,,,7936000,20742350848
**AFG,Afghanistan,1953,0.106,16,**0.015,,,,0.013,,0.002,0.388,0,0.005,,,,0.038,,,0.068,,,0.005,,,0.008,,0.001,0.003,,,,0.131,0.257,,,,0,0.001,,,,,,,,,,,,,8040000,22015463424
AFG,Afghanistan,1954,0.106,,,,,,0.013,,0.002,0.495,0,0.005,,,,0.043,,,0.064,,,0.005,,,0.008,,0.001,0.003,,,,0.174,0.321,,,,0,0.001,,,,,,,,,,,,,8151000,22483329024
AFG,Afghanistan,1955,0.154,44.828,0.048,,,,0.019,,0.002,0.649,0,0.007,,,,0.062,,,0.092,,,0.008,,,0.011,,0.001,0.004,,,,0.236,0.413,,,,0,0.001,,,,,,,,,,,,,8271000,22929889280
....

Negative co2 emission values

The current database contains some negative values in the co2 column, which cannot be correct in my understanding.
See the following query (written in R):

library(dplyr)
OWID.CO2 = read.csv("https://github.com/owid/co2-data/raw/master/owid-co2-data.csv")
df = OWID.CO2 %>% subset(co2<0) %>% select(iso_code, country, year, co2)
df
iso_code country year co2
10621 IRQ Iraq 1948 -0.095
11681 KWT Kuwait 1952 -0.436
11682 KWT Kuwait 1953 -0.051
18235 SAU Saudi Arabia 1951 -0.348
18236 SAU Saudi Arabia 1952 -0.172
18314 SEN Senegal 1968 -0.081
22945 VEN Venezuela 1930 -1.165
22946 VEN Venezuela 1931 -0.256

Missing data when running `co2_emissions.py`

It looks like the changes made in February may have broken something in co2_emissions.py: when running the script, the generated output (co2_emissions.csv) has many missing columns.

Potential issues: Czechoslovakia disaggregation, long-term population series

External feedback:

I was going to ask about countries such as Czechoslovakia, which are present in the underlying CDIAC dataset, but which you have disaggregated into Czechia and Slovakia…

-> Check if/how the disaggregation is performed in our current data, and whether everything is fine.

I saw that some countries in your co2 dataset don't have population data all the way back in time eg China's population record only goes back to 1899 – but I found another dataset where you have figures for China's population (and other countries) back to 1800…?

-> We should have all data for populations back to 1800 (up-to-date time series: https://ourworldindata.org/grapher/population)

ussr in 2015?

continents.csv
.....
USSR,,2015,Europe (excl. EU-28)
USSR,,2015,Europe (excl. EU-28)

there is no ussr in 2015, but maybe one of these should be excl. EU-27??

Readme link to "Our World in Data standard entity names" is broken

The link to "Our World in Data standard entity names" in the readme file is broken. It leads to: https://github.com/owid/co2-data/blob/master/scripts/input/shared/continents.csv

Not sure where it should direct instead, otherwise would have offered a PR.