Comments (28)
I've downloaded the 2020 layer from here. https://www.nass.usda.gov/Research_and_Science/Cropland/Release/index.php
from multi-temporal-crop-classification-training-data.
@mcecil I've the data in #1. I will crop this to our AOI and send you the final GeoTIFF with the bounding boxes for each chip.
One thing I need though is a sample HLS file. We need to project these two datasets to the same CRS. Let's discuss in our meeting.
from multi-temporal-crop-classification-training-data.
We've decided that we will use the CDL crs for all geospatial layers.
- Hamed will create chip boundaries (geojson) based on the CDL layer.
- Mike will include a raster transform to project HLS data to the CDL crs. This will occur during the step when converting from HDF to single-layer TIF.
from multi-temporal-crop-classification-training-data.
Following the update in #4 we will use this ticket for chip generation:
- Loop over chip aois
- Load all three scenes of HLS and clip them to the target chip aoi
- Export one file per time with all bands merged together.
- Clip the CDL for the corresponding chip aoi and export it as a tif as well.
from multi-temporal-crop-classification-training-data.
other questions:
- do we want to scale reflectance values?
- do we want other bands? (like NIR?) I think band 5 is red-edge
from multi-temporal-crop-classification-training-data.
- no scaling of reflectance values. Let's keep them as integers (float values will take much more disk space to store).
- Good catch on the bands. It was my bad to suggest band
B05
. Let's go withB08
for now which is NIR. We can only feed 4 bands at this time.
from multi-temporal-crop-classification-training-data.
@mcecil two updates:
- Following our discussion today, I reviewed the percentage of pixels with various unacceptable QA flags. Given the noise in the QA band, I suggest we only discard a scene (and consequently the chip) if any of the unacceptable flags is present in one scene (per time) more than 5% of pixels. So don't look at this cumulatively over time or across all flags. Only individual flags and at each time.
- Let's use the following QA values as accepted ones (I have also pasted the code that I used to derive this).
[0, 4, 32, 36, 64, 68, 96, 100, 128, 132, 160, 164, 192, 196, 224, 228]
import nasa_hls
qa_table = nasa_hls.get_qa_look_up_table()
qa_table = qa_table[~qa_table["cloud"]]
qa_table = qa_table[~qa_table["cirrus"]]
qa_table = qa_table[~qa_table["snow"]]
qa_table = qa_table[~qa_table["cloud_shadow"]]
qa_table.index
from multi-temporal-crop-classification-training-data.
output file names:
chips/"chip_"merged.tif
chips/"chip".mask.tif
chips_qa/"chip_"_qa.tif
check for band values for -1000 for bad values. If any pixels are bad, then discard chip.
any negative values per band get converted to 0.
from multi-temporal-crop-classification-training-data.
Mike to create binary output chips in separate folder, "chips_binary". This will include both HLS and binary crop/non-crop.
from multi-temporal-crop-classification-training-data.
CDL crop classes [1,2,3,4,5,6,10,11,12,13,14,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,66,67,68,69,70,71,72,74,75,76,77,92,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,236,237,238,240,241,242,243,244,245,246,247,248,249,250,254]
from multi-temporal-crop-classification-training-data.
@HamedAlemo the na counts for each chip. it looks like half of the chips have at least some NA values.
from multi-temporal-crop-classification-training-data.
In the one example I looked at, the NA value was in a very dark area (cloud shadow?) so cropping the band value to 0 (current output) might be reasonable.
from multi-temporal-crop-classification-training-data.
Another example, with QA flag.
We would remove 14 of 30 chips based on the 5% QA threshold.
Of the remaining 16 chps, 5 have some NA values.
from multi-temporal-crop-classification-training-data.
@mcecil the 16 out 30 for QA fags is reasonable. I had almost the same number when I picked 5%. For now let's go with the most restrictive option to generate a v1 of the dataset and drop all chip that have NaN values. You should clip the any pixel that is negative (not -1000 though) to 0, but no data should be kept as now data until we better understand why this is so common.
from multi-temporal-crop-classification-training-data.
Well, this means any chips that have no data, we will just drop, correct?
So there is no question of keeping pixels with no data, we discard them entirely.
from multi-temporal-crop-classification-training-data.
Yes.
from multi-temporal-crop-classification-training-data.
Great, just checked 3 tiles, and we would now keep 84 of 102, so a bit better.
Also, do we need to keep all the HDF's? This is taking a lot of space and will soon consume the C drive. If I just keep the 3 HDF's we use it should be a lot better.
from multi-temporal-crop-classification-training-data.
Oh no, delete all the extra HDFs.
from multi-temporal-crop-classification-training-data.
Ok great, thought we might be saving them for later.
from multi-temporal-crop-classification-training-data.
We also need to update the hls_hdf_to_cog.py script to include 'QA' bands. I've been changing this manually. This script is in the Docker file.
from multi-temporal-crop-classification-training-data.
Summarizing outstanding issues:
- Update hls_hdf_to_cog.py script to include QA bands.
- Check on chips that are assigned tile '01SBU' .
- Consider what to do with HLS tiles that do not have 3 images with 100% spatial coverage and < 5% cloud cover. (skipping for now)
- The 'workflow' notebook does not run well when you have to stop and start again, and when you only run a subset of tiles. I need to fix this.
- Need to calculate per-band mean, sd across all chips.
from multi-temporal-crop-classification-training-data.
Mike tasks
- clean up repo DONE
- implement reducing spatial coverage threshold 100 to 90 to 80 etc. DONE
- incude per tile tracking for the 5 tasks (hdf download, tif conversion, tif reprojection, chipping, filtering) DONE
- continue to exclude chips that have any NA values DONE
- means and standard deviations should be pooled for all dates, and calculated per band (so 4 total) . DONE
- fix chip tracking DONE
- record dates of images for chips DONE
update readme with
- instructions to run workflow notebook DONE
- checking HLS tiles for weird things like "01SBU" DONE
- add section called "Assumptions" (separate from "Instructions") that includes logic for chip generation DONE
- anything else unclear
@HamedAlemo tasks:
- Update hls_hdf_to_cog.py script to include QA bands.
- Confirm means and standard deviations are per band (4 vs 12) in Thursday meeting
from multi-temporal-crop-classification-training-data.
@mcecil I submitted a PR for the QA band naming fix in hls_hdf_to_cog.py
(here). I will let you know when it's merged, and then you should be able to rerun your container and it automatically pulls the updated code.
from multi-temporal-crop-classification-training-data.
Ok I've updated the scripts as mentioned, and will push this to Github today.
Things I have not done:
- changed code to download TIFs instead of HDF files. Because we use HDF metadata for cloud coverage and spatial coverage, I'm not sure if we actually want to download TIFs directly.
from multi-temporal-crop-classification-training-data.
Sounds good, thanks @mcecil . We can address the the new TIFs instead of HDFs this week.
from multi-temporal-crop-classification-training-data.
It seems likely that we will have to use a different band number depending on what HLS product we are using.
The product HLSL30.002 has to use band 5 because there is no band 8 and the infrared band (NIR narrow) is equivalent to band 5, while the product HLSS30.002 has to use band 8A.
There is some ambiguity regarding Band 8 and Band 8A selection. Sentinel 2 Band 8A offers a spectral range that is fully compatible with Landsat 8 Band 5. So, I think Band 8A should be used.
from multi-temporal-crop-classification-training-data.
https://lpdaac.usgs.gov/resources/e-learning/getting-started-cloud-native-hls-data-python/
from multi-temporal-crop-classification-training-data.
Thanks @kordi1372 . We will use the HLSS30.002 product (which has bands based on Sentinel-2 sensor). So whatever is the NIR band in HLSS30.002 we need to select that which seems to be B8A as indicated in the link you shared.
from multi-temporal-crop-classification-training-data.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multi-temporal-crop-classification-training-data.