Using the tiles defined <a href="https://github.com/ClarkCGA/cdl_training_data/issues/

I've downloaded the 2020 layer from here. <a href="https://www.nass.usda.gov/Research_

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Following the update in <a href="https://github.com/ClarkCGA/cdl_training_data/issues/

other questions: do we want to scale reflectance values?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Chip and export CDL and HLS about multi-temporal-crop-classification-training-data HOT 28 CLOSED

clarkcga commented on July 20, 2024

Chip and export CDL and HLS

from multi-temporal-crop-classification-training-data.

Comments (28)

mcecil commented on July 20, 2024

I've downloaded the 2020 layer from here. https://www.nass.usda.gov/Research_and_Science/Cropland/Release/index.php

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

@mcecil I've the data in #1. I will crop this to our AOI and send you the final GeoTIFF with the bounding boxes for each chip.
One thing I need though is a sample HLS file. We need to project these two datasets to the same CRS. Let's discuss in our meeting.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

We've decided that we will use the CDL crs for all geospatial layers.

Hamed will create chip boundaries (geojson) based on the CDL layer.
Mike will include a raster transform to project HLS data to the CDL crs. This will occur during the step when converting from HDF to single-layer TIF.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

Following the update in #4 we will use this ticket for chip generation:

Loop over chip aois
Load all three scenes of HLS and clip them to the target chip aoi
Export one file per time with all bands merged together.
Clip the CDL for the corresponding chip aoi and export it as a tif as well.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

other questions:

do we want to scale reflectance values?
do we want other bands? (like NIR?) I think band 5 is red-edge

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

@mcecil

no scaling of reflectance values. Let's keep them as integers (float values will take much more disk space to store).
Good catch on the bands. It was my bad to suggest band B05. Let's go with B08 for now which is NIR. We can only feed 4 bands at this time.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

@mcecil two updates:

Following our discussion today, I reviewed the percentage of pixels with various unacceptable QA flags. Given the noise in the QA band, I suggest we only discard a scene (and consequently the chip) if any of the unacceptable flags is present in one scene (per time) more than 5% of pixels. So don't look at this cumulatively over time or across all flags. Only individual flags and at each time.
Let's use the following QA values as accepted ones (I have also pasted the code that I used to derive this).
[0, 4, 32, 36, 64, 68, 96, 100, 128, 132, 160, 164, 192, 196, 224, 228]

import nasa_hls
qa_table = nasa_hls.get_qa_look_up_table()
qa_table = qa_table[~qa_table["cloud"]]
qa_table = qa_table[~qa_table["cirrus"]]
qa_table = qa_table[~qa_table["snow"]]
qa_table = qa_table[~qa_table["cloud_shadow"]]
qa_table.index

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

output file names:
chips/"chip_"merged.tif
chips/"chip".mask.tif
chips_qa/"chip_"_qa.tif

check for band values for -1000 for bad values. If any pixels are bad, then discard chip.

any negative values per band get converted to 0.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Mike to create binary output chips in separate folder, "chips_binary". This will include both HLS and binary crop/non-crop.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

CDL crop classes [1,2,3,4,5,6,10,11,12,13,14,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,66,67,68,69,70,71,72,74,75,76,77,92,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,236,237,238,240,241,242,243,244,245,246,247,248,249,250,254]

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

@HamedAlemo the na counts for each chip. it looks like half of the chips have at least some NA values.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

In the one example I looked at, the NA value was in a very dark area (cloud shadow?) so cropping the band value to 0 (current output) might be reasonable.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Another example, with QA flag.

We would remove 14 of 30 chips based on the 5% QA threshold.

Of the remaining 16 chps, 5 have some NA values.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

@mcecil the 16 out 30 for QA fags is reasonable. I had almost the same number when I picked 5%. For now let's go with the most restrictive option to generate a v1 of the dataset and drop all chip that have NaN values. You should clip the any pixel that is negative (not -1000 though) to 0, but no data should be kept as now data until we better understand why this is so common.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Well, this means any chips that have no data, we will just drop, correct?

So there is no question of keeping pixels with no data, we discard them entirely.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

Yes.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Great, just checked 3 tiles, and we would now keep 84 of 102, so a bit better.

Also, do we need to keep all the HDF's? This is taking a lot of space and will soon consume the C drive. If I just keep the 3 HDF's we use it should be a lot better.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

Oh no, delete all the extra HDFs.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Ok great, thought we might be saving them for later.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

We also need to update the hls_hdf_to_cog.py script to include 'QA' bands. I've been changing this manually. This script is in the Docker file.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Summarizing outstanding issues:

Update hls_hdf_to_cog.py script to include QA bands.
Check on chips that are assigned tile '01SBU' .
Consider what to do with HLS tiles that do not have 3 images with 100% spatial coverage and < 5% cloud cover. (skipping for now)
The 'workflow' notebook does not run well when you have to stop and start again, and when you only run a subset of tiles. I need to fix this.
Need to calculate per-band mean, sd across all chips.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Mike tasks

clean up repo DONE
implement reducing spatial coverage threshold 100 to 90 to 80 etc. DONE
incude per tile tracking for the 5 tasks (hdf download, tif conversion, tif reprojection, chipping, filtering) DONE
continue to exclude chips that have any NA values DONE
means and standard deviations should be pooled for all dates, and calculated per band (so 4 total) . DONE
fix chip tracking DONE
record dates of images for chips DONE

update readme with

instructions to run workflow notebook DONE
checking HLS tiles for weird things like "01SBU" DONE
add section called "Assumptions" (separate from "Instructions") that includes logic for chip generation DONE
anything else unclear

@HamedAlemo tasks:

Update hls_hdf_to_cog.py script to include QA bands.
Confirm means and standard deviations are per band (4 vs 12) in Thursday meeting

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

@mcecil I submitted a PR for the QA band naming fix in hls_hdf_to_cog.py (here). I will let you know when it's merged, and then you should be able to rerun your container and it automatically pulls the updated code.

from multi-temporal-crop-classification-training-data.

mcecil commented on July 20, 2024

Ok I've updated the scripts as mentioned, and will push this to Github today.

Things I have not done:

changed code to download TIFs instead of HDF files. Because we use HDF metadata for cloud coverage and spatial coverage, I'm not sure if we actually want to download TIFs directly.

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

Sounds good, thanks @mcecil . We can address the the new TIFs instead of HDFs this week.

from multi-temporal-crop-classification-training-data.

kordi1372 commented on July 20, 2024

It seems likely that we will have to use a different band number depending on what HLS product we are using.
The product HLSL30.002 has to use band 5 because there is no band 8 and the infrared band (NIR narrow) is equivalent to band 5, while the product HLSS30.002 has to use band 8A.
There is some ambiguity regarding Band 8 and Band 8A selection. Sentinel 2 Band 8A offers a spectral range that is fully compatible with Landsat 8 Band 5. So, I think Band 8A should be used.

from multi-temporal-crop-classification-training-data.

kordi1372 commented on July 20, 2024

https://lpdaac.usgs.gov/resources/e-learning/getting-started-cloud-native-hls-data-python/

from multi-temporal-crop-classification-training-data.

HamedAlemo commented on July 20, 2024

Thanks @kordi1372 . We will use the HLSS30.002 product (which has bands based on Sentinel-2 sensor). So whatever is the NIR band in HLSS30.002 we need to select that which seems to be B8A as indicated in the link you shared.

from multi-temporal-crop-classification-training-data.

Chip and export CDL and HLS about multi-temporal-crop-classification-training-data HOT 28 CLOSED

Comments (28)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent