serapieum-of-alex / pyramids Goto Github PK

View Code? Open in Web Editor NEW

4.0 2.0 0.0 23 MB

Geospatial data package

Home Page: https://github.com/Serapieum-of-alex/pyramids

License: GNU General Public License v3.0

Python 100.00%

geospatial-data gis python raster vector gdal geopandas ogr

pyramids's Introduction

Current release info

Name	Downloads	Version	Platforms

pyramids - GIS utility package

pyramids is a GIS utility package using gdal, ....

pyramids

Main Features

GIS modules to enable the modeler to fully prepare the meteorological inputs and do all the preprocessing needed to build the model (align rasters with the DEM), in addition to various methods to manipulate and convert different forms of distributed data (rasters, NetCDF, shapefiles)

Future work

Developing a DEM processing module for generating the river network at different DEM spatial resolutions.

Installing pyramids

Installing pyramids from the conda-forge channel can be achieved by:

conda install -c conda-forge pyramids=0.7.0

It is possible to list all the versions of pyramids available on your platform with:

conda search pyramids --channel conda-forge

Install from GitHub

to install the last development to time, you can install the library from GitHub

pip install git+https://github.com/Serapieum-of-alex/pyramids

pip

to install the last release, you can easily use pip

pip install pyramids-gis==0.7.0

Quick start

  >>> import pyramids

pyramids's People

Contributors

Stargazers

Watchers

pyramids's Issues

upload the precip.1979.nc file

upload the examples/data/precip.1979.nc file to figshare and write an API code to retrieve it from there

https://colab.research.google.com/drive/1r2srfFvFQ6Suc_6N9hIuTFHfUMcaPYc2?usp=sharing#scrollTo=impressed-guarantee

add getitem and setitem to the Dataset object

the Dataset object has two sub groups that can be iterated on the subset (of variables as in netcdf files) or the bands.

so decide which one to iterate on using the getitem and setitem, and implement it

[pyramids] pyramids/dataset.py (Lines 87-88)


self.subsets = src.GetSubDatasets()
        self._variables = self.get_variables()

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

inplace parameter for the crop method

add a parameter to replace the object with the new cropped raster

[pyramids] pyramids/dataset.py (Line 2395)


def crop(self, mask: Union[GeoDataFrame, FeatureCollection]):

Open in IDE · Open on GitHub
[pyramids] pyramids/dataset.py (Lines 3762-3764)


def crop(
        self, mask: Union[Dataset, str], inplace: bool = False
    ) -> Union[None, Dataset]:

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

Extent of cropped raster is wrong

Describe the bug
the extent of the cropped raster is not correct, the source raster covers the entire world and when I crop it using a polygon the values in the raster are cropped correctly but the extent of the raster is not correct, and when trying to calculate stats an error appears that no valid pixels found.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
raster to be cropped: 02_RFCF.zip
mask: coello-basin-extended.zip

resulted raster: mmmmmmmm.zip
2. Click on '....'

dataset = Dataset.read_file("02_RFCF.tif")
dataset.crop(gdf, inplace=True)
vals = dataset.stats()

the following error appears

File "C:\Miniconda3\envs\hapi-new\Lib\site-packages\pyramids\dataset.py", line 576, in stats
    df.iloc[i, :] = self._get_stats(i)
                    ^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\hapi-new\Lib\site-packages\pyramids\dataset.py", line 593, in _get_stats
    vals = band_i.GetStatistics(True, True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\hapi-new\Lib\site-packages\osgeo\gdal.py", line 5037, in GetStatistics
    return _gdal.Band_GetStatistics(self, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to compute statistics, no valid pixels found in sampling.

Debugging

vals = dataset.stats()
f = dataset.read_array()
f 
array([[-999999., -999999., -999999., ..., -999999., -999999., -999999.],
       [-999999., -999999., -999999., ..., -999999., -999999., -999999.],
       [-999999., -999999., -999999., ..., -999999., -999999., -999999.],
       ...,
       [-999999., -999999., -999999., ..., -999999., -999999., -999999.],
       [-999999., -999999., -999999., ..., -999999., -999999., -999999.],
       [-999999., -999999., -999999., ..., -999999., -999999., -999999.]],
      dtype=float32)
dataset.to_file("mmmmmmmm.tif")

Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
the source raster covers the whole world

Here is the cropped raster, the arrow points at the new domain

fix error in documentation build

recheck the configuration of the readthedocs.yml from https://docs.readthedocs.io/en/stable/config-file/v2.html

[pyramids] .readthedocs.yml (Lines 3-15)


# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

conda:
  environment: docs/environment.yml
# Build documentation in the docs/ directory with Sphinx
sphinx:
  configuration: docs/conf.py
#Build documentation with MkDocs
#mkdocs:
#  configuration: mkdocs.yml

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

smooth DEM

Smooth DEM
a function that takes the dem and smooths the values based on a moving window and a statistical function

Window

could be defined by the number of cells or defined by distance (i.e. 1km in each direction)
Statistical function
use the mean/median of all the cells in the window and assign it to the cell

move Hapi related functionality to Hapi repository

the functions convert_flow_direction_to_cell_indices, flow_direction_index and flow_direction_table converte the flow direction indices (1, 2, 4, 8, 16, 32, 64, 128) into the down stream cells and then generate the flow direction table.

these functions are very specific for Hapi and should be moved to Hapi

[pyramids] pyramids/dem.py (Lines 168-306)


def convert_flow_direction_to_cell_indices(self):
        """D8 method generates flow direction raster from DEM and fills sinks.

        Returns
        -------
        flow_direction_cell: [numpy array]
            with the same dimensions of the raster and 2 layers
            first layer for rows index and second rows for column index
        """
        no_columns = self.columns
        no_rows = self.rows

        flow_direction = self.flow_direction()

        # convert index of the flow direction to the index of the cell
        flow_direction_cell = np.ones((no_rows, no_columns, 2)) * np.nan

        for i in range(no_rows):
            for j in range(no_columns):
                if not np.isnan(flow_direction[i, j]):
                    ind = int(flow_direction[i, j])
                    flow_direction_cell[i, j, 0] = i + X_INDEX[ind]
                    flow_direction_cell[i, j, 1] = j + Y_INDEX[ind]

        return flow_direction_cell

    def flow_direction_index(self) -> np.ndarray:
        """flow_direction_index.

            flow_direction_index takes flow direction raster and converts codes for the 8 directions
            (1,2,4,8,16,32,64,128) into indices of the Downstream cell.

        flow_direct:
            [gdal.dataset] flow direction raster obtained from catchment delineation
            it only contains values [1,2,4,8,16,32,64,128]

        Returns
        -------
        [numpy array]:
            with the same dimensions of the raster and 2 layers
            first layer for rows index and second rows for column index
        """
        # check flow direction input raster
        no_val = self.no_data_value[0]
        cols = self.columns
        rows = self.rows

        fd = self.read_array(band=0)
        fd_val = np.unique(fd[~np.isclose(fd, no_val, rtol=0.00001)])
        fd_should = [1, 2, 4, 8, 16, 32, 64, 128]
        if not all(fd_val[i] in fd_should for i in range(len(fd_val))):
            raise ValueError(
                "flow direction raster should contain values 1,2,4,8,16,32,64,128 only "
            )

        fd_cell = np.ones((rows, cols, 2)) * np.nan

        for i in range(rows):
            for j in range(cols):
                if fd[i, j] == 1:
                    # index of the rows
                    fd_cell[i, j, 0] = i
                    # index of the column
                    fd_cell[i, j, 1] = j + 1
                elif fd[i, j] == 128:
                    fd_cell[i, j, 0] = i - 1
                    fd_cell[i, j, 1] = j + 1
                elif fd[i, j] == 64:
                    fd_cell[i, j, 0] = i - 1
                    fd_cell[i, j, 1] = j
                elif fd[i, j] == 32:
                    fd_cell[i, j, 0] = i - 1
                    fd_cell[i, j, 1] = j - 1
                elif fd[i, j] == 16:
                    fd_cell[i, j, 0] = i
                    fd_cell[i, j, 1] = j - 1
                elif fd[i, j] == 8:
                    fd_cell[i, j, 0] = i + 1
                    fd_cell[i, j, 1] = j - 1
                elif fd[i, j] == 4:
                    fd_cell[i, j, 0] = i + 1
                    fd_cell[i, j, 1] = j
                elif fd[i, j] == 2:
                    fd_cell[i, j, 0] = i + 1
                    fd_cell[i, j, 1] = j + 1

        return fd_cell

    def flow_direction_table(self) -> Dict:
        """Flow Direction Table.

            - flow_direction_table takes flow direction indices created by FlowDirectِِIndex function and creates a
            dictionary with the cells' indices as a key and indices of directly upstream cells as values
            (list of tuples).


            flow_direct:
                [gdal.dataset] flow direction raster obtained from catchment delineation
                it only contains values [1,2,4,8,16,32,64,128]

        Returns
        -------
        flowAccTable:
            [Dict] dictionary with the cells indices as a key and indices of directly
            upstream cells as values (list of tuples)
        """
        flow_direction_index = self.flow_direction_index()

        rows = self.rows
        cols = self.columns

        cell_i = []
        cell_j = []
        celli_content = []
        cellj_content = []
        for i in range(rows):
            for j in range(cols):
                if not np.isnan(flow_direction_index[i, j, 0]):
                    # store the indexes of not empty cells and the indexes stored inside these cells
                    cell_i.append(i)
                    cell_j.append(j)
                    # store the index of the receiving cells
                    celli_content.append(flow_direction_index[i, j, 0])
                    cellj_content.append(flow_direction_index[i, j, 1])

        flow_acc_table = {}
        # for each cell store the directly giving cells
        for i in range(rows):
            for j in range(cols):
                if not np.isnan(flow_direction_index[i, j, 0]):
                    # get the indexes of the cell and use it as a key in a dictionary
                    name = str(i) + "," + str(j)
                    flow_acc_table[name] = []
                    for k in range(len(celli_content)):
                        # search if any cell are giving this cell
                        if i == celli_content[k] and j == cellj_content[k]:
                            flow_acc_table[name].append((cell_i[k], cell_j[k]))

        return flow_acc_table

Open in IDE · Open on GitHub
[pyramids] tests/dem/test_dem.py (Lines 69-98)


def test_d8(
        coello_dem_4000: gdal.Dataset,
        flow_direction_array_cells_indices: np.ndarray,
):
    dem = DEM(coello_dem_4000)
    fd_cell = dem.convert_flow_direction_to_cell_indices()
    assert isinstance(fd_cell, np.ndarray)
    assert fd_cell.shape == (dem.rows, dem.columns, 2)
    assert np.array_equal(fd_cell, flow_direction_array_cells_indices, equal_nan=True)


def test_flowDirectionIndex(coello_df_4000: gdal.Dataset):
    dem = DEM(coello_df_4000)
    fd_cell = dem.flow_direction_index()
    assert isinstance(fd_cell, np.ndarray)
    assert fd_cell.shape == (dem.rows, dem.columns, 2)


def test_flowDirectionTable(coello_df_4000: gdal.Dataset):
    dem = DEM(coello_df_4000)
    fd_cell = dem.flow_direction_table()
    assert isinstance(fd_cell, np.ndarray)
    assert fd_cell.shape == (dem.rows, dem.columns, 2)


def test_flowDirectionTable(coello_df_4000: gdal.Dataset, coello_fdt):
    dem = DEM(coello_df_4000)
    fd_table = dem.flow_direction_table()
    assert fd_table == coello_fdt

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

nearestNeighbour bug

Describe the bug
the MatchTwoRasters code gives an error in line 46.

To Reproduce
Steps to reproduce the behaviour:

Go to MatchTwoRasters.py code
run the code line by line of the whole code at one
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
File "<ipython-input-5-dc2eb691f1ba>", line 1, in <cell line: 1> NewRasterB_ND = Raster.cropAlligned(src, NewRasterB) File "C:\MyComputer\01Algorithms\gis\pyramids\pyramids\raster.py", line 1100, in cropAlligned src_array = Raster.nearestNeighbour(src_array, mask_noval, rows, cols) File "C:\MyComputer\01Algorithms\gis\pyramids\pyramids\raster.py", line 1777, in nearestNeighbour if array[rows[i], cols[i] + 1] != nodatavalue and cols[i] + 1 <= no_cols: IndexError: index 93 is out of bounds for axis 1 with size 93

Additional context
Add any other context about the problem here.

Crop any touched cells

currently the function crops only the cells that lies entirely inside the polygon mask, we need to create a parameter to control this behavior

[pyramids] pyramids/dataset.py (Lines 2360-2391)


def _crop_with_polygon_warp(self, feature: Union[FeatureCollection, GeoDataFrame]):
        """Crop raster with polygon.

            - do not convert the polygon into a raster but rather use it directly to crop the raster using the
            gdal.warp function.

        Parameters
        ----------
        feature: [FeatureCollection]

        Returns
        -------
        gdal.Dataset
        """
        if isinstance(feature, GeoDataFrame):
            feature = FeatureCollection(feature)
        else:
            if not isinstance(feature, FeatureCollection):
                raise TypeError(
                    f"The function takes only a FeatureCollection or GeoDataFrame, given {type(feature)}"
                )

        feature = feature._gdf_to_ds()
        warp_options = gdal.WarpOptions(
            format="VRT",
            # outputBounds=feature.total_bounds,
            cropToCutline=True,
            cutlineDSName=feature.file_name,
            # cutlineLayer=feature.layer_names[0],
            multithread=True,
        )
        dst = gdal.Warp("", self.raster, options=warp_options)

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

calculate the domain cells for a given band

Currently the method assumes that the dataset has only one band, but if the dataset is a multi-band the method will miscalculate or give an error.

add a band parameter to the method and use the read_array method to retrieve the array values for each band.

[pyramids] pyramids/dataset.py (Lines 1054-1067)


def count_domain_cells(self):
        """Count cells inside the domain

        Returns
        -------
        int:
            Number of cells
        """
        # count cells inside the domain
        arr = self.raster.ReadAsArray()
        domain_count = np.size(arr[:, :]) - np.count_nonzero(
            (arr[np.isclose(arr, self.no_data_value[0], rtol=0.001)])
        )
        return domain_count

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

extend the functionality of the `change_no_data_value` method to operate on the tiff file on desk

Is your feature request related to a problem? Please describe.
now the change_no_data_value method copies the dataset to memory and then change the value in the no_data_value property and in the bands of the dataset.

The method should be able to change the no_data_value in both the property and in the bands inplace without copying the dataset in memory

Describe the solution you'd like

open the dataset in write mode and change the no_data_value in the dataset without copying it

feature to dataset

FeatureCollection.to_dataset
when converting a feature collection object (vector) to a dataset and using a dataset as a parameter, not a cell_size, if the dataset does not have the same top left corner as the polygon, the conversion will not work.

To Reproduce
Steps to reproduce the behavior:

try to convert the top brown polygon into a raster and use the bottom raster as a parameter to the to_dataset method

the result dataset will be identical to the input dataset

Expected behavior
the resulting dataset should cover only the polygon

Additional context
possible error, when using a dataset as an input, the function uses it as a source for the number of columns, rows, cell_size, and the pivot point (top left corner, point 1), so if the original pivot point (point 2 ) of the polygon is out of the created domain (blue) of the newly created raster gdal can not create an array that fits the original polygon (green)

extent of raster is not used in the plot when plotting sentinel raster with grb but raster the array indices are used

Describe the bug
when plotting a sentinel raster with 3 bands the extent of the raster is not used in the plot but the array indices are used instead
as you see in the image the axes should have the x/y or lat/lon but they have here 0 -> 10000

To Reproduce

plot any sentinel dataset dataset.plot() without providing a band

combine `extract` and `read_array`

Is your feature request related to a problem? Please describe.

The read_array method can read the whole band, read a window using top_left_corner and width and height, or a polygon geodataframe.
The extract method extracts the values at the location of point geometry, the extract now reads the entire dataset to extract the values and used array slicing to access the array, which is not efficient if the dataset is big.
the extract method could be merged into the read_array method where a new parameter can be added points and the values can be extracted using the same way in the extract method now if the dataset is small, In case that the dataset is big the method can locate the points in the array and then use the src.ReadAsArray(xoff, yoff, xwindow, ywindow) to read the values while the dataset is on desk.

clipRasterWithPolygon

Describe the bug
rasterio can not delete the temporary file in the temp environment path

To Reproduce
Steps to reproduce the behavior:
`
aligned_raster = "examples/data/Evaporation_ECMWF_ERA-Interim_mm_daily_2009.01.01.tif"
Basinshp = "examples/data/basin.geojson"

shp = gpd.read_file(Basinshp)
src = gdal.Open(aligned_raster)

dst = Raster.clipRasterWithPolygon(aligned_raster, Basinshp, save=False, output_path=None)
dst = Raster.clip2(aligned_raster, Basinshp, save=False, output_path=None)
Map.plot(dst, title="After Cropping-Evapotranspiration by a shapefile", color_scale=1,
ticks_spacing=0.01)
`

Screenshots
C:\Users\eng_m\anaconda\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string) Traceback (most recent call last): File "C:\Users\eng_m\anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-14-991bc4843a84>", line 36, in <cell line: 36> dst = Raster.clipRasterWithPolygon(aligned_raster, Basinshp, save=False, output_path=None) File "C:\MyComputer\01Algorithms\gis\pyramids\pyramids\raster.py", line 1452, in clipRasterWithPolygon os.remove(temp_path) PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\eng_m\\AppData\\Local\\Temp/cropped.tif'

Statistics for cells within a given polygon/raster

Add a mask parameter (raster/polygon) to be used to calculate the stats of the cells that lies within this given raster/polygon.

[pyramids] pyramids/dataset.py (Lines 540-561)


def stats(self, band: int = None) -> DataFrame:
        """stats.

            - Get statistics of a band.
                [min, max, mean, std]

        Parameters
        ----------
        band: [int]
            band index, if None the statistics of all bands will be returned.

        Returns
        -------
        DataFrame:
            DataFrame of statistics values of each band, the dataframe has the following columns:
            [min, max, mean, std], the index of the dataframe is the band names.
                           min         max        mean       std
            Band_1  270.369720  270.762299  270.551361  0.154270
            Band_2  269.611938  269.744751  269.673645  0.043788
            Band_3  273.641479  274.168823  273.953979  0.198447
            Band_4  273.991516  274.540344  274.310669  0.205754
        """

Open in IDE · Open on GitHub

^{Created from JetBrains using CodeStream}

serapieum-of-alex / pyramids Goto Github PK

pyramids's Introduction

Current release info

pyramids - GIS utility package

Main Features

Future work

Installing pyramids

Install from GitHub

pip

Quick start

pyramids's People

Contributors

Stargazers

Watchers

pyramids's Issues

Recommend Projects

Recommend Topics

Recommend Org