dgketchum / landsat578 Goto Github PK
View Code? Open in Web Editor NEWVery simple API to download Landsat [1-5, 7, 8] data from Google
License: Apache License 2.0
Very simple API to download Landsat [1-5, 7, 8] data from Google
License: Apache License 2.0
The "all_in_one_pending" files (https://landsat.usgs.gov/landsat/all_in_one_pending_acquisition/L7/Pend_Acq/y1999/Jul/Jul-11-1999.txt) don't exist before July 11th, 1999, which means that Landsat 7 images between June 28th and July 10th can't be downloaded (I didn't see any L7 images before this in the metadata).
This is a pretty minor issue (only 2 weeks of missing images) but it might be a good impetus for switching to using the bulk metadata XML (https://landsat.usgs.gov/download-entire-collection-metadata) like we talked about. For a quick fix it might be possible to "write an exception handler that finds a future date and subtracts in 16-day increments".
When using Landsat578 to download images, Landsat 7 scenes don't include QA band.
When building/running landsat using python 3.8.6 and pandas 1.1.4, and executing the proposed command line
landsat -sat 7 --start 2007-05-01 --end 2007-05-31 --lat -51.5 --lon 71.25
an error is generated and the script stops:-
Traceback (most recent call last):
File "........./anaconda3/envs/tmp/bin/landsat", line 33, in <module>
sys.exit(load_entry_point('Landsat578==0.5.1', 'console_scripts', 'landsat')())
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/pkg_resources/__init__.py", line 488, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2872, in load_entry_point
return ep.load()
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2472, in load
return self.resolve()
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2478, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/landsat/landsat_cli.py", line 26, in <module>
from landsat.google_download import GoogleDownload
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/landsat/google_download.py", line 36, in <module>
from landsat.update_landsat_metadata import update_metadata_lists, get_wrs_shapefiles
File "........./anaconda3/envs/tmp/lib/python3.7/site-packages/landsat/update_landsat_metadata.py", line 24, in <module>
from pandas.io.common import EmptyDataError
ImportError: cannot import name 'EmptyDataError' from 'pandas.io.common' (........./anaconda3/envs/tmp/lib/python3.7/site-packages/pandas/io/common.py)
Apparently, in the mean time, EmptyDataError moved from pandas.io.common
to pandas.errors
, ref pandas-dev/pandas#37978
Changing landsat/update_landsat_metadata.py
line 24 from
from pandas.io.common import EmptyDataError
to
from pandas.errors import EmptyDataError
clears the issue.
I think it would be nice if the user could list multiple satellite options in the command line call. I've done this before using the nargs='+' or nargs='*' options in argparse (https://docs.python.org/3/library/argparse.html#nargs).
So the command might look something like: landsat --satellite LE7 LC8 --start ...
I was trying to download Landsat 7 images for the Harney Basin in Oregon (43/30) for 2015. The default download URLs generated by the tool don't work for me. If I find the corresponding image in Earth Explorer the download URLs have different "identifiers" than are listed in the code (https://github.com/dgketchum/Landsat578/blob/master/core/usgs_download.py#L88).
For example, running the tool for "LE7 2015-06-01 2015-06-30 --path 43 --row 30" will try do download the following image but return a 500 error (https://earthexplorer.usgs.gov/download/3373/LE70430302015161EDC00/STANDARD/EE). The corresponding image in Earth Explorer for me has an identifier of 12267 (instead of 3373) and a download URL of https://earthexplorer.usgs.gov/download/12267/LE70430302015161EDC00/STANDARD/EE
First, does the original download (with 3373) work for anyone else? I'll try some other areas and date ranges and see if I have the same issue.
I think the tool could benefit from fully implementing the logging module (https://docs.python.org/3/library/logging.html). I really like having the ability to control the amount of text that is returned to the console, especially for quickly identifying major problems and for debugging.
This could be done a number of ways, but at the minimum I would add a simple "debug" command line argument to turn on debug level logging (similar to how the gdal utilities work). I also like how the conda tools (install, update, etc.) have a "verbose" flag also (https://conda.io/docs/commands/conda-install.html) for quickly changing the level.
Within the code, you just have to determine what level each print statement is. For example, within image_download() the prints for "Authentication failed" and "CSRF_Token not found" should probably be at the error or critical level message, and I think the actual download URL should be returned at the debug level. In unzip_image(), the two print statements would probably be at the info level.
This is something I would be happy to make a branch for and start if you think it would help.
Similar to issue #24
when I ran landsat578 I had to pip install "pyarrow" and "fastparquet" manually. Maybe include it in the 'installs_requires' in line 59 of 'setup.py' ?
like the following:
install_requires=['pyyaml', 'pandas', 'requests', 'lxml', 'future','pyarrow', 'fastparquet'],
use --config only for a file, functionality should include --default-config followed by a dir, where the default can be created.
I'm assuming this is because the code is calling mkdir instead of makedirs (https://docs.python.org/2/library/os.html#os.makedirs)
I got the error when I tried setting the output to "-o .\landsat\foo\bar" where only the folder "landsat" was present.
landsat --update-scenes LANDSAT_8
Please wait while Landsat578 updates Landsat metadata files...
Please wait while scene metadata is split
LANDSAT_1
LANDSAT_2
LANDSAT_3
LANDSAT_4
LANDSAT_5
LANDSAT_7
LANDSAT_8
Traceback (most recent call last):
File "/opt/scripts/descargasMonitoreo/agrosiris/bin/landsat", line 11, in
load_entry_point('Landsat578==0.5.1', 'console_scripts', 'landsat')()
File "/opt/scripts/descargasMonitoreo/agrosiris/lib/python3.6/site-packages/landsat/landsat_cli.py", line 139, in cli_runner
return main(args)
File "/opt/scripts/descargasMonitoreo/agrosiris/lib/python3.6/site-packages/landsat/landsat_cli.py", line 126, in main
g = GoogleDownload(**cfg)
TypeError: init() missing 3 required positional arguments: 'start', 'end', and 'satellite'
Packages installed:
certifi (2020.4.5.1)
chardet (3.0.4)
click (7.1.1)
cloudpickle (1.3.0)
cycler (0.10.0)
dask (2.14.0)
fastparquet (0.3.3)
fsspec (0.7.1)
future (0.18.2)
geojson (2.5.0)
geomet (0.2.1.post1)
html2text (2020.1.16)
idna (2.6)
joblib (0.14.1)
kiwisolver (1.2.0)
Landsat578 (0.5.1)
llvmlite (0.31.0)
lxml (4.5.0)
matplotlib (3.2.1)
numba (0.48.0)
numpy (1.18.3)
pandas (1.0.3)
pip (9.0.1)
pkg-resources (0.0.0)
pyarrow (0.16.0)
pyparsing (2.4.7)
python-dateutil (2.8.1)
pytz (2020.1)
PyYAML (5.1)
requests (2.20.0)
scikit-learn (0.22.2.post1)
scipy (1.4.1)
scorecardpy (0.1.9.2)
sentinelsat (0.13)
setuptools (39.0.1)
six (1.14.0)
thrift (0.13.0)
toolz (0.10.0)
tqdm (4.45.0)
urllib3 (1.22)
wheel (0.34.2)
I am getting the error/traceback listed below when the script tries to download OLI-only images (e.g. L080430302015041LGN01). This is using the latest version on PyPi (0.3.86) and the following command line call. I'm not sure the most logical way to handle it, but you could allow the user to specify "L08" as a satellite option and then exclude L08 images from LC8.
landsat --satellite LC8 --start 2015-02-01 --end 2015-02-15 -p 43 -r 30 -o .\ -cred .\usgs.txt
Namespace(configuration=None, credentials='.\\usgs.txt', end='2015-02-15', latitude=None, longitude=
None, max_cloud_percent=100, output='.\\', path='43', return_list=False, row='30', satellite='LC8',
start='2015-02-01', zipped=False)
Starting download with pathrow...
LO80430302015041LGN01
Traceback (most recent call last):
File "c:\miniconda3\envs\landsat\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\miniconda3\envs\landsat\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Miniconda3\envs\landsat\Scripts\landsat.exe\__main__.py", line 9, in <module>
File "c:\miniconda3\envs\landsat\lib\site-packages\landsat\landsat.py", line 133, in cli_runner
return main(args)
File "c:\miniconda3\envs\landsat\lib\site-packages\landsat\landsat.py", line 124, in main
scenes = download_landsat(start, end, sat, **cfg)
File "c:\miniconda3\envs\landsat\lib\site-packages\landsat\download_composer.py", line 68, in down
load_landsat
usgs_creds, zipped)
File "c:\miniconda3\envs\landsat\lib\site-packages\landsat\usgs_download.py", line 144, in down_us
gs_by_list
identifier, stations = get_station_list_identifier(product)
File "c:\miniconda3\envs\landsat\lib\site-packages\landsat\usgs_download.py", line 104, in get_sta
tion_list_identifier
raise NotImplementedError('Must provide valid product string...')
NotImplementedError: Must provide valid product string...
It would be nice to be able to control the folder structure where the images are saved. For example, I would really like to have the images be saved by path, then row, then year (i.e. /43/30/2015), but I might also want to have them just be by path and row, similar to how they are saved in the Google Storage Bucket (https://console.cloud.google.com/storage/browser/gcp-public-data-landsat/LC08/01/043/033/), or maybe I want to use three digit path and row numbers for some reason (i.e. /043/030).
It probably wouldn't be pretty but you could allow the user to define a format string with specific key words. Something like: "/{PATH:02d}/{ROW:02d}/{YEAR}" where the default could be what you currently have: "/{SENSOR}_{PATH}_{ROW}".
Just a thought. Feel free to ignore!
As it is, the package only downloads T1 images from Google and there are many more images that only appear in the PRE bucket.
datetime start and end string should be converted outside the app-specific code, within the download-specific code
This is another minor issue, but it would be nice if the default output folder was the current working directory. I haven't dug into how this part of the code is structured, but on some of my other tools I have got this to work by setting the default value in the argparser add_argument call (https://docs.python.org/3/library/argparse.html#default).
Right now, if "-o" is not set, I get the following traceback:
Traceback (most recent call last):
File "c:\miniconda3\envs\landsat\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\miniconda3\envs\landsat\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Miniconda3\envs\landsat\Scripts\landsat.exe\__main__.py", line 9, in <module>
File "c:\miniconda3\envs\landsat\lib\site-packages\core\landsat.py", line 81, in __main__
exit(main(args))
File "c:\miniconda3\envs\landsat\lib\site-packages\core\landsat.py", line 69, in main
dry_run=args.return_list, zipped=args.zipped)
File "c:\miniconda3\envs\landsat\lib\site-packages\core\download_composer.py", line 64, in downloa
d_landsat
satellite, tile[0], tile[1]))
File "c:\miniconda3\envs\landsat\lib\ntpath.py", line 75, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Currently the script returns the following traceback and error:
Traceback (most recent call last):
File "c:\miniconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\miniconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Miniconda3\Scripts\landsat.exe\__main__.py", line 9, in <module>
File "c:\miniconda3\lib\site-packages\landsat\landsat.py", line 78, in __main__
exit(main(args))
File "c:\miniconda3\lib\site-packages\landsat\landsat.py", line 70, in main
raise NotImplementedError('Was not executed.')
NotImplementedError: Was not executed.
In a brand new Python 2.7 Conda environment on windows, a basic call to the landsat CLI raises the following ImportError. It works fine for Python 3.6 though.
(landsat-test) D:\>landsat
Traceback (most recent call last):
File "c:\miniconda3\envs\pymetric\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "c:\miniconda3\envs\pymetric\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Miniconda3\envs\pyMETRIC\Scripts\landsat.exe\__main__.py", line 5, in <module>
File "c:\miniconda3\envs\pymetric\lib\site-packages\landsat\landsat.py", line 28, in <module>
from landsat.download_composer import download_landsat
ImportError: No module named download_composer
Simplifying the download_landsat import block in landsat.py to the following line fixed the problem for me on both 2.7 and 3.6, but I didn't check to see if this breaks the tests or would work on mac/linux.
from .download_composer import download_landsat
I then changed the imports in download_composer.py to the following:
from .usgs_download import get_candidate_scenes_list, down_usgs_by_list
from .web_tools import convert_lat_lon_wrs2pr
Also, it seems like the v0.3.86 code on PyPi doesn't match the 0.3.86 release on Github and includes some of the code in the dev branch.
Dask and fastparquet are dependencies that aren't installed during the pip install.
instead of hard coded station identifiers in core.usgs_download.get_station_list_identifier(), use a call to google api to get latest station identifier for each download batch
Is it just a maintenance thing? I have extensive experience automatically publishing to pypi using CI or via a 1-command script. Let me know if that's the issue.
If its not, I'm curious why its only hosted on gitlab now.
could I get other files such as LandsatLook Natural Color Image and QA file besides Geotiff file
If you have an incorrect username or password in the credentials file, the script is not catching that in download_image(). Instead it seems like the script is returning the invalid login HTML as a tgz file (~9KB). You should be able to recreate this by transposing two letters in your password (at least that is how I found it...)
I think the issue is that the exact text string "You must sign in as a registered user to download data or place orders for USGS EROS products" (https://github.com/dgketchum/Landsat578/blob/master/landsat/usgs_download.py#L48) is not in the invalid login HTML. The closest thing I am seeing is "sign in with your existing USGS registered username and password" or "Invalid username/password" (with a div id of "pageError" which might be easier to search for).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.