We should update to: PyTorch 1.0 Python 3.6 # Python

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Update software stack -- and start geo-deep-learning release numbering about geo-deep-learning HOT 8 CLOSED

ymoisan commented on August 31, 2024

Update software stack -- and start geo-deep-learning release numbering

from geo-deep-learning.

Comments (8)

epeterson12 commented on August 31, 2024

Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
Pytorch 1.0.0 Release Notes

Because of this, in the training and classification steps, we will need to transfer our tensors to the CPU before converting them into numpy arrays using Tensor.cpu(). This will need to be done in metrics.py and in image_classification.py:

metrics.py

56    class_report = classification_report(label.cpu(), pred.cpu(), output_dict=True)
78    iou = jaccard_similarity_score(target.cpu(), pred.cpu(), normalize=True)

inference.py

158   output_np[row_from:row_to, col_from:col_to, 0] = useful_classification.cpu()

from geo-deep-learning.

epeterson12 commented on August 31, 2024

If we upgrade to Python 3.7, other packages will need to be updated in our environment:

gdal: version 2.2.2 doesn't support Python 3.7. We could upgrade to the most recent version 2.4.0. I did some quick tests with Python 3.7 and gdal 2.4.0 and didn't have any issues. Further validations will have to be done.
h5py: the 2.8.0 version build that we currently are using py36h39dcb92_0 isn't compatible with Python 3.7. We could update to a more recent build or to the 2.9.0 version which supports Python 3.7. The updates between versions shouldn't affect us.

from geo-deep-learning.

ymoisan commented on August 31, 2024

TL;DR

I think we all agree migrating to PyTorch 1.0 is a must. Migrating to Python 3.7, however, is not a must but we should do it.

Details

My base rationale for migrating to 3.7 is that:

3.6 is already 3.5 yrs old, with the lastest release being 3.6.8 in Dec. 2018 and 3.6.9 announced as "Security fixes only"; "3.6 will receive bugfix updates approximately every 3 months for approximately 24 months. After the release of 3.7.0 final, two more 3.6 bugfix updates will be released. After that, it is expected that security updates (source only) will be released until 5 years after the release of 3.6 final, so until approximately December 2021." So in a nutshell, 3.6 is now in bugfix mode
3.7 has been developed for close to 2.5 yrs and is already at point release 3.7.2 with 3.7.3 expected end March 2019. As for 3.6, 3.7's life cycle is tied to the next major release : "3.7 will receive bugfix updates approximately every 1-3 months for approximately 18 months. After the release of 3.8.0 final ..." 3.8.0 is expected to be released end Oct. 2019

What's new and better

The "What's new" page list bugfixes and enhancements with respect to 3.6. I guess easier debugging might be interesting for us. Other potentially interesting modficications for us include:

argparse
itertools
sqlite3 : useful for opening GeoPackage files ?
"Various optimizations have" : reduced Python startup time by 10% on Linux ... Method calls are now up to 20% faster ... sorted() and list.sort() have been optimized for common cases to be up to 40-75% faster ... dict.copy() is now up to 5.5 times faster ... the creation of named tuples 4 to 6 times faster. ... os.fwalk() function is now up to 2 times faster ... The speed of comparison of array.array instances has been improved considerably in certain cases. It is now from 10x to 70x faster ...

It does not look like there are huge departures from 3.6 that may affect us.

As far as dependent libraries are concerned, here is a table of the main "non standard" imported modules in both our current Python 3.6 (PyTorch 0.4.1) environment and a Python 3.7 environment with their supported versions as available from conda install. Newer versions are in bold.

Note: all standard modules, like os, subprocess, argparse, fnmatch, etc. obviously inherently supported Python 3.7

module	3.6 env	3.7 env
pytorch	0.4.1	1.0.0
	cuda 9.0.176	cuda 9.0.176
	cudnn 7.1.2_2	cudnn 7.4.1_1
[lib]gdal	2.2.4	2.4.0
geos	3.6.2	3.7.1
h5py	2.8.0	2.9.0
hdf5	1.8.18	1.10.4
libtiff	4.0.9	4.0.10
numpy	1.15.4	1.15.4
proj4	5.0.1	5.2.0
ruamel_yaml	0.15.46	0.15.46
scikit-image	0.14.1	0.14.1
scikit-learn	0.20.2	0.20.2
scipy	1.1.0	1.2.0
torchvision	0.2.1	0.2.1

Potential issues for us

New library versions in bold above may introduce issues in the code. Starting from the top:

pytorch: whatever the issues we migrate
gdal: GDAL releases show that 2.2.4 was released on March 22, 2018. 2.3.0 was released in May and 2.4.0 in Dec 2018. There does not seem to be severe backward incompatibilities that we should be worried about; most drivers were improved, including the two most important ones for us: GTiff and GPKG

We should check potential libraries we are likely to import later that depend on GDAL, e.g. rasterio, are OK

geos: this module is not imported directly in geo-deep-learning; no expected issues
h5py: looks like there are a few interesting enhancements we could use for us, especially given ticket #51.
hdf5: Since 2.8, h5py is built against HDF5 1.10.x; however in our current environment we only have HDF5 1.8.18; there should be goodies in HDF5 1.10.4
One such functionality : SWMR: "Data acquisition and computer modeling systems often need to analyze and visualize data while it is being written. It is not unusual, for example, for an application to produce results in the middle of a run that suggest some basic parameters be changed, sensors be adjusted, or the run be scrapped entirely. To enable users to check on such systems, we have been developing a concurrent read/write file access pattern we call SWMR (pronounced swimmer). SWMR is short for single-writer/multiple-reader. SWMR functionality allows a writer process to add data to a file while multiple reader processes read from the file." Maybe quite important in the context of #55.
Another is Virtual Datasets (VDS)
libtiff: no surprises expected here except better performances
proj4: 5.2.0 was released in Sep.2018; 5.0.1 was released in March 2018, but is only in bugfix mode. We should definitely go to 5.2.0
scipy: 1.2.0 released Dec 2018; we use scipy.stats for which there have been improvements in 1.2.0 since scipy 1.1.0 (May 2018)

from geo-deep-learning.

epeterson12 commented on August 31, 2024

As long as we install rasterio through the conda-forge channel, there don't appear to be any issues with dependencies.

We can use the tests_python3_7 environment in order to test how the stack update affects our deep-learning program. So far I haven't found any issues or modifications that we need to make other than explicitly moving tensors to the CPU before converting to numpy as mentioned above.

from geo-deep-learning.

mpelchat04 commented on August 31, 2024

@epeterson12 - Concerning the "moving tensors to the CPU before converting to numpy"; Would it be possible to use built-in functions of PyTorch to calculate our metrics (e.g. replicate the classification_report functions, using PyTorch functions)?
1- Are those functions available in PyTorch 1.0?
2- Would it be wise to implement them, if not?

from geo-deep-learning.

epeterson12 commented on August 31, 2024

In metrics.py, I don't see an alternative to transferring tensors over to the cpu manually since the operations where we directly convert cuda tensors to numpy arrays are sklearn functions and sklearn doesn't support GPU operations. Unlike other deep learning packages like TensorFlow, Pytorch doesn't have an alternative for training metrics.

Since the behavior in releases prior to 1.0.0 was to transfer tensors to the cpu implicitly before converting them to numpy arrays, converting them explicitly shouldn't affect the efficiency of our program.
I don't think we would gain much by implementing them.

from geo-deep-learning.

epeterson12 commented on August 31, 2024

I am running into problems with undefined symbols when trying to use caffe2 for training models and for inference. There seem to be some dependency issues when using gdal and onnx (the intermediate step between pytorch models and caffe2 models). Once I installed the onnx package, I started getting this error while trying to import gdal:

ImportError: [...] python3.7/site-packages/osgeo/../../../libgdal.so.20: undefined symbol: _ZNK6libdap5Error17get_error_messageB5cxx11Ev

Once the cause and/or workaround is identified, I will update this comment with the fix.

Update

Reverting the packages to their version before installing onnx seems to have solved the problem. The packages are:

openssl=1.0.2
libdap4=3.19.1=hd48c02d_1000
conda install expat=2.2.5

from geo-deep-learning.

ymoisan commented on August 31, 2024

Done with v1.0

from geo-deep-learning.

Update software stack -- and start geo-deep-learning release numbering about geo-deep-learning HOT 8 CLOSED

Comments (8)

TL;DR

Details

What's new and better

Potential issues for us

Update

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent