ubarsc / pylidar Goto Github PK
View Code? Open in Web Editor NEWA set of Python modules which makes it easy to write lidar processing code in Python
Home Page: http://www.pylidar.org
License: GNU General Public License v3.0
A set of Python modules which makes it easy to write lidar processing code in Python
Home Page: http://www.pylidar.org
License: GNU General Public License v3.0
Original report by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
We need data and test scripts to test the following functionality:
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
The current code for handling the Riegl coordinate transformation is set up to do the actual calculation as float32. When world coordinates are UTM, the numbers are large enough that there is a loss of precision at the sub-metre level. For TLS scans such as the Riegl gives us, we need the precision to be more like millimetre. Hence this calculation should be done as float64.
I have put in place a hack to get around it for now, as just changing to had too many implications elsewhere in the file. However, this is what ought to be done, ultimately. I have had to circumvent the use of the pylmatrix code, which is a shame.
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
(Note - also posted this to the discussion group, but I don't know if anyone sees that)
Now that few users have gained some experience in using pylidar for real tasks (it has taken a while), it is starting to become apparent that there are some real limitations with the spatial processing mode of pylidar. This is when each block of data is constructed as an array of binned data, with the first two dimensions representing X and Y position, and the third dimension being a ragged array giving each point which falls within that bin. The ragged array is implemented using a numpy masked array, so that for each bin, the third dimension only extends to as many points actually occur in that bin, and the remaining elements are masked.
The main problems arise with the time taken to construct this array, and the size it can occupy in memory. It would seem that these are both tied to the fact the point density can be quite variable across a scan. This means that for a particular block, there can be a single bin which has many points, and this is used as the maximum size of the third dimension. However, if this number is considerably larger than the average across the block, then it makes the whole array very large in that direction, resulting in considerable "empty" space in the rest of the array. Since the block size (in terms of X and Y) is fixed across a whole call to doProcessing(), it must be small enough that the block with the largest point count in a single bin can be accommodated in memory, however, for some scans (TLS in particular), this can impose a very small block size, in order to fit within available memory.
I am not sure, but I think this also ties in to the run time, as the construction of this array takes considerable time, and with a small block size, I think that much of the data would have to be handled multiple times to select which points apply to a given block. Not sure about that part.
In any case, on TLS scans (where point density is highest in the centre), the memory requirements can go into many Gb, for files which are not that big in the first place. The run times can be many hours to process across such a file.
Jasmine and I put together a non-spatial processing framework, in which the non-spatial mode is used, and the user function does what it requires in spatial terms, just on-the-fly. In this case, the operation was a simple min(Z) operation, and was implemented using a numba jit function to do the per-point section. The memory requirement was tiny, and the run time went from on the order of 12 hours down to around 5 minutes.
Obviously this is a vast improvement.
My feeling is that the spatial processing mode, with the masked arrays, was worth testing out, but I suspect it is not a feasible model for doing this sort of thing, and we are better off using the non-spatial mode and handling the spatial aspects in the particular case, for a given task. Using numba will help greatly in this.
With this in mind, I would like to suggest that we change the default control setting to non-spatial, and shelve all further plans for development on the spatial-mode parts of the code.
It also raises the question (which Nick has raised before) about what purpose the indexing of a file is actually serving for us. The TLS example mentioned above made no use of the index, and runs vastly faster, so it is not clear that indexing files is a useful thing to do.
These are just my impressions, and I recognise that there could be a lot else I don't know about. As always, I emphasize that I am just helping out here, and don't really know anything about lidar, but these results suggest that there are some important questions to consider.
Any thoughts are welcome.
Neil
Original report by Daniel Clewley (Bitbucket: danclewley, GitHub: danclewley).
I think there is a problem applying the LiDAR processor directly to LAS files as the processor example from http://pylidar.org/en/latest/processorexamples.html produces a blank image when applied directly to a LAS file but the expected result when the file is converted to SPD format first.
I've tested on a couple of LAS files I have and apl1dr_x509000ys6945000z56_2009_ba1m6_pbrisba.las from the pylidar test data and get the same result. For all files I have generated a spatial index first using lasindex
. I'm using pylidar 0.3.0 installed via conda on Python 3.5 under Linux.
It looks like the problem is getPointsByBins
always returns 0 points for the LAS file but not the SPD version. The code below will reproduce the problem by running the same function on a LAS file and an SPD v4 created from the input LAS. Adding a line to print zValues.shape
shows no points being found for the LAS file.
import os
import numpy
from rios import cuiprogress
from pylidar import lidarprocessor
from pylidar.toolbox.translate.las2spdv4 import translate
from pylidar.lidarformats import generic
def writeImageFunc(data):
zValues = data.input_points.getPointsByBins(colNames='Z')
(maxPts, nRows, nCols) = zValues.shape
nullval = 0
if maxPts > 0:
minZ = zValues.min(axis=0)
stack = numpy.ma.expand_dims(minZ, axis=0)
else:
stack = numpy.empty((1, nRows, nCols), dtype=zValues.dtype)
stack.fill(nullval)
data.output_image.setData(stack)
if __name__ == '__main__':
input_las = 'testdata/apl1dr_x509000ys6945000z56_2009_ba1m6_pbrisba.las'
imported_spd = os.path.splitext(input_las)[0] + '.spd'
info = generic.getLidarFileInfo(input_las)
importedSPD = imported_spd
translate(info, input_las, imported_spd, epsg=27700,
pulseIndex='FIRST_RETURN', spatial=True, binSize=2.0,
buildPulses=True)
for in_points in [input_las, imported_spd]:
out_img = in_points + '.img'
dataFiles = lidarprocessor.DataFiles()
dataFiles.input_points = lidarprocessor.LidarFile(in_points,
lidarprocessor.READ)
dataFiles.output_image = lidarprocessor.ImageFile(out_img,
lidarprocessor.CREATE)
if os.path.splitext(in_points)[-1].lower() != '.spd':
dataFiles.input_points.setLiDARDriverOption('BIN_SIZE', 2.0)
controls = lidarprocessor.Controls()
controls.setProgress(cuiprogress.CUIProgressBar())
lidarprocessor.doProcessing(writeImageFunc, dataFiles)
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
It seems there is some mysterious state information being preserved between calls to lidarprocessor.doProcessing(), within the SPDV4 driver and/or the h5py libraries.
The attached program pylidarFileCloseBug.py demonstrates the problem. It runs two successive processes which operate on the same file. The intention is situations where a first pass through a file is required to calculate some numbers, then a second pass through to update a field with the resulting values. In this small example, we just read some points and do nothing.
The first pass opens the file with READ mode, and works fine. The second pass opens with UPDATE, and fails. If both passes use the same mode (either READ or UPDATE), they both succeed.
I do not know whether the problem is in h5py or pylidar, but it is not purely an h5py problem, because the attached h5py fragment pureH5py.py works fine.
It does not modify the file (as shown by the file modification times), so it is internal to the program, not in the file itself.
I was testing this on the latest tip in the repository (fbea23379b02 - dated Sun Nov 27 15:14:44 2016 +1000).
Any thoughts?
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Running pylidar_index on an input SPD V4 file creates an output file which has a different number of points. This does not seem good. It may be harmless, but it could also be a sign of something more sinister. Example counts for a single TLS file show the number of points change from 28363782 to 28362382, a loss of 1400 points. It is not yet clear whether the points are missing from the file, or simply that the count is incorrectly recorded, but neither seems like a good thing.
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
This probably should be an issue under pynninterp, but here will do for now.
Nick has some evidence to suggest a memory leak when running interpolations. He was running something Jaz set up to do interpolations across a large number of tiles (with a suitably small window size), and it grew and grew, to many Gb. This suggests that there is a memory leak, with something not being freed up between windows.
I should try to set up a suitable test loop over the guts of pynninterp, but wanted to log this here, so others might be able to look into it, too.
Original report by muirj (Bitbucket: muirj, ).
Error when converting from spd file to las file. The error states "Last argument needs to be numpy dtype"
Original report by Caio Hamamura (Bitbucket: caiohamamura, GitHub: caiohamamura).
Is there a SPDV4 driver for C/C++? I’d like to use it in my algorithms to convert from and to spdv4 format, as I’m currently working with a custom binary implementation and would like to switch to a more standard format. SPDV4 could be the format standard if it had a solid c/c++ library. And in the future support for compressing algorithms.
Hi,
I am trying to open a lidar file (LAZ and LAS formats), but the function spatial.readLidarPoints can not read it. The file is not corrupted because it open correctly on ArcMap and QGIS.
Do you have any advice or suggestion? Thanks in advance, JL
`/usr/local/lib/python3.7/dist-packages/pylidar/lidarformats/generic.py in getReaderForLiDARFile(fname, mode, controls, userClass, verbose)
625 # none worked
626 msg = 'Cannot open LiDAR file %s' % fname
--> 627 raise LiDARFormatDriverNotFound(msg)
628
629 def getLidarFileInfo(fname, verbose=False):
LiDARFormatDriverNotFound: Cannot open LiDAR file ./data/lidar_data.las`
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Somewhere in the conversion from a Riegl rxp file to an SPD V4 file, the NUMBER_OF_POINTS gets converted to a float64. This is likely to be problematic. It is defined in the format as uint64, and so this means that at some point in the process, there is an implicit type conversion going on. This is unlikely to be handling precision in the right ways, and I suspect it responsible for a loss of precision when the number of points is large. The same type conversion appears to also occur in conversion from .las to SPD V4. NUMBER_OF_PULSES also appears to undergo the same type conversion.
Original report by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Running pylidar_index on a TLS file seems to double the number of points in the resulting file. The number is almost exactly double (e.g. from 28363782 to 56726164), when reported by pylidar_info (NUMBER_OF_POINTS field). Seems weird.
When run on a tile of airborne lidar, the number of points changes a little, e.g. from 665022 to 690044, which is nothing like as much increase, but still weird.
Original report by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
Currently when exporting to LAS, if there are columns that aren't part of the LAS spec they get saved as extra columns or 'attributes'. However, when reading the file back in with PyLidar you get a segmentation fault. Changing the version to LAS 1.4 seems to help a bit but other errors emerge.
Seems that using LAStools for this isn't easy. I've tried a number of things, but we need someone who has experience of the library to give us some assistance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.