Comments (17)
Thank you Max. We will look into this but first, @fscottfoti do you have any comments on the initial architecture of this, rational, or best way to move forward?
from pandana.
Not off the top of my head but I will take a look in depth later today...
from pandana.
Hmm, this is complicated and I'm not sure I have this all sorted out in my head, but this is what I remember...
Basically in the internal C++ code (in contraction hierarchies) node_ids are actually indexes into an array so yeah internally they have to be contiguous starting at 0, but they shouldn't be externally so the user never has to know about that.
I'm pretty sure net.set deals with that correctly here by mapping external ids to internal ids. That's pretty well tested code so I think there's more to it than that. Not saying there's no bug, but I mean, it's been used a lot without 0-based indexes so I think it's mostly working.
I'm also a bit confused. You're doing poi searches right, not aggregations? If that's the case you don't use set
at all - you use set_pois??
from pandana.
Sorry for the confusion, shouldn't have said POIs. This is an aggregation, specifically the regional_vars calculation which involves regional POIs, thus the confusion.
It looks like I may have overlooked this line where the index of self.node_idx
is defined as the index of the nodes dataframe. That makes sense now. Let's leave the issue open while I keep digging for the root cause then, because my issue resolved itself as soon as I converted my node IDs to 0-based sequential integers. I'll try to put together a script or a notebook or something that demonstrates the issue.
from pandana.
Yeah, a simple test would be great - I could take a look
from pandana.
OK, I put together an IPython notebook here along with the data files you'll need to run it. Warning, its a a fairly big network, and the aggregations and precompute steps each take about 3-5 minutes to run on my laptop with 16GB of RAM. Check it out when you get a chance and let me know what you think.
from pandana.
This example is super duper useful - thanks for putting together and especially for sharing the data. Turns out the bug is that when you write the buildings table to csv and read it back in, tmnode_id becomes a string, whereas when nodes is stored in the hdf5 the tmnode_id stays in int. This means you can't align the two indexes (strings != ints). On top of that the notebook appears to hide that from you, whereas although it took me a while to see it, the interpreter shows it clearly. Anyway, bottom line is it's not a bug in pandana, which I figured it couldn't be since I've never used consecutive node_ids. The pandas index bug strikes again!
from pandana.
Interesting, although slightly unsatisfactory! This explains why the problem occurs when reading the buildings table from .csv, but it doesn't explain how/why the issue was happening during the course of a regular UrbanSim run. I only wrote out the buildings table to .csv for the purpose of debugging. I guess the next step would be to write the buildings table out to hdf5 instead, and go from there. I'll report back if I make any progress there, but feel free to close the issue in the meantime if you want. Thanks for taking a look, though. Nice catch!
from pandana.
I think I got that wrong - osm_nodes.index is a string and osm_buildings.tmnode_id is an int.
I'm also seeing something strange.
In [27]: osm_buildings.tmnode_id.isin(osm_nodes.index).value_counts()
Out[27]:
True 1843351
Name: tmnode_id, dtype: int64
This seems to indicate that it does align, even with the different types.
But that's not how pandana does it - it uses a pd.merge, which clearly isn't working. So I'm not sure yet that that's the whole story...
from pandana.
Incidentally, you don't want to see Removed 1843351 rows because they contain missing values
- that means it didn't align.
from pandana.
I can confirm that doing this osm_nodes.index = osm_nodes.index.astype('int')
before the pdna.Network call fixes the alignment issue, so those types are the problem.
I'm at a loss as to why isin returns all true - that seems like a pandas bug?
from pandana.
Sorry I've got it now. Pandana casts to int when it initializes. This keeps the series from aligning. I think the cast is unnecessary though. It has to be an int internally but not externally. We've probably only used ints before.
from pandana.
OK I think that makes sense. The working hypothesis is that if I were to recast the node ID column of my non-zero-indexed nodes table, then everything should work? I'll try that in a full urbansim run and report back.
from pandana.
Yes, turns out @sablanchard already knew that pandana had an issue with using strings as nodeids. It should be fixable.
from pandana.
Oh OK, cool. I'll go ahead and mark this issue closed then?
from pandana.
I'd leave it open as a reminder to fix Pandana to support strings. But you should be fine with the current version if you convert to ints.
from pandana.
Actually, I'm going to close this since the heading is misleading. Will open another issue with a better name for the issue.
from pandana.
Related Issues (20)
- Support for M1 Macs HOT 3
- [Question] How to get different Key|Value combinations in osm.node_query
- [Feature Request] Aggreggation queries to subset of nodes
- Binary incompatibility error with older versions of NumPy HOT 1
- Shortest path length exists but shortest path is empty HOT 5
- shortest path distance between non connected nodes
- Allow specifying mapping distance for POIs
- Issue: the POI Accessibility computation is taking forever
- OSM loader - POIs that are ways in OSM HOT 1
- Pandana network initiation slow for large network HOT 4
- how to increase the performance of the aggregate when impedance is changed to time?
- net.shortest_paths doesn't work with multiprocessing
- tables sub-dependency versions conflict with python 3.10 compatibility when installing with pip HOT 1
- incorrect shortest paths solutions when impendance values are particularly small or large HOT 1
- Plotting with Pandana does not render title HOT 1
- osm.pdna_network_from_bbox error HOT 6
- Pulling network object from OSM bugged HOT 2
- New release ?
- Google Colab install pandana error HOT 1
- query the OSM API for the street network within the specified bounding box error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandana.