I ran into this problem while running an instance of bayarea_urbansim using my own net

Thank you Max. We will look into this but first, <a class="user-mention notranslate" d

OK, I put together an IPython notebook <a href="https://drive.google.com/file/d/0B3AJ8

Network.set() fails if network node IDs are not sequentially ordered integers starting at 0 about pandana HOT 17 CLOSED

udst commented on July 19, 2024

Network.set() fails if network node IDs are not sequentially ordered integers starting at 0

from pandana.

Comments (17)

sablanchard commented on July 19, 2024

Thank you Max. We will look into this but first, @fscottfoti do you have any comments on the initial architecture of this, rational, or best way to move forward?

from pandana.

fscottfoti commented on July 19, 2024

Not off the top of my head but I will take a look in depth later today...

from pandana.

fscottfoti commented on July 19, 2024

Hmm, this is complicated and I'm not sure I have this all sorted out in my head, but this is what I remember...

Basically in the internal C++ code (in contraction hierarchies) node_ids are actually indexes into an array so yeah internally they have to be contiguous starting at 0, but they shouldn't be externally so the user never has to know about that.

I'm pretty sure net.set deals with that correctly here by mapping external ids to internal ids. That's pretty well tested code so I think there's more to it than that. Not saying there's no bug, but I mean, it's been used a lot without 0-based indexes so I think it's mostly working.

I'm also a bit confused. You're doing poi searches right, not aggregations? If that's the case you don't use set at all - you use set_pois??

from pandana.

mxndrwgrdnr commented on July 19, 2024

Sorry for the confusion, shouldn't have said POIs. This is an aggregation, specifically the regional_vars calculation which involves regional POIs, thus the confusion.

It looks like I may have overlooked this line where the index of self.node_idx is defined as the index of the nodes dataframe. That makes sense now. Let's leave the issue open while I keep digging for the root cause then, because my issue resolved itself as soon as I converted my node IDs to 0-based sequential integers. I'll try to put together a script or a notebook or something that demonstrates the issue.

from pandana.

fscottfoti commented on July 19, 2024

Yeah, a simple test would be great - I could take a look

from pandana.

mxndrwgrdnr commented on July 19, 2024

OK, I put together an IPython notebook here along with the data files you'll need to run it. Warning, its a a fairly big network, and the aggregations and precompute steps each take about 3-5 minutes to run on my laptop with 16GB of RAM. Check it out when you get a chance and let me know what you think.

from pandana.

fscottfoti commented on July 19, 2024

This example is super duper useful - thanks for putting together and especially for sharing the data. Turns out the bug is that when you write the buildings table to csv and read it back in, tmnode_id becomes a string, whereas when nodes is stored in the hdf5 the tmnode_id stays in int. This means you can't align the two indexes (strings != ints). On top of that the notebook appears to hide that from you, whereas although it took me a while to see it, the interpreter shows it clearly. Anyway, bottom line is it's not a bug in pandana, which I figured it couldn't be since I've never used consecutive node_ids. The pandas index bug strikes again!

from pandana.

mxndrwgrdnr commented on July 19, 2024

Interesting, although slightly unsatisfactory! This explains why the problem occurs when reading the buildings table from .csv, but it doesn't explain how/why the issue was happening during the course of a regular UrbanSim run. I only wrote out the buildings table to .csv for the purpose of debugging. I guess the next step would be to write the buildings table out to hdf5 instead, and go from there. I'll report back if I make any progress there, but feel free to close the issue in the meantime if you want. Thanks for taking a look, though. Nice catch!

from pandana.

fscottfoti commented on July 19, 2024

I think I got that wrong - osm_nodes.index is a string and osm_buildings.tmnode_id is an int.

I'm also seeing something strange.

In [27]: osm_buildings.tmnode_id.isin(osm_nodes.index).value_counts()
Out[27]: 
True    1843351
Name: tmnode_id, dtype: int64

This seems to indicate that it does align, even with the different types.

But that's not how pandana does it - it uses a pd.merge, which clearly isn't working. So I'm not sure yet that that's the whole story...

from pandana.

fscottfoti commented on July 19, 2024

Incidentally, you don't want to see Removed 1843351 rows because they contain missing values - that means it didn't align.

from pandana.

fscottfoti commented on July 19, 2024

I can confirm that doing this osm_nodes.index = osm_nodes.index.astype('int') before the pdna.Network call fixes the alignment issue, so those types are the problem.

I'm at a loss as to why isin returns all true - that seems like a pandas bug?

from pandana.

fscottfoti commented on July 19, 2024

Sorry I've got it now. Pandana casts to int when it initializes. This keeps the series from aligning. I think the cast is unnecessary though. It has to be an int internally but not externally. We've probably only used ints before.

from pandana.

mxndrwgrdnr commented on July 19, 2024

OK I think that makes sense. The working hypothesis is that if I were to recast the node ID column of my non-zero-indexed nodes table, then everything should work? I'll try that in a full urbansim run and report back.

from pandana.

fscottfoti commented on July 19, 2024

Yes, turns out @sablanchard already knew that pandana had an issue with using strings as nodeids. It should be fixable.

from pandana.

mxndrwgrdnr commented on July 19, 2024

Oh OK, cool. I'll go ahead and mark this issue closed then?

from pandana.

fscottfoti commented on July 19, 2024

I'd leave it open as a reminder to fix Pandana to support strings. But you should be fine with the current version if you convert to ints.

from pandana.

fscottfoti commented on July 19, 2024

Actually, I'm going to close this since the heading is misleading. Will open another issue with a better name for the issue.

from pandana.

Network.set() fails if network node IDs are not sequentially ordered integers starting at 0 about pandana HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent