Coder Social home page Coder Social logo

Comments (17)

sablanchard avatar sablanchard commented on July 19, 2024

Thank you Max. We will look into this but first, @fscottfoti do you have any comments on the initial architecture of this, rational, or best way to move forward?

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Not off the top of my head but I will take a look in depth later today...

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Hmm, this is complicated and I'm not sure I have this all sorted out in my head, but this is what I remember...

Basically in the internal C++ code (in contraction hierarchies) node_ids are actually indexes into an array so yeah internally they have to be contiguous starting at 0, but they shouldn't be externally so the user never has to know about that.

I'm pretty sure net.set deals with that correctly here by mapping external ids to internal ids. That's pretty well tested code so I think there's more to it than that. Not saying there's no bug, but I mean, it's been used a lot without 0-based indexes so I think it's mostly working.

I'm also a bit confused. You're doing poi searches right, not aggregations? If that's the case you don't use set at all - you use set_pois??

from pandana.

mxndrwgrdnr avatar mxndrwgrdnr commented on July 19, 2024

Sorry for the confusion, shouldn't have said POIs. This is an aggregation, specifically the regional_vars calculation which involves regional POIs, thus the confusion.

It looks like I may have overlooked this line where the index of self.node_idx is defined as the index of the nodes dataframe. That makes sense now. Let's leave the issue open while I keep digging for the root cause then, because my issue resolved itself as soon as I converted my node IDs to 0-based sequential integers. I'll try to put together a script or a notebook or something that demonstrates the issue.

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Yeah, a simple test would be great - I could take a look

from pandana.

mxndrwgrdnr avatar mxndrwgrdnr commented on July 19, 2024

OK, I put together an IPython notebook here along with the data files you'll need to run it. Warning, its a a fairly big network, and the aggregations and precompute steps each take about 3-5 minutes to run on my laptop with 16GB of RAM. Check it out when you get a chance and let me know what you think.

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

This example is super duper useful - thanks for putting together and especially for sharing the data. Turns out the bug is that when you write the buildings table to csv and read it back in, tmnode_id becomes a string, whereas when nodes is stored in the hdf5 the tmnode_id stays in int. This means you can't align the two indexes (strings != ints). On top of that the notebook appears to hide that from you, whereas although it took me a while to see it, the interpreter shows it clearly. Anyway, bottom line is it's not a bug in pandana, which I figured it couldn't be since I've never used consecutive node_ids. The pandas index bug strikes again!

from pandana.

mxndrwgrdnr avatar mxndrwgrdnr commented on July 19, 2024

Interesting, although slightly unsatisfactory! This explains why the problem occurs when reading the buildings table from .csv, but it doesn't explain how/why the issue was happening during the course of a regular UrbanSim run. I only wrote out the buildings table to .csv for the purpose of debugging. I guess the next step would be to write the buildings table out to hdf5 instead, and go from there. I'll report back if I make any progress there, but feel free to close the issue in the meantime if you want. Thanks for taking a look, though. Nice catch!

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

I think I got that wrong - osm_nodes.index is a string and osm_buildings.tmnode_id is an int.

I'm also seeing something strange.

In [27]: osm_buildings.tmnode_id.isin(osm_nodes.index).value_counts()
Out[27]: 
True    1843351
Name: tmnode_id, dtype: int64

This seems to indicate that it does align, even with the different types.

But that's not how pandana does it - it uses a pd.merge, which clearly isn't working. So I'm not sure yet that that's the whole story...

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Incidentally, you don't want to see Removed 1843351 rows because they contain missing values - that means it didn't align.

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

I can confirm that doing this osm_nodes.index = osm_nodes.index.astype('int') before the pdna.Network call fixes the alignment issue, so those types are the problem.

I'm at a loss as to why isin returns all true - that seems like a pandas bug?

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Sorry I've got it now. Pandana casts to int when it initializes. This keeps the series from aligning. I think the cast is unnecessary though. It has to be an int internally but not externally. We've probably only used ints before.

from pandana.

mxndrwgrdnr avatar mxndrwgrdnr commented on July 19, 2024

OK I think that makes sense. The working hypothesis is that if I were to recast the node ID column of my non-zero-indexed nodes table, then everything should work? I'll try that in a full urbansim run and report back.

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Yes, turns out @sablanchard already knew that pandana had an issue with using strings as nodeids. It should be fixable.

from pandana.

mxndrwgrdnr avatar mxndrwgrdnr commented on July 19, 2024

Oh OK, cool. I'll go ahead and mark this issue closed then?

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

I'd leave it open as a reminder to fix Pandana to support strings. But you should be fine with the current version if you convert to ints.

from pandana.

fscottfoti avatar fscottfoti commented on July 19, 2024

Actually, I'm going to close this since the heading is misleading. Will open another issue with a better name for the issue.

from pandana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.