Coder Social home page Coder Social logo

Slow getRoutes by stop_id about node-gtfs HOT 6 CLOSED

groenroos avatar groenroos commented on August 19, 2024
Slow getRoutes by stop_id

from node-gtfs.

Comments (6)

brendannee avatar brendannee commented on August 19, 2024

Thanks for the detailed description. Which agency are you using this with?

I think that the getRoutes method isn't very efficient when searching by stop_id - when this is present, it has to lookup stoptimes by stop_id to get trip_id, then trips by trip_id to get route_id. Doing this a lot results in a lot of queries.

I have a few ideas how this could be improved:

  • Change your query to this:
    ** Use getRoutes to search for all routes within your radius and keep these in memory: https://github.com/BlinkTagInc/node-gtfs/blob/eda855458517b7bae18b00e9a7ddf6aaf3950df2/README.md#gtfsgetroutesquery-projection-options
    ** For each route, get a list of all stops that it serves and keep these in memory
    ** Search getStops for all stops within your radius. At this point, you should have the list of routes and their stops to match up.

  • We could update node-GTFS to preprocess stops with the GTFS import and add a new field to each stop which is routes served. Knowing what routes serve each stop is a useful thing.

  • Analyze the queries happening to see if there are additional indexes that should be created which could speed up your existing logic

  • Revisit #93 and see if this can speed things up.

Let me know what you think.

from node-gtfs.

groenroos avatar groenroos commented on August 19, 2024

Thanks for the quick response!

I'm working with the TfNSW Open Data: https://opendata.transport.nsw.gov.au/node/332/exploreapi

Their combined bus data is enormous; nearly 6,000 routes, 38,000 stops and 3.7 million stop times. It occurred to me that instead of importing all the records from the /buses interface under a single agency_key, I could import each bus operator's data separately under separate agency_keys as offered by their API. Could this potentially improve the performance?

If not, I'll have a go at your suggested alternative logic. Sounds like an intriguing approach!

As for the longer term, ideally the improvements would be built-in to the gtfs-import, whether the right solution is adding indexes and/or relationships, or preprocessing the data (this is probably my favourite). That would be the simplest solution, so that none of it is lost/overwritten when my cronjobbed gtfs-import downloads updates from the data provider.

from node-gtfs.

groenroos avatar groenroos commented on August 19, 2024

Their combined bus data is enormous; nearly 6,000 routes, 38,000 stops and 3.7 million stop times. It occurred to me that instead of importing all the records from the /buses interface under a single agency_key, I could import each bus operator's data separately under separate agency_keys as offered by their API. Could this potentially improve the performance?

Nevermind; it had no discernible difference in query speed, but it did have unintended side effects in a different part of my app.

What sort of indexes could I create to speed up the existing logic? I could also look into writing a custom MongoDB query with a $lookup so there aren't as many roundtrips to the database.

from node-gtfs.

groenroos avatar groenroos commented on August 19, 2024
  • Analyze the queries happening to see if there are additional indexes that should be created which could speed up your existing logic

It seems that db.stoptimes.createIndex( { "stop_id": 1 } ) has solved all my woes, and the 40+ second response time for the above code is now clearly sub 500ms.

Might be a good idea to add this index to the importer?

from node-gtfs.

brendannee avatar brendannee commented on August 19, 2024

Wow - thanks for the notes.

I added a stop_id index to the mongoose schema: 918d5bb

If you have a chance to delete your database and then try the import again to see if that is the exact index needed that would be great - let me know.

Also, let me know if you notice or can think of any other indexes that would be useful in speeding up the processing.

from node-gtfs.

groenroos avatar groenroos commented on August 19, 2024

I dropped the database, and re-ran the import on 1.8.4, and it does create the appropriate index and the query is lightning fast! As far as I'm concerned, this issue is fixed! Thank you very much for your help!

from node-gtfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.