Coder Social home page Coder Social logo

Fatal Error about node-gtfs HOT 18 CLOSED

MentatGhola avatar MentatGhola commented on July 19, 2024
Fatal Error

from node-gtfs.

Comments (18)

ianemcallister avatar ianemcallister commented on July 19, 2024 1

I ran into similar file size issues with a 108.5MB stop_times.txt file from Trimet (https://developer.trimet.org/GTFS.shtml) Portland, OR. I've been constructing my own alternative process to writeout train line specific JSON files and bypass mongo entirely. If this functionality would be helpful to others I'd be happy to submit a PR when it's working properly.

from node-gtfs.

ianemcallister avatar ianemcallister commented on July 19, 2024 1

Cool. So far the algorithm has been very specific to my project, but I've been thinking about refactoring it for the larger community, glad you're interested, I'll keep you updated!

from node-gtfs.

derhuerst avatar derhuerst commented on July 19, 2024

When you do npm run download, the scripts/download.js file will be run.

This file contains a few bad practices on how to deal with asynchronicity and streams (blocking the event queue as well as accessing the fs synchronously. This is what I think leads to too much data being read into memory.

Can you please verify how consitently you get this error?

As a temporary workaroung, you can call node with something like --optimize_for_size --max_old_space_size=2000 to increase the memory limit.

from node-gtfs.

MentatGhola avatar MentatGhola commented on July 19, 2024

derhuerst,
I get that error ever time I run the download script. I did come across some info earlier about the --max_old_space_size=2000 flag but got the same error. I then ran it again with the memory size increased to 3gb and it hung on "post processing" , with no stop_times inserted in my database. I just updated node and got the error again, but will try again with --optimize_for_size. What exactly does --optimize_for_size do?

from node-gtfs.

derhuerst avatar derhuerst commented on July 19, 2024

@MentatGhola

What exactly does --optimize_for_size do?

I don't know, but man node | grep memory gave me "Enables optimizations which favor memory size over execution speed".

from node-gtfs.

MentatGhola avatar MentatGhola commented on July 19, 2024

Well the good news is I no longer get a fatal error. Bad news is I can't get past "Post Processing data". I have tried twice and both times let it run 3+ hours. The stoptimes, stops, and trips collections are empty.

from node-gtfs.

brendannee avatar brendannee commented on July 19, 2024

Thanks for reporting this.

I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.

Pull request are welcome, including readme updates about the --optimize_for_size option.

from node-gtfs.

derhuerst avatar derhuerst commented on July 19, 2024

I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.

Pull request are welcome, including readme updates about the --optimize_for_size option.

I think once the script uses streams in a non-blocking way everywhere, --optimize_for_size won't be necessary anymore. Will see if I have time to create a PR.

from node-gtfs.

sign0 avatar sign0 commented on July 19, 2024

Hi,

I have the same problem, I have a gtfs whose stop_times.txt is 1.4GB. I extended memory with the flag --max-old-space-size=32768, but it's not enough to parse my gtfs.

I think the problem is async.forEach. We must proceed on large files in row mode : the count of rows in the file, then a i++ until the end. So the process is really not memory intensive.

from node-gtfs.

akaplo avatar akaplo commented on July 19, 2024

Also having trouble with this. It fails on my StopTimes file as well, which is 31.1MB (PVTA's (Massachusetts, USA) GTFS Data).

I'll look into this and see if I can pinpoint the error; if so, I'll attempt to fix it and throw up a pull request if I manage to.

For your reference, my stack trace is as follows:

Security context: 0x3e17400b4629
1: NonStringToString [native runtime.js:~571] [pc=0x3b337225e069](this=0x32675ec1c229 ,i=0x112292b61 <Number: 1.45962e+09)
2: parseInt(aka parseInt) [native v8natives.js:~37] [pc=0x3b3371d2b489](this=0x3e17400041b9 ,q=0x112292b61 <Number: 1.45962e+09>,r=10)
3: new constructor(aka ObjectID) [/Users/akaplo/Documents/Transit/node-gtfs/node_modules/bs...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

from node-gtfs.

Aghassi avatar Aghassi commented on July 19, 2024

So, I've been looking into this (and failing constantly) for the past couple days. I agree that this problem is stemming from the async calls for inserting data into Mongo, and doing that via a readable stream. I'm now going to try this package: https://github.com/theelee13/node-gtfs2mongo.git

I'm dealing with 85Mb of data, so hopefully this works. If it does, it may be viable to work with that framework to help make the import happen.

from node-gtfs.

derhuerst avatar derhuerst commented on July 19, 2024

@brendannee any news on this?

from node-gtfs.

Aghassi avatar Aghassi commented on July 19, 2024

@ianemcallister I'd be interested in seeing that.

from node-gtfs.

tyleragreen avatar tyleragreen commented on July 19, 2024

Just wanted to chime in and say I had the same issue with the 240MB stop_times.txt from MBTA, even when using --optimize_for_size --max_old_space_size=2000.

from node-gtfs.

ianemcallister avatar ianemcallister commented on July 19, 2024

It's not pretty and it's not 100% complete but if it helps here's what I used to parse the files into JSON files. https://github.com/ianemcallister/Public_Transportation_App/blob/master/server/lib/parseGTFS.js

I watched my activity monitor when it's crunching and it gets in the 2GB of ram range, it's a lot of data to crunch.

Also, I'm not as familiar with mongo, but there's gotta be a way to write to the database with this repo more frequently than per file, even per 10,000 lines would make a big difference.

from node-gtfs.

tyleragreen avatar tyleragreen commented on July 19, 2024

I'm wondering why the csv library has a 'readable' event inside which you still have to call parser.read(). Most other csv libraries emit a 'data' event for each line after you set up the pipe.

I've converted the synchronous file accesses to asynchronous in this fork. I also moved the floatFields and integerFields outside the loop body in an effort to save memory. Neither fix has been enough to get past the issue.

==== JS stack trace =========================================

Security context: 0x18a526b4629 <JS Object>
    2: new constructor(aka ObjectID) [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb-core/node_modules/bson/lib/bson/objectid.js:~28] [pc=0x211040617374] (this=0x183e431ca3e9 <an ObjectID with map 0x8fd849e23b9>,id=0x18a526041b9 <undefined>)
    3: arguments adaptor frame: 0->1
    5: createPk [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted

Has anyone had any success with this?

from node-gtfs.

gerlacdt avatar gerlacdt commented on July 19, 2024

should be fixed with:
#56

from node-gtfs.

brendannee avatar brendannee commented on July 19, 2024

I pushed a new update in node-gtfs 0.6.0 that should help importing large GTFS files.

I'm closing this issue as it is probably the same as #55 - please post there if you have large GTFS files that still have issues importing.

from node-gtfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.