Hi, I'm getting a node error when I run "npm run download"; Utah Tra

When you do npm run download , the <code class="notran

derhuerst, I get that error ever time I run the download . I did come across

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fatal Error about node-gtfs HOT 18 CLOSED

MentatGhola commented on July 19, 2024

Fatal Error

from node-gtfs.

Comments (18)

ianemcallister commented on July 19, 2024 1

I ran into similar file size issues with a 108.5MB stop_times.txt file from Trimet (https://developer.trimet.org/GTFS.shtml) Portland, OR. I've been constructing my own alternative process to writeout train line specific JSON files and bypass mongo entirely. If this functionality would be helpful to others I'd be happy to submit a PR when it's working properly.

from node-gtfs.

ianemcallister commented on July 19, 2024 1

Cool. So far the algorithm has been very specific to my project, but I've been thinking about refactoring it for the larger community, glad you're interested, I'll keep you updated!

from node-gtfs.

derhuerst commented on July 19, 2024

When you do npm run download, the scripts/download.js file will be run.

This file contains a few bad practices on how to deal with asynchronicity and streams (blocking the event queue as well as accessing the fs synchronously. This is what I think leads to too much data being read into memory.

Can you please verify how consitently you get this error?

As a temporary workaroung, you can call node with something like --optimize_for_size --max_old_space_size=2000 to increase the memory limit.

from node-gtfs.

MentatGhola commented on July 19, 2024

derhuerst,
I get that error ever time I run the download script. I did come across some info earlier about the --max_old_space_size=2000 flag but got the same error. I then ran it again with the memory size increased to 3gb and it hung on "post processing" , with no stop_times inserted in my database. I just updated node and got the error again, but will try again with --optimize_for_size. What exactly does --optimize_for_size do?

from node-gtfs.

derhuerst commented on July 19, 2024

@MentatGhola

What exactly does --optimize_for_size do?

I don't know, but man node | grep memory gave me "Enables optimizations which favor memory size over execution speed".

from node-gtfs.

MentatGhola commented on July 19, 2024

Well the good news is I no longer get a fatal error. Bad news is I can't get past "Post Processing data". I have tried twice and both times let it run 3+ hours. The stoptimes, stops, and trips collections are empty.

from node-gtfs.

brendannee commented on July 19, 2024

Thanks for reporting this.

I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.

Pull request are welcome, including readme updates about the --optimize_for_size option.

from node-gtfs.

derhuerst commented on July 19, 2024

I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.

Pull request are welcome, including readme updates about the --optimize_for_size option.

I think once the script uses streams in a non-blocking way everywhere, --optimize_for_size won't be necessary anymore. Will see if I have time to create a PR.

from node-gtfs.

sign0 commented on July 19, 2024

Hi,

I have the same problem, I have a gtfs whose stop_times.txt is 1.4GB. I extended memory with the flag --max-old-space-size=32768, but it's not enough to parse my gtfs.

I think the problem is async.forEach. We must proceed on large files in row mode : the count of rows in the file, then a i++ until the end. So the process is really not memory intensive.

from node-gtfs.

akaplo commented on July 19, 2024

Also having trouble with this. It fails on my StopTimes file as well, which is 31.1MB (PVTA's (Massachusetts, USA) GTFS Data).

I'll look into this and see if I can pinpoint the error; if so, I'll attempt to fix it and throw up a pull request if I manage to.

For your reference, my stack trace is as follows:

Security context: 0x3e17400b4629
1: NonStringToString [native runtime.js:~571] [pc=0x3b337225e069](this=0x32675ec1c229 ,i=0x112292b61 <Number: 1.45962e+09)
2: parseInt(aka parseInt) [native v8natives.js:~37] [pc=0x3b3371d2b489](this=0x3e17400041b9 ,q=0x112292b61 <Number: 1.45962e+09>,r=10)
3: new constructor(aka ObjectID) [/Users/akaplo/Documents/Transit/node-gtfs/node_modules/bs...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

from node-gtfs.

Aghassi commented on July 19, 2024

So, I've been looking into this (and failing constantly) for the past couple days. I agree that this problem is stemming from the async calls for inserting data into Mongo, and doing that via a readable stream. I'm now going to try this package: https://github.com/theelee13/node-gtfs2mongo.git

I'm dealing with 85Mb of data, so hopefully this works. If it does, it may be viable to work with that framework to help make the import happen.

from node-gtfs.

derhuerst commented on July 19, 2024

@brendannee any news on this?

from node-gtfs.

Aghassi commented on July 19, 2024

@ianemcallister I'd be interested in seeing that.

from node-gtfs.

tyleragreen commented on July 19, 2024

Just wanted to chime in and say I had the same issue with the 240MB stop_times.txt from MBTA, even when using --optimize_for_size --max_old_space_size=2000.

from node-gtfs.

ianemcallister commented on July 19, 2024

It's not pretty and it's not 100% complete but if it helps here's what I used to parse the files into JSON files. https://github.com/ianemcallister/Public_Transportation_App/blob/master/server/lib/parseGTFS.js

I watched my activity monitor when it's crunching and it gets in the 2GB of ram range, it's a lot of data to crunch.

Also, I'm not as familiar with mongo, but there's gotta be a way to write to the database with this repo more frequently than per file, even per 10,000 lines would make a big difference.

from node-gtfs.

tyleragreen commented on July 19, 2024

I'm wondering why the csv library has a 'readable' event inside which you still have to call parser.read(). Most other csv libraries emit a 'data' event for each line after you set up the pipe.

I've converted the synchronous file accesses to asynchronous in this fork. I also moved the floatFields and integerFields outside the loop body in an effort to save memory. Neither fix has been enough to get past the issue.

==== JS stack trace =========================================

Security context: 0x18a526b4629 <JS Object>
    2: new constructor(aka ObjectID) [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb-core/node_modules/bson/lib/bson/objectid.js:~28] [pc=0x211040617374] (this=0x183e431ca3e9 <an ObjectID with map 0x8fd849e23b9>,id=0x18a526041b9 <undefined>)
    3: arguments adaptor frame: 0->1
    5: createPk [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted

Has anyone had any success with this?

from node-gtfs.

gerlacdt commented on July 19, 2024

should be fixed with:
#56

from node-gtfs.

brendannee commented on July 19, 2024

I pushed a new update in node-gtfs 0.6.0 that should help importing large GTFS files.

I'm closing this issue as it is probably the same as #55 - please post there if you have large GTFS files that still have issues importing.

from node-gtfs.

Fatal Error about node-gtfs HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent