Comments (18)
I ran into similar file size issues with a 108.5MB stop_times.txt file from Trimet (https://developer.trimet.org/GTFS.shtml) Portland, OR. I've been constructing my own alternative process to writeout train line specific JSON files and bypass mongo entirely. If this functionality would be helpful to others I'd be happy to submit a PR when it's working properly.
from node-gtfs.
Cool. So far the algorithm has been very specific to my project, but I've been thinking about refactoring it for the larger community, glad you're interested, I'll keep you updated!
from node-gtfs.
When you do npm run download
, the scripts/download.js
file will be run.
This file contains a few bad practices on how to deal with asynchronicity and streams (blocking the event queue as well as accessing the fs synchronously. This is what I think leads to too much data being read into memory.
Can you please verify how consitently you get this error?
As a temporary workaroung, you can call node
with something like --optimize_for_size --max_old_space_size=2000
to increase the memory limit.
from node-gtfs.
derhuerst,
I get that error ever time I run the download script. I did come across some info earlier about the --max_old_space_size=2000 flag but got the same error. I then ran it again with the memory size increased to 3gb and it hung on "post processing" , with no stop_times inserted in my database. I just updated node and got the error again, but will try again with --optimize_for_size. What exactly does --optimize_for_size do?
from node-gtfs.
What exactly does
--optimize_for_size do
?
I don't know, but man node | grep memory
gave me "Enables optimizations which favor memory size over execution speed".
from node-gtfs.
Well the good news is I no longer get a fatal error. Bad news is I can't get past "Post Processing data". I have tried twice and both times let it run 3+ hours. The stoptimes, stops, and trips collections are empty.
from node-gtfs.
Thanks for reporting this.
I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.
Pull request are welcome, including readme updates about the --optimize_for_size
option.
from node-gtfs.
I'm going to try to recreate this and also see what I can to clean up the issues that @derhuerst brought up.
Pull request are welcome, including readme updates about the --optimize_for_size option.
I think once the script uses streams in a non-blocking way everywhere, --optimize_for_size
won't be necessary anymore. Will see if I have time to create a PR.
from node-gtfs.
Hi,
I have the same problem, I have a gtfs whose stop_times.txt is 1.4GB. I extended memory with the flag --max-old-space-size=32768, but it's not enough to parse my gtfs.
I think the problem is async.forEach. We must proceed on large files in row mode : the count of rows in the file, then a i++ until the end. So the process is really not memory intensive.
from node-gtfs.
Also having trouble with this. It fails on my StopTimes file as well, which is 31.1MB (PVTA's (Massachusetts, USA) GTFS Data).
I'll look into this and see if I can pinpoint the error; if so, I'll attempt to fix it and throw up a pull request if I manage to.
For your reference, my stack trace is as follows:
Security context: 0x3e17400b4629
1: NonStringToString [native runtime.js:~571] [pc=0x3b337225e069](this=0x32675ec1c229 ,i=0x112292b61 <Number: 1.45962e+09)
2: parseInt(aka parseInt) [native v8natives.js:~37] [pc=0x3b3371d2b489](this=0x3e17400041b9 ,q=0x112292b61 <Number: 1.45962e+09>,r=10)
3: new constructor(aka ObjectID) [/Users/akaplo/Documents/Transit/node-gtfs/node_modules/bs...FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
from node-gtfs.
So, I've been looking into this (and failing constantly) for the past couple days. I agree that this problem is stemming from the async
calls for inserting data into Mongo, and doing that via a readable
stream. I'm now going to try this package: https://github.com/theelee13/node-gtfs2mongo.git
I'm dealing with 85Mb of data, so hopefully this works. If it does, it may be viable to work with that framework to help make the import happen.
from node-gtfs.
@brendannee any news on this?
from node-gtfs.
@ianemcallister I'd be interested in seeing that.
from node-gtfs.
Just wanted to chime in and say I had the same issue with the 240MB stop_times.txt from MBTA, even when using --optimize_for_size --max_old_space_size=2000
.
from node-gtfs.
It's not pretty and it's not 100% complete but if it helps here's what I used to parse the files into JSON files. https://github.com/ianemcallister/Public_Transportation_App/blob/master/server/lib/parseGTFS.js
I watched my activity monitor when it's crunching and it gets in the 2GB of ram range, it's a lot of data to crunch.
Also, I'm not as familiar with mongo, but there's gotta be a way to write to the database with this repo more frequently than per file, even per 10,000 lines would make a big difference.
from node-gtfs.
I'm wondering why the csv library has a 'readable' event inside which you still have to call parser.read(). Most other csv libraries emit a 'data' event for each line after you set up the pipe.
I've converted the synchronous file accesses to asynchronous in this fork. I also moved the floatFields
and integerFields
outside the loop body in an effort to save memory. Neither fix has been enough to get past the issue.
==== JS stack trace =========================================
Security context: 0x18a526b4629 <JS Object>
2: new constructor(aka ObjectID) [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb-core/node_modules/bson/lib/bson/objectid.js:~28] [pc=0x211040617374] (this=0x183e431ca3e9 <an ObjectID with map 0x8fd849e23b9>,id=0x18a526041b9 <undefined>)
3: arguments adaptor frame: 0->1
5: createPk [/home/ubuntu/workspace/node-gtfs/node_modules/mongodb/node_modules/mongodb...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted
Has anyone had any success with this?
from node-gtfs.
should be fixed with:
#56
from node-gtfs.
I pushed a new update in node-gtfs 0.6.0 that should help importing large GTFS files.
I'm closing this issue as it is probably the same as #55 - please post there if you have large GTFS files that still have issues importing.
from node-gtfs.
Related Issues (20)
- Can I use the command line for query operations? HOT 6
- GTFS-R TripDescriptor HOT 5
- Working with Docker Database locked HOT 4
- Dropping/Clearing database HOT 1
- Deployment on errors HOT 5
- Invalid default csv parser option "relax" in import script. HOT 2
- Performance improvements HOT 4
- Possible performance improvement: DuckDB HOT 3
- Changing internal maxInsertVariables has significant impact on total import time HOT 4
- Agency_id defined in agency.txt but not in routes.txt results in invalid GTFS export HOT 1
- Edge deployment and SQLite HOT 4
- Not running with Deno HOT 7
- occupancyStatus from vehicle_positions HOT 6
- Persistent trip_updates when running getStopTimeUpdates HOT 13
- Deleting the db to avoid id collisions HOT 3
- Propagate RT delays to missing stop_sequences HOT 6
- updateGtfsRealtime doesn't work if one URL is down HOT 2
- Disable clean stale GTFS-Realtime data by default HOT 10
- Zip File Containing Multiple GTFS Static Zip Files HOT 3
- import.js throws an error because attribution_id is missing but the spec says it's optional HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-gtfs.