lbdremy / node-csv-stream Goto Github PK
View Code? Open in Web Editor NEWSimple CSV stream for node.js
License: MIT License
Simple CSV stream for node.js
License: MIT License
Hi,
im running into an issue where an escape then the following enclose character (both in my case a double quote "
), are split over two sequential chunks in a stream..
In this simple bit of code for example:
var Parser = require('csv-stream/lib/parser');
var p = new Parser({
escapeChar : '"',
enclosedChar : '"'
});
p.parse('"..."');
p.parse('"quoted"""');
the full string is actually "...""quoted"""
, but because the two characters are split, it breaks this line (https://github.com/lbdremy/node-csv-stream/blob/master/lib/parser.js#L52) because it doesn't have access to the next character in the full string to determine if its actually an enclose character.
A suggested fix would be to keep track of if we are currently "escaping" between chunks, and treat the first character of the next chunk a bit differently.
I will create a PR for the fix
Thanks
This is probably not a use case that this module should be used for, but I am hoping there may be an easy way to make this work...
I have an input stream that has the following format:
Col1,Col2,Col3
"Abc",1,2
"Xyz",3,4
Cl1,Cl2
"bla",1
"Bla bla",2
"Bla bla bla", 3
C1,C2,C3,C4
1,2,3,4
5,6,7,8
7,8,9,10
Any suggestions?
Thanks!
Hi, thanks for your little handy lib first of all :)
I have a question though: why is there no documentation available on the write part?
Looking at your documentation I would assume it would consume object hashes, so it would internally convert them to whatever it needs to handle.
So I tried using:
csvsream.write({some: object})
but the code expects a buffer as input, which seems very inconsistent to me.
What are your ideas?
My understanding of pause() with respect to streaming a csv would be to pause the stream at the current record. This would allow async processing to occur before resuming the stream to get the next record.
What's currently happening is the buffer is fully processed before _paused is taken into account.
var count = 0;
var csvStream = new csv.createStream({
endLine : '\n',
escapeChar : '"',
enclosedChar: '"'
})
var readStream = fs.createReadStream(targetPath)
csvStream.on('error', function(err){
console.log(err)
})
csvStream.on('data', function(data){
count++
console.log('pre pause', count)
csvStream.pause()
//I would normally have async code here with a csvStream.resume() in a callback/promise
})
csvStream.on('end', function(){
console.log("we're done")
//cb()
})
readStream.pipe(csvStream)
I would expect the stream to pause at each record, but it goes through until the end of file (or buffer in this case) before actually entering the paused state.
Is the intended behaviour to process the entire file/buffer before allowing pause() to occur?
When there's more than one header line things get weird.
I am trying to parse key/value CSV data from a streaming api thats coming in the following format:
|unit=835636,unittype=5,address=197.178.58.229,kind=1,pending=0,mileage=23026,odometer=0,logic_state=1,reason=20,eventid=0,response=0,longitude=37.93065,latitude=-2.40789,altitude=956,gps_valid=1,gps_connected=1,satellites=7,velocity=46,heading=113,emergency=0,driver=0,ignition=1,door=0,arm=0,disarm=0,extra1=0,extra2=0,extra3=1,siren=0,lock=0,immobilizer=0,unlock=0,fuel=0,rpm=0,modemsignal=0,main_voltage=28.11,backup_voltage=100.00,analog1=0.00,analog2=0.00,analog3=0.29,datetime_utc=2013/08/28 08:08:25,datetime_actual=2013/08/28 08:08:22,network=TCPIP.1|
Using the following example from your readme.md:
var http = require('http'),
request = require('request'),
csv = require('csv-stream');
var options = {
delimiter : '=', // default is ,
endLine : '|', // default is \n,
columns : ['unit','unittype','address','kind','pending','mileage','odometer','logic_state','reason','longitude','latitude','altitude','gps_valid','gps_connected','satellites','velocity','heading','emergency','driver','ignition','door','arm','disarm','extra1','extra2','extra3','siren','lock','immobilizer','unlock','fuel','rpm','modemsignal','main_voltage','analog1','analog','analog3','datetime_utc','datetime_actual', 'network'],
escapeChar : '',
enclosedChar : ''
}
var csvStream = csv.createStream(options);
request('http://127.0.0.1:7777/user=me&pass=1234').pipe(csvStream)
.on('error',function(err){
console.error(err);
})
.on('data',function(data){
// outputs an object containing a set of key/value pair representing a line found in the csv file.
console.log(data);
})
.on('column',function(key,value){
// outputs the column name associated with the value found
console.log(key + ' : ' + value);
});
However I'm getting the following response:
{ unit: 'unit',
unittype: '856511,unittype',
address: '5,address',
kind: '197.180.82.170,kind',
pending: '1,pending',
mileage: '0,mileage',
odometer: '13918,odometer',
logic_state: '0,logic_state',
reason: '1,reason',
longitude: '20,eventid',
latitude: '0,response',
altitude: '0,longitude',
gps_valid: '38.43246,latitude',
gps_connected: '-2.95994,altitud
satellites: '544,gps_valid',
velocity: '1,gps_connected',
heading: '1,satellites',
emergency: '7,velocity',
driver: '8,heading',
ignition: '0,emergency',
door: '0,driver',
arm: '0,ignition',
disarm: '1,door',
extra1: '0,arm',
extra2: '0,disarm',
extra3: '0,extra1',
siren: '0,extra2',
lock: '0,extra3',
immobilizer: '1,siren',
unlock: '0,lock',
fuel: '0,immobilizer',
rpm: '0,unlock',
modemsignal: '0,fuel',
main_voltage: '0,rpm',
analog1: '0,modemsignal',
analog: '0,main_voltage',
analog3: '28.94,backup_voltage',
datetime_utc: '86.00,analog1',
datetime_actual: '0.00,analog2',
network: '0.00,analog3',
undefined: 'TCPIP.1' }
Do I have to extend the parser to accomplish the above scenario?
I am trying to terminate the parsing and emit the end
event.
Here is my code, any help?
var csvStream = csv.createStream({ escapeChar: '"', enclosedChar: '"' });
let csvData = {headers: {},rows:[]};
request(req.body.fileUrl).pipe(csvStream)
.on('error', function (err) {
console.error(err);
})
.on('header', function (columns) {
csvData.headers = columns;
console.log(columns);
})
.on('data', async function (data) {
console.log(data);
csvData.rows.push(data);
if (csvData.rows.length >= 100) {
csvStream.emit('end'); // I need only top 100 rows
}
})
.on('end', async function () {
// run the last of the operations
csvData.rows = csvData.rows.slice(0, 10);
return res.send(csvData);
})
.on('column', function (key, value) {
// outputs the column name associated with the value found
console.log('#' + key + ' = ' + value);
});`
See discussion in: dominictarr/excel-stream#12
I'm using csv-stream
with express to parse a request stream (file).
This is the code I'm using:
const csv = require('csv-stream')
module.exports = (req, res) => {
const csvStream = csv.createStream({ enclosedChar: '"' })
req.pipe(csvStream)
.on('error', (err) => { console.error(err) })
.on('data', (data) => { console.log(data) })
req.on('end', () => { res.send('END') })
}
When I run this, I get this error:
buffer.js:557
throw new TypeError('Unknown encoding: ' + encoding);
^
TypeError: Unknown encoding:
at stringSlice (buffer.js:557:9)
at Buffer.toString (buffer.js:593:10)
at CSVStream.write (/(...)/node_modules/csv-stream/index.js:59:34)
at IncomingMessage.ondata (_stream_readable.js:626:20)
at emitOne (events.js:115:13)
at IncomingMessage.emit (events.js:210:7)
at IncomingMessage.Readable.read (_stream_readable.js:462:10)
at flow (_stream_readable.js:833:34)
at resume_ (_stream_readable.js:815:3)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
I can fix this by adding the following to my code:
csvStream._encoding = 'utf8'
But this is an ugly hack that I want to avoid at all cost.
Is there any way to fix this from my side? Am I missing something? Or should csv-stream
set a value for this._encoding
coming from options
or something similar?
Can we get top 10 rows only?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.