gocarina / gocsv Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 242.0 301 KB

The GoCSV package aims to provide easy CSV serialization and deserialization to the golang programming language

License: MIT License

Go 100.00%

gocsv's People

Contributors

Stargazers

Watchers

Forkers

torfuzx henrikpihl wrapp-archive keymastervn clyphub pgruenbacher hut8 samv leeychee timblackwell ingaged wawandco veloting stmichaelis moorereason challiwill benjamintrapani epicyon rafaeljusto williamlv morita-kuma peterdeka fullstackanalytics pkopac jasondoc3 bhainesva luccacabra yaoshipu davidshiller chrisdobbins siddharth178 iain17 clems71 rtoledof skabbes zhaitianduo ezeql mlowe12 kanchengzhu puredu arisawa coolhanddev marcwilson volker48 lmanotas deliri anasanzari wbexwbex nguyendangminh hzhang08 abeusher imjerrybao fino-digital launchdarkly qhenkart andrewdonelson jchapm bardiakeyvani jongman sdotz ferhatelmas schwuk cclulu mako21983 3-shake sulthonzh jinguoxing bevanhunt definitelymod sachnk ragnar-by opsrampdeveloper kinpol nodeart mamont1971 bjacobowski fwwieffering zbarahal marklr copyleftdev gnat88 leeamra dgparker adambuckland hongshan feng2012 negativeone1 iamxhu infigenie alex jsvisa linuxfft m1dcarry apogeecorp hori-ryota nmlgc amasser nmijailovic sellzee rjammala

gocsv's Issues

Trim trailing spaces?

I'm dealing with data that is fixed width csv, and I'd like to have a simple method for trimming trailing spaces. I know that the leading space trimming is handled by encoding/csv, but I was looking for a way to do it by providing a custom CSVReader that wraps LazyCSVReader, or something along those lines. I can't seem to get my head around how to do that, is it possible to provide implementations of the Decoder or CSVReader interfaces?

Columns with a blank header fail to unmarshal

See test case here: https://github.com/User4574/gocsvexample

How about custom delimiter instead of commas

What I may say is somehow trivial for csv format.
Due to our tab-delimiter input from my customer , I am currently modifying gocsv to support tab delimiter instead of comma delimiter.

So, why not support custom delimiter like golang basic library encoding/csv

type Reader struct {
        Comma            rune // field delimiter (set to ',' by NewReader)
       ...
}

Make MarshalFile overwrite file

Currently, MarshalFile seems to leave parts of the old file behind. Is there any way to clear it out?

[feature request]Handle duplicate headers

we have a situation where we have csv file with duplicate headers such as following example

client_id,client_name,client_age,class,class,class
1,Jose,42,maths,chem,
2,Daniel,26,chem,,
3,Vincent,32,maths,physics,chem

Race condition in UnmarshalToCallback()

Hi,

Facing race condition when I tried to use UnmarshalToCallback in my code. Running

go test -v -race

detected 'race condition' when it tried to access same code, post spawning of goroutines.

Refer attached log for more details.
RaceCondition.txt

Content containing commas

Hi, this is not really an issue with your software but in using it with dumb data source.

There are four colums in the CSV, one of the columns contains ordinary text data that may also contain commas. Horrible. There is a phone number, a date, sent/received,Subject,Content

An example row is like this
,+447755505585,08-22-2013 08:43:12,Send,,Some text, which may contain commas

Does your library provide any way that I can handle this case?

Many thanks

setting gocsv.TagSeparator causes marshalling to use that rune as a separator for the CSV data

I have some csv field names which contain commas and I can't control that.
To deal with this, before unmarshalling the csv I will do:

gocsv.TagSeparator = "#"

But for some reason that causes the marshaller to use this # as the field separator in the CSV. AFAIK these are totally separate concepts: TagSeparator is regarding struct tags and the field separator is regarding the format of the marshalled/unmarshalled CSV.

So I expected to be able to parse a CSV using this tag, which works:

DateOfCreation *CivilDate `bigquery:DateOfCreation,nullable" csv:"Date, of creation#omitempty"

But then when I marshall the struct back to a CSV, I get

field1#field2#field3

And what I want is

field1,field2,field3

UnmarshalText return

Noticed a missing return (see between ## in code below) within the unmarshall function in types.go.
When using UnmarshalText rather than UnmarshalCSV, function returns error on the last line instead of returning nil (if UnmarshalText exists).

unMarshallIt := func(finalField reflect.Value) error {
        if finalField.CanInterface() && finalField.Type().Implements(unMarshallerType) {
            if err := finalField.Interface().(TypeUnmarshaller).UnmarshalCSV(value); err != nil {
                return err
            }
            return nil
        } else if finalField.CanInterface() && finalField.Type().Implements(textUnMarshalerType) { // Otherwise try to use TextMarshaller
            if err := finalField.Interface().(encoding.TextUnmarshaler).UnmarshalText([]byte(value)); err != nil {
                return err
            }
            ## return nil ##
        }
        return fmt.Errorf("No known conversion from string to " + field.Type().String() + ", " + field.Type().String() + " does not implements TypeUnmarshaller")
    }

Allow optional default value

Would it be reasonable to add functionality that would allow you to specify a default value for a given field? For example:

type User struct {
    Email        string `csv:"email"`
    Name        string `csv:"name"`
        Status      string `csv:"active,default=inactive"`
}

Unable to serialise []string

Error generated:

err: No known conversion from []string to string, []string does not implements TypeMarshaller nor Stringernative:
err: No known conversion from []string to string, []string does not implements TypeMarshaller nor Stringerref:

Accept custom csv reader for Unmarshal function?

How would you feel about a PR that would change this function to take in a csv Reader interface instead of the type itself?

This interface would be defined in the package and look something like:

type CSVReader interface {
    Read() ([]string, error)
    ReadAll() ([][]string, error) // Maybe without this function even because I don't think it's used
}

How to deal with Embedded struct ?

I want to Unmarshal CSV into a struct with an embedded struct.
Does not seem to work.

type Identity struct {
    Supplier string `csv:"supplier"`
    Id       string `csv:"id"`
}
type Product struct {
    Identity
    Price int `csv:"price"`
}

Do you have any pointer to make it work ?

Backward compatibility: Add v2 branch mirrored last stable /v2 lib version.

Now it is impossible to use github.com/gocarina/gocsv/v2 with go.mod enabled library with may be used within non go.mod project, dep enabled project for example.

For example in github.com/gramework/gramework with github.com/gocarina/gocsv/v2, breaks dep enabled project in which it was imported.

Solving failure: No versions of github.com/gocarina/gocsv met constraints:
        master: Could not introduce github.com/gocarina/gocsv@master, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)
        master: Could not introduce github.com/gocarina/gocsv@master, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)
        v1: Could not introduce github.com/gocarina/gocsv@v1, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)

And Go Module wiki described the backward compatible solution https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Major branch: Update the go.mod file to include a /v3 at the end of the module path in the module directive (e.g., module github.com/my/module/v3). Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/foo"). Tag the release with v3.0.0.

Go versions 1.9.7+, 1.10.3+, and 1.11 are able to properly consume and build a v2+ module created using this approach without requiring updates to consumer code that has not yet opted in to modules (as described in the the "Semantic Import Versioning" section above).
A community tool github.com/marwan-at-work/mod helps automate this procedure. See the repository or the community tooling FAQ below for an overview.
To avoid confusion with this approach, consider putting the v3.. commits for the module on a separate v3 branch.
If instead you have been previously releasing on master and would prefer to tag v3.0.0 on master, that is a viable option, but consider creating a v1 branch for any future v1 bug fixes.

[feature request] Export getCSVRows and readTo

I have a use case where I already have the csv rows in memory as [][]string. So I no longer need to parse strings.

And primarily I would like to be able to implement a custom decoder and leverage the readTo function to populate fields into the my structs.

MarshalChan panics on a closed channel with no values

Calling MarshalChan on a closed channel (with no values) causes panic.
The error culprit is the zero value read from the channel - "firstValue := <-c" - at:
https://github.com/gocarina/gocsv/blob/master/encode.go#L20

To reproduce the error: https://play.golang.org/p/1Hvmg1V9J3

Keep up the good work,
Enrico

csv tags in structs cannot be a variable

var operation = "operation_date"
type LatamHeaders struct {
time_zone string
Operation_date string csv:operation

I want to do something like this the variable which I am using is a variable. and it is not responding anything.

How to use two different comma type?

Hello.
I am trying to parse two csv files that have different comma type; ",", "\t", in a program.
I injected below code to change default separator.

        gocsv.SetCSVReader(func(in io.Reader) gocsv.CSVReader {
                r := csv.NewReader(in)
                r.Comma = '\t'
                return r
        })

It seems like working fine but doesn't.
The csv with tab could be parsed well but at the same time one with comma not.
In my opinion, the same reader is used in a program.

How can I use different reader on each csv file?

optional header

It would be good to make the writing of the field header line optional

UTF8 Byte Order Marker causes me to lose first field

I had a bizarre issue where I was always losing the first field in my unmarshaled CSV files, I dug in and figured out that it was because the first header string, when printed as a byte array, had the UTF-8 BOM as part of it. I was able to work around by tagging my first fields like this:

Code         string `csv:"\xEF\xBB\xBFCODE,CODE"`

Customizable CSV Writer example doesn't compile

Here's the sample code:

gocsv.SetCSVWriter(func(out io.Writer) *SafeCSVWriter {
    return csv.NewWriter(out)
})

csv.NewWriter returns a io.Writer, not a *SaveCSVWriter. Perhaps it's supposed to be:

gocsv.SetCSVWriter(func(out io.Writer) *SafeCSVWriter {
	writer := csv.NewWriter(out)
	writer.Comma = '|'
	return NewSafeCSVWriter(writer)
})

No suitable API for millons of records.

Sometimes, we need to generate millions of records for test.
func Marshal(in interface{}, out io.Writer) (err error)need to generate records first, which spends lots of memory.

Can we support API like func Marshal(c <-chan interface{}, out io.Writer) (err error)?

README: incorrect import in example

The example code says import gocsv should be changed to import "github.com/gocarina/gocsv"

Delimiters other than than comma

Pipe-delimited files are extremely common, there doesn't appear to be a way to overwrite the delimiter and that would be a necessary feature to be able to use this package.

Consider http://labix.org/gopkg.in usage

Hello!

We would like you to add tag (e.g. v1.0) for current master head, so we would be to use gopkg.in and lock our code on certain version/commit of gocsv.

Thanks in advance!

Cannot ignore structs

#94 created a bug where you can no longer ignore structs when marshalling

type Quote struct {
	Symbol 	string
	Sector 	sql.NullString 	 `csv:"-"` // should be ignored
}

from sql package:

type NullString struct {
        String string
        Valid  bool // Valid is true if String is not NULL
}

Will always give you:

AAPL,-SEC,true
GOOGL,,false

column 0: wrong number of fields in line

using gocsv package to read a csv file. It works fine with files having comma separator. And now it throws column 0: wrong number of fields in line for '|' separated file.

[feature request]Handle unmarshaling without CSV headers

I need to unmarshal some CSVs without headers. As far as I understand, there is no way to do this with this library currently.

It seems we need a new function like UnmarshalWithoutHeaders just like the MarshalWithoutHeaders that was created in the issue #3. I would think that the order of fields in the struct you are unmarshalling to could be used to map columns, or we could use an index struct tag similar to this libarary: https://github.com/yunabe/easycsv.

I'm a Go noob so please tell me if I have things very wrong. If this feature does make sense, I'd be happy to try taking a stab at this, but I think I might need some guidance!

Let unmarshalling succeed if there are errors in just part of the lines

Currently, if one column in one line cannot be deserialized (e.g. failed conversion), the whole operation fails. It would be useful to have an option to continue the deserialization with the remaining lines. Collected errors could then be returned and the application could check if the whole operation failed.

Cancelling unmarshalling

My use case is reading an extremely large CSV, coming from s3, processing these the rows and then outputting results in new CSV.

My problem is gracefully exiting early. I don't see anyway of stopping the CSV unmarshalling.
I was hoping that closing the input reader would work but it does not.

Any suggestions?

sql.Nullstring

If I am using a sql.NullString in my struct, is there a way to define a custom handler then to allow this to be sent through as a string? I am trying this code in https://github.com/OpenCoreData/ocdServices/blob/master/janus/ageModel.go and it works but does serialize the sql.NullStings.

Thanks! nice package

[feature request] Access rows one by one

Hello!

I am piping csv data into my program and I would like to be able to access them one by one.
I know that the first row will contain headers (or I can supply a string containing the headers), so maybe something like this might work? :

scanner := gocsv.NewScanner(os.Stdin)
for scanner.Scan() {

    scanner.UnmarshalString(scanner.Text(), &record)
    if err != nil {
        // process error
        continue
    }
    fmt.Println(record)
}

EDIT: After reading the source files, I made this, but I am still not sure that this should be the correct (only) way to access records one by one:

	c := make(chan logRecord)
	go processRecords(c)

	for {
		err = gocsv.UnmarshalToChan(os.Stdin, c)
		if err == io.EOF {
			break
		}
		if err != nil {
			fmt.Println(err)
		}
	}

Thank you!
Andrei

Parse error never returns from gocsv.UnmarshalToChan

Noticed this when my CSV file had some bad data in it. If decode.readEach returns an error (https://github.com/gocarina/gocsv/blob/master/decode.go#L193) it does not close the channel and the program hangs . maybe you could just add a defer outValue.Close() after it is initialized to avoid this ?

Return rows as dictionaries

Would be nice to be able to just get an array of dictionaries keyed by the header row:

https://gist.github.com/drernie/5684f9def5bee832ebc50cabb46c377a

Streaming CSV

Any way to read headers and then start streaming or reading one record at a time ?

I'd like to do that as well on the write side, the idea is to be able to parse a file, filter it with little memory footprint, as only a few current rows would be kept around at any given time.

Is this something possible ?

make column optional

We have a situation where a column is present in some csv files, absent in others. UnmarshalCSV() gives an error if we try to parse file that is missing this column.
It errors out with following message wrong number of fields in line.
Is there any way we can make this column optional? If the column is not present, default value of this column should be used.

Proposal: Unmarshalling into pointer fields

Currently, unmarshalling pointers to built-in types (like *string) doesn't work without implementing an interface. This feature should support that natively.

Rationale

Unmarshalling into pointer types allows us to make fields optional; otherwise, unmarshalling a partial record would set zero values for the missing fields in the resulting struct, and you wouldn't know if it was because the field was blank or absent.

I have a scenario where I need to accept a subset of supported fields and update a database. I want to make most of the fields optional and only update database fields that are present in a given CSV file.

Example Struct

Given the following User struct:

type User struct {
    ID         int     `csv:"user_id"`
    Username   string  `csv:"username"`
    LocationID int     `csv:"location_id"`
    ExternalID *int    `csv:"external_id"` // optional
    Nickname   *string `csv:"nickname"`    // optional
}

Example Input

user_id,username,external_id
1,userA,1000
2,,

Notice that the location_id and nickname fields are absent.

Example Result

Any fields present in the CSV with empty values would receive a Go zero-value in the resulting struct. Any missing non-pointer fields would also receive a Go zero-value. Any missing pointer fields would retain a nil value.

[]User{
    {
        ID: 1,
        Username: "userA",
        LocationID: 0,
        ExternalID: (*int)1000,
        Nickname: nil,
    },
    {
        ID: 2,
        Username: "",
        LocationID: 0,
        ExternalID: (*int)0,
        Nickname: nil,
    },
}

How to deal with pointers?

I have structures with the format:

type Customer struct {
    Email      *string    `json:"email,omitempty" csv:"email"`
    FirstName  *string    `json:"firstName,omitempty" csv:"firstName"`
    LastName   *string    `json:"lastName,omitempty" csv:"lastName"`
}

The values are pointers so that they can be nil. Works fine with go's json marshaller but returns the error:

No known conversion from *string to string, *string does not implements TypeMarshaller nor Stringer

Read empty file ⇒ panic

I just tried to deserialize an empty CSV file, and 💥 !

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/gocarina/gocsv.readTo(0x7f35f4f833c8, 0xc82000a7b0, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/decode.go:73 +0x984
github.com/gocarina/gocsv.Unmarshal(0x7f35f4f833a0, 0xc82002c028, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/csv.go:121 +0xbb
github.com/gocarina/gocsv.UnmarshalFile(0xc82002c028, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/csv.go:106 +0x6e

I think this should return an error rather than causing a panic.

        headers := csvRows[0]
        body := csvRows[1:]

should probably be preceded by:

if len(csvRows) == 0 {
  return errors.New("header row not found")
}

I"ll open a PR soon.

make toBool() ignore uppercase when strings are input

I dont see a reason why "Yes" =/= "yes" when converting a csv string to bool.

gocsv/types.go

Line 93 in a7422e7

case "yes":

Embedded struct pointers do not marshal

Given the types from sample_structs_test,

type Sample struct {
	Foo  string  `csv:"foo"`
	Bar  int     `csv:"BAR"`
	Baz  string  `csv:"Baz"`
	Frop float64 `csv:"Quux"`
	Blah *int    `csv:"Blah"`
	SPtr *string `csv:"SPtr"`
	Omit *string `csv:"Omit,omitempty"`
}

type EmbedSample struct {
	Qux string `csv:"first"`
	Sample
	Ignore string  `csv:"-"`
	Grault float64 `csv:"garply"`
	Quux   string  `csv:"last"`
}

we know that this will work correctly:

sampleA := Sample{"hellofoo", 2, "hellobaz", 52.0, nil, nil, nil}
a := EmbedSample{"hello", sampleA, "helloignore", 42.0, "helloquux"}
sampleObjects := []EmbedSample{a}
resultNonPtr, err := gocsv.MarshalBytes(sampleObjects)

However, if you change the EmbedSample Sample field to *Sample (perfect valid within go), then:

sampleA := Sample{"hellofoo", 2, "hellobaz", 52.0, nil, nil, nil}
a := EmbedSample{"hello", &sampleA, "helloignore", 42.0, "helloquux"}
sampleObjects := []EmbedSample{a}
resultWithPtr, err := gocsv.MarshalBytes(sampleObjects)

resultWithPtr is missing all of the Sample struct's items and differs unexpectedly from resultNonPtr.

Fix lint errors

http://go-lint.appspot.com/github.com/gocarina/gocsv

Give simple synchronized/blocked version of for each function to iterate the reader

UnmarshalToCallback first create a channel , start a goroutine to fill the channel, then a loop to read from the channel, and call callback function to handle the object.
I think it's users choice to decide whether to handle the whole process in synchronize or async.

So just give us a function like filepath.Walk, so we can read the csv in streaming, and with a callback that we can returns error to stop the iterate.

Working with map

Would it be possible to create a map of two fields. For e.g.

client_id,client_name
4,Jose
2,Daniel
5,Vincent

I'm looking to create a map[client_id]client_name

Of course, after writing it into a struct, I could create a map. But it would be easier and simpler if gocsv could also do this automatically. Any thoughts?

Chinese characters not disploy property

chinese characters messy code

structMap was useless

in reflect.go:

func getStructInfo(rType reflect.Type) *structInfo {
	structMapMutex.RLock()
	stInfo, ok := structMap[rType]
	structMapMutex.RUnlock()
...

I think you should add WLock and assignment structMap[rType]?
The strcutMap was useless in current version.

Make better comments of function

Make it a real doc.
https://godoc.org/github.com/gocarina/gocsv

interface is not handled correctly

Consider the following example:

package main

import (
	"github.com/gocarina/gocsv"
	"fmt"
	"encoding/json"
)

type Foo struct {
	Id    int			`json:"id" csv:"id"`
	Value interface{}	`json:"value" csv:"value"`
}

func main() {
	foo := Foo{Id: 1, Value:"xyz"}

	out, err := gocsv.MarshalString([]Foo{foo})
	if err != nil {
		panic(err)
	}

	fmt.Println(out)

	bytes, err := json.Marshal(foo)
	if err != nil {
		panic(err)
	}

	fmt.Println(string(bytes))
}

Output:

id,value
1,

Expected:

id,value
1,xyz

E.g. json tags resulted in expected behaviour:

{"id":1,"value":"xyz"}

handle null sql interfaces

I'm trying to use MarshalString with a struct slice comprising of sql.Null interfaces but it outputs a csv with all the fields of the sql.Null interface (so for each value it has the type, isNull, and the column value). How would I go about handling that?

First column cannot be populated

csv data:

number,column_a,column_b
1,a,b
2,c,d
3,e,f`

my struct:

type Whatever struct{
     JustNumber   int     `csv:"number"`
     ColumnA      string  `csv:"column_a"`
     ColumnB      string  `csv:"column_b"`
}

the result will be:

{0   a   b}
{0   c   d}
{0   e   f}

but if csv data like this:

,number,column_a,column_b
,1,a,b
,2,c,d
,3,e,f`