Coder Social home page Coder Social logo

gocsv's People

Contributors

acls avatar ahmedalhulaibi avatar andrei-m avatar arcanechimp avatar bcbolt avatar benjamintrapani avatar bhainesva avatar chrisbroome avatar cskr avatar ferhatelmas avatar fwwieffering avatar goryudyuma avatar haton14 avatar hchang-clypd avatar hesidoryn avatar iain17 avatar jonathanpicques avatar jozseftiborcz avatar kangoo13 avatar kidandcat avatar marcsantiago avatar moorereason avatar mschmidt-onecause avatar nmlgc avatar pikanezi avatar samv avatar sn00011 avatar stmichaelis avatar woody1193 avatar zaneli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gocsv's Issues

Trim trailing spaces?

I'm dealing with data that is fixed width csv, and I'd like to have a simple method for trimming trailing spaces. I know that the leading space trimming is handled by encoding/csv, but I was looking for a way to do it by providing a custom CSVReader that wraps LazyCSVReader, or something along those lines. I can't seem to get my head around how to do that, is it possible to provide implementations of the Decoder or CSVReader interfaces?

How about custom delimiter instead of commas

What I may say is somehow trivial for csv format.
Due to our tab-delimiter input from my customer , I am currently modifying gocsv to support tab delimiter instead of comma delimiter.

So, why not support custom delimiter like golang basic library encoding/csv

type Reader struct {
        Comma            rune // field delimiter (set to ',' by NewReader)
       ...
}

[feature request]Handle duplicate headers

we have a situation where we have csv file with duplicate headers such as following example

client_id,client_name,client_age,class,class,class
1,Jose,42,maths,chem,
2,Daniel,26,chem,,
3,Vincent,32,maths,physics,chem

Race condition in UnmarshalToCallback()

Hi,

Facing race condition when I tried to use UnmarshalToCallback in my code. Running

go test -v -race

detected 'race condition' when it tried to access same code, post spawning of goroutines.

Refer attached log for more details.
RaceCondition.txt

Content containing commas

Hi, this is not really an issue with your software but in using it with dumb data source.

There are four colums in the CSV, one of the columns contains ordinary text data that may also contain commas. Horrible. There is a phone number, a date, sent/received,Subject,Content

An example row is like this
,+447755505585,08-22-2013 08:43:12,Send,,Some text, which may contain commas

Does your library provide any way that I can handle this case?

Many thanks

setting gocsv.TagSeparator causes marshalling to use that rune as a separator for the CSV data

I have some csv field names which contain commas and I can't control that.
To deal with this, before unmarshalling the csv I will do:

gocsv.TagSeparator = "#"

But for some reason that causes the marshaller to use this # as the field separator in the CSV. AFAIK these are totally separate concepts: TagSeparator is regarding struct tags and the field separator is regarding the format of the marshalled/unmarshalled CSV.

So I expected to be able to parse a CSV using this tag, which works:

DateOfCreation *CivilDate `bigquery:DateOfCreation,nullable" csv:"Date, of creation#omitempty"

But then when I marshall the struct back to a CSV, I get

field1#field2#field3

And what I want is

field1,field2,field3

UnmarshalText return

Noticed a missing return (see between ## in code below) within the unmarshall function in types.go.
When using UnmarshalText rather than UnmarshalCSV, function returns error on the last line instead of returning nil (if UnmarshalText exists).

unMarshallIt := func(finalField reflect.Value) error {
        if finalField.CanInterface() && finalField.Type().Implements(unMarshallerType) {
            if err := finalField.Interface().(TypeUnmarshaller).UnmarshalCSV(value); err != nil {
                return err
            }
            return nil
        } else if finalField.CanInterface() && finalField.Type().Implements(textUnMarshalerType) { // Otherwise try to use TextMarshaller
            if err := finalField.Interface().(encoding.TextUnmarshaler).UnmarshalText([]byte(value)); err != nil {
                return err
            }
            ## return nil ##
        }
        return fmt.Errorf("No known conversion from string to " + field.Type().String() + ", " + field.Type().String() + " does not implements TypeUnmarshaller")
    }

Allow optional default value

Would it be reasonable to add functionality that would allow you to specify a default value for a given field? For example:

type User struct {
    Email        string `csv:"email"`
    Name        string `csv:"name"`
        Status      string `csv:"active,default=inactive"`
}

Unable to serialise []string

Error generated:

err: No known conversion from []string to string, []string does not implements TypeMarshaller nor Stringernative:
err: No known conversion from []string to string, []string does not implements TypeMarshaller nor Stringerref:

Accept custom csv reader for Unmarshal function?

How would you feel about a PR that would change this function to take in a csv Reader interface instead of the type itself?

This interface would be defined in the package and look something like:

type CSVReader interface {
    Read() ([]string, error)
    ReadAll() ([][]string, error) // Maybe without this function even because I don't think it's used
}

How to deal with Embedded struct ?

I want to Unmarshal CSV into a struct with an embedded struct.
Does not seem to work.

type Identity struct {
    Supplier string `csv:"supplier"`
    Id       string `csv:"id"`
}
type Product struct {
    Identity
    Price int `csv:"price"`
}

Do you have any pointer to make it work ?

Backward compatibility: Add v2 branch mirrored last stable /v2 lib version.

Now it is impossible to use github.com/gocarina/gocsv/v2 with go.mod enabled library with may be used within non go.mod project, dep enabled project for example.

For example in github.com/gramework/gramework with github.com/gocarina/gocsv/v2, breaks dep enabled project in which it was imported.

Solving failure: No versions of github.com/gocarina/gocsv met constraints:
        master: Could not introduce github.com/gocarina/gocsv@master, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)
        master: Could not introduce github.com/gocarina/gocsv@master, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)
        v1: Could not introduce github.com/gocarina/gocsv@v1, as its subpackage github.com/gocarina/gocsv/v2 is missing. (Package is required by github.com/gramework/[email protected].)

And Go Module wiki described the backward compatible solution https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Major branch: Update the go.mod file to include a /v3 at the end of the module path in the module directive (e.g., module github.com/my/module/v3). Update import statements within the module to also use /v3 (e.g., import "github.com/my/module/v3/foo"). Tag the release with v3.0.0.

Go versions 1.9.7+, 1.10.3+, and 1.11 are able to properly consume and build a v2+ module created using this approach without requiring updates to consumer code that has not yet opted in to modules (as described in the the "Semantic Import Versioning" section above).
A community tool github.com/marwan-at-work/mod helps automate this procedure. See the repository or the community tooling FAQ below for an overview.
To avoid confusion with this approach, consider putting the v3.. commits for the module on a separate v3 branch.
If instead you have been previously releasing on master and would prefer to tag v3.0.0 on master, that is a viable option, but consider creating a v1 branch for any future v1 bug fixes.

[feature request] Export getCSVRows and readTo

I have a use case where I already have the csv rows in memory as [][]string. So I no longer need to parse strings.

And primarily I would like to be able to implement a custom decoder and leverage the readTo function to populate fields into the my structs.

csv tags in structs cannot be a variable

var operation = "operation_date"
type LatamHeaders struct {
time_zone string
Operation_date string csv:operation

I want to do something like this the variable which I am using is a variable. and it is not responding anything.

How to use two different comma type?

Hello.
I am trying to parse two csv files that have different comma type; ",", "\t", in a program.
I injected below code to change default separator.

        gocsv.SetCSVReader(func(in io.Reader) gocsv.CSVReader {
                r := csv.NewReader(in)
                r.Comma = '\t'
                return r
        })

It seems like working fine but doesn't.
The csv with tab could be parsed well but at the same time one with comma not.
In my opinion, the same reader is used in a program.

How can I use different reader on each csv file?

optional header

It would be good to make the writing of the field header line optional

UTF8 Byte Order Marker causes me to lose first field

I had a bizarre issue where I was always losing the first field in my unmarshaled CSV files, I dug in and figured out that it was because the first header string, when printed as a byte array, had the UTF-8 BOM as part of it. I was able to work around by tagging my first fields like this:

Code         string `csv:"\xEF\xBB\xBFCODE,CODE"`

Customizable CSV Writer example doesn't compile

Here's the sample code:

gocsv.SetCSVWriter(func(out io.Writer) *SafeCSVWriter {
    return csv.NewWriter(out)
})

csv.NewWriter returns a io.Writer, not a *SaveCSVWriter. Perhaps it's supposed to be:

gocsv.SetCSVWriter(func(out io.Writer) *SafeCSVWriter {
	writer := csv.NewWriter(out)
	writer.Comma = '|'
	return NewSafeCSVWriter(writer)
})

?

No suitable API for millons of records.

Sometimes, we need to generate millions of records for test.
func Marshal(in interface{}, out io.Writer) (err error)need to generate records first, which spends lots of memory.

Can we support API like func Marshal(c <-chan interface{}, out io.Writer) (err error)?

Delimiters other than than comma

Pipe-delimited files are extremely common, there doesn't appear to be a way to overwrite the delimiter and that would be a necessary feature to be able to use this package.

Consider http://labix.org/gopkg.in usage

Hello!

We would like you to add tag (e.g. v1.0) for current master head, so we would be to use gopkg.in and lock our code on certain version/commit of gocsv.

Thanks in advance!

Cannot ignore structs

#94 created a bug where you can no longer ignore structs when marshalling

type Quote struct {
	Symbol 	string
	Sector 	sql.NullString 	 `csv:"-"` // should be ignored
}

from sql package:

type NullString struct {
        String string
        Valid  bool // Valid is true if String is not NULL
}

Will always give you:

AAPL,-SEC,true
GOOGL,,false

column 0: wrong number of fields in line

using gocsv package to read a csv file. It works fine with files having comma separator. And now it throws column 0: wrong number of fields in line for '|' separated file.

[feature request]Handle unmarshaling without CSV headers

I need to unmarshal some CSVs without headers. As far as I understand, there is no way to do this with this library currently.

It seems we need a new function like UnmarshalWithoutHeaders just like the MarshalWithoutHeaders that was created in the issue #3. I would think that the order of fields in the struct you are unmarshalling to could be used to map columns, or we could use an index struct tag similar to this libarary: https://github.com/yunabe/easycsv.

I'm a Go noob so please tell me if I have things very wrong. If this feature does make sense, I'd be happy to try taking a stab at this, but I think I might need some guidance!

Let unmarshalling succeed if there are errors in just part of the lines

Currently, if one column in one line cannot be deserialized (e.g. failed conversion), the whole operation fails. It would be useful to have an option to continue the deserialization with the remaining lines. Collected errors could then be returned and the application could check if the whole operation failed.

Cancelling unmarshalling

My use case is reading an extremely large CSV, coming from s3, processing these the rows and then outputting results in new CSV.

My problem is gracefully exiting early. I don't see anyway of stopping the CSV unmarshalling.
I was hoping that closing the input reader would work but it does not.

Any suggestions?

[feature request] Access rows one by one

Hello!

I am piping csv data into my program and I would like to be able to access them one by one.
I know that the first row will contain headers (or I can supply a string containing the headers), so maybe something like this might work? :

scanner := gocsv.NewScanner(os.Stdin)
for scanner.Scan() {

    scanner.UnmarshalString(scanner.Text(), &record)
    if err != nil {
        // process error
        continue
    }
    fmt.Println(record)
}


EDIT: After reading the source files, I made this, but I am still not sure that this should be the correct (only) way to access records one by one:

	c := make(chan logRecord)
	go processRecords(c)

	for {
		err = gocsv.UnmarshalToChan(os.Stdin, c)
		if err == io.EOF {
			break
		}
		if err != nil {
			fmt.Println(err)
		}
	}

Thank you!
Andrei

Streaming CSV

Any way to read headers and then start streaming or reading one record at a time ?

I'd like to do that as well on the write side, the idea is to be able to parse a file, filter it with little memory footprint, as only a few current rows would be kept around at any given time.

Is this something possible ?

make column optional

  • We have a situation where a column is present in some csv files, absent in others. UnmarshalCSV() gives an error if we try to parse file that is missing this column.
  • It errors out with following message wrong number of fields in line.
  • Is there any way we can make this column optional? If the column is not present, default value of this column should be used.

Proposal: Unmarshalling into pointer fields

Currently, unmarshalling pointers to built-in types (like *string) doesn't work without implementing an interface. This feature should support that natively.

Rationale

Unmarshalling into pointer types allows us to make fields optional; otherwise, unmarshalling a partial record would set zero values for the missing fields in the resulting struct, and you wouldn't know if it was because the field was blank or absent.

I have a scenario where I need to accept a subset of supported fields and update a database. I want to make most of the fields optional and only update database fields that are present in a given CSV file.

Example Struct

Given the following User struct:

type User struct {
    ID         int     `csv:"user_id"`
    Username   string  `csv:"username"`
    LocationID int     `csv:"location_id"`
    ExternalID *int    `csv:"external_id"` // optional
    Nickname   *string `csv:"nickname"`    // optional
}

Example Input

user_id,username,external_id
1,userA,1000
2,,

Notice that the location_id and nickname fields are absent.

Example Result

Any fields present in the CSV with empty values would receive a Go zero-value in the resulting struct. Any missing non-pointer fields would also receive a Go zero-value. Any missing pointer fields would retain a nil value.

[]User{
    {
        ID: 1,
        Username: "userA",
        LocationID: 0,
        ExternalID: (*int)1000,
        Nickname: nil,
    },
    {
        ID: 2,
        Username: "",
        LocationID: 0,
        ExternalID: (*int)0,
        Nickname: nil,
    },
}

How to deal with pointers?

I have structures with the format:

type Customer struct {
    Email      *string    `json:"email,omitempty" csv:"email"`
    FirstName  *string    `json:"firstName,omitempty" csv:"firstName"`
    LastName   *string    `json:"lastName,omitempty" csv:"lastName"`
}

The values are pointers so that they can be nil. Works fine with go's json marshaller but returns the error:

No known conversion from *string to string, *string does not implements TypeMarshaller nor Stringer

Read empty file โ‡’ panic

I just tried to deserialize an empty CSV file, and ๐Ÿ’ฅ !

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/gocarina/gocsv.readTo(0x7f35f4f833c8, 0xc82000a7b0, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/decode.go:73 +0x984
github.com/gocarina/gocsv.Unmarshal(0x7f35f4f833a0, 0xc82002c028, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/csv.go:121 +0xbb
github.com/gocarina/gocsv.UnmarshalFile(0xc82002c028, 0x5497a0, 0xc82000e820, 0x0, 0x0)
        /home/liam/go/src/github.com/gocarina/gocsv/csv.go:106 +0x6e

I think this should return an error rather than causing a panic.

        headers := csvRows[0]
        body := csvRows[1:]

should probably be preceded by:

if len(csvRows) == 0 {
  return errors.New("header row not found")
}

I"ll open a PR soon.

Embedded struct pointers do not marshal

Given the types from sample_structs_test,

type Sample struct {
	Foo  string  `csv:"foo"`
	Bar  int     `csv:"BAR"`
	Baz  string  `csv:"Baz"`
	Frop float64 `csv:"Quux"`
	Blah *int    `csv:"Blah"`
	SPtr *string `csv:"SPtr"`
	Omit *string `csv:"Omit,omitempty"`
}

type EmbedSample struct {
	Qux string `csv:"first"`
	Sample
	Ignore string  `csv:"-"`
	Grault float64 `csv:"garply"`
	Quux   string  `csv:"last"`
}

we know that this will work correctly:

sampleA := Sample{"hellofoo", 2, "hellobaz", 52.0, nil, nil, nil}
a := EmbedSample{"hello", sampleA, "helloignore", 42.0, "helloquux"}
sampleObjects := []EmbedSample{a}
resultNonPtr, err := gocsv.MarshalBytes(sampleObjects)

However, if you change the EmbedSample Sample field to *Sample (perfect valid within go), then:

sampleA := Sample{"hellofoo", 2, "hellobaz", 52.0, nil, nil, nil}
a := EmbedSample{"hello", &sampleA, "helloignore", 42.0, "helloquux"}
sampleObjects := []EmbedSample{a}
resultWithPtr, err := gocsv.MarshalBytes(sampleObjects)

resultWithPtr is missing all of the Sample struct's items and differs unexpectedly from resultNonPtr.

Give simple synchronized/blocked version of for each function to iterate the reader

UnmarshalToCallback first create a channel , start a goroutine to fill the channel, then a loop to read from the channel, and call callback function to handle the object.
I think it's users choice to decide whether to handle the whole process in synchronize or async.

So just give us a function like filepath.Walk, so we can read the csv in streaming, and with a callback that we can returns error to stop the iterate.

Working with map

Would it be possible to create a map of two fields. For e.g.

client_id,client_name
4,Jose
2,Daniel
5,Vincent

I'm looking to create a map[client_id]client_name

Of course, after writing it into a struct, I could create a map. But it would be easier and simpler if gocsv could also do this automatically. Any thoughts?

structMap was useless

in reflect.go:

func getStructInfo(rType reflect.Type) *structInfo {
	structMapMutex.RLock()
	stInfo, ok := structMap[rType]
	structMapMutex.RUnlock()
...

I think you should add WLock and assignment structMap[rType]?
The strcutMap was useless in current version.

interface is not handled correctly

Consider the following example:

package main

import (
	"github.com/gocarina/gocsv"
	"fmt"
	"encoding/json"
)

type Foo struct {
	Id    int			`json:"id" csv:"id"`
	Value interface{}	`json:"value" csv:"value"`
}

func main() {
	foo := Foo{Id: 1, Value:"xyz"}

	out, err := gocsv.MarshalString([]Foo{foo})
	if err != nil {
		panic(err)
	}

	fmt.Println(out)

	bytes, err := json.Marshal(foo)
	if err != nil {
		panic(err)
	}

	fmt.Println(string(bytes))
}

Output:

id,value
1,

Expected:

id,value
1,xyz

E.g. json tags resulted in expected behaviour:

{"id":1,"value":"xyz"}

handle null sql interfaces

I'm trying to use MarshalString with a struct slice comprising of sql.Null interfaces but it outputs a csv with all the fields of the sql.Null interface (so for each value it has the type, isNull, and the column value). How would I go about handling that?

First column cannot be populated

csv data:

number,column_a,column_b
1,a,b
2,c,d
3,e,f`

my struct:

type Whatever struct{
     JustNumber   int     `csv:"number"`
     ColumnA      string  `csv:"column_a"`
     ColumnB      string  `csv:"column_b"`
}

the result will be:

{0   a   b}
{0   c   d}
{0   e   f}

but if csv data like this:

,number,column_a,column_b
,1,a,b
,2,c,d
,3,e,f`

the result will be:

{1   a   b}
{2   c   d}
{3   e   f}

why first column cannot be populate to struct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.