dehesa / codablecsv Goto Github PK
View Code? Open in Web Editor NEWRead and write CSV files row-by-row or through Swift's Codable interface.
License: MIT License
Read and write CSV files row-by-row or through Swift's Codable interface.
License: MIT License
Hi! I'm looking for the correct configuration to handle a CSV file that has an empty line after the header, like so:
"Item Code","ItemStatus"
"ABC","In Stock"
"DEF","Unavailable"
The callsite looks like this:
let decoder = CSVDecoder { config in
config.headerStrategy = .firstLine
}
let items = try decoder.decode([Item].self, from: csvString)
// items == []
I've used various combinations of delimiters.row = "\n"
, escapingStrategy = "\n"
, and trimStrategry = .whitespaces
, but the decoder either throws an error or returns an empty array. Is there a way to ignore empty lines?
When using CodableCSV to load user-provided CSV files, one currently needs to ask the user which field delimiter is used in their file.
It would be nice if CodableCSV had an option to automatically infer the field delimiter from the provided file.
I saw that this feature is on the roadmap, along with row delimiter detection and header detection. There are also already some references to it in the code, with the idea to use auto-detection when the field delimiter is set to nil
in the reader's configuration.
I'd be happy to contribute this feature. My idea was to port the dialect detection code from the CleverCSV Python library to Swift.
An alternative would be to use the library directly, however that would introduce a dependency to the project, and, more importantly, I'm not quite sure how good Swift's support is for calling Python code. I guess it wouldn't work on iOS, for example?
@dehesa what do you think?
After decoding the CSV file I noticed that many objects created by the decoder are still in memory. After a quick look, it looks like there is a retain cycle here that causes a memory leak:
DecodingRecordOrdered -> decoder -> chain -> state -> DecodingRecordOrdered
I think one of those references needs to be weak to break this cycle.
A clear and concise description of what the bug is.
Using using CSVDecoder.Lazy() on a CSV file with CRLF line endings, and the delimiter is not configured to be CRLF (the default is only '\n'
), decoding rows will silently fail.
Steps to reproduce the behavior:
Either throw an error when an invalid row delimiter is encountered, or successfully parse both \n and \r\n line endings without additional configuration.
This is the call stack from lazy decoding down to where the internal error is thrown. In between, the error is eaten by a try?
and the Lazy iterator terminates without error, despite not processing the CSV.
#0 0x0000000102e0ba48 in CSVReader._parseEscapedField(rowIndex:escaping:) at CodableCSV/sources/imperative/reader/Reader.swift:278
#1 0x0000000102e091a0 in CSVReader._parseLine(rowIndex:) at CodableCSV/sources/imperative/reader/Reader.swift:165
#2 0x0000000102e0a1f8 in CSVReader.readRow() at CodableCSV/sources/imperative/reader/Reader.swift:112
#3 0x0000000102ddb160 in ShadowDecoder.Source.isRowAtEnd(index:) at CodableCSV/sources/declarative/decodable/internal/Source.swift:109
#4 0x0000000102dc1678 in CSVDecoder.Lazy.next() at CodableCSV/sources/declarative/decodable/DecoderLazy.swift:48
I'm trying to access columns property but it seems it's not available at all.
` let result = try CSVReader(input: url) {
$0.encoding = .utf8
$0.delimiters.row = "\r\n"
$0.headerStrategy = .firstLine
$0.trimStrategy = .whitespaces
}
let columns = result.columns `
Delete section if not applicable
From the README:
The previous example will work if the CSV file has a header row and the header titles match exactly the property names (name, age, and hasPet). A more efficient and detailed implementation:
struct Student: Decodable { let name: String let age: Int let hasPet: Bool init(from decoder: Decoder) throws { var row = try decoder.unkeyedContainer() self.name = try row.decode(String.self) self.age = try row.decode(Int.self) self.hasPet = try row.decode(Boolean.self) } }
What makes this implementation more efficient than the implementation listed above it?
struct School: Decodable { // The custom CSV file is a list of all the students. let people: [Student] } struct Student: Decodable { let name: String let age: Int let hasPet: Bool }
Are there benchmarks to show the implementation with init(from:)
is more efficient? Should I implement all of my Codable
structures with a manual init(from:)
?
It is just that.
CocoaPods says:
[!] The platform of the target `<TARGET_NAME>` (macOS 10.15) is not compatible with `CodableCSV (0.4.0)`, which does not support `macOS`.
Is there any specific reason for that?
Could you perhaps update the podspec?
Hey @dehesa ๐
I am fairly new to this package and I have a question.
I want to skip a column during export and import.
Export: Given a CSVEncoder
and struct Pet
struct Pet {
let name: String
let age: Int
}
let pets = ...
let encoder = CSVEncoder { $0.headers = ["name", "age"] }
let data = try encoder.encode(pets)
Is it possible to skip a particular column, that is, encode only a single column "name" into a csv file?
Import: Given a CSVDecoder
,
let decoder = CSVDecoder()
let result = try decoder.decode([Pet].self, from: data)
Can I import data
into an array of Pet
, if data
does not contain an age
column (and perhaps give it a default value if the column does not exist)?
Many thanks for your help! ๐
Roman
Hi all. Not sure if this is pilot error or if its a bug but it appears that the last column in our CSV consistently decodes to blank. We've got a correct header line and I'm using a .firstLine strategy. Have also confirmed that my data model has the same number of columns as vars. The only solution to fix this appears to be using a dummy column at the end.
Delete section if not applicable
With the new Combine framework, one can chain the CSVDecoder.
conform to protocol TopLevelDecoder/Encoder
see examples from other Codable Frameworks such as XMLCoder and YAMS.
How do I configure CodableCSV to omit selected properties from the object when creating the CSV file.
Model objects contain fields that are Codable but which I do not wish to have included in the CSV file. I have not located a method in this package that enables me to do this, though perhaps it would simply omit any properties for which there was no matching label in the header row. (That would be a user friendly and simple way to provide this functionality if it does not exist.)
I have a large csv file (> 400,000 lines) which is too big to decode in one blob, so I loop through each line of the file by calling readLine(), and then for each line:
convert the line to Data
let obj = try decoder.decode([Obj].self, from: data).first
It seems that if you call decoder.decode repeatedly that it leaks memory. It looks like the allocation of Buffer() from the initialiser of CSVReader has a retain cycle.
I haven't yet had time to dive into this - am wiling to do so if no other advice. See attached memory graph.
Having an extra comma in a data line (which is usually caused by the CSV creator failing to quote a field) causes that line and all subsequent lines to fail to parse. Having an extra comma at the end of the header line causes all subsequent data lines to fail to parse.
Please see the attached test file (it is really a .swift file, but I changed the extension to .txt in order to attach it).
DecodingBadInputTests.txt
Both of these situations (additional commas in either header or data line) are forbidden by rfc4180, so I would expect an exception to be raised.
I encountered both of these instances of ill-formed CSV in files I downloaded from my banks. I'm using CodableCSV in a Swift app I've written to take the differently formatted CSV from each bank and create a standard format which I then import into a spreadsheet for further analysis.
A clear and concise description of what the bug is.
Decoding a CSV file with CRLF line endings fails with an error, if the last field in a row is quoted.
The error:
Invalid input
Reason: The last field is escaped (through quotes) and an EOF (End of File) was encountered before the field was properly closed (with a final quote character).
Help: End the targeted field with a quote.
Steps to reproduce the behavior:
Using a CSV file with CRLF line endings (url
), decode with this:
let decoder = try CSVDecoder(configuration: {
$0.encoding = .utf8
$0.bufferingStrategy = .sequential
$0.headerStrategy = .firstLine
$0.trimStrategy = .whitespaces
$0.delimiters.row = "\r\n" // or "\n", also fails
}).lazy(from: url)
No error
This was introduced in v0.6.6
In a large scale streaming situation, the csv is being used to 'chunk' rows. I'd like to be able to pass in headers, but not send them to the CSV (since the header is already out there).
Is this possible? I can't use CodingKeys - because they are already being used as 'string' for a JSON decoder.
I've been trying to find a form of 'Lazy' where I could 'flushEncoding()' which reset the rows and left a usable lazy encoder.
Keeping a root
encoder and making a new lazy() as needed also works great, except lacking the ability to suppress the header after the 1st lazy instance. (any way I've tried removing it 'after' the fact breaks CodingKeys lookup - as expected)
With the default config, how can I escape commas and line returns within a field in order to ensure the resulting CSV is readable?
I'm hacking together an app with Swift and Xcode and I'm a complete novice. To provide the app with some basic data I have provided it with csv files, which are parsed with CodableCSV. Many thanks for the package!
Using basic data it's working fine. I have tried not to fiddle with the configuration. Delimiters are commas and end of line is "\r".
However, for one of my tables I need now to expand one of the fields to include sentences or even paragraphs of text, which contain commas and newlines. Initially I understood from the documentation that the way to do this is to enclose the whole field in double quotes ("..."). That crashed the app and so did escaping the individual offending characters with double quotes (",) or with a backslash (,).
Many thanks for any pointers!
Example extract of table:
id,title,introduction
reg,Regular Models,"The good news for learners of Spanish..."
irr_i,Essentials I,
irr_ii,Essentials II,
Error message:
CodableCSV/Reader.swift:75:` Fatal error: 'try!' expression unexpectedly raised an error: [CSVReader] Invalid input
Reason: The targeted field parsed successfully. However, the character right after it was not a field nor row delimiter.
Help: If your CSV is CRLF, change the row delimiter to "\r\n" or add a trim strategy for "\r".
User info: Row index: 1, Field: The good news for learners of `Spanish...
Hey @dehesa!
I think I was too slow, it looks like you already implemented the sequential
buffering strategy for 0.5.2. I was taking some time to learn about the Decoder protocol internals.
What I learned is that it's possible to decode an UnkeyedDecodingContainer
into any sequence without buffering. ShadowDecoder.UnkeyedContainer
seems to do a good job of iteratively decoding each item.
The README demos decoding into a preallocated array.
let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
let content: [Student] = try decoder([Student].self, from: URL("~/Desktop/Student.csv"))
Instead of an Array, I created a custom sequence wrapper. With the added benefit of customizing how the result is wrapped. I had my ๐ค that AnySequence
was Decodable
, but it's not.
class DecodableSequence<T: Decodable>: Sequence, IteratorProtocol, Decodable {
private var container: UnkeyedDecodingContainer
required init(from decoder: Decoder) throws {
container = try decoder.unkeyedContainer()
}
func next() -> Result<T, Error>? {
if container.isAtEnd {
return nil
}
// or could use a try! here
return Result { try container.decode(T.self) }
}
}
Then:
let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
let url = URL(fileURLWithPath: "Student.csv")
let results = try decoder.decode(DecodableSequence<Student>.self, from: url)
for result in results {
print(try result.get())
}
Any thoughts on this technique or Alternatives? Would a sequence wrapper like this be useful to include as part of the library?
Thanks!
@josh
A clear and concise description of what the bug is.
The pod name "CodableCSV" is occupied by a different project with the same name as this one: https://github.com/pauljohanneskraft/CodableCSV.
Following the pod install instructions for this repo will fail.
The error message from pod is:
[!] CocoaPods could not find compatible versions for pod "CodableCSV":
In Podfile:
CodableCSV (~> 0.6.1)
None of your spec sources contain a spec satisfying the dependency: `CodableCSV (~> 0.6.1)`.
Performing a pod search CodableCSV
:
-> CodableCSV (0.4.0)
CodableCSV allows you to encode and decode CSV files using Codable model types.
pod 'CodableCSV', '~> 0.4.0'
- Homepage: https://github.com/pauljohanneskraft/CodableCSV
- Source: https://github.com/pauljohanneskraft/CodableCSV.git
- Versions: 0.4.0, 0.2.0, 0.1.1 [trunk repo]
Steps to reproduce the behavior:
Add pod "CodableCSV", "~> 0.6.1"
to Podfile, as per the readme.
Perform:
pod install
This package will be installed by pod.
I needed to support iOS 10 in my app but the library only supports iOS >= 12. So, I had to replace the library.
But I was curious to know why the library requires iOS 12. So I added the source code directly to a test project that targets iOS 10, and it builds successfully!
It looks like the library actually supports iOS 10 as it is now and no need for extra work. I suggest changing the requirements for the library to the minimum version of every OS to allow it to be used in a wider range of projects.
I am a fan of the concept of Codable declarative CSV parsing, but am running into the edges of it a little with my current use case. I'm parsing a nutrient database (a UK public health source), and in their dataset they either offer a floating point value for a quantity of a nutrient, or special codes representing trace amounts: e.g. they use "N" to represent "significant but unmeasured quantity" or "Tr" to represent "trace amounts".
Here's an example subset of an input:
Water (g),Protein (g),Fat (g),Carbohydrate (g),Energy (kJ) (kJ),Starch (g),Total sugars (g),Glucose (g)
76.7,2.9,15.2,0.8,625,Tr,0.8,0.1
9.7,1.3,1.2,Tr,67,0.0,Tr,0.0
84.2,0.2,0.1,Tr,7,0.0,Tr,0.0
93.4,4.0,0.7,0.4,100,Tr,0.3,0.1
8.5,6.1,8.7,N,N,N,N,N
In my use case, I'd basically like to ignore N or Tr values (defaulting them to 0 in the parsed type, maybe), but the parser throws an exception and exits when it encounters a non-parseable Double value.
Similar to the customisation point for a Decimal parser, It'd be great if we could customise the parsing for types such as Double to be able to handle for edge cases in our input data. In my case I'd be able to Double cast values that aren't "N" or "Tr", and return 0.0 for those edge cases.
I've been able to resolve my issues using the imperative parser, or by pre-processing the CSV whenever I parse it, but it ceases to be a nice declarative interface at that point (and requires loading the whole thing into memory, as my old SwiftCSV implementation did).
The Decimal parser option works, but results in Decimal values - in my case I want simple Doubles.
When a csv file contains a string with a leading -
, +
, or =
, Excel will treat it as a formula field and throw an error.
Love an option to auto-detect these leading characters and escape them properly with a single leading single-quote '
.
trimStrategy characters are not trimmed from a quoted string field.
Use a CSVDecoder with trimStrategy = .whitespaces
and a CSV like:
Name,Value
" Foo ","1"
The Name field is parsed as " Foo " - spaces are not trimmed from the string.
Characters in trimStrategy trimmed from the result
Sorry if this is available already, but couldn't find it in sources except in CSVWriter, but not CSVEncoder or in README.
Simply, for a live time series serialization would want to append new data to the URL instead of overwrite it with a bufferingStrategy of sequential (but can also see where users wouldn't want it to append, so should be a separate strategy).
Could I help you supporting Encodable?
When I change the floatStrategy as .convert I get an fatal error in this code:
case .throw: throw CSVEncoder.Error._invalidFloatingPoint(value, codingPath: self.codingPath)
case .convert(let positiveInfinity, let negativeInfinity, let nan):
if value.isNaN {
return nan
} else if value.isInfinite {
return (value < 0) ? negativeInfinity : positiveInfinity
} else { fatalError() }
}
So with either strategy I either get the thrown error or a fatal error if a valid Double is attempted to be encoded. Is this expected or is there another configuration I'm missing?
Similarly, when I try to decode a double, I get an error thrown, but when I decode it as a string and convert the string to a double in my structs init(from: Decoder) I process the field correctly.
Encoding a struct with a nil value results in an infinite loop.
Steps to reproduce the behavior:
import Foundation
import CodableCSV
struct Employee: Encodable {
let id: Int
let name: String
let supervisorId: Int?
enum CodingKeys: String, CodingKey, CaseIterable {
case id = "Employee ID"
case name = "Name"
case supervisorId = "Supervisor ID"
}
}
let oneEmployee = Employee(id: 1, name: "Roy", supervisorId: nil)
let encoder = CSVEncoder {
$0.headers = Employee.CodingKeys.allCases.map(\.rawValue)
}
let result = try! encoder.encode([oneEmployee], into: String.self)
print(result)
I'm actually not sure how a nil value is represented in CSV but I think you just put nothing in between the two commas. In any case, this should at least throw an error instead of entering into an infinite loop.
Configuration.trimStrategry
should be Configuration. trimStrategy
When trying to encode a single object, the following error occurs:
struct Student: Encodable {
let name: String, age: Int?, country: String?, hasPet: Bool?
}
let student = Student(name: "Marcos", age: 1, country: "Spain", hasPet: true)
let encoder = CSVEncoder { $0.headers = ["name", "age", "country", "hasPet"] }
let result = try encoder.encode(student, into: String.self)
print(result)
[CSVEncoder] Invalid coding path
Reason: The coding key identifying a CSV row couldn't be transformed into an integer value.
Help: The provided coding key identifying a CSV row must implement `intValue`.
User info: Coding path: [CodingKeys(stringValue: "name", intValue: nil)], Key: CodingKeys(stringValue: "name", intValue: nil)
It works fine as long as the object is wrapped in an array, however. Browsing the source for a few minutes didn't make it clear to me why this is the case.
A clear and concise description of what the bug is.
Steps to reproduce the behavior:
A clear and concise description of what you expected to happen.
Package.swift
file (or Package.resolved
file). Alternatively, go to Xcode's Source Control Navigator (โ+2
) and click on CodableCSV
.Add any other context about the problem here (or delete this section if it is unnecessary).
given a row like:
20 May 2021,"some description","$1,090"
the "," in "$1,090" is treated as a delimiter when it shouldn't be because its between the quotes.
do {
try parsedResults = CSVReader.decode(input: row)
} catch {
print("ERROR")
}
A clear and concise description of what you expected to happen.
Package.swift
file (or Package.resolved
file). Alternatively, go to Xcode's Source Control Navigator (โ+2
) and click on CodableCSV
.Add any other context about the problem here (or delete this section if it is unnecessary).
How can i install on linux? I can't make it work
Add any other context about the question here (or delete this section if it is unnecessary).
Delete section if not applicable
It is currently possible to create an invalid CSVWriter.Configuration
by supplying nil
as a field- or row-delimiter. nil
means "infer the delimiter from the CSV data", which only makes sense for the CSVReader
. This error is reported at runtime.
I'd suggest having separate Delimiter.Pair
types for CSVReader.Configuration
and CSVWriter.Configuration
so that we can prevent invalid configuration at compile-time. The Delimiter.Pair
for the writer's configuration would simply not have an API for specifying inference.
Alternatively we can keep it as it is currently, and raise a run-time error when inference is requested from the CSVWriter
. This does spare us from having two very similar Delimiter.Pair
types.
If we add a more explicit API for delimiter inference, as suggested in #44, I think this might become even more important, as the auto-completion will otherwise include .infer
and multiple overloads of .infer(options:)
in its suggestions, which would be quite confusing in the context of the CSVWriter
.
Question
Hi all. Not sure if this is pilot error or if its a bug but it appears that the last column in our CSV consistently decodes to blank. We've got a correct header line and I'm using a .firstLine strategy. Have also confirmed that my data model has the same number of columns as vars. The only solution to fix this appears to be using a dummy column at the end.
System
OS: macOS 12.3.1, Xcode 12.3
CodableCSV 0.6.7
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.