Coder Social home page Coder Social logo

csvimporter's People

Contributors

chrisleversuch avatar christiansteffens avatar jbehrens94 avatar jeehut avatar phoney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csvimporter's Issues

Swift Package Manager support broken

When I try to install using SPM I get

warning: target 'Supporting Files' in package 'CSVImporter' contains no valid source files
'CSVImporter' project/.build/checkouts/CSVImporter.git--809956689264780610: error: invalid target name at 'Tests/Code'; name of test targets must end in 'Tests'
'JIRATool' project: error: product dependency 'CSVImporter' not found

CSV custom parsing

As of now, by default, the library takes the first row as the header row. I need to parse the CSV in such a way that the second row should be taken as the Header row. Is there any way to do that?
Also, is there any way to skip unwanted rows while parsing?

Would like to see init with NSURL

Apple is recommending that files be referenced using fileURLs. It would be helpful to have an init method that takes a fileURL. Something like this, or it could also validate that the URL is a fileURL.

init(url:NSURL, delimiter: String = ",") {
   self.init(path: url.path!, delimiter:delimiter)
}

Add Codable support

Hey there,
would be great to be able to use native way of decoding as an alternative data mapping.
Other libraries like CSV.swift and CodableCSV already do this, but they lack of stream based data import.

Is it safe to assume that no more frameworks will be imported

I really like Carthage but there are some libraries that are not supported yet.
I use Carthage for frameworks that won't import other frameworks and cause issues with Cocoapods.

I saw that you only imported handy swift, is it safe to assume that you will not import other frameworks in the future?

Reporting parsing errors

Is there any way to skip a row or report a parsing error (aborting parsing) if a field can't be parsed? Can I return an optional mapped record?

Any way to get the column headers before parsing the rest?

I'd like to automatically determine the type of CSV file I'm importing by examining the first line. Is there any way to get the first line as [String] so that I can test their values (without parsing the whole file)?

Ideally, I could then parameterize the record type as part of the startImport call, but I can also just instantiate two different importers.

Debug logs or Error Propagation

Error propagation or even any feedback on errors while reading files is really lacking.
I'm looking at a tight deadline right now so I can't at the moment, but I'll try to get back to it eventually and submit a pull request.

Code Signing Fails in XCode 10

I have the same issue with CSVImporter as another user did with with another framework installed with Carthage (alejandro-isaza/Upsurge#97). After archiving, I go to upload to the AppStore. If I chose to manually select the Provisioning Profiles this screen shows.

no provisioning profile required

However, whether or not I use automatic or manual management of certificates, when I go to validate/distribute I get the error "Code signing "CSVImporter.framework" failed."

code signing error

In the above examples, I removed the .dSYM files from the Build Phases, per the advice from a stack overflow page (https://stackoverflow.com/questions/34529583/couldnt-find-platform-family-in-info-plist-cfbundlesupportedplatforms-or-mach-o). However, when I make sure to keep the .dSYM file in the "Copy Files" of the "Build Phases" and follow the instruction from the Carthage site for adding frameworks to a project (https://github.com/Carthage/Carthage#adding-frameworks-to-an-application), I get the following error:

"Couldn't find platform family in Info.plist CFBundleSupportedPlatforms or Mach-O LC_VERSION_MIN for CSVImporter"

cfbundlesupportedplatforms error

Any help with this would be most appreciated.

Thanks!

Swift 3.1 compile errors

Admitted novice here.

I've been using CSVImporter in my project for a while now without issue. Tried to make a new build today and ran into the following error:
Module compiled with Swift 3.0.2 cannot be imported in Swift 3.1
For the line:
import CSVImporter

So I re-downloaded the CSVImporter as well as HandySwift, opened the .xcworkspace in Xcode, and made new frameworks. I replaced the old frameworks in my project with the new ones and this fixed the previous error but introduced new ones:
Cannot specialize non-generic type 'module<CSVimporter'
Use of undeclared type 'CSVimporter'
For the line:
var fileURLImporter = CSVImporter<[String: String]>(url: destinationFileUrl)

This now seems to be well above my head. Is this a problem with the CSVImporter code, or is this my issue? Any guidance would be appreciated.

Could not install with CocoaPods

This is my Podfile

# Uncomment the next line to define a global platform for your project
# platform :ios, '9.0'

target 'My Awesome App' do
  # Comment the next line if you're not using Swift and don't want to use dynamic frameworks
  use_frameworks!

  # Pods for Insane Diet
  pod 'CSVImporter', '~> 1.4'

end

and this is the output

$ pod install                                                                                                                                                                                                                                                                                                     Analyzing dependencies
[!] Unable to find a specification for `CSVImporter (~> 1.4)`

I see that even a pod search csvimporter does not return any available pod.

Cannot init CSVImporter

I have a delimiter as semicolon (;).

I couldn't find a method to simply init the CSVImporter class with a method like:
init(path:String,delimiter:String)

So I went for following:

let importer = CSVImporter.init(path: path, delimiter: ";", lineEnding: .newLine, encoding: .utf8, workQosClass: .default, callbacksQosClass: nil)

But compiler complains as:

screen shot 2018-03-21 at 11 40 50 am

"Fatal error: Index out of range" when both ',' and '\n' are within a quoted cell

I get an index out of range error when I try to import a CSV file which has both a newline and a comma within a single (quoted) cell. It does not crash when I only have a newline, or I only have a comma.

See this Swift Playground for a full reproduction of the error. The failing code is also copied below:

let csvFileWithCommaAndNewline = "Column 1,Column 2,Column 3\n"
    + "cell 1.1,cell 1.2,cell 1.3\n"
    + "cell 2.1,\"cell 2.2, with comma and \n newline\",cell 2.3"

let importerWithCommaAndNewline = CSVImporter<Row>(contentString: csvFileWithCommaAndNewline)
importerWithCommaAndNewline.importRecords(structure: {print($0)}, recordMapper: {
    return Row(cell1: $0["Column 1"]!, cell2: $0["Column 2"]!, cell3: $0["Column 3"]!)
})

Speed and Memory Usage Questions

Hey Flinesoft! Thank you for posting CSVImporter! It has been a great learning experience for me. But I have some questions and observations.

First, what I was looking for: I recently worked on a Python app for a client to parse a 10 million line CSV file for importing to a mySQL database. I was pleased at how easy it was to write a Python script in less than a day to do this.

But then I got curious about how quickly could I do it in Swift. The first thing I noticed was there is no NSCSV-like framework! So a search of github was in order which led me to CSVImporter, which led to learning about Carthage and dependent libraries, etc. It was a day of drinking from the fire hose! :-) I picked CSVImporter over the other CSV projects in github because I wanted to use Swift, and because your readme rational for yet another CSV importer: "a really large CSV file".

Ok, Carthage installed, FileKit and HandySwift plugged into my XCode project and everything up and running (two days) and parsing my 10 million line CSV file.

Now onto the question/observation!

My Python CSV parser (which uses the "import csv" package) parses my 10 million line file in 7 seconds. My CSVImporter Swift application takes 14 minutes to do the same thing!

So, fired up Instruments and see that the readValuesInLine function is taking up all the time in NSRegularExpression. I moved the instantiation of startPartRegex, middlePartRegex, and endPartRegex out of the function and into the class init. This cut the time to just over 6 minutes. Good but not great. Instruments now shows that stringByReplacingOccurrencesOfString is the time sink.

Any suggestions on how to speed this up?

I assume that the Python cvs is just a wrapper around the good old libcsv written in C and as such does not handle unicode correctly whereas Swift String does handle unicode and this may be the problem: Swift string operations like stringByReplacingOccurrencesOfString are probably not optimal for parsing a 10 million line file.

Another observation is that my CSVImporter app memory usage steadily increases as it runs and takes over 10 gigabytes of RAM before it is done. Your readme notes claim that the file is read line by line instead of loading the entire string into memory. Are you sure about this? Your importLines function is doing:

for line in csvStreamReader
{
let valuesInLine = readValuesInLine(line)
closure(valuesInLine: valuesInLine)
}

My understanding of the Swift Sequence object is that it does load all of the elements into memory in order to provide the "for _ in object". This would certainly explain the ever increasing memory.

Oh, as to memory, your startImportingRecords appends all records onto importedRecords. I added my own "startReader" function that does not accumulate the records to avoid the huge array at the end. Even with this removed the app is still taking many gigabytes of RAM which convinces me even more that csvStreamReader is reading in the entire file.

Do you have any advice on how to make Swift comparable to Python for CSV parsing?

Thanks!

-Allan

Doesn't properly handle CRLF line endings

When reading files with CRLF line endings the file's aren't handled correctly. As an example the CSVImporter sample app unit tests demonstrate this problem. All of the Baseball files use Windows line endings. I just ran the unit tests. At the end of each test they print out the importedRecords. I see the below output for some of the tests. Notice the \r in the last team. The CRs shouldn't be in the results. I think that the underlying File class is set to use \n as the line separator.

Test Case '-[CSVImporter_iOS_Tests.CSVImporterSpec imports_data_from_CSV_file_with_headers]' started.
["yearID", "lgID", "teamID", "franchID", "divID", "Rank", "G", "Ghome", "W", "L", "DivWin", "WCWin", "LgWin", "WSWin", "R", "AB", "H", "2B", "3B", "HR", "BB", "SO", "SB", "CS", "HBP", "SF", "RA", "ER", "ERA", "CG", "SHO", "SV", "IPouts", "HA", "HRA", "BBA", "SOA", "E", "DP", "FP", "name", "park", "attendance", "BPF", "PPF", "teamIDBR", "teamIDlahman45", "teamIDretro\r"]

And another test has \r in several of the values.

Test Case '-[CSVImporter_iOS_Tests.CSVImporterSpec imports_data_from_CSV_file_with_headers]' started.
["yearID", "lgID", "teamID", "franchID", "divID", "Rank", "G", "Ghome", "W", "L", "DivWin", "WCWin", "LgWin", "WSWin", "R", "AB", "H", "2B", "3B", "HR", "BB", "SO", "SB", "CS", "HBP", "SF", "RA", "ER", "ERA", "CG", "SHO", "SV", "IPouts", "HA", "HRA", "BBA", "SOA", "E", "DP", "FP", "name", "park", "attendance", "BPF", "PPF", "teamIDBR", "teamIDlahman45", "teamIDretro\r"]
Progress: 1
Progress: 63
//...
Did finish import, first array: Optional(["H": "426", "SOA": "23", "SO": "19", "WCWin": "", "AB": "1372", "BPF": "103", "IPouts": "828", "PPF": "98", "3B": "37", "BB": "60", "HBP": "", "lgID": "NA", "ER": "109", "CG": "22", "name": "Boston Red Stockings", "teamIDretro\r": "BS1\r", "yearID": "1871", "divID": "", "FP": "0.83", "R": "401", "G": "31", "BBA": "42", "HA": "367", "RA": "303", "park": "South End Grounds I", "DivWin": "", "WSWin": "", "HR": "3", "E": "225", "ERA": "3.55", "franchID": "BNA", "DP": "", "L": "10", "LgWin": "N", "W": "20", "SV": "3", "SHO": "1", "Rank": "3", "Ghome": "", "teamID": "BS1", "teamIDlahman45": "BS1", "HRA": "2", "SF": "", "attendance": "", "CS": "", "teamIDBR": "BOS", "SB": "73", "2B": "70"])

Needless to say the unit tests don't really validate the expected values.

Different UTF encoding?

Hi,

how can I use CSV Importer with different UTF encodings e.g. UTF16

default: NSUTF8StringEncoding

Bye

Moe

Remote URL formats?

How do I use the builtin remote URL function?

let importer = CSVImporter<[String]>(url: sourceURL)

What format does the "url" take? I have asked a question on SO where someone asked to use a fileURL(?) which sounds like it still requires the CSV file to be parsed to be locally on the device.

I could use your framework in conjunction with separately downloading it from the remote source, I was simply hoping the framework could handle it all...

Add option for processing data in batches

Hey there again,
your library does a great jobs importing data line by line to prevent memory.
But it seems like there is no solution to process the imported lines in batches. Correct me if I oversaw this feature.

In my case, my CSV file is huge so adding all the data to an array exceeds the memory limit of some devices.

It would be good to have an option to set a batch size.
When the importer has imported as many elements as defined, a callback is fired where I can process the batch of processed elements (e.g. save them to a database) and free up memory, so the importer can continue.

Also there must be a onFinish callback that doesn't pass the whole array, because it could be the case that it doesn't fit into memory.

Traking progress via Progress Bar

Is there a good way to track the present progress of the importing as the importing is happening? Right now, it is only possible to see the number of lines present but in order for a UIProgressView to be added it needs an end result so a total sum of lines. In that case, we could simply divide the current number of lines with the total number of lines. However, I understand that CSVImporter is importing the file one line at a time making it hard for us to get the total until the end of the import. Is there any workaround for this?

Linux Support

Currently, this package doesn't build on Linux. For example, the current stable branch won't even start compiling when downladed through the SPM, since the source location apparently changed from Sources to Framework/Sources.

When using the 1.9.0 tag instead, the code won't compile:

/home/vagrant/.build/checkouts/-7124877466912694924/Sources/Code/FileSource.swift:37:39: error: cannot convert value of type 'Data' to type 'NSData' in coercion
            if let data = (fileHandle.readData(ofLength: chunkSize) as NSData).mutableCopy() as? NSMutableData {
                           ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/vagrant/.build/checkouts/-7124877466912694924/Sources/Code/CSVImporter.swift:235:13: error: use of unresolved identifier 'autoreleasepool'
            autoreleasepool {
            ^~~~~~~~~~~~~~~
/home/vagrant/.build/checkouts/-7124877466912694924/Sources/Code/CSVImporter.swift:235:29: error: closure use of non-escaping parameter 'closure' may allow it to escape
            autoreleasepool {
                            ^

If fixing this for Linux would be too hard (which would be understandable), I would recommend adding an explicit note to the README that Linux is unsupported and maybe even removing the Package.swift for the time being (it's currently broken anyway).

Import CSV

Hi,

I tried to import a CSV file and looks like this with print(record):

["O\0r\0d\0e\0r\0U\0u\0i\0d\0", "\0E\0x\0c\0h\0a\0n\0g\0e\0", "\0T\0y\0p\0e\0", "\0Q\0u\0a\0n\0t\0i\0t\0y\0", "\0L\0i\0m\0i\0t\0", "\0C\0o\0m\0m\0i\0s\0s\0i\0o\0n\0P\0a\0i\0d\0", "\0P\0r\0i\0c\0e\0", "\0O\0p\0e\0n\0e\0d\0", "\0C\0l\0o\0s\0e\0d\0\r\0"]

I tried it first with an other csv and everything worked smoothly. After each character \0 is added. How can I fix this?

Bye

Moe

Output full of "0\"

Thanks for your frameworks.

I have been trying to use your framework.

But I keep getting output I can not use, do you know why I get an output like this:

["O\0r\0d\0e\0r\0U\0u\0i\0d\0", "\0E\0x\0c\0h\0a\0n\0g\0e\0", "\0T\0y\0p\0e\0", "\0Q\0u\0a\0n\0t\0i\0t\0y\0", "\0L\0i\0m\0i\0t\0", "\0C\0o\0m\0m\0i\0s\0s\0i\0o\0n\0P\0a\0i\0d\0", "\0P\0r\0i\0c\0e\0", "\0O\0p\0e\0n\0e\0d\0", "\0C\0l\0o\0s\0e\0d\0\r\0"]

If you take the "0\" away it would be the output I need, why does this occur?

Can't find CSVImporter.framework

I'm now using Swift4.2. I want to install CSVImporter via carthage.
I add "github "Flinesoft/CSVImporter" == 1.9.1" in my Cartfile.
Then I input "carthage update --platform iOS" in command line, no errors occurs.
But I can't find CSVImporter.framework in /Carthage/Build/iOS. What am I doing wrong?

Doesn't properly handle empty lines in the data

I have a CSV file which contains large data including all types of special characters.
But if my file contains data in the format

"This is test Data
added for testing empty line in file next is empty line

This is end"

Then after finishing importing records I am not getting this record although it is present in my CSV file.

Hi I can't use importer library

my env is xcode 8 , swift 2.3(legacy swift check)

I have make sample project and i add importer library with cocoapods
but it didn't work

1

2

Publish 1.9.0 to Cocoapods

Hi,

It doesn't seem like the 1.9.0 podspec has been published to the Cocoapods trunk?

In case it helps, a sample command would be

git tag 1.9.0
git push --tags
pod repo push CSVImporter.podspec --swift-version='3.2' --allow-warnings --sources='https://github.com/CocoaPods/Specs'

Closures called on main queue

I noticed that the closures for completion come back on the main queue. I was hoping to have the callbacks on a background queue but I don't see any infra for that in the code or documentation. Is there a reason for this? If I were to just remove the DispatchQueue.main.async would you anticipate any adverse effects?

Issue Structuring line

Morning,

I'm receiving the following error.
Warning: Couldn't structurize line.

I log each record and it looks fine. Any idea on what the issue could be?

Thanks,

Jonathan

Automatically determine the encoding of the file

Hi there, again thanks for making this since it saves tons of time.

Could you point me in the code or explain how does the importer determine what type of encoding the file is in when importing. I need to somehow extract this information and not sure how to do that. Maybe you can give me a hint where to look. not a bug more like request for information. And is there actually an automatic encoding determination or am i misinterpreting things?

   ```

guard let csv = CSVImporter<[String: String]>(url: fileURL) else {
return
}

    csv.startImportingRecords(structure: { (headerValues) -> Void in
        print(headerValues)
        
    }) {$0}.onFinish {importedRecord in
        print(importedRecord)
        
    }

trim whitespace from headers

is there anyway that the structure lambda could transform the headers? e.g. my headers have whitespace around them and i'd like to trim that whitespace

Cant update to 1.8.0

Interesting but Cocoa pods doesnt pull build 1.8.0 via manual version specification or update or fresh install. :(

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.