Coder Social home page Coder Social logo

xlsx's Introduction

Xlsx

Hackage Build Status

Simple xlsx parser/writer, only basic functionality at the moment

Hacking

For all new changes please use the formatting rules provided by the included .stylish-haskell.yaml file. Older commits may still use old styling practices but these can be overwritten where necessary.

xlsx's People

Contributors

9999years avatar alanz avatar anarchist666 avatar awkure avatar bradediger avatar cohei avatar dten avatar dvekeman avatar edsko avatar emilaxelsson avatar flhorizon avatar goodlyrottenapple avatar heimdell avatar igrep avatar jappeace avatar luketemp avatar manojgudi avatar marcelozabini avatar mulderr avatar nikitarazmakhnin avatar parsonsmatt avatar qnikst avatar qrilka avatar romildo avatar shimuuar avatar smurphy8 avatar stla avatar taktoa avatar teofilc avatar timds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xlsx's Issues

Invalid zip archive error when opening xlsx file

Error

CallStack (from HasCallStack):
  error, called at src/Codec/Xlsx/Parser.hs:51:18 in xlsx-0.2.4-9SiA9fpnKskAS2JNdYtqrA:Codec.Xlsx.Parser

Link to file
http://fsaeonline.com/content/tblMaterials_2015.xls

Code

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Control.Lens
import           Codec.Xlsx
import qualified Data.ByteString.Lazy as L

main :: IO ()
main = do
    bs <- L.readFile "tblMaterials_2015.xls"
    let value = toXlsx bs ^? ixSheet "List1" .
              ixCell (3,2) . cellValue . _Just
    putStrLn $ "Cell B3 contains " ++ show value

Strange ordering in rows/cells

In Ampersand we gratefully use xlsx to import information from Excel sheets. One in a while however, we get strange results.

During diagnosis, I have discovered that reading excel sheets contains a bug. I have stripped an excel sheet to get a tiny example of a sheet that has this bug:

ExcelBug.xlsx

If I run the following haskell, the strange behaviour appears:

{-# LANGUAGE OverloadedStrings #-}
module Main where
import Codec.Xlsx
import qualified Data.ByteString.Lazy as L
import Control.Lens

main :: IO ()
main = do
  bs <- L.readFile "ExcelBug.xlsx"

  let value = toXlsx bs ^? ixSheet "Handelingen" .
              ixCell (3,2) . cellValue . _Just
  putStrLn $ "Cell B3 contains " ++ show value

As a result, I get: Cell B3 contains Just (CellText "55555")

As you can see in the excel sheet, cell B3 contains programmeren; coordineren.

This obviously is wrong.

cwStyle causes an Excel repair

Hello,

I tried to set column widths. The field cwStyle is mandatory. I don't know what it refers to. I set it to 1, so that I got

	<cols>
		<col max="4" min="1" style="1" width="30"/>
	</cols>

in my sheet1.xml. Excel does not like that, it has to repair the file:

Repaired Records: Column information from /xl/worksheets/sheet1.xml part

I took a look at the repaired file. Excel did several modifications but the important one is that it removed the style="1". Indeed, if I remove it:

	<cols>
		<col max="4" min="1" width="30"/>
	</cols>

then Excel opens the file without issue.

Sheet ordering in Spreadsheet

A spreadsheet is defined as

data Xlsx = Xlsx
    { _xlSheets :: Map Text Worksheet
    , _xlStyles :: Styles
    } deriving (Eq, Show)

Because the sheets are a Map, I cannot generate a spreadsheet having sheets in a specific order.

Update references from #36

Those were from the 2nd edition, it's better to use the latest 4th edition instead.
Also border elements need order slightly fixed

Doesn't work under windows.

I run parser under linux on file, and it went just fine. When operating same file (any one produces an error) under windows, it fails with the next message:

Xlsx {_xlSheets = fromList [("\1051\1080\1089\1090\&1",Worksheet {_wsColumns =
*** Exception: Maybe.fromJust: Nothing

I can't upload an xlsx here, but its has the next contents: [[123, 234], [what, the]] in a square formation.
The office has russian locale (that does that unicode in xlSheet name, its a Sheet1 in russian - Лист1).

toXlsx does not preserve the bold font in comments

Hello,

When I comment a cell in Excel, Excel automatically adds my name at the top of the comment, in bold font.

This is what I get in comment1.xml:

<commentList>
    <comment ref="A1" authorId="0">
        <text>
            <r>
                <rPr><b/><sz val="9"/><color indexed="81"/><rFont val="Tahoma"/><charset val="1"/></rPr>
                <t>Stéphane Laurent:</t>
            </r>
            <r>
                <rPr><sz val="9"/><color indexed="81"/><rFont val="Tahoma"/><charset val="1"/></rPr>
                <t xml:space="preserve">
                    hello</t>
            </r>
        </text>
    </comment>
</commentList>

When I read the xlsx file with toXlsx and when I rewrite it, without modifying anything, my name is not in bold font anymore.

This is the XlsxText I get when I read the file:

XlsxRichText
  [ RichTextRun
      { _richTextRunProperties =
          Just
            RunProperties
              { _runPropertiesBold = Nothing
              , _runPropertiesCharset = Just 1
              , _runPropertiesColor =
                  Just
                    Color
                      { _colorAutomatic = Nothing
                      , _colorARGB = Nothing
                      , _colorTheme = Nothing
                      , _colorTint = Nothing
                      }
              , _runPropertiesCondense = Nothing
              , _runPropertiesExtend = Nothing
              , _runPropertiesFontFamily = Nothing
              , _runPropertiesItalic = Nothing
              , _runPropertiesOutline = Nothing
              , _runPropertiesFont = Just "Tahoma"
              , _runPropertiesScheme = Nothing
              , _runPropertiesShadow = Nothing
              , _runPropertiesStrikeThrough = Nothing
              , _runPropertiesSize = Just 9.0
              , _runPropertiesUnderline = Nothing
              , _runPropertiesVertAlign = Nothing
              }
      , _richTextRunText = "St\233phane Laurent:"
      }
  , RichTextRun
      { _richTextRunProperties =
          Just
            RunProperties
              { _runPropertiesBold = Nothing
              , _runPropertiesCharset = Just 1
              , _runPropertiesColor =
                  Just
                    Color
                      { _colorAutomatic = Nothing
                      , _colorARGB = Nothing
                      , _colorTheme = Nothing
                      , _colorTint = Nothing
                      }
              , _runPropertiesCondense = Nothing
              , _runPropertiesExtend = Nothing
              , _runPropertiesFontFamily = Nothing
              , _runPropertiesItalic = Nothing
              , _runPropertiesOutline = Nothing
              , _runPropertiesFont = Just "Tahoma"
              , _runPropertiesScheme = Nothing
              , _runPropertiesShadow = Nothing
              , _runPropertiesStrikeThrough = Nothing
              , _runPropertiesSize = Just 9.0
              , _runPropertiesUnderline = Nothing
              , _runPropertiesVertAlign = Nothing
              }
      , _richTextRunText = "\r\nhello"
      }
  ]

I think that _runPropertiesBold should be Just True for the first line.

library cannot parse comment created by libreoffice calc

Attempt to parse XLSX file with comment which was created by libreoffcie 5.2.3 (from debian testing) results in

testcase: Safe.fromJustNote Nothing, authorId                                                                                         
CallStack (from HasCallStack):                                                                                                        
  error, called at ./Safe/Util.hs:18:44 in safe-0.3.9-61ab156570e2b884f7dd8f14bc7f08646c0c1ff9b7315fb92c2ac7d05ee27df0:Safe.Util

Here is file which triggers bug. Simply a single cell with comment
testcase2..xlsx

Worksheet -> [[Text]] function?

This is not really urgent, but I was thinking of adding xlsx support (in addition to csv) to the pandoc-placetable filter, so I'd need a Worksheet -> [[Text]] function.

I guess I could come up with the fold Map (Int, Int) Cell -> [[Cell]] myself somehow (although there might be a neater way using Lens.Traversal?), but how about a Cell -> Text function that preserves some styling for numbers (like number of significant digits, currency, scientific notation etc.)?

MS Excel 2010 has claims to generated by test executable test.xlsx file

Excel 2010 говорит что test.xlsx поврежден и после восставновления выдает пояснения: "Восстановленный компонент: часть /xl/worksheets/sheet1.xml с ошибкой XML. Разрушительный сбой Строка 1, столбец 3282483.".
OpenOffice Calc открывает без проблем и если пересохранить открытый файл в xlsx формате, то уже Excel его открывает без нареканий.

Support setting print options

Stuff like landscape and repeated rows/columns. Opening an issue only so that you know I'm working on this; PR to follow.

Parsec combinators?

I am writing some Parsec combinators for my own use on top of this.

Do you want a pull request for them when I am done? I am not sure if they belong in this library.

Use (x,y) coordinates instead of (y,x)

Even cell names in Excel itself are referred to using (x,y) (though it uses a slimmed down syntax and substitutes letters for numbers in the x coordinate).

On a cartesian plane (or anything similar), everyone is taught always that coordinates will be of the form (x,y) where x is the “run” and y is the “rise”. There are only a few cases out there which are exceptions; in fact, I can only think of two: this library and ncurses.

This would be a massive API change (so, I expect it would be for the next release or further down the road), but it would certainly be a good one to make in my humble opinion.

Switch to Data.Time instead of System.Time

Using the old-time package is becoming more and more of a chore. It would really be great to see this package transition over to using the more modern Data.Time from the time package.

Problem reading fonts

When run over the attached Excel file, the StyleSheet returned by parseStyleSheet shows only two Font structures when the xml from ex/xl/styles.xml shows five.
ex.xlsx

Add support for SheetViews

PR forthcoming for this as well.

(I'm submitting all these PRs in one go because I'm still polishing some of the code occassionally and don't want to bother you will a PR only then to subsequently send more PRs to change the code again.)

Cannot parse an empty workbook

Creating a new workbook and saving as 'Book0.xlsx', then executing

            toXlsx <$> Data.ByteString.Lazy.readFile "Book0.xlsx" 

in GHCi results in

*** Exception: MissingFile "xl/sharedStrings.xml"

Book0.xlsx

Perhaps move Lens functions to a separate package?

Including the lens library brings in a lot of dependencies and makes the package brittle to build.

I looked a the functions and types, and it seems that a base library that excludes lens would still be very useable, with lens only adding additional convenience functions.

Would it make sense for a more minimal base package that excludes lens, and then another package that adds lens functionality?

This kind of separation is also followed by Data.Aeson (aeson) and Data.Aeson.Lens (aeson-lens).

write file example to be repaired (Windows)

Hello,

On Windows, using Microsoft Excel 2010, this is what I get when I run this example and open the file:

capture1

capture2

<?xml version="1.0" encoding="UTF-8" standalone="true"?>

-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">

<logFileName>error027600_01.xml</logFileName>

<summary>Errors were detected in file 'C:\HaskellProjects\test_xlsx\example2.xlsx'</summary>


-<repairedParts summary="Following is a list of repairs:">

<repairedPart>Repaired Part: /xl/workbook.xml part with XML error. (Workbook) Catastrophic failure Line 1, column 323.</repairedPart>

<repairedPart>Repaired Part: /xl/worksheets/sheet1.xml part with XML error. Catastrophic failure Line 1, column 517.</repairedPart>

</repairedParts>

</recoveryLog>

Support cell comments and cell formulas.

I would like Cell to have two additional attributes: comment and formula. I need this for a project I am working on, and I am reading the code to figure how to implement them myself. If you could briefly describe how to go about adding these, that would save me much time. I hope to have this by Tuesday, April 26th, so if this is a simple change that you could complete by then, even better!

Thanks!

Release a new version to hackage?

It looks like this library has changed a lot since it was last released, almost 2 years ago. Any way it could get released again? Obviously it can be used from git, but it's more of a pain (and having docs on hackage is really convenient).

Three R packages cannot read the files created by xlsx

Hello @qrilka

You might be interested to know that three very good R packages fail to read the files created by your library: nacnudus/tidyxl#13 (tidyxl, openxlsx, and the current development version of readxl).
Since these files can be opened in Excel and LibreOffice without issue, I agree that these packages should be able to read them. But perhaps you could do something to make these files more standard. Since these packages fail to read these files, we can suspect that other packages/softwares fail as well.

Check `contents` use in other places

#68 is already the 2nd time when contents not accepting empty tag resulted in library error.
So it makes perfect sense to recheck other places where it's currently used

ExitFailure 11 from cabal install xlsx on Mac OS X

I just installed the haskell platform on this MacBook (OS X 10.6.8) and ran cabal install xlsx. After 20 minutes or so, it failed. I can't make much sense of the diagnostics.

I captured the full output.

Other relevant diagnostic info I can think of:

bash-3.2$ cabal --version
cabal-install version 1.16.0.2
using version 1.16.0 of the Cabal library 

bash-3.2$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3

Default values from the spec

Reading through the spec I found some default values that don't agree with those in StyleSheet.hs. I seem to be able to rationalize all but the Font defaults - they seem to not agree with empirical evidence. Please see my notes, notes.txt.

Further, I wonder if it might be worth considering modeling the record fields with default values without a Maybe type? Perhaps the parser could insert the proper default value upon detecting the missing tag.

Improve relationships support

As in #48 new relationship files were introduced it makes sense to move them into a separate entity with proper serialization and deserialization

Why can't we use the 0-th CellXf ?

Hello,

I wanted to create a xlsx file with a date column. And no other column, just to try.
In my first attempt (see below), I set the number format id in the 0-th component of the field _styleSheetCellXfs of the StyleSheet. That does not work: the xlsx file is created, but the numbers are not formatted as dates.

My second attempt (see below) was successful. I let the 0-th component of _styleSheetCellXfs as an empty CellXf, and I put the CellXf with the number format id in the 1-th component.

Is it something expected ? What is the role of the 0-th CellXf if we can't use it to define a style ?

First attempt:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
module Testdate where
import Codec.Xlsx
import qualified Data.Map.Lazy as DML
import qualified Data.ByteString.Lazy as L
import Data.Time.Clock.POSIX (getPOSIXTime)
import Control.Lens
import Data.Dates 
import Data.Dates.Formats (parseDateFormat)
import Data.Either.Extra (fromRight)

 -- convert a date given as "YYYY/MM/DD" to its corresponding Excel number value
excelDate :: String -> Double
excelDate date = fromIntegral $ datesDifference origin datetime 
    where origin = DateTime{year = 1899, month=12, day=30, hour=0, minute=0, second=0}
          datetime = fromRight $ parseDateFormat "YYYY/MM/DD" date

-- input some dates
dates = ["2017/10/01", "2017/10/02"]

-- input locations of the cells
locations = [(1,1), (2,1)]

-- create cells for these dates
emptyCell = Cell { _cellStyle = Nothing, 
                   _cellValue = Nothing, 
                   _cellComment = Nothing, 
                   _cellFormula = Nothing }
cells = map (\x -> set cellStyle (Just 0) $ 
                     set cellValue (Just $ CellDouble $ excelDate x) $ 
                       emptyCell) dates 

-- create worksheet
emptyWorksheet = Worksheet { _wsColumns = [], 
                             _wsRowPropertiesMap = DML.empty, 
                             _wsCells = DML.empty, 
                             _wsDrawing = Nothing, 
                             _wsMerges = [], 
                             _wsSheetViews = Nothing, 
                             _wsPageSetup = Nothing, 
                             _wsConditionalFormattings = DML.empty, 
                             _wsDataValidations = DML.empty, 
                             _wsPivotTables = [] }
worksheet = set wsCells (DML.fromList $ zip locations cells) emptyWorksheet

-- create styleSheet
emptyStyleSheet = minimalStyleSheet
emptyCellXf =  (_styleSheetCellXfs emptyStyleSheet) !! 0
cellXf = set cellXfApplyNumberFormat (Just True) $ 
           set cellXfNumFmtId (Just (stdNumberFormatId NfDMmmYy)) $ 
             emptyCellXf
stylesheet = set styleSheetCellXfs [cellXf] emptyStyleSheet

-- create xlsx 
emptyXlsx = Xlsx { _xlSheets = [], 
                   _xlStyles = emptyStyles, 
                   _xlDefinedNames = DefinedNames [], 
                   _xlCustomProperties = DML.fromList [] }
xlsx = set xlStyles (renderStyleSheet stylesheet) $ 
         set xlSheets [("Sheet1", worksheet)] emptyXlsx

--
main :: IO ()
main = do
  ct <- getPOSIXTime
  L.writeFile "mydatexlsx1.xlsx" $ fromXlsx ct xlsx

Correction:

cells = map (\x -> set cellStyle (Just 1) $ 
                     set cellValue (Just $ CellDouble $ excelDate x) $ 
                       emptyCell) dates 
stylesheet = set styleSheetCellXfs [emptyCellXf, cellXf] emptyStyleSheet

Missing columns bug

In some cases there are missing data in the parsed Excel sheet. To reproduce:

Get small example sheet

in ghci:

import Codec.Xlsx
import qualified Data.ByteString.Lazy as L
import qualified Data.Map as Map

e <- L.readFile "TNM.xlsx"
let lx = _wsCells $ snd $ head $ Map.toList $ _xlSheets $ toXlsx e
map (\c -> (Map.!) lx (2,c)) [2..6] -- this is ok
map (\c -> (Map.!) lx (5,c)) [2..6] -- column is missing

Compare the result of the last command to the actual Excel file. Column E is missing and all other data shifted left.

Support more per column properties

there is (not quite detailed) _wsRowPropertiesMap but for columns there are only withd in _wsColumns while <col> element supports more options (e.g. hidden or collapsed) and e.g. cwStyle doesn't look like something about column width

Reading/writing images

Would be wonderful to have a function which takes an image (FilePath) and embeds it in the xlsx, such that it is visible when opened by Excel. The function signature might look something like

addImage :: FilePath - > Dimensions -> Worksheet -> Worksheet

My understanding is that this image would be stored as a separate file in the xlsx, then referenced by pathname in the document XML. Ideally, there would also be functionality for reading images from an existing XLS? That is, the inverse function

retrieveImages :: Worksheet -> [(FilePath, Dimensions)]

Preserve Column Width formatting (in Parser and Writer)

I have now gotten to the point where I am programmatically generating a .xlsx correctly using this library, which is a great start! Now, I am needing to set some column widths so that the spreadsheet displays sensibly on opening in Excel.

When I attempt to set the _wsColumns record field in my Xlsx, everything compiles correctly and the .xlsx is created as-expected. However, when I open the file with Excel, I get the following error:

Excel found unreadable content in 'document_name.xlsx'. Do you want to recover the contents of this workbook? If you trust the source of this workbook, click Yes.

I assumed that I simply set the field incorrectly, so I took a copy of the file without the column data, opened it in Excel, set the column-widths to their correct sizes and saved it. I then attempted to parse in this corrected file to audit how it performed the column widths, but the _wsColumns record field for the read-in document is an empty list (which makes me think that the column width data was lost). Is this a bug in xlsx's writer/parser, a version incompatibility between the library and the version of Excel I am using (Office Pro 2010), or just a failure on my part to set the field correctly?

Is there documentation somewhere (the haddocks seem a bit lacking) of a minimal working exampe for setting column widths?

Generic instances

It would be nice if we could have Generic instances for all the datatypes in this library. This would make it easier for people using the library to define orphan instances (e.g.: for serialization).

In particular this would be useful because sometimes this library takes a long time (and a lot of memory) to parse large spreadsheets, so for development purposes I like to parse the spreadsheet and then serialize it to the disk, and then deserialize it every time I :r in GHCi. Here's the file I had to write to do this: gist.

You may also consider providing serialization instances (any of the standard libraries, binary/cereal/ store, should work) in this library proper, to enable such a workflow.

BTW: It is also often useful from an end-user perspective to provide instances of Data and NFData whenever possible. Generic is really an essential one though.

Replace <variant> with <vt:variant> ?

Hello,

The XLSX files created by the xlsx library cannot be read by the Javascript library js-xlsx nor by the R package readxl. The first one returns an error, the second one returns an error or totally crashes R.

I don't know the cause for readxl. I investigated a bit what happens with js-xlsx, and I found that it is looking for XML tags <vt:variant>...</vt:variant> while xlsx creates tags <variant>....</variant>. Maybe it needs vt:vector instead of vector as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.