Coder Social home page Coder Social logo

jferard / fastods Goto Github PK

View Code? Open in Web Editor NEW
36.0 7.0 6.0 4.94 MB

A very fast and lightweight (no dependency) library for creating ODS (Open Document Spreadsheet, mainly for Calc) files in Java. It's a Martin Schulz's SimpleODS fork

License: GNU General Public License v3.0

Java 99.80% Python 0.20%
java spreadsheet speed libreoffice calc ods opendocument od-files no-dependencies

fastods's People

Contributors

dependabot[bot] avatar jferard avatar juergen-albert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastods's Issues

Favor 'object composition' over 'class inheritance' in style builders

There are still a lot of class inheritance. A refactoring should limit the use of inheritance, since it's error prone.

EDIT: It is acceptable for styles, because it's easy to manage, but in the case of builders, there is a dirty F-bound trick to keep the fluent style alive.

TableCellStyle and DataStyle

Presently, HeavyTableRow uses a fake TableCellStyle to set the cell DataStyle. This exposes a <style:table-cell> definition per data style type in styles.xml common styles. These fake styles are visible for the LO user (type F11).

We need another way of setting a data style for a cell.

Create an in memory ZipUTF8Writer

To help test some features, it would be nice to have a precise view of what zip entries (content.xml, ...) contains. To do so, we need a ZipUTF8WriterTester with the same API as ZipUTF8Writer and the following methods:

String ZipUTF8WriterTester.getEntryAsString(String entryName)
org.w3c.dom.Document ZipUTF8WriterTester.getEntryAsDocument(String entryName)

Gather non unit tests in a misc folder

There are four fake tests:

  • src/test/.../OdsFileWithHeaderAndFooterCreation.java (kind of example)
  • src/test/.../OdsFileCreation.java (kind of example)
  • src/test/.../ProfileFastODS.java (for profiling)
  • src/bench/.../Benchmark.java (for benchmarks)
    All those files could be in a new src/misc/java folder (a misc would replace the bench one). The first two could have a ...Test.java ending to help running it from maven.

Allow page layout share between master pages

Currently, the PageStyle handle two tags:

  • style:master-page in styles.xml > master-styles
  • style:page-layout in styles.xml >automatic-styles
    If we split PageStyle into MasterPageStyle and PageLyout, a page layout could be shared between two master pages.

Footer and header builder are incomplete

There are two builders : RegionFooterHeaderBuilder that builds a RegionFooterHeader and SimpleFooterHeaderBuilder that builds a SimpleFooterHeader. None of them allows the user to set margins or min height.

Threading

Reminder:

  • one should write ods files sequentially;
  • LO needs the styles in content.xml to be defined before they are used (that means that a style-name attribute mut refer to a style previously defined);
  • basic use case of FastODS implies threading: one thread gathers data (e.g. performs a query on a database) and put it on a bus; one thread reads data from bus and writes it to file (this is the FastODS part).

Now, under some assumptions, it is possible to split FastODS work into two threads:

  • one thread builds HeavyTableRows and put it on a bus;
  • one thread reads the rows from bus and writes it to file.

These assumptions are:

  • all styles are defined when the first table row is written;
  • a row is never modified after it was put on the bus (there is a kind of flush)

I think there is only one way to discover if it will really speed up the file creation: try!

Improve Javadoc

The quality of the Javadoc is dropping. With the new JDK8 doclint, it does not even passes a basic mvn site.

An effort on the doc is necessary.

The example in the README is outdated

The API was modified a couple of times since the example was written. To avoid a desynchronization between README and API, the simple example should have his own test class in misc dir.

Add formula

It is sometimes useful to create formulas programmatically. It should not be too difficult to implement in HeavyTableColdRow.

Clarify styles destinations

There are four possible destinations for style definitions:

  • content.xml/automatic-styles
  • styles.xml/styles
  • styles.xml/automatic-styles
  • styles.xml/master-styles

LO seems to follow some kind of "group rule". E.g. A page layout is is styles.xml/automatic-styles, therefore all styles referenced in that page layout have to be in styles.xml/automatic-styles. If FastODS does not follow that rule, LO won't be able to read FastODS generated files.

See #24
See #19

Move StyleTag to StylesEntry

The StyleTag XML is written in content.xml > automatic-styles. That's the wrong destination, since FastODS does not allow anonymous styles. The output has to be moved to styles.xml > styles.

Handle duplicate styles at builder level

Most of the styles are created with a <Style>.builder(<name>). <some setters...> .build().

It should be possible to avoid name conflicts at that level, with a kind of styles dictionary (name -> style). Currently, name conflicts are not handled by FastODS: it is possible to have multiple styles with the same name in the document.

It could need some digging in the OASIS Standard to see what happens then, but it's better to avoid it.

What happens when a file cannot be saved?

Currently, there is no way to distinguish the following cases:

  • the file cannot be saved because the filename is a directory name
  • the file can be saved but the current file will be overwritten
  • the file can be saved without problem
    (I do not consider usual I/O problems).
    The library may be used in an application which has to handle the cases above (e.g. in the GUI), so its important to distinguish (exception or boolean return) thoses cases.

Use a specific FullList for raw types

A basic implementation of a list of bytes or ints with 0 as default value would probably be faster than an ArrayList<Integer> with auto boxing/unboxing, and would avoid the the problem of null values.
Need some profiling to determine if it worths it.

Fix copyright date

We're in 2017, the copyright date should be 2016-2017:

  • in the README.md
  • in the header of files

Merge Util and XMLUtil

Both classes have the same purpose: give helper methods for the library. The merge could be a facade pattern, with the two classes remaining the same, or a full merge.

Before merging, the argument order must be the same in every method or constructor: XMLUtil then Util.

Make ConfigItem accessible

The Table objects inits a bunch of ConfigItems, but there is currently no way to change their values.

A ConfigItems class, where each ConfigItem would be accessible by name, seems a good way to expose setters`:

ConfigItems.set("zoomValue", "60");

New cell pseudo-type text

Some cells may contain complex text, similar to the footer/header text. With new cell pseudo-type, it would be easy to do that.
String version:
TableCell.setStringValue("string")
Outputs a very short XML:

<table:table-cell office:value-type="string" office:string-value="string"/>

Text version (see syntax for builder here #14):
TableCell.setTextValue(TextValue.builder().<some methods to build the text value>.build())
Outputs a longer XML:

<table:table-cell office:value-type="string" calcext:value-type="string">
    <text:p>
        <text:span text:style-name="...">te</text:span><text:span text:style-name="...">xt</text:span>
    </text:p>
</table:table-cell>

Move object type inference in a specific class

The use of TableCellWalker.setObjectValue should be avoided. FastODS has to infer the "ods type" of the object, what is not always obvious.

Instead of TableCellWalker.setObjectValue, we should use a new class CellValue, and a newTableCellWalker.setCellValue, which is unambiguous.

More details on CellValue:

  • CellValue.fromObject: where the inference is done
  • CellValue.fromDate, CellValue.fromBoolean, etc.
  • CellValue.fromTypeAndObject, unambiguous but slow.

This class will be a little cumberstone, but useful for helpers.

Maybe create a length class

This should be carefully tested, since it could slow down the generation of files if used intensively, but would be useful.

A sign that such a class is missing is the javadoc: the comments are polluted with explanations on what a length is (see 18.3.18 length, 18.3.23 percent).

A new API for FooterHeader building

Let's make it like that:
FooterHeader.builder("fh").par().styledText("A", ts1).styledText("B", ts2).styledPar(ps1).text("C").text("D").text("E).
The output would be:

AB

CDE

Improve benchmarking results presentation

The benchmarks should present results like that:

FastODS 10 tables of 5000 rows, 20 columns without warmup:
avg time: ... ms
best time: ... ms
worst time: ... ms

FastODS 10 tables of 5000 rows, 20 columns with warmup:
avg time: ... ms
best time: ... ms
worst time: ... ms

Idem for SimpleODS and JOpendocument libraries.

Create a OdsDocument.getOrAddTable()

The common idiom to access to a table when one does not know if it exists is:

Table t;
try {
    t = document.getTable("t");
} catch (FastOdsException e) {
    t = document.addTable("t");
}

A document.getOrAddTable()method would be welcome.

Java1.6 compliance

The code is Java 1.6 compliant. It would be useful to set source & target to 1.6 in the pom.xml, in order to have a jre1.6 compatible jar.

Escape tooltip text

The setTooltip(c, text) does not escape the text. It's a bug and will make LO crash if the text contains < or > characters.

A simple workaround is to escape the text before passing it to setTooltip with an XMLUtil instance.

EDIT: I've seen that the newlines aaren't correctly rendered under LO. Try to fix this too.

Create a TextProperties class

The TextStyle.appendAnonymousXMLToContentEntry output a <style:text-properties> tag. A TextProperties object in the TextStyle class, with delegation, would clarifiy the API.

A separate DomTester module

The DomTester class provides different tests for XML code equivalence. It is implemented in test dir and it is not so simple. Perhaps a separate module would be better ?

The name of the module would be : fastods-testlib (like in Guava), in a mutlimodule structure:

fastods/fastods/... <- current module without DomTester
fastods/fastods-testlib/... <- DomTester

Allow ZipOutputStream configuration

Currently, the ZipOutputStream has BEST_SPEED level. It seems good for most use cases, but some times one could have a good reason too choose BEST_COMPRESSION or NO_COMPRESSION.

Improve unit tests coverage

According to cobertura, the coverage of unit tests is 64 %, which is little. Coverage should be increased.

Autofilter

This should be relatively easy to setup:

table:calculation-settings
table:named-expressions
<table:database-ranges>
	<table:database-range table:name="__Anonymous_Sheet_DB__0" table:target-range-address="Sheet1.A1:Sheet1.A3" table:display-filter-buttons="true"/>
</table:database-ranges>

Add a smaller file generation for profiling

When profiling all methods (including java. ...), the profiling is very slow and blocks on a weak laptop. Adding smaller and less greedy test would allow to do the profiling in that case.

Add a logger for exceptions

This logger will trace exceptions instead of ignore or e.printStackTrace(). It could also provide valuable debug information.

SimpleODS dependency

The POM contains a SimpleODS dependency, used only in Benchmark.java, which is not part of unit tests, but has to be run manually. Since SimpleODS is not available in Maven central repo, this leads to a compilation failure.

Three options:

  • install SimpleODS manually;
  • remove dependency;
  • manage to keep dependency, but avoid failure (how ?)

Spans

When merging a cell with its neighbours, the generated XML sould look like:

<table:table-row>
	cells
	<table:table-cell table:number-columns-spanned="A" table:number-rows-spanned="B">
		value of the cell
	</table:table-cell>
	<table:covered-table-cell table:number-columns-repeated="A-1"/>
	cells
</table:table-row>

Followed by B times:

<table:table-row>
	cells
	<table:covered-table-cell table:number-columns-repeated="A"/>
	cells
</table:table-row>

Currently, all covered cells are ignored, and the XML looks like:

<table:table-row>
	cells
	<table:table-cell table:number-columns-spanned="A" table:number-rows-spanned="B">
		value of the cell
	</table:table-cell>
	cells
</table:table-row>

And nothing on the following rows.

Adding the covered cells on the row should be easy, since the XML of the row is created cell by cell. But it's more difficult to "cover the cells" of the next rows, because each HeavyTableRow is independant.

One solution would be:

  • create, for each Table, two methods setCovered(row, col) and isCovered(row, col).
  • call table.setCovered(row, col) when one cell covers neighbor cells.
  • call table.isCovered(row, col) when outputting the XML.

Separate Footer and Header

The FooterHeader class stores both header and footer, but we need to check the type (h or f) at some places. This check could be avoided with two classes.
Perhaps a design like this: keep the FooterHeader class but let it store an instance of FooterOrHeader interface that gathers the specific code of each "type".

Check printable header and footer

Some features have not been tested, like header and footer for printer, because they were not used. They should be tested and fixed if necessary now.

Check DomTester usage

Every time XML code is tested, DomTester should be used to avoid false positives:
<a b="1" c="2" />
Is equivalent to:
<a b="2" c="1" />

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.