Coder Social home page Coder Social logo

minamotorin / exshell2csv Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 67 KB

Small script to convert Excel to CSV, written in shell script only. No additional packages are required.

License: GNU General Public License v3.0

Shell 75.60% Awk 24.40%
excel-to-csv excel converter exshell shell-script sed awk csv

exshell2csv's Introduction

README of exshell2csv, Version 0.2.1.

exshell2csv: Small script to convert Excel to CSV, written in shell script only. No additional packages are required.

I wanted convert Excel (*.xlsx file) to CSV in command line. I found softwares and packages to do this but these sowtwares are too large. Who want to install new packages just to convert? Why there is no small script to do that? So I wrote this.

Suggestion:

STOP Using EXCEL

I want to write documents:Use Markdown.
I want to do something like a spreadsheet:Use R.
I want to write documents with a spreadsheet:Use R Markdown.
I surely have to use a spreadsheet:Use (GNU) Emacs Org-mode.

Clone repository and put exshell2csv on a PATH.

And add permission: Run chmod +x /path/to/exshell2csv.

Show Usage:Run exshell2csv or exshell2csv -h.
Show list of sheet ids and sheet names:Run exshell2csv /path/to/excel.xlsx. Output is [SHEET ID]: [SHEET NAME].
Convert Excel file’s sheet number [SHEET ID] to CSV:
 Run exshell2csv /path/to/excel.xlsx [SHEET ID]. Not [SHEET NAMS] but [SHEET ID]. Output to STDOUT.

If you are Microsoft Windows user, maybe you have to run exshell2csv [APGUMENTS] | awk "{gsub("$", "\\r"); print}" due to carriage ruturn difference. I’ve never checked if this code is required or not.

I’ve never read documentation of Open Document Format. Some features will be wrong.

Some values will be not formated because some format features are not implemented. It is hard to check default format styles of Ecxcel, so there are only 2 format styles are supported.

And custom styles (defined in xl/styles.xml) is also unsupported. I don’t know how to read xl/styles.xml. (xl/ will be created by unzip /path/to/excel.xlsx.)

You can add format style yourself:

  1. Record cell number of line and column as [LINE] and [COLUMN] where value you want to format is on.
  2. Run exshell2csv.sh /path/to/excel.xlsx [SHEET ID] | sed -n '/^l/d; /^[COLUMN] [LINE]/p'. You have to replace arguments of exshell2csv.sh and [COLUMN] and [LINE] to 1.’s cell’s ones.

Output is [COLUMN] [LINE] [FORMAT ID] [VALUES]. It means format style [FORMAT ID] will format [VALUES] as which you want.

  1. Write format awk script in [CUSTOM FORMAT STYLE] file like this:

    else if ($3==[FORMAT ID]) {
      VALUES = after(4);
      VALUES = YourFormatScript(VALUES);
      cell[$2, $1] = VALUES;
    }
    

You can define new functions in [USER FUNCTIONS] file and use it. See Build section. Function after, ALPH_advance, ALPH_lt, and fdate were already defined.

  • /bin/sh (Bourne Shell)

    I don’t know does this script work on Ubuntu or not because Ubuntu’s /bin/sh is dash.

  • sed

  • awk (nawk or gawk)

  • unzip

This section shows how to make one script file from exshell2csv.sh, exshell2csv.awk, [CUSTOM FORMAT STYLE] (option), and [USER FUNCTIONS] (option).

Following commands are required run in same directory with exshell2csv.sh and exshell2csv.awk.

make.sh’s first argument is output path. WARNING: If output path is exist, path will be overwritten.

plain:

Run make.sh [OUTPUT PATH] and make customizeless exshell2csv. [OUTPUT PATH] will be overwritten.

test:

You shold test if scripts work fine before build.

Run sh exshell2csv.sh /path/to/excel.xlsx [SHEET ID] | awk -f exshell2csv.awk to test plain exshell2csv.

For using custom format style, run followings:

sed -e '/# CUSTOMIZE AREA #/r[CUSTOM FORMAT STYLE]' -e '$r[USER FUNCTIONS]' exshell2csv.awk > yourexshell2csv.awk
sh exshell2csv.sh /path/to/excel.xlsx [SHEET ID] | awk -f yourexshell2csv.awk

yourexshell2csv.awk will be overwritten.

customize:

If you want to add format style, run make.sh [OUTPUT PATH] [CUSTOM FORMAT STYLE].

Or if you want to add functions, run make.sh [OUTPPUT PATH] [CUSTOM FORMAT STYLE] [USER FUNCTIONS].

[OUTPUT PATH] will be overwritten.

See also: Customize section in NOTE section.

Why couldn’t I use a sheet name to select the sheet?:
 Due to risk of a number sheet name.
There are cells which have diference between original Excel and output CSV:
 CSV’s value on the cells are inner expression of Excel. The feature to format inner expression to string as same as Excel is not implemented. See NOTE section.

There are many softwares or packages to convert Excel to CSV.

Microsoft Excel:Excel can convert Excel file to CSV.
TODO
Add similar projects and hyper links.

If you have questions or feedbacks, or found bugs, typographical errors, wrong English or codes, or something else, pleas use GitHub issue feel free.

leap year:Excel judges year 1900 is a leap year. But this script is not. This is Excel’s bug (due to compatibility). I didn’t implement this because I don’t know the details.

This project is under the GNU General Public License Version 3.

exshell2csv's People

Contributors

minamotorin avatar rdv-dev avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

rdv-dev

exshell2csv's Issues

Values directly in sheetX.xml are not properly formatted

When there are values directly in sheet1.xml for example, this is the output of the sed scripts before being processed by the awk script:

 <c r="A1"> 7589437
B 1 s 0
C 1 s 1
D 1 s 2
E 1 s 3
F 1 s 4
G 1 s 5
H 1 s 6
I 1 s 7
J 1 s 8
K 1 s 9
L 1 s 10
M 1 s 11
N 1 s 12
O 1 s 13
P 1 s 14
Q 1 s 15
R 1 s 16
S 1 s 17
T 1 s 18
U 1 s 19
V 1 s 20
W 1 s 21
 <c r="A2"> 7589440
B 2 s 22

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.