Coder Social home page Coder Social logo

roytam1 / bsdconv Goto Github PK

View Code? Open in Web Editor NEW

This project forked from buganini/bsdconv

0.0 0.0 0.0 10.22 MB

A simple but powerful DSL for charset/encoding conversion and transformation, pure C implemetation with no extra dependencies

Home Page: https://bsdconv.io/bsdconv/

License: BSD 2-Clause "Simplified" License

Shell 0.11% Python 9.39% C 78.93% Makefile 5.57% HTML 0.74% Batchfile 0.07% Roff 5.18%

bsdconv's Introduction

Documentation & Support

http://www.slideshare.net/buganini/bsdconv

http://www.slideshare.net/Buganini/journey-of-bsdconv

API Reference: http://buganini.github.io/bsdconv/

Use bsdconv-man to show manual page for each module

IRC: irc://irc.freenode.net#bsdconv

Compilation & Installation

make PREFIX=${prefix} # default to /usr/local
sudo make install PREFIX=${prefix} # default to /usr/local
sudo ldconfig ${prefix}/lib # Linux
sudo ldconfig -m ${prefix}/lib # FreeBSD

Add codec alias

Update modules/{from,inter,to}/alias
make alias

Example

Convert traditional chinese big5 to simplified chinese utf-8

bsdconv big5:zhcn:utf-8 in.txt > out.txt
bsdconv big5:zhcn:utf-8 -i in.txt #inplace

Convert traditional chinese utf-8 to simplified chinese GB2312 with transliteration

bsdconv utf-8:zhcn:cp936,cp936-trans in.txt > out.txt

Convert simplified chinese to traditional chinese

bsdconv utf-8:zhtw:zhtw-words:utf-8

And ignoring whitespaces mixed in words

bsdconv utf-8:whitespace-derail:zhtw:zhtw-words:whitespace-rerail:utf-8

Convert big5 data, traditional chinese to simplified chinese, CRLF/CR/LF to CRLF, to big5 data, translate simplified chinese words, which are not in big5, to HTML entities, and uppercase the ascii characters.

bsdconv big5:zhcn:win:upper:big5,htmlentity in.txt > out.txt

Counting character width

echo -n "aa" | bsdconv utf-8:width:null
FULL: 1
HALF: 1

echo -n "aaˇ" | bsdconv utf-8:width:null
FULL: 1
HALF: 1
AMBI: 1

Very useful for migrating MySQL DB from Big5 to UTF-8

bsdconv htmlentity,big5-5c,big5:utf-8 in.sql > out.sql

Recover from mis-decoding/encoding (mistreated big5 as iso-8859-1 and converted to utf-8)

bsdconv 'utf-8:iso-8859-1|big5:utf-8'

Decode escaped data (byte/unicode mixed) like %u9644%20

bsdconv 'escape,byte:unicode,byte|skip,ascii:utf-8'

Generate string for fuzzy comparison

echo ¼ℌăDžⓐ⁹灣湾ド鬒鬒æß | bsdconv UTF-8:ZH-FUZZY-TW:KANA-PHONETIC:NFKD-CASEFOLD:UTF-8
    1⁄4hădža9灣灣do鬒鬒æss

Translate text to HTML

bsdconv big5:nl2br:ascii,html-img in.txt > out.htm

Use glyph image from http://www.cns11643.gov.tw

bsdconv utf-8:ascii,ascii-html-cns11643-img in.txt out.htm

Maintain inter map:

bsdconv bsdconv-keyword,bsdconv:bsdconv-keyword,utf-8 inter/FOO.txt > edit.tmp
vi edit.tmp
bsdconv bsdconv-keyword,utf-8:bsdconv-keyword,bsdconv edit.tmp > inter/FOO.txt

Windows

Use mingw with Makefile.win to build it, then copy everythings in build/ to c:\bsdconv
the path of the executable will be c:\bsdconv\bsdconv.exe

If you want to install to directory other than default path set BSDCONV_PATH environment variable to your path.

Run setEnvVar.bat as administrator could help you set proper environment variables.

Bindings

Python

Perl

PHP

Ruby

Go

Java

Haskell

Elasticsearch

PostgreSQL

MySQL

bsdconv's People

Contributors

buganini avatar godfat avatar mmcco avatar pkmx avatar roytam1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.