Coder Social home page Coder Social logo

boinkor-net / chars Goto Github PK

View Code? Open in Web Editor NEW
181.0 5.0 13.0 3.89 MB

cha(rs) is a commandline tool to display information about unicode characters

Home Page: https://github.com/boinkor-net/chars

License: MIT License

Rust 99.95% Makefile 0.02% Shell 0.03%
characters cli rust unicode

chars's People

Contributors

antifuchs avatar bors[bot] avatar ctsrc avatar dependabot-preview[bot] avatar dependabot[bot] avatar evanmcc avatar github-actions[bot] avatar jayman2000 avatar notriddle avatar progval avatar vladimyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

chars's Issues

Suggestion: Unicode version codepoint was added

I deal with Unicode a fair bit and chars is a handy tool. Sometimes it would be convenient to know which Unicode version assigned a particular codepoint.

E.g the output from chars might look something like this. The version information might not be shown by default and require a command line flag if it was deemed too noisy.

$ chars party
U+0001F973, 🥳 0x0001F973, \0374563, UTF-8: f0 9f a5 b3, UTF-16BE: d83edd73
Width: 2, prints as 🥳
Quotes as \u{1f973}
Unicode name: FACE WITH PARTY HORN AND PARTY HAT
Unicode version: 11.0

U+0001F389, 🎉 0x0001F389, \0371611, UTF-8: f0 9f 8e 89, UTF-16BE: d83cdf89
Width: 2, prints as 🎉
Quotes as \u{1f389}
Unicode name: PARTY POPPER
Unicode version: 6.0

I think the information is available via the DerivedAge.txt file in the UCD.

Allow effective searching for flags and other zwj-joined symbols

Turns out we can't find, e.g., the transgender flag (new in unicode 13!) - its codepoints are

U+1F3F3
U+FE0F
U+200D
U+26A7
U+FE0F

...meaning we can only find the constituent codepoints, but not the whole. That's a problem for all kinds of flags, family configurations and other glyphs composed of multiple codepoints.

The sequences have names, so we ought to be able to retrieve them.

Searches with many results

For searches like chars arrow (2820 lines!) or chars box (875), the results are not easy to read. It would be nice if there were a way to have a single-line per result mode which output U+XXXX, prints as X, Unicode name: XXX.

I see two possible approaches:

  • Automatically switch to single-line results if more than some number match (more than one?)
  • Add a command-line argument to enable this mode

Pull in unicode_names as an internal crate

We appear to be dependent on a very recent unicode_names (or at least a synced-up one). Since that hasn't updated since unicode 8 (and that took a year), maybe we could pull in a slimmed-down version of https://github.com/ProgVal/unicode_names2 and release that as a workspace crate. It does use the same data file as we do, after all!

Maybe the same might apply to the unicode-width crate too, but it's less noticeable for my use case. Let's try unicode_names first.

Difficulty searching for small triangles

I can see that there are several small triangles that exist:

$ chars 'DOWN-POINTING TRIANGLE'
U+0001F783, 🞃 0x0001F783, \0373603, UTF-8: f0 9f 9e 83, UTF-16BE: d83ddf83
Width: 1, prints as 🞃
Quotes as \u{1f783}
Unicode name: BLACK DOWN-POINTING ISOSCELES RIGHT TRIANGLE

U+0001F53D, 🔽 0x0001F53D, \0372475, UTF-8: f0 9f 94 bd, UTF-16BE: d83ddd3d
Width: 2, prints as 🔽
Quotes as \u{1f53d}
Unicode name: DOWN-POINTING SMALL RED TRIANGLE

U+0001F53B, 🔻 0x0001F53B, \0372473, UTF-8: f0 9f 94 bb, UTF-16BE: d83ddd3b
Width: 2, prints as 🔻
Quotes as \u{1f53b}
Unicode name: DOWN-POINTING RED TRIANGLE

U+2BC6, ⯆ 0x2BC6, \025706, UTF-8: e2 af 86, UTF-16BE: 2bc6
Width: 1, prints as ⯆
Quotes as \u{2bc6}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE CENTRED

U+29E9, ⧩ 0x29E9, \024751, UTF-8: e2 a7 a9, UTF-16BE: 29e9
Width: 1, prints as ⧩
Quotes as \u{29e9}
Unicode name: DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK

U+29E8, ⧨ 0x29E8, \024750, UTF-8: e2 a7 a8, UTF-16BE: 29e8
Width: 1, prints as ⧨
Quotes as \u{29e8}
Unicode name: DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK

U+26DB, ⛛ 0x26DB, \023333, UTF-8: e2 9b 9b, UTF-16BE: 26db
Width: 1 (2 in CJK context), prints as ⛛
Quotes as \u{26db}
Unicode name: HEAVY WHITE DOWN-POINTING TRIANGLE

U+25BF, ▿ 0x25BF, \022677, UTF-8: e2 96 bf, UTF-16BE: 25bf
Width: 1, prints as ▿
Quotes as \u{25bf}
Unicode name: WHITE DOWN-POINTING SMALL TRIANGLE

U+25BE, ▾ 0x25BE, \022676, UTF-8: e2 96 be, UTF-16BE: 25be
Width: 1, prints as ▾
Quotes as \u{25be}
Unicode name: BLACK DOWN-POINTING SMALL TRIANGLE

U+25BD, ▽ 0x25BD, \022675, UTF-8: e2 96 bd, UTF-16BE: 25bd
Width: 1 (2 in CJK context), prints as ▽
Quotes as \u{25bd}
Unicode name: WHITE DOWN-POINTING TRIANGLE

U+25BC, ▼ 0x25BC, \022674, UTF-8: e2 96 bc, UTF-16BE: 25bc
Width: 1 (2 in CJK context), prints as ▼
Quotes as \u{25bc}
Unicode name: BLACK DOWN-POINTING TRIANGLE

U+23F7, ⏷ 0x23F7, \021767, UTF-8: e2 8f b7, UTF-16BE: 23f7
Width: 1, prints as ⏷
Quotes as \u{23f7}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE

U+23EC, ⏬ 0x23EC, \021754, UTF-8: e2 8f ac, UTF-16BE: 23ec
Width: 2, prints as ⏬
Quotes as \u{23ec}
Unicode name: BLACK DOWN-POINTING DOUBLE TRIANGLE

$

But, when I try to look at only the small triangles:

$ chars 'SMALL TRIANGLE'
$ 

I get nothing. If I search for medium triangles:

$ chars 'MEDIUM TRIANGLE'
U+0001F827, 🠧 0x0001F827, \0374047, UTF-8: f0 9f a0 a7, UTF-16BE: d83edc27
Width: 1, prints as 🠧
Quotes as \u{1f827}
Unicode name: DOWNWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F826, 🠦 0x0001F826, \0374046, UTF-8: f0 9f a0 a6, UTF-16BE: d83edc26
Width: 1, prints as 🠦
Quotes as \u{1f826}
Unicode name: RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F825, 🠥 0x0001F825, \0374045, UTF-8: f0 9f a0 a5, UTF-16BE: d83edc25
Width: 1, prints as 🠥
Quotes as \u{1f825}
Unicode name: UPWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F824, 🠤 0x0001F824, \0374044, UTF-8: f0 9f a0 a4, UTF-16BE: d83edc24
Width: 1, prints as 🠤
Quotes as \u{1f824}
Unicode name: LEFTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F807, 🠇 0x0001F807, \0374007, UTF-8: f0 9f a0 87, UTF-16BE: d83edc07
Width: 1, prints as 🠇
Quotes as \u{1f807}
Unicode name: DOWNWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F806, 🠆 0x0001F806, \0374006, UTF-8: f0 9f a0 86, UTF-16BE: d83edc06
Width: 1, prints as 🠆
Quotes as \u{1f806}
Unicode name: RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F805, 🠅 0x0001F805, \0374005, UTF-8: f0 9f a0 85, UTF-16BE: d83edc05
Width: 1, prints as 🠅
Quotes as \u{1f805}
Unicode name: UPWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F804, 🠄 0x0001F804, \0374004, UTF-8: f0 9f a0 84, UTF-16BE: d83edc04
Width: 1, prints as 🠄
Quotes as \u{1f804}
Unicode name: LEFTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+2BC8, ⯈ 0x2BC8, \025710, UTF-8: e2 af 88, UTF-16BE: 2bc8
Width: 1, prints as ⯈
Quotes as \u{2bc8}
Unicode name: BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED

U+2BC7, ⯇ 0x2BC7, \025707, UTF-8: e2 af 87, UTF-16BE: 2bc7
Width: 1, prints as ⯇
Quotes as \u{2bc7}
Unicode name: BLACK MEDIUM LEFT-POINTING TRIANGLE CENTRED

U+2BC6, ⯆ 0x2BC6, \025706, UTF-8: e2 af 86, UTF-16BE: 2bc6
Width: 1, prints as ⯆
Quotes as \u{2bc6}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE CENTRED

U+2BC5, ⯅ 0x2BC5, \025705, UTF-8: e2 af 85, UTF-16BE: 2bc5
Width: 1, prints as ⯅
Quotes as \u{2bc5}
Unicode name: BLACK MEDIUM UP-POINTING TRIANGLE CENTRED

U+23F7, ⏷ 0x23F7, \021767, UTF-8: e2 8f b7, UTF-16BE: 23f7
Width: 1, prints as ⏷
Quotes as \u{23f7}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE

U+23F6, ⏶ 0x23F6, \021766, UTF-8: e2 8f b6, UTF-16BE: 23f6
Width: 1, prints as ⏶
Quotes as \u{23f6}
Unicode name: BLACK MEDIUM UP-POINTING TRIANGLE

U+23F5, ⏵ 0x23F5, \021765, UTF-8: e2 8f b5, UTF-16BE: 23f5
Width: 1, prints as ⏵
Quotes as \u{23f5}
Unicode name: BLACK MEDIUM RIGHT-POINTING TRIANGLE

U+23F4, ⏴ 0x23F4, \021764, UTF-8: e2 8f b4, UTF-16BE: 23f4
Width: 1, prints as ⏴
Quotes as \u{23f4}
Unicode name: BLACK MEDIUM LEFT-POINTING TRIANGLE

$

I still get plenty of results.

Suggestion: Make output colorful

Hello.

chars works pretty well and really helps a lot. However, its output looks a little boring and different parts of output take a while to distinguish.

So it would be great if the output is colorful. How do you think?

Suggestion: output something when there’s no results

When I ran,

chars --help

I was confused because chars gave me no output. From what I can tell, chars searched for “--help”, didn’t find anything and printed nothing as a result. It would be nice if chars printed something along the lines of “No results for ‘--help’.” That would make what chars is doing clearer.

[Feature] unicode character lookup by description

Being able to quickly look up a unicode character from your terminal could prove very useful (being able to call cha from vim, for example).

Is this in scope? If so I might submit a pull trying to implement this.

`cargo install` fails

When naively doing cargo install following the README, it fails with:

13:25~/git/chars(master)$ cargo install
error: found a virtual manifest at `/data/data/com.termux/files/home/git/chars/Cargo.toml` instead of a package manifest

cargo install chars --git https://github.com/antifuchs/chars.git works fine.

AUR Package

Hi, thanks for building this.

Just thought I'd let you know that I've added an AUR package for chars to make it easy to install on Arch Linux with the system packaging tools.

Might be worth including a link to the package in installation section of the README.

Suggestion: Include HTML character entity reference names in output and in search

With your tool it is possible to look up unicode characters by various criteria as you've stated in your readme, including "unicode name" and "also known as".

In HTML, named character escape sequences are available for things like the less than and the greater than signs, but also for quite a few other characters.

Back in the day, before UTF-8 encoding support was widespread, we'd use the ISO-8859-1 encoding for our HTML and we'd use named character escape sequences for characters like æ, ø, å for example.

Some of those names stuck with me and I sometimes search for those characters by those names on Google if I am on a machine where inputing said characters directly is not possible or just too cumbersome.

Even on my MacBook Air, where I can generally long-press certain keys to access other characters, some applications implement text input that does not support the long-press functionality, so I go to some other window on-screen and either long-press there, or search for it on Google whichever is most convenient at the time (convenience in this case is determined by which other windows I happen to have on screen at that moment).

I pretty much always have at least one terminal window open at any time, and if I don't then opening the terminal is fast and simple.

Prior to purchasing my MacBook Air, when I was running Linux on a ThinkPad, I made a few simple shellscripts that were named after the HTML character entity references for the characters that I most commonly needed; æ, ø, å, Æ, Ø, Å; aelig, oslash, aring, AElig, Oslash, Aring. When executed they would spit out the corresponding UTF-8 encoded byte sequence for the character in question.

oslash
ø

A full list of all HTML character entity references can be found at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML

Most notably for me personally, aside from the six mentioned above are laquo, raquo, ndash, mdash, eacute and Eacute, but they are all useful IMO and anyway if you agree to include the HTML character entity reference names then it would make the most sense to include them all I think.

So to get to the point, my suggestion is that based upon the table at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML, an additional field be added for applicable characters in the output for chars.

Some examples of what the output of chars would look like:

Example 1

chars U+002A
ASCII 2/a,  42, 0x2a, 0052, bits 00101010
Width: 1, prints as *
Unicode name: ASTERISK
Also known as: Star, Splat, Aster, Times, Gear, Dingle, Bug, Twinkle, Glob
HTML entity names: ast, midast

Example 2

chars U+00AE
LATIN1 ae, 174, 0xae, 0256, bits 10101110
Width: 1 (2 in CJK context), prints as ®
Quotes as \u{ae}
Unicode name: REGISTERED SIGN
HTML entity names: reg, circledR, REG

Example 3

chars U+00C6
LATIN1 c6, 198, 0xc6, 0306, bits 11000110
Width: 1 (2 in CJK context), prints as Æ
Upper case. Downcases to æ
Quotes as \u{c6}
Unicode name: LATIN CAPITAL LETTER AE
HTML entity name: AElig

In the examples above, a field named "HTML entity names" (where multiple names exist) or "HTML entity name" (where only one name exists) has been added.

Furthermore, I request that case-sensitive search is performed on this field where present, so that one can search for them and get results like shown in the following examples:

Example 1

chars Oslash
LATIN1 d8, 216, 0xd8, 0330, bits 11011000
Width: 1 (2 in CJK context), prints as Ø
Upper case. Downcases to ø
Quotes as \u{d8}
Unicode name: LATIN CAPITAL LETTER O WITH STROKE
HTML entity name: Oslash

Example 2

chars oslash
LATIN1 f8, 248, 0xf8, 0370, bits 11111000
Width: 1 (2 in CJK context), prints as ø
Lower case. Upcases to Ø
Quotes as \u{f8}
Unicode name: LATIN SMALL LETTER O WITH STROKE
HTML entity name: oslash

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.