Coder Social home page Coder Social logo

a-merezhanyi / voca_rs Goto Github PK

View Code? Open in Web Editor NEW
182.0 7.0 11.0 3.65 MB

Voca_rs is the ultimate Rust [unicode] string library, implemented as independent functions and on Foreign Types (String and str).

Home Page: https://crates.io/crates/voca_rs

License: Other

Rust 100.00%
string rust utf-8 utf8 string-manipulation string-matching string-formatter unicode

voca_rs's Introduction

voca_rs

Crates version dependency status Build Status codecov license

Voca_rs is a Rust library for manipulating [unicode] strings.

Voca_rs is implemented on Foreign Types, i.e. String and str. Respects Unicode.

Voca_rs is inspired by Voca.js (JavaScript), string.py (Python), Inflector (Rust), and Grafite (PHP).

TL;DR

Using functions:

use voca_rs::*;

let input_string = "LazyLoad with XMLHttpRequest and snake_case";
let string_in_words = split::words(&input_string);
// => ["Lazy", "Load", "with", "XML", "Http", "Request", "and", "snake", "case"]
let words_in_string = &string_in_words.join(" ");
// => "Lazy Load with XML Http Request and snake case"
let truncated_string = chop::prune(&words_in_string, 21, "");
// => "Lazy Load with XML..."
let sliced_string = chop::slice(&truncated_string, 5, -2);
// => "Load with XML."
let snaked_string = case::snake_case(&sliced_string);
// => "load_with_xml"

Using traits (all methods start from the underscore symbol):

use voca_rs::Voca;

"LazyLoad with XMLHttpRequest and snake_case"
._words()
// => ["Lazy", "Load", "with", "XML", "Http", "Request", "and", "snake", "case"]
.join(" ")
// => "Lazy Load with XML Http Request and snake case"
._prune(21, "")
// => "Lazy Load with XML..."
._slice(5, -2)
// => "Load with XML."
._snake_case();
// => "load_with_xml"

Documentation

See the complete documentation at https://docs.rs/voca_rs/

Run tests: cargo test
Build docs: cargo doc -> ./target/doc/voca_rs/index.html
Build a project: cargo build -> ./target/debug

Functions

Case

Chop

Count

Escape

Index

Manipulate

Query

Split

Strip

Utils

Copyright

Coded by A. Merezhanyi

License

Licensed under MIT License

voca_rs's People

Contributors

a-merezhanyi avatar codacy-badger avatar fretn avatar mark-i-m avatar rarepally avatar rillian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

voca_rs's Issues

Rewrite: strip::strip_tags

\src\strip.rs - strip::strip_tags

Rewrite a strip_tags function to remove the "dissolve" module, use the only clean function.

Refactoring: HEXDIGITS

\src\utils.rs - utils::HEXDIGITS

Make a combination of slices of DIGITS, ASCII_LOWERCASE, ASCII_UPPERCASE.

README.md example does not compile

When I compile
In the README.md, when compiling:

let snake_string = case::snake_case(chop::slice(&words_in_string, 13, 28));

rustc reports that slice() is returning a Strong and that snake_case() must take a &str.

I rewrote this as

let snake_string = case::snake_case(&*chop::slice(&words_in_string, 13, 28));

and it worked.

U+200D (zero-width joiner) breaks the parsing

Long story short:

fn main() {
    assert_eq!(voca_rs::strip::strip_tags("<p>\u{200D}</p>after"), "after");
}

Leads to

thread 'main' panicked at src/main.rs:2:5:
assertion `left == right` failed
  left: ""
 right: "after"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I believe it is caused by the following fact:

use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let graphemes = "<p>\u{200D}</p>".graphemes(true).collect::<Vec<_>>();
    assert_eq!(graphemes, ["<", "p", ">\u{200d}", "<", "/", "p", ">"]);
}

It is very hard to work correctly with unicode, and it is even more hard to make non-trivial assumptions (like a "grapheme is a character or something like that", or "nothing would be attached to a normal character in a grapheme") 😢

Add scientific numbers validation

src/query - query::is_numeric

Add scientific numbers validation
assert_eq!(query::is_numeric("1.5E+2"), true);

Probably via regexp

Refactoring: PRINTABLE

\src\utils.rs - utils::PRINTABLE

Make a combination of DIGITS, ASCII_LETTERS, PUNCTUATION, WHITESPACE.

split::words alternative that keeps punctuation

I currently have to singularise whole sentences. The code that I came up with splits a string using voca_rs::split::words and maps each word using inflector::string::singularize::to_singular. I then stitch it back together using Vec#join(" ").

This does remove punctuation like , and . though. It would be great if there was a method that keeps the punctuation intact. I saw that you wanted to add support for inflector. Did you already have an idea on how to handle this specific case here?

Encode/decode URI?

I didn't see anything at a glance, but wanted to check if there's any support to encode/decode URIs with this library. I know some other Rust libraries exist that do this, but I'm already using voca for html escaping and also need URI encoding. Would love to maintain a single dependency instead of multiple if it makes sense.

Example in JavaScript is the encodeURI function.

At the moment, I'm using percent-encoding to achieve this.

strip_tags panic

let s = voca_rs::strip::strip_tags("<span style=\"color: rgb(51, 51, 51); font-family: \" microsoft=\"\" yahei=\"\" stheiti=\"\" wenquanyi=\"\" micro=\"\" hei=\"\" simsun=\"\" sans-serif=\"\" font-size:=\"\" 16px=\"\">】มีมี่’ เด็กสาวที่นอนไม่ค่อยหลับเนื่องจากกลัวผี ขี้เหงา และอะไรหลายๆ อย่างทำให้เธอมึนได้โล่เพราะไม่ค่อยได้นอน การที่เธอ นอนไม่หลับทำให้เธอได้เจอกับ ‘ดีเจไททัน’ แห่งคลื่น 99.99 MHzเขาจัดรายการในช่วง Midnight Fantasy ตีสามถึงตีห้า และมีมี่ก็เป็นผู้ฟังเพียงคนเดียวของเขาจากที่ตอนแรกเธอฟังดีเจไททันเพื่อช่วยปลอบประโลมการที่เธอต้องมาอยู่หอเพียงลำพัง แต่ไปๆ มาๆกลับกลายเป็นว่าเธออยู่รอฟังเขาทุกคืนทำให้เธอไปเรียนแบบมึนๆ จนบังเอิญไปนอนหลับซบ ‘ธรรม’ผู้ชายจอมกวนที่บังเอิญมานอนให้เธอซบ! จนอาจารย์สั่งให้ไปทำรายงานคู่กัน และนั่นก็เป็นที่มาของการที่เธอเริ่มไม่แน่ใจแล้วว่าเธอปลื้มดีเจไททัน หรือแอบหวั่นไหวกับนายจอมกวนคนนี้กันแน่</span><br />");
println!("{}",s);

Got panic:

thread 'main' panicked at 'index 779 out of range for slice of length 664', src/libcore/slice/mod.rs:2674:5
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:77
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1052
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:472
  11: rust_begin_unwind
             at src/libstd/panicking.rs:380
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  13: core::slice::slice_index_len_fail
             at src/libcore/slice/mod.rs:2674
  14: <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/slice/mod.rs:2838
  15: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/slice/mod.rs:2656
  16: <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/vec.rs:1883
  17: voca_rs::strip::unicode_string_range
             at /Users/gembin/.cargo/registry/src/github.com-1ecc6299db9ec823/voca_rs-1.10.0/src/strip.rs:68
  18: voca_rs::strip::strip_html_tags
             at /Users/gembin/.cargo/registry/src/github.com-1ecc6299db9ec823/voca_rs-1.10.0/src/strip.rs:87
  19: voca_rs::strip::strip_tags
             at /Users/gembin/.cargo/registry/src/github.com-1ecc6299db9ec823/voca_rs-1.10.0/src/strip.rs:55

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.