Coder Social home page Coder Social logo

Comments (4)

assarbad avatar assarbad commented on May 29, 2024

The following .tar file contains the file names that cause the error renamer-error.tar.gz.

This should hopefully ensure that the file names appear on the target system the same as on mine. All files have been truncated to zero size for convenience.

The issue appears to be around the second code point in the file name starting with Z.

After unpacking this .tar file on a n ext4 and btrfs partition, respectively, I am seeing the same issue with renamer, so it's not specific to NTFS or the ufsd (commercial) driver from Paragon.

For reference, this is the renaming I am attempting:

From

$ xxd -g 1 before-renaming.txt
00000000: 44 69 65 20 49 6e 76 65 73 74 6d 65 6e 74 62 61  Die Investmentba
00000010: 6e 6b 20 42 61 72 69 6e 67 73 20 42 61 6e 6b 20  nk Barings Bank
00000020: 5b 31 37 31 30 31 38 5f 62 61 72 69 6e 67 73 5f  [171018_barings_
00000030: 69 6e 66 5d 2e 6d 70 34 0a 44 69 65 20 49 6e 76  inf].mp4.Die Inv
00000040: 65 73 74 6d 65 6e 74 62 61 6e 6b 20 4c 65 68 6d  estmentbank Lehm
00000050: 61 6e 20 42 72 6f 74 68 65 72 73 20 5b 31 37 31  an Brothers [171
00000060: 30 31 38 5f 6c 65 68 6d 61 6e 5f 69 6e 66 5d 2e  018_lehman_inf].
00000070: 6d 70 34 0a 5a c3 b6 6c 69 62 61 74 20 2d 20 44  mp4.Z..libat - D
00000080: 65 72 20 6b 61 74 68 6f 6c 69 73 63 68 65 20 4c  er katholische L
00000090: 65 69 64 65 6e 73 77 65 67 20 5b 31 30 38 30 34  eidensweg [10804
000000a0: 36 34 35 5d 2e 6d 70 34 0a                       645].mp4.

To

$ xxd -g 1 after-renaming.txt
00000000: 44 69 65 20 49 6e 76 65 73 74 6d 65 6e 74 62 61  Die Investmentba
00000010: 6e 6b 20 42 61 72 69 6e 67 73 20 42 61 6e 6b 20  nk Barings Bank
00000020: 5b 31 37 31 30 31 38 5f 62 61 72 69 6e 67 73 5f  [171018_barings_
00000030: 69 6e 66 5d 2e 6d 70 34 0a 44 69 65 20 49 6e 76  inf].mp4.Die Inv
00000040: 65 73 74 6d 65 6e 74 62 61 6e 6b 20 4c 65 68 6d  estmentbank Lehm
00000050: 61 6e 20 42 72 6f 74 68 65 72 73 20 5b 31 37 31  an Brothers [171
00000060: 30 31 38 5f 6c 65 68 6d 61 6e 5f 69 6e 66 5d 2e  018_lehman_inf].
00000070: 6d 70 34 0a 5a c3 b6 6c 69 62 61 74 20 2d 20 44  mp4.Z..libat - D
00000080: 65 72 20 6b 61 74 68 6f 6c 69 73 63 68 65 20 4c  er katholische L
00000090: 65 69 64 65 6e 73 77 65 67 2e 6d 70 34 0a        eidensweg.mp4.

The respective part (Zölibat) seems not to change at all, yet it's the part renamer complains about:

  • 5a c3 b6 6c 69 62 61 74
  • 5a c3 b6 6c 69 62 61 74

As far as I could ascertain the c3 b6 translates fine to ö (in HTML ö).

from pipe-rename.

assarbad avatar assarbad commented on May 29, 2024

Found it. rust-gdb was a real help.

Okay, so the problem was introduced by #48. I am still trying to reason about it, but the author - @mtimkovich - is probably better equipped to help out here. It looks to me as if the code makes undue assumptions about what the (renamed) string ought to look like.


The underlying issue

Preparations

  1. Update to master (currently at 56055f4)
  2. Append the following lines to Cargo.toml (especially [profile.dev] is relevant)
  3. Run cargo build
  4. Go into the folder where you unpacked renamer-error.tgz
  5. Run env RUST_BACKTRACE=1 rust-gdb -ex "b Rename::new" -ex run --args /path/to/target/debug/renamer *
  6. Edit interactively and save and quit
  7. Use n (for next) when back at the GDB prompt (note: after issuing a command once, you can hit Enter and it will get repeated, so instead of n plus Enter again you can simply hit Enter to run next again)

Lines to append to Cargo.toml

# xref: https://bitshifter.github.io/rr+rust/#11
# The development profile, used for `cargo build`
[profile.dev]
opt-level = 0  # Controls the --opt-level the compiler builds with
debug = true   # Controls whether the compiler passes `-g`

# The release profile, used for `cargo build --release`
[profile.release]
opt-level = 3
debug = false

Observing the panic

I am cutting out some redundant text output for brevity. Replaced lines have been substituted by a [...]:

$ env RUST_BACKTRACE=1 rust-gdb --args ../target/debug/renamer *
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
[...]
Reading symbols from ../target/debug/renamer...
[...]
Breakpoint 1 at 0x4f908: file src/main.rs, line 55.
[...]
Starting program: /home/oliver/pipe-rename/target/debug/renamer Die\ Investmentbank\ Barings\ Bank\ \[171018_barings_inf\].mp4 Die\ Investmentbank\ Lehman\ Brothers\ \[171018_lehman_inf\].mp4
Zölibat\ -\ Der\ katholische\ Leidensweg\ \[10804645\].mp4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 34368]

Breakpoint 1, renamer::Rename::new (original="Zölibat - Der katholische Leidensweg [10804645].mp4", new="Zölibat - Der katholische Leidensweg.mp4") at src/main.rs:55
55              let mut new = new.to_string();
(gdb) n
56              if let Ok(home) = env::var("HOME") {
(gdb)
57                  if &new[..2] == "~/" {
(gdb)
thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside 'ö' (bytes 1..3) of `Zölibat - Der katholische Leidensweg.mp4`', library/core/src/str/mod.rs:127:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: core::str::slice_error_fail_rt
   3: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/intrinsics.rs:2696:5
   5: core::str::slice_error_fail
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/mod.rs:86:9
   6: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeTo<usize>>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/traits.rs:294:21
   7: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/str/traits.rs:65:9
   8: <alloc::string::String as core::ops::index::Index<core::ops::range::RangeTo<usize>>>::index
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/string.rs:2324:10
   9: renamer::Rename::new
             at /home/oliver/pipe-rename/src/main.rs:57:17
  10: renamer::find_renames::{{closure}}
             at /home/oliver/pipe-rename/src/main.rs:147:22
  11: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:290:13
  12: core::iter::traits::iterator::Iterator::find_map::check::{{closure}}
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2732:32
  13: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2238:21
  14: core::iter::traits::iterator::Iterator::find_map
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:2738:9
  15: <core::iter::adapters::filter_map::FilterMap<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/adapters/filter_map.rs:61:9
  16: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
  17: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/spec_from_iter.rs:33:9
  18: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/mod.rs:2648:9
  19: core::iter::traits::iterator::Iterator::collect
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/iter/traits/iterator.rs:1836:9
  20: renamer::find_renames
             at /home/oliver/pipe-rename/src/main.rs:140:27
  21: renamer::main
             at /home/oliver/pipe-rename/src/main.rs:351:28
  22: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[Inferior 1 (process 34363) exited with code 0145]

What this shows us is that this is the problematic line.

In particular the issue seems to be that the slice created is a slice of bytes and therefore attempts to cut a Unicode codepoint right in its middle, which makes the underlying code barf.

example.tar.gz (Rust Playgound link) shows the issue a little better. Perhaps String::from_utf8() could be used to make the types in that condition on line 57 more compatible without having to care too much about what exactly is in the string at runtime?!

fn print_type_of<T>(name: &str, _: &T) {
    println!("{}: {}", name, std::any::type_name::<T>());
}

fn main() {
    let new: &str = "Zölibat"; // input as per line 53
    print_type_of(&"new", &new);
    let new = new.to_string(); // corresponds to line 55 (just without mut)
    print_type_of(&"mut new", &new);
    print_type_of(&"'~/'", &"~/");
    let first_two = &new[..2]; // the problematic part of line 57 (!!!)
    print_type_of(&"first_two", &first_two);
    println!("{}", new);
}

Debug with rust-gdb -ex 'b main.rs:11' -ex run --args ./target/debug/teststr

This answer appears to explain it even better and more succinctly than I ever could.

from pipe-rename.

assarbad avatar assarbad commented on May 29, 2024

The answer appears to be to use if new.starts_with("~/") { instead of that slice of bytes.

Rust Playground link

I think the tests could use some more wacky codepoints, too. Be it German umlauts, Icelandic characters (Þ,Ð ...), Cyrillic or Chinese characters 😉 ... just thinking out loud here.

from pipe-rename.

mtimkovich avatar mtimkovich commented on May 29, 2024

Good idea with getting weirder with the test cases.

from pipe-rename.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.