TL;DR Currently tofi is only case insensitive for latin. <p di

Ok, I think I've got this mostly working on the <a href="https://github.com/philj56/to

To make sure I thoroughly tested everything I've also replaced <code class="notranslat

Case insensitivity for non-latin,about philj56/tofi

Comments (23)

MahouShoujoMivutilde commented on May 26, 2024 1

Thanks for being so responsive and making tofi even better!

from tofi.

philj56 commented on May 26, 2024

Hey, thanks for the detailed write up! Yeah, non-ascii stuff is generally a bit busted in C. Seems people use menus like this for much more than just running programs like I do, so I agree proper Unicode handling would be nice to have. On that note, I think I may also need to look into allowing multiple font files to be passed to --font as well at some point.

A general note: I'd like to have as few dependencies as possible, vendored or otherwise. Therefore, instead of using utf8.h (which does look pretty good), I'd rather use GLib as tofi already depends on it, and it includes all the necessary functions. According to the note in the documentation of g_strncasecmp, the correct way to do this with GLib is g_utf8_casefold followed by strcmp or similar. islower and isupper can be done with a combination of g_utf8_get_char and g_unichar_islower / g_unichar_isupper.

For your specific problem, I think you're exactly right in saying

I suspect this line
https://github.com/philj56/tofi/blob/ef585fcbe156e6622222cd4a018776a5e1fb194b/src/fuzzy_match.c#L122

That line's grabbing the first byte of pattern, and sticking a null terminator after it. For utf8 input, the characters are more than 1 byte long, which means that search is not a valid utf8 string, and utf8casestr probably balks at it. The correct way is to search for a whole Unicode character - with GLib, that'd be with g_utf8_strchr.

It's late here now, but I'll take a look at this tomorrow and see how it works in practice.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

Cool, thank you for the best dmenu alternative I've used, btw.

from tofi.

philj56 commented on May 26, 2024

Ok, I think I've got this mostly working on the utf8 branch (at least I haven't broken Latin text). I don't have an easy way to type non-Latin text into tofi at the moment, so could you give that branch a go and let me know if it works alright (with and without fuzzy matching)? I'll also look into ways to automate typing text into tofi, which would help with testing stuff like this.

Cool, thank you for the best dmenu alternative I've used, btw.

No problem, I'm glad you like it!

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

So far it seems to work just fine with fuzzy matching and without, but I'll use it for a few days to make sure.

I'll also look into ways to automate typing text into tofi, which would help with testing stuff like this.

Wtype can do that.

01.08.15.2022.mp4

sleep 5; wtype 'Something about fox; Что-то там про лису; you can even autotype this 不規則性エントロピー'

But I think just headless test for search functions with predefined "input" strings, patterns and corresponding expected return values would be more robust in term of automation.

from tofi.

philj56 commented on May 26, 2024

But I think just headless test for search functions with predefined "input" strings, patterns and corresponding expected return values would be more robust in term of automation.

Definitely, I was more thinking of something to quickly check it's roughly working.

Wtype can do that.

Nice, thanks!

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

To make sure I thoroughly tested everything I've also replaced fzf with tofi --fuzzy-match true (and I use it a lot when in terminal). (In addition to clipboard/drun/etc).

General notes

This is just a few remarks, not a critique.

Tofi works faster with larger number of shorter lines than fewer number of longer lines, even if the actual number of characters is the same. (Relevant and noticeable only when there really a lot of text in the input - e.g. when used for file search)
Tofi's interface is strongly tied to the search, so if search takes long enough, prompt can ignore your input.
There are some weird edge cases where fuzzy search will display the line with some obscure phrase that simply has the same letters as the prompt as the best match, and as the second place, right after it, - the expected result, exact match for the prompt. It happens once in a blue moon, so it doesn't matter that much, and it looks like the original algo's problem, rather than utf8 port problem (considering testing still matches expected results after initial score is set back to 100).
Would be nice if hotkeys like Ctrl-j/k also ignored keyboard layout (maybe relevant as an example: rofi wayland fork (written in C) does that).

UTF8

score was initially 100 and now 0. It works, results' order is the same, but resulting score is different (tested via https://github.com/philj56/fuzzy-match ported to utf8).

Overall, I haven't seen any practical problems with new utf8 search backend.

from tofi.

philj56 commented on May 26, 2024

Nice! I've merged the utf8 handling into master.

To make sure I thoroughly tested everything I've also replaced fzf with tofi --fuzzy-match true (and I use it a lot when in terminal). (In addition to clipboard/drun/etc).

This is well beyond what I thought it would ever be used for 😅 Thanks for being so thorough! I just tried it myself and it's pretty bad with find * | tofi. It probably needs to work similarly to fzf and start drawing before all input is loaded for something like that to be usable.

Tofi works faster with larger number of shorter lines than fewer number of longer lines, even if the actual number of characters is the same. (Relevant and noticeable only when there really a lot of text in the input - e.g. when used for file search)

Is that just with fuzzy match, or for simple matching as well? For fuzzy matching it makes sense, as there's likely lots of potential matches for a long input string. Off the top of my head, I'd expect the time to find the best match to grow exponentially with string length. I guess there could be some heuristic, to say "Only ever try to find the n next characters", so accuracy can be traded for speed.

Tofi's interface is strongly tied to the search, so if search takes long enough, prompt can ignore your input.

Yes, it only draws a new frame every keypress, and will hang until it's finished. I can't get it to miss input with some quick testing, but I can get it to exhibit an old bug: rendering becomes out-of-sync with input somehow, so what's displayed lags behind your input by 1 keypress. Solving this probably requires some re-architecturing.

There are some weird edge cases ...

Interesting, I guess this is due to the camel-case / word separator bonuses being larger than the letter adjacency bonus:

tofi/src/fuzzy_match.c

Lines 170 to 173 in 3406922

    
           const int adjacency_bonus = 15; 
        
           const int separator_bonus = 30; 
        
           const int camel_bonus = 30; 
        
           const int first_letter_bonus = 15;

I can reproduce this with something like:

printf "think\nTxHxIxNxK" | tofi --fuzzy-match true

typing "think" (or even just "th") then preferentially matches "TxHxIxNxK". This could be solved by making the fuzzy match heuristics customisable, but I don't want to add too many seldom-used options. Maybe this is worth it though, as the original algorithm's really just optimised for code searching.

Would be nice if hotkeys like Ctrl-j/k also ignored keyboard layout

I can look into it, but just to be clear, you mean selecting a keyboard layout with e.g. j in a different place shouldn't change which keyboard key you have to hit? (Also, keyboard handling in general needs sorting out, it's on my todo list)

Overall, I haven't seen any practical problems with new utf8 search backend.

👍 Great! After you suggested it, I actually added a tiny bit of unit testing in 3406922 and found one minor bug: an accented character will fuzzily match against a string containing the same character and accent, but not together, e.g.
ạ will match against aọ (but not against just a).

Thanks again for all the discussion and feedback, it's very helpful!

from tofi.

philj56 commented on May 26, 2024

Off the top of my head, I'd expect the time to find the best match to grow exponentially with string length.

Actually, I remembered I've thought about this before and did some benchmarking. The time for one match should be O(n*m), where n is the length of the search pattern and m is the length of the target string, so it should be the same. Rendering is likely to be slower with long lines though - I'll have to do some testing.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

It probably needs to work similarly to fzf and start drawing before all input is loaded for something like that to be usable

It is already surprisingly usable for me, I use it as general media opener when I know exactly what I want to pull up and don't want to go down the directory tree.

But yeah, I can see it being slow to open with large enough collection of files.

Here is the script

#!/usr/bin/env sh

fd -t f -t l \
    -e '.bmp' \
    -e '.gif' \
    -e '.jpeg' \
    -e '.jpg' \
    -e '.jxr' \
    -e '.heif' \
    -e '.webp' \
    -e '.png' \
    -e '.svg' \
    -e '.tif' \
    -e '.tiff' \
    -e '.cbz' \
    -e '.cbr' \
    -e '.flac' \
    -e '.m4a' \
    -e '.mp3' \
    -e '.ogg' \
    -e '.opus' \
    -e '.wav' \
    -e '.wma' \
    -e '.m4v' \
    -e '.avi' \
    -e '.mkv' \
    -e '.mov' \
    -e '.mp4' \
    -e '.mpeg' \
    -e '.wmv' \
    -e '.webm' \
    -e '.djvu' \
    -e '.epub' \
    -e '.fb2' \
    -e '.pdf' \
    -e '.docx' \
    -e '.xlsx' \
    . ~ |
    sed "s#^$HOME#~#g" | tofi | sed "s#^~#$HOME#g" |
    xargs -d '\n' -r xdg-open

# and -E and --ignore-file for fd to not go in places you don't expect to find relevant files

Also fd is faster than find, so there is that.

Is that just with fuzzy match, or for simple matching as well?

Just the fuzzy match, yes.

Tested via tr -d '\n' < alice_in_wonderland.txt | fold -s -w1000 | tofi --fuzzy-match true.

Try with 1000 and more reasonable 100 - the difference is very noticeable.

alice_in_wonderland.txt.

old bug: rendering becomes out-of-sync with input somehow, so what's displayed lags behind your input by 1 keypress.

This is exactly what I saw and meant, yes.

Weird edge cases
TxHxIxNxK

Yeah, I guess that's distilled version of what I saw.

Maybe this is worth it though, as the original algorithm's really just optimised for code searching.

I think those kind of options would likely be configured once by user and left alone after, but it'll be definitely nice to have.

I'll play around with the values to see maybe there is a better default, idk.

but just to be clear, you mean selecting a keyboard layout with e.g. j in a different place shouldn't change which keyboard key you have to hit?

No, more like j is the same physical key as о (cyrillic) and k is л (cyrillic).

This is a long shot, but I know that GUI apps don't care about your keyboard layout when there is some mod mask, e.g. ctrl+t will open new tab in chromium regardless of my keyboard layout.

EDIT: That's seems to be called "control transformation":

If the Control modifier is active and was not consumed by the translation process, the string produced is transformed to its matching ASCII control character (if applicable). Keysyms are not affected.

Same for fzf (somehow) with ctrl-j/k.

Great! After you suggested it, I actually added a tiny bit of unit testing in 3406922

Nice!

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

In regards to ignoring layout:

Here is what wev thinks about ctrl-j in different layouts

# Pressed Ctrl-j
[14:     wl_keyboard] key: serial: 2930; time: 5602413; key: 37; state: 1 (pressed)
                      sym: Control_L    (65507), utf8: ''
[14:     wl_keyboard] modifiers: serial: 0; group: 16
                      depressed: 00000004: Control 
                      latched: 00000000
                      locked: 00000010: Mod2 
[14:     wl_keyboard] key: serial: 2932; time: 5603933; key: 44; state: 1 (pressed)
                      sym: j            (106), utf8: '
'
[14:     wl_keyboard] key: serial: 2933; time: 5604061; key: 44; state: 0 (released)
                      sym: j            (106), utf8: ''
[14:     wl_keyboard] key: serial: 2934; time: 5606685; key: 37; state: 0 (released)
                      sym: Control_L    (65507), utf8: ''
[14:     wl_keyboard] modifiers: serial: 0; group: 16
                      depressed: 00000000
                      latched: 00000000
                      locked: 00000010: Mod2 

# Switched layout
[14:     wl_keyboard] key: serial: 2936; time: 5608053; key: 108; state: 1 (pressed)
                      sym: ISO_Next_Group (65032), utf8: ''
[14:     wl_keyboard] modifiers: serial: 1; group: 16
                      depressed: 00000000
                      latched: 00000000
                      locked: 00000010: Mod2 
[14:     wl_keyboard] key: serial: 2938; time: 5608141; key: 108; state: 0 (released)
                      sym: ISO_Next_Group (65032), utf8: ''

# Pressed Ctrl-j again
[14:     wl_keyboard] key: serial: 2939; time: 5610821; key: 37; state: 1 (pressed)
                      sym: Control_L    (65507), utf8: ''
[14:     wl_keyboard] modifiers: serial: 1; group: 16
                      depressed: 00000004: Control 
                      latched: 00000000
                      locked: 00000010: Mod2 
[14:     wl_keyboard] key: serial: 2941; time: 5612221; key: 44; state: 1 (pressed)
                      sym: Cyrillic_o   (1743), utf8: '
'
[14:     wl_keyboard] key: serial: 2942; time: 5612381; key: 44; state: 0 (released)
                      sym: Cyrillic_o   (1743), utf8: ''
[14:     wl_keyboard] key: serial: 2943; time: 5614469; key: 37; state: 0 (released)
                      sym: Control_L    (65507), utf8: ''
[14:     wl_keyboard] modifiers: serial: 1; group: 16
                      depressed: 00000000
                      latched: 00000000
                      locked: 00000010: Mod2

Notice that while sym is either j or Cyrillic_o, key number stays 44.

https://git.sr.ht/~sircmpwn/wev

j is 44,
k is 45.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

Just quick and dirty

patch

diff --git a/src/main.c b/src/main.c
index 043f6a8..e02bd08 100644
--- a/src/main.c
+++ b/src/main.c
@@ -182,7 +182,7 @@ static void handle_keypress(struct tofi *tofi, xkb_keycode_t keycode)
 		}
 	} else if (entry->input_length > 0
 			&& (sym == XKB_KEY_BackSpace
-				|| (sym == XKB_KEY_w
+				|| (keycode == 25 /* `w` physical key */
 					&& xkb_state_mod_name_is_active(
 						tofi->xkb_state,
 						XKB_MOD_NAME_CTRL,
@@ -214,7 +214,7 @@ static void handle_keypress(struct tofi *tofi, xkb_keycode_t keycode)
 		} else {
 			entry->results = string_vec_filter(&entry->commands, entry->input_mb, tofi->fuzzy_match);
 		}
-	} else if (sym == XKB_KEY_u
+	} else if (keycode == 30 /* `u` physical key */
 			&& xkb_state_mod_name_is_active(
 				tofi->xkb_state,
 				XKB_MOD_NAME_CTRL,
@@ -233,7 +233,7 @@ static void handle_keypress(struct tofi *tofi, xkb_keycode_t keycode)
 			entry->results = string_vec_filter(&entry->commands, entry->input_mb, tofi->fuzzy_match);
 		}
 	} else if (sym == XKB_KEY_Escape
-			|| (sym == XKB_KEY_c
+			|| (keycode == 54 /* `c` physical key */
 				&& xkb_state_mod_name_is_active(
 					tofi->xkb_state,
 					XKB_MOD_NAME_CTRL,
@@ -250,7 +250,7 @@ static void handle_keypress(struct tofi *tofi, xkb_keycode_t keycode)
 
 	uint32_t nsel = MAX(MIN(entry->num_results_drawn, entry->results.count), 1);
 	if (sym == XKB_KEY_Up || sym == XKB_KEY_Left || sym == XKB_KEY_ISO_Left_Tab
-			|| (sym == XKB_KEY_k
+			|| (keycode == 45 /* `k` physical key */
 				&& xkb_state_mod_name_is_active(
 					tofi->xkb_state,
 					XKB_MOD_NAME_CTRL,
@@ -270,7 +270,7 @@ static void handle_keypress(struct tofi *tofi, xkb_keycode_t keycode)
 			}
 		}
 	} else if (sym == XKB_KEY_Down || sym == XKB_KEY_Right || sym == XKB_KEY_Tab
-			|| (sym == XKB_KEY_j
+			|| (keycode == 44 /* `j` physical key */
 				&& xkb_state_mod_name_is_active(
 					tofi->xkb_state,
 					XKB_MOD_NAME_CTRL,

...seems to actually work just fine! Now hotkeys are layout-agnostic.

Will test further.

from tofi.

philj56 commented on May 26, 2024

Sorry for the slow response, been a busy couple of days.

Here is the script

Pretty cool - on my desktop, running that in my home folder gives ~15000 files, which takes 100ms to open. Not bad, but would definitely be improved by the early start to rendering I mentioned. Nice to know about fd.

Try with 1000 and more reasonable 100 - the difference is very noticeable.

Wow that's bad! Sway actually completely locked up for me. I tried checking just how many times fuzzy_match_recurse() is called when entering a pattern:

Pattern	fold width	`fuzzy_match_recurse()` calls
`s`	100	7996
`ss`	100	29859
`sss`	100	70713
`s`	1000	6648
`ss`	1000	159426
`sss`	1000	2512656

Safe to say, 2.5 million function calls is never going to be very quick - I'll do some more investigation. Maybe the "Only ever try to find the n next characters" strategy isn't a bad one to have as a failsafe.

This is exactly what I saw and meant, yes.

Cool, I'll look into it.

I think those kind of options would likely be configured once by user and left alone after, but it'll be definitely nice to have.

Yeah probably, it's more just about making the man page / command line help more awkward to read. I'll think about it at some point.

...seems to actually work just fine! Now hotkeys are layout-agnostic.

Nice! Yeah I've come across this in gamedev stuff before, keycodes vs. keysyms was never that clear to me. My first thought is, does this still work on a physical AZERTY/Dvorak/whatever keyboard? If so, then yeah looks like keycodes are the way to go.

from tofi.

philj56 commented on May 26, 2024

Very briefly, here's the performance of fuzzy matching sss against Alice in Wonderland with different line lengths:

so yeah, some exponential / power-law is going on.

Edit: Did some more thinking, some theoretical background: asking how many ways $N$ there are to match $n$ copies of a character in a string with $k$ of that character is the same question as asking how many $k$-bit numbers there are with $n$ bits set, which turns out to be:

$$ N = \frac{k!}{n!(k-n)!} $$

For example, in the first 1000 characters of Alice in Wonderland, there are 39 s's, so

$$ \begin{aligned} N &= \frac{39!}{3! \cdot 36!} \\ &= 9139 \end{aligned}$$

This scales very poorly with $k$. For the full text wrapped to 1000 character lines, the total number of matches for sss is (excuse the overkill one-liner):

$ tr -d '\n' < alice_in_wonderland.txt | fold -s -w1000 | sed "s/[^s]//g" | awk '{ print length }' | sort -n | python -c 'import sys; import math; [print(math.factorial(int(line)) / (math.factorial(3) * math.factorial(int(line) - 3))) for line in sys.stdin]' | paste -s -d+ | bc -l
2000025.0

So yeah, 2 million possible matches to search (plus ~500000 function calls just to check the null byte at the end of the pattern I guess).

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

Sorry for the slow response, been a busy couple of days.

No worries!

Sway actually completely locked up for me.

I am not sure it was a real lock up.

I had similar effect (running Hyprland) with really long lines, but I was able to just click outside the tofi's window (which really became locked up), press hotkey to open the terminal, and pkill tofi.

It'll actually lock up with just 2 1000 character lines

tr -d '\n' < alice_in_wonderland.txt | fold -s -w1000 | head -n 2 | tofi --fuzzy-match true

The score is -700-something for alice...

Maybe the "Only ever try to find the n next characters" strategy isn't a bad one to have as a failsafe.

...So another way to approach this is that if penalty for unmatched characters is large enough, we could just drop fuzzy search for this particular line, and use regular search, or maybe find n characters, then give up and try regular search.

That way we won't miss a match at the end of the long line, but still avoid taking too long.

keycodes vs. keysyms, AZERTY/Dvorak

From arch wiki, keyboard_input:

The keyboard sends a scancode to the computer.

The Linux kernel maps the scancode to a keycode, see Map scancodes to keycodes.

The keyboard layout maps the keycode to a symbol or keysym, depending on what modifier keys are pressed.

For the Linux console, see Linux console/Keyboard configuration.

For Xorg and Wayland, see Xorg/Keyboard configuration.

From map scancodes to keycodes:

Mapping scancodes to keycodes is achieved in a layer lower than Xorg and Linux console, which means that changes to this mapping will be effective in both
...
The preferred method is to use udev because it uses hardware information (which is a quite reliable source) to choose the keyboard model in a database. It means that if your keyboard model has been found in the database, your keys are recognized out of the box.

At the same time, keycode = scancode + 8, https://unix.stackexchange.com/a/364652

Or, more like keycode = mapping_stuff(scancode) + 8.

So, the way I understand it works is:

Case 1

You have QWERTY keyboard with QWERTY firmware that is known to udev.
You press physical key Q, which generates scancode 16 (maybe?), which is sent to udev, which knows it is a QWERTY keyboard, and so it translates it to 24 keycode, then your compositor looks at what layout you have, and translates the keycode to corresponding keysym, which is XKB_KEY_q.

Case 2

You have AZERTY keyboard
You press A, which is in the same place as Q for QWERTY, which generates scancode (16? idk), but since udev knows this is AZERTY, it translates it to keycode 38, which gives you desired XKB_KEY_a in the end.

However, if you have layout level AZERTY or Dvorak, you'll get different keys for the same keycodes, so of course, hotkeys wont work as expected.

So my guess:

alternative hardware-based layouts will work fine
alternative software-based layouts (set in compositor) will have bindings work as if it was QWERTY

I could be wrong, I don't fully get how scancode to keycode translation happens, and AW covers it in terms of remapping, not deep dive "what will break when".

Ideally, we could use some virtual keyboard, that goes through all those layers, has its own /dev/input/... device, etc, to check.

Oh, there are also canonical key names, like AC07 for j. It's a thing. I don't know what to do with it, yet. Yeaaaah.

Maybe they are the proper way to do layout-agnostic stuff, idk.

e.g.

/* keycode = 44, `j` physical key */

/* this will always, independent of selected layout return "AC07" for keycode = 44 */
xkb_keymap_key_get_name(tofi->xkb_keymap, keycode);

from tofi.

philj56 commented on May 26, 2024

I am not sure it was a real lock up.

Yeah haha, but foolishly I'd used a fullscreen tofi window, and after switching to a different tty to kill tofi, sway wouldn't show again on changing back 😅.

That way we won't miss a match at the end of the long line, but still avoid taking too long.

Ah I meant "the first n matches per character", so for each pattern character, you'd only look at the next "n" matching characters - you'd still find a match at the end of the string if that was the only one.

...So another way to approach this is that if penalty for unmatched characters is large enough, we could just drop fuzzy search for this particular line, and use regular search, or maybe find n characters, then give up and try regular search.

Yeah I started to think this was the best approach too. It can still be fuzzy, but just return only the first match if the line's too long, rather than trying to find the best possible match.

So, the way I understand it works is:

👍 I'll have to read this again tomorrow when I'm more awake, but this seems like a good understanding of things. I'll do some more searching first, but I guess the easiest solution is just to use keycodes for now, as that fixes software layout stuff, and worry about other things if someone brings it up in the future. I also keep meaning to overhaul the input stuff at some point, at which point I may add customisable keybinds, and then we could also make this behaviour customisable as well. May decide not to go that route though, who knows.

Oh, there are also canonical key names, like AC07 for j. It's a thing. I don't know what to do with it, yet. Yeaaaah.

Haha, oh dear.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

so for each pattern character, you'd only look at the next "n" matching characters - you'd still find a match at the end of the string if that was the only one.

Oh, makes sense.

from tofi.

philj56 commented on May 26, 2024

Alright, after some more reading I think you're right and keycodes are the way to go for the CTRL-J style shortcuts. These are defined in <linux/input-event-codes.h> (without the +8 that XKB uses). Initially I wasn't sure what this would mean for *BSD, but it seems that at least FreeBSD uses the same definitions. OpenBSD is different, but Wayland isn't even available there yet so it's not an issue for now.

I've made the change and done some rearranging in d1b94b4, hopefully it should work fine. This is off-topic, but I think seeing as we're already using GLib functions for UTF-8 stuff, soon I'll set about ripping out any mention of wchar_t and make everything explicitly either UTF-32 or UTF-8 (This is partly motivated by seeing that C23 is finally adding some standard library stuff for them, and partly just because I've always hated the ambiguity around wchar_t).

Maybe they are the proper way to do layout-agnostic stuff, idk.

Ah I didn't see this edit before. I think the keycode stuff is probably fine, but that's good to know about.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

I've made the change and done some rearranging in https://github.com/philj56/tofi/commit/d1b94b4487c72b13320a281ad6a2fc291e4c79c7, hopefully it should work fine.

It does, thanks, looks a lot cleaner too!

This is off-topic, but I think seeing as we're already using GLib functions for UTF-8 stuff, soon I'll set about ripping out any mention of wchar_t...

I am not a C programmer, but yeah, wide characters always seemed weird to me.

I think the keycode stuff is probably fine, but that's good to know about.

Yes. I've also found where AC07 for j comes from - /usr/share/X11/xkb/keycodes/evdev (which is owned by xkeyboard-config, which is a dependency of libxkbcommon).

// translation from evdev scancodes to something resembling xfree86 keycodes.

default xkb_keycodes "evdev" {
	minimum = 8;
	maximum = 255;
....
	<AC07> = 44;
...
}

> ... evdev scancodes ...

I don't see them in linux source (just the numbers from <linux/input-event-codes.h>), but I haven't looked very thoroughly, they could be defined in different ways, or it could be just a Xorg thing pulled in libxkbcommon.

The meaning of AC07 is defined in grep -r AC07 /usr/share/X11/xkb/symbols for different layouts.

Also

means the first key (01) in the third row up (C) of the alphanumeric area (A). So why does it use that?

(from here)

So you'll probably be fine with either of keycodes or "canonical names", idk.

from tofi.

philj56 commented on May 26, 2024

It does, thanks, looks a lot cleaner too!

👍 Yeah that's the first step towards customisable keybindings, if that ends up being a thing.

means the first key (01) in the third row up (C) of the alphanumeric area (A). So why does it use that?

Nice find! So yeah, seems keycodes and "canonical names" are both fine.

I've limited the line length for "perfect" fuzzy matching to 100 characters in 820fb11, with a comment explaining the logic:

tofi/src/fuzzy_match.c

Lines 97 to 119 in 820fb11

    
                   /* 
        
                    * If the string is more than 100 characters, just find the first fuzzy 
        
                    * match rather than the best. 
        
                    * 
        
                    * This is required as the number of possible matches (for patterns and 
        
                    * strings all consisting of one letter) scales something like: 
        
                    * 
        
                    * slen! / (plen! (slen - plen)!)  ~  slen^plen for plen << slen 
        
                    * 
        
                    * This quickly grinds everything to a halt. 100 is chosen fairly 
        
                    * arbitrarily from the following logic: 
        
                    * 
        
                    * - e is the most common character in English, at around 13% of 
        
                    *   letters. Depending on the context, let's say this be up to 20%. 
        
                    * - 100 * 0.20 = 20 repeats of the same character. 
        
                    * - In the worst case here, 20! / (10! 10!) ~200,000 possible matches, 
        
                    *   which is "slow but not frozen" for my machine. 
        
                    * 
        
                    * In reality, this worst case shouldn't be hit, and finding the "best" 
        
                    * fuzzy match in lines of text > 100 characters isn't really in scope 
        
                    * for a dmenu clone. 
        
                    */ 
        
                   bool first_match_only = slen > 100;

For the Alice in Wonderland text wrapped to 100 character lines, repeatedly typing e still causes tofi to slow down a lot, but not hang for more than a second (and as mentioned, that's by far the worst case - normal searches shouldn't see slowdowns now).

This issue's mostly off the original topic and getting a bit long to search through now, so I'm gonna mark it as resolved and open some new ones for the other things you mentioned. Thanks again for all the input, it's very useful!

from tofi.

philj56 commented on May 26, 2024

One final word on this actually: I've done the conversion from wchar_t to explicit UTF-8 / UTF-32 in 3bbd8ff. This shouldn't change behaviour at all, and I'll make a new release in a couple of days if no problems present themselves, as there's been a lot of changes by now.

from tofi.

estnml commented on May 26, 2024

Hello, i would be very happy if u could help me how can i launch tofi with clipboard content from terminal.

from tofi.

MahouShoujoMivutilde commented on May 26, 2024

I use copyq as an actual clipboard manager, tofi is just an interface for it.

https://gist.github.com/MahouShoujoMivutilde/7113baadb13208d91f4271a28debc58b

Replace dmenu with tofi --fuzzy-match true.

from tofi.

Case insensitivity for non-latin about tofi HOT 23 CLOSED

Comments (23)

General notes

UTF8

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	const int adjacency_bonus = 15;
	const int separator_bonus = 30;
	const int camel_bonus = 30;
	const int first_letter_bonus = 15;

	/*
	* If the string is more than 100 characters, just find the first fuzzy
	* match rather than the best.
	*
	* This is required as the number of possible matches (for patterns and
	* strings all consisting of one letter) scales something like:
	*
	* slen! / (plen! (slen - plen)!) ~ slen^plen for plen << slen
	*
	* This quickly grinds everything to a halt. 100 is chosen fairly
	* arbitrarily from the following logic:
	*
	* - e is the most common character in English, at around 13% of
	* letters. Depending on the context, let's say this be up to 20%.
	* - 100 * 0.20 = 20 repeats of the same character.
	* - In the worst case here, 20! / (10! 10!) ~200,000 possible matches,
	* which is "slow but not frozen" for my machine.
	*
	* In reality, this worst case shouldn't be hit, and finding the "best"
	* fuzzy match in lines of text > 100 characters isn't really in scope
	* for a dmenu clone.
	*/
	bool first_match_only = slen > 100;