qarmin / czkawka Goto Github PK
View Code? Open in Web Editor NEWMulti functional app to find duplicates, empty folders, similar images etc.
License: Other
Multi functional app to find duplicates, empty folders, similar images etc.
License: Other
It will allow to easy check changes provided by PR
https://docs.github.com/en/free-pro-team@latest/actions/guides/storing-workflow-data-as-artifacts
For now development is stopped due this bug in Intelij Rust - intellij-rust/intellij-rust#5146
You could use approach of https://github.com/kornelski/dupe-krill to hash only as little as necessary, instead of hashing whole files.
Delete button can be clicked too easily.
For now arguments are parsed manually, but some people find hard to read my code, so maybe better idea is to use some lightweight argument parser.
Some of possible libraries:
For now CLI information's are very simple and provide very large amount of useful information's but rest may be written to instruction (Instruction.md)
When doing a recursive search over my home directory the GUI crashes with the following:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "H\xE2\xDE\xECuX\xAE\xF2\xC0\xDC2Xpsf\xD0~L(\u{1}"', czkawka_core/src/duplicate.rs:256:92
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I cloned from github on the 2020-10-06T16:45
Program was started with cargo run --bin czkawka_gu
is this enough or do you need more debug info?
For now development is focused on Linux, but with little(or even without any) changes it should work also on MacOS.
Probably GTK is the biggest issue for providing multi OS support.
Windows port will require some changes in code, because it works fine now only with directories which starts with /
.
For now this issue is very low on my TODO list, but maybe someone want to push changes to change that.
State of cross-compilation from Linux(needed for providing official binaries)
Tool | Windows | MacOS |
---|---|---|
Czkawka CLI | X | X |
Czkawka GUI Orbtk | X | ? |
Czkawka GUI GTk | X | ? |
Compilation state when building Czkawka for native OS
Tool | Linux | Windows | MacOS |
---|---|---|---|
Czkawka CLI | X | X | X |
Czkawka GUI Orbtk | X | X | X |
Czkawka GUI GTk | X | - | ? |
Legend
X - works
- - not works
? - doesn't checked
For now Blake3 is really fast, but using hashes with smallest number of bits e.g. 64 could also improve performance.
With current code implementation collisions, can only happens when size of files are equal, so this is really unlikelly(0.000000000001% I think)
I wanted to search for duplicates but the program crashes when processing a file with a special char (ü).
I used czkawka version 1.0.0
I also had FSLint installed and tried the same search with that to prevent hardware or filesystem issues and it worked fine.
RUST_BACKTRACE=full ./czkawka_gui
thread 'main' panicked at 'called Result::unwrap()
on an Err
value: "\x81ber die Firma.txt"', czkawka_core/src/duplicate.rs:253:92
stack backtrace:
0: 0x5573ba7259b5 -
1: 0x5573ba74979c -
2: 0x5573ba7239b2 -
3: 0x5573ba7282b0 -
4: 0x5573ba727ffc -
5: 0x5573ba7288f3 -
6: 0x5573ba7284eb -
7: 0x5573ba748131 -
8: 0x5573ba747f53 -
9: 0x5573ba6ed09c -
10: 0x5573ba6dda16 -
11: 0x7f698b113346 -
12: 0x7f698b12e9ff - g_signal_emit_valist
13: 0x7f698b12f12f - g_signal_emit
14: 0x7f698c4b0add -
15: 0x7f698c4b0b35 -
16: 0x7f698b113346 -
17: 0x7f698b12e9ff - g_signal_emit_valist
18: 0x7f698b12f12f - g_signal_emit
19: 0x7f698c4aef90 -
20: 0x7f69855f3dae - ffi_call_unix64
21: 0x7f69855f371f - ffi_call
22: 0x7f698b113ced - g_cclosure_marshal_generic_va
23: 0x7f698b113346 -
24: 0x7f698b12e9ff - g_signal_emit_valist
25: 0x7f698b12f12f - g_signal_emit
26: 0x7f698c56ba36 -
27: 0x7f698b116008 - g_cclosure_marshal_VOID__BOXEDv
28: 0x7f698b113346 -
29: 0x7f698b12e9ff - g_signal_emit_valist
30: 0x7f698b12f12f - g_signal_emit
31: 0x7f698c568d0e -
32: 0x7f698c56a2fb -
33: 0x7f698c56cf5e -
34: 0x7f698c53a721 - gtk_event_controller_handle_event
35: 0x7f698c6fa26b -
36: 0x7f698c5b48f7 -
37: 0x7f698b113346 -
38: 0x7f698b12e3cd - g_signal_emit_valist
39: 0x7f698b12f12f - g_signal_emit
40: 0x7f698c6fc534 -
41: 0x7f698c5b186e -
42: 0x7f698c5b3948 - gtk_main_do_event
43: 0x7f698c0c4765 -
44: 0x7f698c0f4f92 -
45: 0x7f698ae38417 - g_main_context_dispatch
46: 0x7f698ae38650 -
47: 0x7f698ae38962 - g_main_loop_run
48: 0x7f698c5b2a25 - gtk_main
49: 0x5573ba6cca20 -
50: 0x5573ba6e7033 -
51: 0x5573ba728cc3 -
52: 0x5573ba6ce592 -
53: 0x7f698a3e1b97 - __libc_start_main
54: 0x5573ba6c11ae -
55: 0x0 -
über die Firma.txt: Non-ISO extended-ASCII text, with CRLF line terminators
OS: Linux Mint 19.3 Tricia x86_64
Kernel: 5.4.0-48-generic
FileSystem: ext4
Upper tabs should be remembered, because not every single tab is available in some modes
Sometimes users use very complicated settings which is very hard to set again and again.
When doing a recursive search over my home directory the GUI crashes with the following:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "H\xE2\xDE\xECuX\xAE\xF2\xC0\xDC2Xpsf\xD0~L(\u{1}"', czkawka_core/src/duplicate.rs:256:92
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I cloned from github on the 2020-10-06T16:45
Program was started with cargo run --bin czkawka_gu
is this enough or do you need more debug info?
Sometimes they are just result of error or forgotten operations.
It should never be deleted without knowing that this file is unused
For now Czkawka is only compiled on CI, but it will be really great to be able to run it inside CI to find possible (xvfb-run
should help with GUI).
Step with Valgrind also should be helpful(only with CLI, because GUI shows a lot of leak from GTK and SDL(Orbtk))
Fslint have this.
I'm not sure how exactly check if file have debug symbols.
Probably the best idea will be use file and strip commands
Translation page - https://crowdin.com/project/czkawka
Anyone can translated strings directly in this page or just download file https://github.com/qarmin/czkawka/blob/master/i18n/en/czkawka_gui.ftl, change it offline and later upload it directly to crowdin
Only some most popular languages will be supported due to crowdin limits
Gtk rs is mess, but still probably the only good solution for rich UI in Rust.
It require reading C code and documentation with C examples.
Todo
Here is the log when I call cargo install czkawka_gui
. Executed it on my mac with latest OS version.
error: failed to run custom build command for `pango-sys v0.10.0`
Caused by:
process didn't exit successfully: `/var/folders/nj/9ppfyhkj0dlfjcjdl0kk3cc80000gn/T/cargo-installpksMWD/release/build/pango-sys-a23ee1b9605e9182/build-script-build` (exit code: 1)
--- stdout
cargo:rerun-if-env-changed=PANGO_NO_PKG_CONFIG
cargo:rerun-if-env-changed=PKG_CONFIG
cargo:rerun-if-env-changed=PANGO_STATIC
cargo:rerun-if-env-changed=PANGO_DYNAMIC
cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64-apple-darwin
cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64_apple_darwin
cargo:rerun-if-env-changed=HOST_PKG_CONFIG_PATH
cargo:rerun-if-env-changed=PKG_CONFIG_PATH
cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64-apple-darwin
cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64_apple_darwin
cargo:rerun-if-env-changed=HOST_PKG_CONFIG_LIBDIR
cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR
cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64-apple-darwin
cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64_apple_darwin
cargo:rerun-if-env-changed=HOST_PKG_CONFIG_SYSROOT_DIR
cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
--- stderr
`"pkg-config" "--libs" "--cflags" "pango" "pango >= 1.36"` did not exit successfully: exit code: 1
--- stderr
Package pango was not found in the pkg-config search path.
Perhaps you should add the directory containing `pango.pc'
to the PKG_CONFIG_PATH environment variable
No package 'pango' found
Package pango was not found in the pkg-config search path.
Perhaps you should add the directory containing `pango.pc'
to the PKG_CONFIG_PATH environment variable
No package 'pango' found
warning: build failed, waiting for other jobs to finish...
error: failed to compile `czkawka_gui v1.1.0`, intermediate artifacts can be found at `/var/folders/nj/9ppfyhkj0dlfjcjdl0kk3cc80000gn/T/cargo-installpksMWD`
Caused by:
build failed
For now if user twice check duplicates, then hash of files which still have same size needs to be checked again which is a little slow.
This option should be switchable from GUI.
It is needed by a lot of things.
Single selection will be used to select and delete included and excluded directories.
Multi selection will be used when selecting multiple entries by duplicate or empty folder finder.
Selecting oldest, newest etc. entries by button also will be available when this will be implemented.
For now I couldn't find any gtk rs examples which I could use.
Files which ending to ~
or #
(and a lot of others) are temporary files which can be deleted.
- Support for file names with non Unicode characters #44
czkawka dies when encountering a file date that it doesn't like, but it also doesn't say which file that may be. Running it with BACKTRACE options does not give any additional info.
You may want to enhance the exception handling to output files that it has problems with.
Full output:
me@mymachine:~/my_huge_directory$ RUST_BACKTRACE=full ~/Downloads/czkawka_gui
thread 'main' panicked at 'Invalid file date: SystemTimeError(1132781810s)', czkawka_core/src/duplicate.rs:289:71
stack backtrace:
0: 0x55f20392b9b5 -
1: 0x55f20394f79c -
2: 0x55f2039299b2 -
3: 0x55f20392e2b0 -
4: 0x55f20392dffc -
5: 0x55f20392e8f3 -
6: 0x55f20392e4eb -
7: 0x55f20394e131 -
8: 0x55f20394df53 -
9: 0x55f2038f303c -
10: 0x55f2038e3a16 -
11: 0x7f795d3b0a56 -
12: 0x7f795d3cfb28 - g_signal_emit_valist
13: 0x7f795d3d00d3 - g_signal_emit
14: 0x7f795d9c82ae -
15: 0x7f795d9c8318 -
16: 0x7f795d3b0802 - g_closure_invoke
17: 0x7f795d3c4962 -
18: 0x7f795d3cfb9e - g_signal_emit_valist
19: 0x7f795d3d00d3 - g_signal_emit
20: 0x7f795d9c6754 -
21: 0x7f795dc77ae1 -
22: 0x7f795d3b0a56 -
23: 0x7f795d3cfb28 - g_signal_emit_valist
24: 0x7f795d3d00d3 - g_signal_emit
25: 0x7f795da8efcc -
26: 0x7f795d3b3c56 - g_cclosure_marshal_VOID__BOXEDv
27: 0x7f795d3b0a56 -
28: 0x7f795d3cfb28 - g_signal_emit_valist
29: 0x7f795d3d00d3 - g_signal_emit
30: 0x7f795da8c012 -
31: 0x7f795da8d65b -
32: 0x7f795da90646 -
33: 0x7f795da57bb0 - gtk_event_controller_handle_event
34: 0x7f795dc1a16d -
35: 0x7f795dc715ef -
36: 0x7f795d3b0a56 -
37: 0x7f795d3cedd1 - g_signal_emit_valist
38: 0x7f795d3d00d3 - g_signal_emit
39: 0x7f795dc1bc23 -
40: 0x7f795dad7128 -
41: 0x7f795dad93db - gtk_main_do_event
42: 0x7f795d7c1f79 -
43: 0x7f795d7f5106 -
44: 0x7f795d2c4fbd - g_main_context_dispatch
45: 0x7f795d2c5240 -
46: 0x7f795d2c5533 - g_main_loop_run
47: 0x7f795dad837d - gtk_main
48: 0x55f2038d2a20 -
49: 0x55f2038ed033 -
50: 0x55f20392ecc3 -
51: 0x55f2038d4592 -
52: 0x7f795d0620b3 - __libc_start_main
53: 0x55f2038c71ae -
54: 0x0 -
Files filled with 0x00, are usually broken.
There is no to much there, but still this can be usable
czkawka/czkawka_core/src/lib.rs
Line 14 in d9bfb41
You can get the version of the package directly from cargo.
pub const CZKAWKA_VERSION: &str = env!("CARGO_PKG_VERSION");
I would like to see in Czkawka support for finding biggest files in providing location.
By default I think that 50 biggest files should be displayed but with ability to change this value by user.
The only problem is if user choose 50 files but there is 100 files with same size, then should be everything displayed or just 50?
This allow easy check what file contains
I don't know how and where to kill thread when it is unnecessary.
Pressing stop should immediately stops searching.
For now Czkawka using GTK 3 because GTK 4 isn't publicly available and bindings for it are not ready, but still I want to upgrade to newer version as fast as possible, since GTK 4 is only needed to when building app, not in runtime.
when I run $ czkawka_cli dups -d $PWD
, czkawka finds duplicate files and deletes them without asking for permission, giving a warning or showing that files were deleted in its output:
$ dd if=/dev/urandom of=file-1 bs=64M count=16
$ cp file-1 file-2
$ czkawka_cli dup -d $PWD
Found 2 duplicated files in 1 groups with same content which took 512.00 MiB:
Size - 512.00 MiB (536870896) - 2 files
/home/she3sha3y/dups/file-1
/home/she3sha3y/dups/file-2
----
czkawka deleted file-2
without saying in the output. Help does not say that this command deletes files
$ czkawka_cli dup -h
czkawka_cli-dup 1.0.0
Finds duplicate files
This is very VERY dangerous. If I run this from the home directory, or worse, as root from the root directory, it could break a system and I would not even know. Many language libraries have duplicate files e.g. use the same package manager, Python with init.py. I could have two similar binaries with different filenames. Breaking a system is as easy as (DON'T TRY THIS! ) sudo czkawka_cli dups -d/
.
IMO the command should just print out dups and an option should be explicitly typed to invoke deletion (after asking permission) and preferably it should refuse to run at all as the root user or in directories like /usr/
/etc/
/bin/
and exit gracefully with a message Can't run this command as root
, Can't run this command in /bin/ as it could potentially break your system
.
This should be fixed urgently! Thank you in advance. 😀
For now checking of file hashes or its size are done in one thread which seems to not work well with with very fast disks which are capable to read more data, but are limited by processor.
For now temporary files finder finds only files which ends with some extensions
czkawka/czkawka_core/src/temporary.rs
Line 195 in d3652c1
~/home/cache
available from GUINow users doesn't know how much data was processed and how much remaining.
It will be really great if Czkawka will be available in Debian and Ubuntu repository.
Maybe someday.
Add a feature to search and list symlinks that point to a nonexisting file or are pointing in a infinite loop.
For now pressing Search button will freeze entire GUI until searching ends which is really bad.
Probably running searching in another thread will allow to create smother UI and also to pause and stop search without needing to kill app
This is implemented in dupeguru
First of all, thank you so much for the program! I used to do this cleaning with tedious, slow, and imperfect multiline shell commands. This is my first time using a program like czkawka. I never tried fslint
and would like to leave that job for a compiled and highly performant program like czkawka.
I have some suggestions to improve the CLI command:
Add an option to separate output by a null character: similiar to how fd
and find
have a --print0 / -0
. That way any output, even directories and files with spaces in their names, could be piped into xargs -0
, to be used inside a larger script.
$ czkawka_cli empty-files -i $HOME -0 | xargs -0I% touch %/not-empty-now.html
infer -i
from current directory if not present: If I run czkawka empty-files
from ~/some/directory
, I get the error FATAL ERROR: Parameter -i with set of included files is required
. It would be much better if it could just run as if -i ~/some/directory
and issue a warning that it is running from there. I could live without -i
at all if I had that and cd
.
Use long and short subcommands instead of single-letter arguments with two dashes: i.e.
$ czkawka ef -i $HOME -delete
$ czkawka empty-files -i $HOME -delete
This seems more natural to me. I rarely see CLI programs that use the current argument style of czkawka. That will also make it hard to write shell completions since, most shell expect a behaviour similar to GNU coreutils. Of course, I don't know Rust so I can't tell how hard that would be or whether the current CLI was made like that for a certain reason.
output of big
subcommand is perfect but, it cannot be used inside larger commands because of the prefix ???.?? MiB (?????? bytes) - *
420.73 MiB (441170444 bytes) - /home/user/Documents/file-1
418.49 MiB (438823124 bytes) - /home/user/Documents/file-102
387.79 MiB (406631076 bytes) - /home/user/Music/file-293
375.57 MiB (393813512 bytes) - /home/user/Downloads/file-555
maybe you could add --short / -S
options (or even better: --output / -O <quiet|minimal|verbose|debug>
). In that case I could, for example, find the largest videos with czkawka big
and pipe it to ffmpeg
for compression. for now my easiest solution would be to use awk or sed in the middle:
# Does not work
~ $ czkawka big -i $HOME -x 'mp4' | xargs ffmpeg {ffmpeg options}
#Works
~ $ czkawka big -i $HOME -x 'mp4' | awk '{$1="";$2="";$3="";$4="";$5=""; print $0}' | xargs ffmpeg {ffmpeg options}
# Proposed solution, works with spaced filenames too.
~ $ czkawka big -x 'mp4' -S -0 | xargs -0 ffmpeg {ffmpeg options}
Help is not too helpful: maybe with separate subcommand for each task you could add help to indvidual subcmds in a GNU-generic formatAs of now, the help message dumps everything czkawka could do with very long lines for every subcommand. and there is no man page. It took my a while longer to read compared to other help messages.
czkawka is not safe against .git folders. i.e.
~/Documents/project $ czkawka_cli empty-folders -d $(pwd)
Found 4 empty folders
/home/user/Documents/project/.git/branches/
/home/user/Documents/project/.git/objects/info/
/home/user/Documents/project/.git/objects/pack/
/home/user/Documents/project/.git/refs/tags/
Add a find -xdev
equivalent. Add a function to only search on one filesystem.
There should be easy way to install and upgrade app on Linux
This project should have even basic website in Github or alone.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.