Coder Social home page Coder Social logo

dharple / detox Goto Github PK

View Code? Open in Web Editor NEW
293.0 12.0 17.0 1.22 MB

Tames problematic filenames

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.23% C 81.62% Lex 0.41% Yacc 2.24% M4 0.86% Shell 13.17% Roff 0.23% PHP 0.24%
c filenames-change

detox's People

Contributors

a1346054 avatar chrysle avatar dharple avatar gy-mate avatar ninedotnine avatar sanjaymsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

detox's Issues

configure: error: cannot find install-sh ...

Hi Doug!
Thanks for your email regarding the updated version of detox. I tried to create an updated PKGBUILD for Arch Linux, but the below message comes up in the build process. This is my rewritten PKGBUILD for v1.3.0

==> Starting build()...
configure: error: cannot find install-sh, install.sh, or shtool in "." "./.." "./../.."
==> ERROR: A failure occurred in build().
Aborting...

man: detox -c

The example in the manpages detox -c my_detoxrc -L -v does not work; option -c is neither documented in the man page nor in --help.

EXAMPLES
...
     detox -c my_detoxrc -L -v
                 Will list the sequences within my_detoxrc, showing their filters and options.
$ detox -c my_detoxrc -L -v
detox: invalid option -- 'c'
usage: detox [-hLnrvV] [-f configfile] [-s sequence] [--dry-run] [--inline] [--special]
	  file [file ...]

Shoud be -f instead?

Empty default "eats up" valid characters

I want to reconfigure detox to have a less opinionated "safe" character subset (essentially, I want to keep Latin1 and Cyrillic chars but to remove unsafe chars for sh -c input such as '"$, SMB shares and HFS volumes). I have the following table:

Details
default

start


0x23	_	# '#'
0x25	_	# %
0x2b		+
0x2c	_	# ,
0x2d		-
0x2e		.
0x3d	_	# =
0x5e	_	# ^
0x5f		_
0x7e		~

#0x20		_	# space
0x21		_	# !
0x22		_	# "
0x24		_	# $
0x27		_	# '
0x2a		_	# *
0x2f		_	# /
0x3a		_	# :
0x3b		_	# ;
0x3c		_	# <
0x3e		_	# >
0x3f		_	# ?
#0x40			# @
0x5c		_	# \
0x60		_	# `
0x7c		_	# |

#0x28		-	# (
#0x29		-	# )
#0x5b		-	# [
#0x5d		-	# ]
0x7b		-	# {
0x7d		-	# }

0x26		_and_	# &

end

And a simple detoxrc:

Details
sequence default {
  safe {
	filename "/home/driib/mysafe.tbl";
  };
};

However, I get the following dry-run output where allowed characters and unsafe ones are stripped alike (seems like every second):

nas-now/downloads/z2020P2/01 5G Core Networks.pdf -> nas-now/downloads/z2020P2/0 GCr ewrspf1

Would you be so kind to point me in the right direction? If I add a non-empty default, the dry-run produces the correct output (albeit not the one I like):

nas-now/downloads/macOS High Sierra Patcher.dmg -> nas-now/downloads/_________________________.___

I will fall back to something like this but I would really like to take advantage of writing more complex sequences in detox instead:

find . -type d -print0 | xargs -0 -I '{}' sh -c "rename -n \"s/[\\\$!'\\\"]/_/g\" {}/*"

Thank you, Merry Christmas and happy holidays!

Move CP1252 to its own table

It probably only makes sense in the context of single byte characters, and it should be a separate filter using the ISO 8859-1 base.

Fix relative link recursion or remove support for it

While testing detox I accidentally ran it across my entire projects directory (all of my dev projects) when I created a symlink in /tmp that pointed at ../.. and then ran detox in a test with -r --special.

A lot of disturbing compiler complaints (char vs. unsigned char)

Dear Doug,

thanks for providing and sharing this nice tool. I discovered it today, thanks to an article in the c't, a highly regarded computer magazine here in Germany. In order to spread the joy, I packaged v1.3.3 for openSUSE right away here, in such a way, that it's ready for entering the official distribution, building on the blocks from Antoine Ginies. During that course, I noticed a lot of rather disturbing compiler complaints during build:

[    5s] gcc -DHAVE_CONFIG_H -I.  -DDATADIR=\"/usr/share\" -DSYSCONFDIR=\"/etc\"   -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -c -o detox.o detox.c
[    5s] detox.c: In function 'main':
[    5s] detox.c:339:29: warning: pointer targets in passing argument 1 of 'parse_file' differ in signedness [-Wpointer-sign]
[    5s]   339 |      file_work = parse_file(*file_walk, main_options);
[    5s]       |                             ^~~~~~~~~~
[    5s]       |                             |
[    5s]       |                             char *
[    5s] In file included from detox.c:46:
[    5s] file.h:38:49: note: expected 'unsigned char *' but argument is of type 'char *'
[    5s]    38 | extern unsigned char *parse_file(unsigned char *filename, struct detox_options *options);
[    5s]       |                                  ~~~~~~~~~~~~~~~^~~~~~~~
[    5s] detox.c:339:16: warning: pointer targets in assignment from 'unsigned char *' to 'char *' differ in signedness [-Wpointer-sign]
[    5s]   339 |      file_work = parse_file(*file_walk, main_options);
[    5s]       |                ^
[    5s] detox.c:340:16: warning: pointer targets in passing argument 1 of 'parse_dir' differ in signedness [-Wpointer-sign]
[    5s]   340 |      parse_dir(file_work, main_options);
[    5s]       |                ^~~~~~~~~
[    5s]       |                |
[    5s]       |                char *
[    5s] In file included from detox.c:46:
[    5s] file.h:40:38: note: expected 'unsigned char *' but argument is of type 'char *'
[    5s]    40 | extern void parse_dir(unsigned char *indir, struct detox_options *options);
[    5s]       |                       ~~~~~~~~~~~~~~~^~~~~
[    5s] detox.c:344:17: warning: pointer targets in passing argument 1 of 'parse_file' differ in signedness [-Wpointer-sign]
[    5s]   344 |      parse_file(*file_walk, main_options);
[    5s]       |                 ^~~~~~~~~~
[    5s]       |                 |
[    5s]       |                 char *
[    5s] In file included from detox.c:46:
[    5s] file.h:38:49: note: expected 'unsigned char *' but argument is of type 'char *'
[    5s]    38 | extern unsigned char *parse_file(unsigned char *filename, struct detox_options *options);
[    5s]       |                                  ~~~~~~~~~~~~~~~^~~~~~~~
[    5s] detox.c:347:20: warning: pointer targets in passing argument 1 of 'parse_special' differ in signedness [-Wpointer-sign]
[    5s]   347 |      parse_special(*file_walk, main_options);
[    5s]       |                    ^~~~~~~~~~
[    5s]       |                    |
[    5s]       |                    char *
[    5s] In file included from detox.c:46:
[    5s] file.h:42:42: note: expected 'unsigned char *' but argument is of type 'char *'
[    5s]    42 | extern void parse_special(unsigned char *in, struct detox_options *options);
[    5s]       |                           ~~~~~~~~~~~~~~~^~
[    5s] detox.c:366:20: warning: pointer targets in passing argument 1 of 'parse_inline' differ in signedness [-Wpointer-sign]
[    5s]   366 |       parse_inline(*file_walk, main_options);
[    5s]       |                    ^~~~~~~~~~
[    5s]       |                    |
[    5s]       |                    char *
[    5s] In file included from detox.c:46:
[    5s] file.h:44:41: note: expected 'unsigned char *' but argument is of type 'char *'
[    5s]    44 | extern void parse_inline(unsigned char *filename, struct detox_options *options);
[    5s]       |                          ~~~~~~~~~~~~~~~^~~~~~~~

You can see the full build log by clicking on the succeeded link.

Yes, our compilers are parameterized quite squeamishly, but it often helps to discover issues early. That doesn't work so well anymore, if a project triggers that many warnings, though...

Of course, we could muzzle the compiler, but it would be nice, if you could take a look into this issue yourself. I'm sure, eliminating these signedness issues improves the overall value of this fine project even more.

Modernize Makefile and configure

Eriberto, the Debian maintainer, has requested that I replace the existing Makefile.in and configure.in with more modern ones, Makefile.am and configure.ac.

Reduce casting between char and unsigned char

See issue #31 . Address the underlying issue: convert all internal strings over to either a signed or unsigned char *, and address the consequences of whichever choice is made.

Using signed chars causes the character math to get wonky.

Using unsigned chars causes warnings with the standard C library functions.

Update wipeup filter

The wipeup filter should trim _ and - off the end of the filename.

The remove_trailing option should probably be the normal behavior.

Refactor config_file_spoof and add other sequences

Running detox without a config file leaves the user with only the default sequence. This is OK, but doesn't agree with the man pages at all.

Refactor the config_file_spoof logic so that we can easily build extra sequences, and then build out a full set of sequences.

Alternatively, do something like what was done in #21, converting the stock detoxrc into C code that can be loaded at will.

Remove hackish inline detection

Right now we're building two different .o files from several .c files, based on the INLINE_MODE flag. This is overkill, and hackish. Create separate source files where appropriate, and clean up this mess.

Remove dot character from file

I think it would be a nice improvent. For example: I want rename foo.bar.baz.pdf to foo_bar_baz.pdf.

Thanks for your work!

Specify character set from the command line

From Eriberto (the Debian package maintainer):

I would like to suggest two features. Something as:

-c '^~': So, detox also will change characters ^ and ~ by _.
-d '^~': detox will delete the charactters if found.

This fits nicely in with my vision of v2, pushing all of the actual sequencing to the command line and away from config files and custom conversion tables.

Safe filter behaves significantly differently when the table is missing

The basic safe filter removes any character that doesn't match. This is significantly different from when the safe table can be loaded; the default is not set, so any unmatched character is left alone.

So, any UTF-8 characters that get passed through the basic safe filter get removed, while the same character passed through the table-based safe filter is left alone.

crash on directory with carriage returns and spaces

Something like this


$  stat *Syphilect*
  File: '     '$'\n\n''Epidemic Consummation'$'\n''by Syphilectomy'$'\n'

I did a mkdir from a clipboard copy of a web page. That's why it has the carriage returns in it.

the filename is exactly

anliot@ace ~/music/     Epidemic Consummationby Syphilectomy $ pwd |od -t x1 
0000000 2f 68 6f 6d 65 2f 61 6e 6c 69 6f 74 2f 6d 75 73
0000020 69 63 2f 20 20 20 20 20 0a 0a 45 70 69 64 65 6d
0000040 69 63 20 43 6f 6e 73 75 6d 6d 61 74 69 6f 6e 0a
0000060 62 79 20 53 79 70 68 69 6c 65 63 74 6f 6d 79 0a
0000100 0a
0000101

this didn't crash detox :
$ detox *Syphil* -s uncgi -v
I have built (and retested with ) your detox current code from github. I don't know the issue, but I wouldn't be a bit surprised if it was a library error.
I don't see any unicode in the filename, really

cheers!

Update regression tests to ignore system files

When a copy of detox is installed on the system (from an OS package or manually), it interferes with the regression tests, because detox will pick up on the translation tables or system config file and change the behavior being tested.

Update branches

Rename master to main (after a notice period) and create a 1.x branch to have a place for bug fixes on v1.x.

UTF-8 Filter behaves like the safe filter

The UTF-8 filter behaves like the safe filter, in that many characters between 0x20 and 0x3F are converted to _ or -. This should be done by the safe filter, and the UTF-8 filter should only be for transliterating UTF-8 to ASCII.

Add tests for --special

Add tests for the --special flag. Add tests for the --special argument with --recursive, when confronted with a symlink loop. Add symlinks that reference . or .. on tests that use --special --recursive.

Remove libpopt

popt was added years ago, to support a build on OS/X (I think). libpopt is no longer being maintained. Remove support for it.

Add a regression test for max_length with multiple extensions

Add a regression test confirming that the max_length filter does the right thing when confronted with .tar.gz or similar extension.

For instance, super long filename.tar.gz should prefer to be reduced down to super_lon.tar.gz instead of super_long_fil.gz.

Noticed while working on detoxrc.5 for #22.

Update the default runtime behavior

Update the default behavior of detox (with or without a config file) so that the only tables run are safe and wipeup. Add notes to the config file about the iso8859_1 and utf_8 sequences being transliteration.

Tag v1.4.0

Create a new version, 1.4.0, to release the current set of changes, and give the 1.x branch the ability to do some regression testing. Ties with issue #25 .

Relative paths not working

Running

detox -r -v .

doesn't find anything, while

detox -r -v $(pwd)

works as expected.

Detox version 1.3.0, Fedora 27.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.