Comments (10)
One solution I can think of could be to add an optional preprocess command that would be invoked on a file somewhere in fclones pipeline (if it doesn't affect the size of the files - better after matching by size). However that could be quite slow, because it would require launching a new process for every file.
Another option would be to make ignoring exif a built-in function, but in a modular way so different "on the fly preprocessors" could be easily added (even as dynamic library plugins). That would be less universal, but potentially more performant. Probably even we wouldn't need to save a copy of a file.
from fclones.
Piping is more universal because it would allow to use any program written in any language as a preprocessor and these programs are already available and people know how to use them (or they can easily look them up). So it is also much less work to get the job done.
Apologies if my questions above are a distraction.
No, not at all. This is an interesting idea and when I have a bit more time, I plan to investigate it.
from fclones.
The second approach sounds cleaner - partly since it would not require a lot of additional writes to disk. The trade off on it being less universal is interesting - what is the constraint that concerns you there?
With that said, piping files/data is the linux way - whether this should be a flag in fclones or whether it should just be able to interface with the likes of exiftool (or others) is probably another question.
Still, the second approach you mentioned does sound nice (and a fair bit beyond chapter 10 of the book - which is where I am now in rust).
Apologies if my questions above are a distraction. What you have made works well as it stands - I am not sure if you want to be thinking about any enhancements at this early stage.
from fclones.
Exciting to see the feature added to the 0.4.0 milestone. I will be looking forward to doing a diff between versions to try and grok how you pulled it together.
I went through your source code the other day and appreciated the commenting you had put through it - for yourself I am sure, but I appreciated it too. Thanks.
I also noticed how large the build directories got - I think it was sitting at about 1GB of space (I think that was on macos) - for some reason I didn't expect it all to take that much space (not that it matters at all - I was just surprised.
from fclones.
@stuzenz I pushed an early preview. There is new option: --transform <command>
. It transforms files on the fly, through the provided external command, without making a copy.
Silly examples that do nothing:
fclones . -R --transform dd
fclones . -R --transform 'dd if=$IN of=$OUT'
Unfortunately, as I expected, this is very slow compared to grouping without transformation. I believe spawning a new process for each file is the biggest cost. Anyways, this probably would be useful if you need to process a smaller number of bigger files.
from fclones.
Hi Piotr
That was quick work!
I look forward to checking out the early release - I am a little tight for time for a couple of days, I will give you some feedback at the end of the week or in the weekend (with that said - I bet I ignore my other jobs and have a look at this earlier).
A pity about the performance hit, but as you said to be expected - and it isn't really an issue for my use case.
I look forward to looking at the source as well.
Have a good week!
Cheers,
Stu
from fclones.
Hi Piotr,
After reading the --help I was still a little confused on the correct syntax to be using. I created a test directory along with some subdirectories (on archlinux OS)
I tried a number of different approaches with exiv2.
The below seemed like the simplest example I could come up with - which didn't work. To troubleshoot, I would like to try using the $IN variables - but I couldn't get the syntax to work - can you give an example of what the example syntax should be for the below example
(from the help)
"If the program does not support piping, but requires its input and/or output file path to be specified in the argument list, denote these paths by $IN and $OUT special variables. If $IN is specified in the command string, the file will not be piped to the standard input. If $OUT is specified in the command string, the result will not be read from the standard output, but fclones will set up a named pipe $OUT and read from that pipe instead"
Eg1
(base) ➜ fclone-test ls
dif Hooters-Tokyo-2012-11-09.jpg _IGP4130.JPG _IMG_0007.JPG _IMG_0031.JPG _IMG_0085.JPG _IMG_0301.JPG
'Hooters-Tokyo-2012-11-09 (copy).jpg' '_IGP4130 (copy).JPG' '_IMG_0007 (copy).JPG' '_IMG_0031 (copy).JPG' '_IMG_0085 (copy).JPG' '_IMG_0301 (copy).JPG' same
(base) ➜ fclone-test exiv2 _IGP4130.JPG
File name : _IGP4130.JPG
File size : 10750017 Bytes
MIME type : image/jpeg
Image size : 4928 x 3264
Camera make : PENTAX
Camera model : PENTAX K-5
Image timestamp : 2012:07:29 15:44:14
Image number :
Exposure time : 1/5000 s
Aperture : F3.5
Exposure bias : 0 EV
Flash : No, compulsory
Flash bias :
Focal length : 55.0 mm (35 mm equivalent: 82.0 mm)
Subject distance:
ISO speed : 1600
Exposure mode : Aperture priority
Metering mode : Multi-segment
Macro mode :
Image quality :
Exif Resolution : 4928 x 3264
White balance : Manual
Thumbnail : image/jpeg, 7853 Bytes
Copyright :
Exif comment :
(base) ➜ fclone-test exiv2 _IGP4130\ \(copy\).JPG
File name : _IGP4130 (copy).JPG
File size : 10687393 Bytes
MIME type : image/jpeg
Image size : 4928 x 3264
_IGP4130 (copy).JPG: No Exif data found in the file
(base) ➜ fclone-test fclones . * --transform 'exiv2 -d -a *'
[2020-07-05 15:04:23.331] fclones: info: Scanned 15 file entries
[2020-07-05 15:04:23.331] fclones: info: Found 12 (71.8 MB) files matching selection criteria
[2020-07-05 15:04:23.331] fclones: info: Found 12 (71.8 MB) candidates after grouping by size
[2020-07-05 15:04:23.331] fclones: info: Found 12 (71.8 MB) candidates after pruning hard-links
[2020-07-05 15:04:23.335] fclones: error: Failed to transform /home/stuart/Development/2020/07-July/fclone-test/_IMG_0031 (copy).JPG: exiv2 returned non-zero status code: 255
------------------------- STDERR -------------------------
*: Failed to open the file
Thanks!
from fclones.
Looking at how exiv2
works I fail to find how to tell it to make a copy of an image without metadata instead of dropping the metadata and saving the image back to the original file. If it can only modify files "in-place", then it won't work with --transform
. But on the other hand, if you're ok with losing metadata, there is probably no need to use --transform
at all - then you can just manually find + exiv2 your files before running fclones.
By default --transform
expects a command that reads data on its standard input and writes to its standard output - this way a temporary on-disk copy can be avoided.
The $IN
and $OUT
parameters can be used for commands that accept syntax <command> <input path> <output path>
and read a file from the input path and write out the result to the output path - in this case you run such command as follows:
fclones --transform '<command> $IN $OUT' <options> <paths>..
from fclones.
I managed to do it with the following script:
#/bin/bash
TEMP_FILE="/tmp/fclones.$RANDOM"
dd if="$1" of="$TEMP_FILE" bs=64k
exiv2 -d a "$TEMP_FILE"
dd if="$TEMP_FILE" of="$2" bs=64k
rm $TEMP_FILE
Save this to strip_exif.sh and the run:
fclones --transform '/bin/bash strip_exif.sh $IN $OUT' --names '*.JPG' -R ~/Images/Test
from fclones.
Hmm, what about making another --transform-copy
option that would do this automatically, without a script?
So then you could just directly call:
fclones --transform-copy 'exiv2 -d a $IN' ...
and $IN would be a copy of the file ?
from fclones.
Related Issues (20)
- fclones depends on libc6:amd64 (>= 2.36) HOT 4
- On android, running on termux, hard or soft link creation fails after creating the dupes file.
- Hard links are reported as duplicates
- Feedback, showdown against 3 other tools HOT 4
- How to find AND isolate/extract unique files that are in one directory but not another? HOT 2
- Add ability to filter by magic HOT 1
- How to deduplicate and compress?
- Sort file chunks instead of hashing HOT 1
- fclones scans a ton of files that have no chance of matching HOT 4
- fclones `move` ignores standard format log from `group`?
- /var/lib/snapd/void: Permission denied HOT 1
- ARM binary HOT 1
- group output according to order on command line
- "remove" does not work if some files have future modification dates HOT 2
- Why hash whole files? HOT 2
- hard links treated as duplicates
- Pattern matching doesn't work at all on Windows HOT 1
- the dir specified in --cache is empty after a complete run
- [Feature Request] Support for `dedupe` command on ReFS on Windows HOT 1
- index out of bounds: the len is 0 but the index is 0 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fclones.