Comments (17)
I think this is more to do with using TIF
as an extension that using a period. The validation error is coming from https://github.com/MarcusBarnes/mik/blob/master/src/inputvalidators/CsvBooks.php#L140. Coincidentally I opened a PR (#496) two weeks ago that will let you indicate TIF
as a valid file extension.
Just to be sure, can you change the filenames back to using periods as separators but leave the (currently invalid until we merge #496) TIF
extension as is and see what happens?
from mik.
Sorry, my logic is wrong. Change the extension to tif
and try using periods as separators.
from mik.
OK, a couple of tests:
Separator _
, extension tif
= success
Separator _
, extension TIF
= fail
Separator .
, extension TIF
= fail
Separator .
, extension tif
= success
So it looks like the uppercase/lowercase extension is the culprit.
But interestingly, it behaves very unpredictably in my set -- because the first record is not .0001.tif
but instead .0000.tif
.
The presence of this file results in weird behaviours. Typically directory 0 just isn't generated, but it can also cause MIK to skip, say, directory 2, or more directories.
from mik.
@bondjimbond when you say "can also", are you saying that it behaves differently across runs of MIK with the same input data?
from mik.
when you say "can also", are you saying that it behaves differently across runs of MIK with the same input data?
Yes... I was running with a test directory of just four images (0000.tif through 0003.tif).
The first few runs, it produced directories 1, 2, and 3.
Later runs, just 1 and 3.
Another later run, just 3.
After removing 0000.tif, it consistently produced 1, 2, and 3.
from mik.
Could you zip up all your data and config files and send them to me so I can try to replicate that?
from mik.
Sure, here's my test directory and ini file: https://vault.sfu.ca/index.php/s/ChGaez7NLOygY3w
from mik.
OK, got it, I'll give it a try this evening.
from mik.
Can you send me your mappings file?
from mik.
Ack, of course, sorry
barkerville_mapping.txt
from mik.
@bondjimbond Strangely, I can't replicate the behavior you are seeing. I ran MIK about 10 times and always got the same thing: page objects for pages 1-3 and an error indicating a problem with the 0000
file ([...]
added by me):
"message":"mkdir(): File exists" [...] "filename_segments":["1987","0019","0039","0000"],"page_number":""
The problem is coming from https://github.com/MarcusBarnes/mik/blob/master/src/writers/CsvBooks.php#L132-L134: since we trim all left padding 0s, we need something other than a 0000 as the page number. I'm not sure a fix to allow 0000
as a page number would be trivial.
from mik.
Although a check to see if $page_number
is an empty string, and if it is, assign it a value of 0
to create a 0
directory, might be a simple fix. But do you want 0
to be the first page number instead of 1
?
Something like:
$page_number = ltrim(end($filename_segments), '0');
if (strlen($page_number) === 0) {
$page_number = '0';
}
$page_level_output_dir = $book_level_output_dir . DIRECTORY_SEPARATOR . $page_number;
mkdir($page_level_output_dir);
from mik.
Just tried that, it worked:
/tmp/brandon_books/
└── 1987
├── 0
│ ├── MODS.xml
│ └── OBJ.tif
├── 1
│ ├── MODS.xml
│ └── OBJ.tif
├── 2
│ ├── MODS.xml
│ └── OBJ.tif
├── 3
│ ├── MODS.xml
│ └── OBJ.tif
└── MODS.xml
MODS.xml for page 0 is:
<titleInfo>
<title>This is a title, page 0</title>
</titleInfo>
</mods>
MODS.xml for page 1 is:
<titleInfo>
<title>This is a title, page 1</title>
</titleInfo>
</mods>
from mik.
That's exactly what I need! :)
from mik.
OK, I can open a PR for this if you want.
from mik.
Please do!
from mik.
I've made the same change to the CsvNewspapers writer and pushed up the issue-498 branch. I'll need to assemble some test data later but once I do that I'll open a PR.
from mik.
Related Issues (20)
- include_migrated_from_uri hard-codes SFU URL HOT 11
- regex_replace filter not available? HOT 9
- Postwritehook suggestion: reorganize output by filetype? HOT 3
- CDM toolchains extracting wrong files due to repeating ID numbers HOT 6
- Toolchain: CONTENTdm compound PDFs HOT 33
- CSV Compound Child metadata ignored HOT 7
- Make extensions of page images configurable in CSV Book and Newspaper toolchains HOT 1
- Test using PHP 7.2 and 7.3
- checkconfig doesn't detect missing CSV input file
- Simple Archive toolchain HOT 72
- OAI toolchain error: "Undefined namespace prefix" HOT 1
- OAI toolchain error: "No such file or directory" HOT 1
- OAI-PMH Filegetter only works for DC HOT 11
- Mysterious mapping error? HOT 9
- OAI: If item identifier has special characters, temp metadata filename doesn't match filegetter HOT 4
- Islandora 8: Create a toolchain that produces YAML config files for CSV migrations HOT 1
- CsvCompound fails to create directories HOT 1
- Delimiter problem in CSV Newspapers toolchain? HOT 15
- Mysterious failures in the CSV Newspaper toolchain HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mik.