Comments (10)
This has to do with the default buffersize of FIDO which is 128 kb.
Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature.
If you increase it to say 512 kb, FIDO will correctly recognise the file.
Example:
fido.py -bufsize 512000
You also might want to increase the default buffersize by changing the default settings in the code.
from fido.
Interesting. This would be the first example that I’ve seen of a file that needs more than the default 128kb to identify. I wonder if there is a better signature for AI 14? I’ve never looked at the format, but it would be surprising if one actually needed to look at 500kb before knowing a file really is an AI 14 one.
Cheers,
Adam.
From: Maurice de Rooij [mailto:[email protected]]
Sent: 03 October 2013 10:58
To: openplanets/fido
Subject: Re: [fido] Adobe Illustrator 14 file identified as PDF 1.5, not AI (#41)
This has to do with the default buffersize of FIDO which is 128 kb.
Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature.
If you increase it to say 512 kb, FIDO will correctly recognise the file.
Example:
fido.py -bufsize 512000
You also might want to increase the default buffersize by changing the default settings in the code.
—
Reply to this email directly or view it on GitHub #41 (comment) .
Adam Farquhar
Head of Digital Scholarship
Collections Division
T:+44 (0)20 7412 7832
[email protected]
The British Library
London
NW1 2DB
http://www.bl.uk/
The British Library’s latest Annual Report and Accounts
http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge
http://www.bl.uk/emaildisclaimer.html
from fido.
Indeed interesting.
Unfortunately Adobe has not published specifications for this format (or maybe I just did not find them...)
After further examination it looks like the section between the PDF header and the AI subset header exists out of
- a JPG thumbnail
- XMP/RDF metadata with audit trail information (saved date, etc)
- XMP metadata with information about swatches, colormodes and fonts
- inline font streams
Based on this we might assume the binary distance between the PDF header and the AI subset header is very variable, and depends heavily on the existence and number/size of earlier mentioned items.
from fido.
Reopened for discussion
from fido.
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145433, 2012/01/17-15:11:19 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/vnd.adobe.illustrator</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Looking For Adventure</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>Yogesh Sharma</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/">
<xmp:MetadataDate>2012-02-06T17:31:28+05:30</xmp:MetadataDate>
<xmp:ModifyDate>2012-02-06T17:31:28+05:30</xmp:ModifyDate>
<xmp:CreateDate>2012-01-12T16:09:39+05:30</xmp:CreateDate>
<xmp:CreatorTool>Adobe Illustrator CS6 (Macintosh)</xmp:CreatorTool>
Adam Farquhar
Head of Digital Scholarship
Collections Division
T:+44 (0)20 7412 7832
[email protected]
The British Library
London
NW1 2DB
http://www.bl.uk/
The British Library’s latest Annual Report and Accounts
http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge
http://www.bl.uk/emaildisclaimer.html
from fido.
Heh, adventure indeed 👍
from fido.
Updated the section about read buffers in the FIDO Usage Guide.
from fido.
Would picking the format out of the XMP payload be more reliable than looking for the "%AI5_FileFormat" comment?
from fido.
Possibly Andy.
Playing around with this format currently in CS6, and looking at the XMP payload seems more reliable.
If the XMP payload is proven to be more reliable the advanced signature should be submitted to PRONOM.
Of course it will be added to the extension file for the time being...
from fido.
Closed due lack of recent activity.
from fido.
Related Issues (20)
- Question re: regex used in FIDO HOT 3
- Price-matching other repos HOT 3
- No 1.4.0 release available HOT 1
- Crash on XLS format 59 HOT 3
- FIDO should use the latest PRONOM release (v.96)
- 1.4.1 wheel does not match source, missing format file HOT 1
- Pronom version number needs to be updated HOT 2
- setuptools requirement in setup.py:install_requires is unsafe HOT 1
- Fido hanging on skeleton stream (fmt/1000) HOT 3
- Current fido release 1.4.1 does not find pronom v95 HOT 1
- olefile as a dependency at version >= 0.46 HOT 2
- fido documentation link fails HOT 2
- Updating signatures fails when the URL of the reference file identifier can't be found HOT 2
- convert PRONOM formats to FIDO signature fails HOT 7
- Migrate from 1.4.1 to 1.6.1 : FileNotFoundError: [Errno 2] No such file or directory: '.../fido/conf/formats-v104.xml' HOT 13
- Automation of update of FIDO signature site HOT 1
- Python 2 begone. HOT 1
- Migrate FIDO documentation to docs directory HOT 1
- FIDO should support multiple signature sources
- fido uses PRONOM v109 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fido.