Coder Social home page Coder Social logo

Comments (10)

techmaurice avatar techmaurice commented on September 27, 2024

This has to do with the default buffersize of FIDO which is 128 kb.

Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature.

If you increase it to say 512 kb, FIDO will correctly recognise the file.

Example:
fido.py -bufsize 512000

You also might want to increase the default buffersize by changing the default settings in the code.

from fido.

adamfarquhar avatar adamfarquhar commented on September 27, 2024

Interesting. This would be the first example that I’ve seen of a file that needs more than the default 128kb to identify. I wonder if there is a better signature for AI 14? I’ve never looked at the format, but it would be surprising if one actually needed to look at 500kb before knowing a file really is an AI 14 one.

Cheers,

Adam.

From: Maurice de Rooij [mailto:[email protected]]
Sent: 03 October 2013 10:58
To: openplanets/fido
Subject: Re: [fido] Adobe Illustrator 14 file identified as PDF 1.5, not AI (#41)

This has to do with the default buffersize of FIDO which is 128 kb.

Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature.

If you increase it to say 512 kb, FIDO will correctly recognise the file.

Example:
fido.py -bufsize 512000

You also might want to increase the default buffersize by changing the default settings in the code.


Reply to this email directly or view it on GitHub #41 (comment) .

Adam Farquhar
Head of Digital Scholarship
Collections Division
T:+44 (0)20 7412 7832

[email protected]
The British Library
London

NW1 2DB

http://www.bl.uk/
The British Library’s latest Annual Report and Accounts

http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge

http://www.bl.uk/emaildisclaimer.html

from fido.

techmaurice avatar techmaurice commented on September 27, 2024

Indeed interesting.

Unfortunately Adobe has not published specifications for this format (or maybe I just did not find them...)

After further examination it looks like the section between the PDF header and the AI subset header exists out of

  • a JPG thumbnail
  • XMP/RDF metadata with audit trail information (saved date, etc)
  • XMP metadata with information about swatches, colormodes and fonts
  • inline font streams

Based on this we might assume the binary distance between the PDF header and the AI subset header is very variable, and depends heavily on the existence and number/size of earlier mentioned items.

from fido.

techmaurice avatar techmaurice commented on September 27, 2024

Reopened for discussion

from fido.

adamfarquhar avatar adamfarquhar commented on September 27, 2024

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145433, 2012/01/17-15:11:19 ">

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

  <rdf:Description rdf:about=""

        xmlns:dc="http://purl.org/dc/elements/1.1/">

     <dc:format>application/vnd.adobe.illustrator</dc:format>

     <dc:title>

        <rdf:Alt>

           <rdf:li xml:lang="x-default">Looking For Adventure</rdf:li>

        </rdf:Alt>

     </dc:title>

     <dc:creator>

        <rdf:Seq>

           <rdf:li>Yogesh Sharma</rdf:li>

        </rdf:Seq>

     </dc:creator>

  </rdf:Description>

  <rdf:Description rdf:about=""

        xmlns:xmp="http://ns.adobe.com/xap/1.0/"

        xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/">

     <xmp:MetadataDate>2012-02-06T17:31:28+05:30</xmp:MetadataDate>

     <xmp:ModifyDate>2012-02-06T17:31:28+05:30</xmp:ModifyDate>

     <xmp:CreateDate>2012-01-12T16:09:39+05:30</xmp:CreateDate>

     <xmp:CreatorTool>Adobe Illustrator CS6 (Macintosh)</xmp:CreatorTool>

Adam Farquhar
Head of Digital Scholarship
Collections Division
T:+44 (0)20 7412 7832

[email protected]
The British Library
London

NW1 2DB

http://www.bl.uk/
The British Library’s latest Annual Report and Accounts

http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge

http://www.bl.uk/emaildisclaimer.html

from fido.

techmaurice avatar techmaurice commented on September 27, 2024

Heh, adventure indeed 👍

from fido.

techmaurice avatar techmaurice commented on September 27, 2024

Updated the section about read buffers in the FIDO Usage Guide.

from fido.

anjackson avatar anjackson commented on September 27, 2024

Would picking the format out of the XMP payload be more reliable than looking for the "%AI5_FileFormat" comment?

from fido.

techmaurice avatar techmaurice commented on September 27, 2024

Possibly Andy.
Playing around with this format currently in CS6, and looking at the XMP payload seems more reliable.

If the XMP payload is proven to be more reliable the advanced signature should be submitted to PRONOM.
Of course it will be added to the extension file for the time being...

from fido.

carlwilson avatar carlwilson commented on September 27, 2024

Closed due lack of recent activity.

from fido.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.