Coder Social home page Coder Social logo

xls format give two results about fido HOT 7 CLOSED

openpreserve avatar openpreserve commented on September 26, 2024
xls format give two results

from fido.

Comments (7)

anjackson avatar anjackson commented on September 26, 2024

I believe this is the correct result, as PRONOM/DROID/Fido cannot distinguish between these two formats. Your document could be Excel 97 or Excel 2000-2003.

from fido.

adamfarquhar avatar adamfarquhar commented on September 26, 2024

I guess that the signature ‘does not’ distinguish between the two formats. Perhaps someone could look at improving the signature for this case?

From: Andy Jackson [mailto:[email protected]]
Sent: 06 September 2013 15:34
To: openplanets/fido
Subject: Re: [fido] xls format give two results (#37)

I believe this is the correct result, as PRONOM/DROID/Fido cannot distinguish between these two formats. Your document could be Excel 97 or Excel 2000-2003.


Reply to this email directly or view it on GitHub #37 (comment) .

from fido.

techmaurice avatar techmaurice commented on September 26, 2024

Will look into this. If there is a way to advance the signature going to submit this to PRONOM.

from fido.

anjackson avatar anjackson commented on September 26, 2024

According to this relevant StackOverflow question, it looks like being a rather difficult case, similar in nature to the issues around the TIFF tiff. The base file format is the same for all (BIFF8), and in the same way as for TIF, the only way to determine the particular version of the format is to fully parse the file and look for data that indicates the use of features added by later versions. In some cases, this is manageable via binary signatures (e.g. PNG versions are differentiated by feature usage).

Using a hierarchical format structure can avoid this issue, by having a single BIFF8 format which has two known sub-formats. As PRONOM is not hierarchical, the only options are to move to a new record for BIFF8 and deprecate the others (a la TIFF), or to invest a lot of energy in doing feature-based format distinction within the BIFF8 payload.

from fido.

techmaurice avatar techmaurice commented on September 26, 2024

Thanks Andy.

Feature based identification would be beyond the scope of Fido IMHO.
As for PRONOM not being hierarchical, it might be time for a revision of it. That would take time and money, but I think it is well worth it.

Another option of course is to urge institutions that have need for a broader and more actively maintained registry to finally do something about the the format registry issue. For what I know about former initiatives regarding GDFR and UDFR, it stalls because of the 'tomato, tomatoe' type of discussions about naming, semantics and who would be responsible.

from fido.

anjackson avatar anjackson commented on September 26, 2024

Back at the DROID 7 consultation event, TNA seemed pretty unlikely to change PRONOM significantly. It's hooked into SDB as well as DROID and I think it's quite expensive to modify. However, that was a while ago now.

FWIW, I feel that the issues behind the failures of UDFR, CRISP and to a lesser extent, Just Solve The Problem are not really about technology or modeling. It simply isn't clear to me that we found the people who will spend time adding the info and then consulting them, prototyping with them, building with them.

I am willing to spend a few hours every week or two on the Archive Team wiki, and I do, because I consider it a part of my job. Who else thinks the same? I don't know.

As for hierarchial format identifiers, we already have them in the form of mime types plus the mime-info specification. Just add a version parameter and you can cross-walk to PRONOM if you wish. Tika supports this approach, is simple enough to extend, and can also be used for feature detection if necessary, so most of my effort goes there right now. Fido is also essentially mime-info based, so it might be possible to patch the hierarchy on top of the PRONOM data, I guess.

from fido.

carlwilson avatar carlwilson commented on September 26, 2024

Closed due lack of recent activity.

from fido.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.