Comments (7)
I believe this is the correct result, as PRONOM/DROID/Fido cannot distinguish between these two formats. Your document could be Excel 97 or Excel 2000-2003.
from fido.
I guess that the signature ‘does not’ distinguish between the two formats. Perhaps someone could look at improving the signature for this case?
From: Andy Jackson [mailto:[email protected]]
Sent: 06 September 2013 15:34
To: openplanets/fido
Subject: Re: [fido] xls format give two results (#37)
I believe this is the correct result, as PRONOM/DROID/Fido cannot distinguish between these two formats. Your document could be Excel 97 or Excel 2000-2003.
—
Reply to this email directly or view it on GitHub #37 (comment) .
from fido.
Will look into this. If there is a way to advance the signature going to submit this to PRONOM.
from fido.
According to this relevant StackOverflow question, it looks like being a rather difficult case, similar in nature to the issues around the TIFF tiff. The base file format is the same for all (BIFF8), and in the same way as for TIF, the only way to determine the particular version of the format is to fully parse the file and look for data that indicates the use of features added by later versions. In some cases, this is manageable via binary signatures (e.g. PNG versions are differentiated by feature usage).
Using a hierarchical format structure can avoid this issue, by having a single BIFF8 format which has two known sub-formats. As PRONOM is not hierarchical, the only options are to move to a new record for BIFF8 and deprecate the others (a la TIFF), or to invest a lot of energy in doing feature-based format distinction within the BIFF8 payload.
from fido.
Thanks Andy.
Feature based identification would be beyond the scope of Fido IMHO.
As for PRONOM not being hierarchical, it might be time for a revision of it. That would take time and money, but I think it is well worth it.
Another option of course is to urge institutions that have need for a broader and more actively maintained registry to finally do something about the the format registry issue. For what I know about former initiatives regarding GDFR and UDFR, it stalls because of the 'tomato, tomatoe' type of discussions about naming, semantics and who would be responsible.
from fido.
Back at the DROID 7 consultation event, TNA seemed pretty unlikely to change PRONOM significantly. It's hooked into SDB as well as DROID and I think it's quite expensive to modify. However, that was a while ago now.
FWIW, I feel that the issues behind the failures of UDFR, CRISP and to a lesser extent, Just Solve The Problem are not really about technology or modeling. It simply isn't clear to me that we found the people who will spend time adding the info and then consulting them, prototyping with them, building with them.
I am willing to spend a few hours every week or two on the Archive Team wiki, and I do, because I consider it a part of my job. Who else thinks the same? I don't know.
As for hierarchial format identifiers, we already have them in the form of mime types plus the mime-info specification. Just add a version parameter and you can cross-walk to PRONOM if you wish. Tika supports this approach, is simple enough to extend, and can also be used for feature detection if necessary, so most of my effort goes there right now. Fido is also essentially mime-info based, so it might be possible to patch the hierarchy on top of the PRONOM data, I guess.
from fido.
Closed due lack of recent activity.
from fido.
Related Issues (20)
- Question re: regex used in FIDO HOT 3
- Price-matching other repos HOT 3
- No 1.4.0 release available HOT 1
- Crash on XLS format 59 HOT 3
- FIDO should use the latest PRONOM release (v.96)
- 1.4.1 wheel does not match source, missing format file HOT 1
- Pronom version number needs to be updated HOT 2
- setuptools requirement in setup.py:install_requires is unsafe HOT 1
- Fido hanging on skeleton stream (fmt/1000) HOT 3
- Current fido release 1.4.1 does not find pronom v95 HOT 1
- olefile as a dependency at version >= 0.46 HOT 2
- fido documentation link fails HOT 2
- Updating signatures fails when the URL of the reference file identifier can't be found HOT 2
- convert PRONOM formats to FIDO signature fails HOT 7
- Migrate from 1.4.1 to 1.6.1 : FileNotFoundError: [Errno 2] No such file or directory: '.../fido/conf/formats-v104.xml' HOT 13
- Automation of update of FIDO signature site HOT 1
- Python 2 begone. HOT 1
- Migrate FIDO documentation to docs directory HOT 1
- FIDO should support multiple signature sources
- fido uses PRONOM v109 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fido.