Coder Social home page Coder Social logo

drewnoakes / metadata-extractor Goto Github PK

View Code? Open in Web Editor NEW
2.5K 126.0 469.0 12.48 MB

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

License: Apache License 2.0

CSS 0.58% Java 99.41% DIGITAL Command Language 0.01%
java exif iptc xmp metadata icc jpeg webp quicktime mp4

metadata-extractor's Introduction

metadata-extractor logo

metadata-extractor build status Maven Central Donate

metadata-extractor is a Java library for reading metadata from media files.

Installation

The easiest way is to install the library via its Maven package.

<dependency>
  <groupId>com.drewnoakes</groupId>
  <artifactId>metadata-extractor</artifactId>
  <version>2.19.0</version>
</dependency>

Alternatively, download it from the releases page.

Usage

Metadata metadata = ImageMetadataReader.readMetadata(imagePath);

With that Metadata instance, you can iterate or query the various tag values that were read from the image.

Features

The library understands several formats of metadata, many of which may be present in a single image:

It will process files of type:

  • JPEG
  • TIFF
  • WebP
  • WAV
  • AVI
  • PSD
  • PNG
  • BMP
  • GIF
  • HEIF (HEIC & AVIF)
  • ICO
  • PCX
  • QuickTime
  • MP4
  • Camera Raw
    • NEF (Nikon)
    • CR2 (Canon)
    • ORF (Olympus)
    • ARW (Sony)
    • RW2 (Panasonic)
    • RWL (Leica)
    • SRW (Samsung)

Camera-specific "makernote" data is decoded for cameras manufactured by:

  • Agfa
  • Apple
  • Canon
  • Casio
  • Epson
  • Fujifilm
  • Kodak
  • Kyocera
  • Leica
  • Minolta
  • Nikon
  • Olympus
  • Panasonic
  • Pentax
  • Reconyx
  • Sanyo
  • Sigma/Foveon
  • Sony

Read getting started for an introduction to the basics of using this library.

Questions & Feedback

The quickest way to have your questions answered is via Stack Overflow. Check whether your question has already been asked, and if not, ask a new one tagged with both metadata-extractor and java.

Bugs and feature requests should be provided via the project's issue tracker. Please attach sample images where possible as most issues cannot be investigated without an image.

Contributing

If you want to get your hands dirty, making a pull request is a great way to enhance the library. In general it's best to create an issue first that captures the problem you want to address. You can discuss your proposed solution in that issue. This gives others a chance to provide feedback before you spend your valuable time working on it.

An easier way to help is to contribute to the sample image file library used for research and testing.

Credits

This library is developed by Drew Noakes.

Thanks are due to the many users who sent in suggestions, bug reports, sample images from their cameras as well as encouragement. Wherever possible, they have been credited in the source code and commit logs.

Other languages


More information about this project is available at:

metadata-extractor's People

Contributors

amerson8s avatar audynamo avatar baelec avatar cowwoc avatar danielsz avatar drewnoakes avatar drmorr0 avatar intellidevpeep avatar jibee avatar jonasvoelcker avatar kerenby avatar kwhopper avatar laughmetal avatar lzaruba avatar nadahar avatar nagix avatar normana10 avatar palantir0 avatar payton avatar perballing avatar rcketscientist avatar ricardobochnia avatar saurabheights avatar sergiusthebest avatar sksamuel avatar skyfish1 avatar stefanoltmann avatar taher-ghaleb avatar theefer avatar tsmock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metadata-extractor's Issues

Don't give access to non-public final refrences to mutable objects

Example from Directory:

protected final Collection<Tag> _definedTagList = new ArrayList<Tag>();

 public Collection<Tag> getTags()
 {
        return _definedTagList;
 }

Now look at the following code snippet:

 System.out.println(directory.getTagCount());
 directory.getTags().removeAll(directory.getTags());
 System.out.println(directory.getTagCount());

Example output:
21
0

The fix would be to return a copy return new ArrayList<Tag>(_definedTagList) because clients should not be able to modify _definedTagList since it's rather a part of the implementation than the API.

Exposing non-public final refrences to mutable objects to the public is never a good idea. For an in depth discussion about this topic I would recommend chapter 4 of "Effective Java" by Joshua Bloch.

Deploy to Maven Central

Maven is the defacto dependency management system.

Either get builds into Maven Central, or use an alternative repository such as Sonatype.

Some old versions are already in Maven central.

(migrated from Google Code)

Review whether the bit masks are needed in SequentialReader and RandomAccessReader

The getUIntX and getIntX methods can be written in are more elegant way. By the way if you look closely you will see that even the second version can be written in a better way but I don't have time for that now. I will do it maybe tomorrow or at Monday.

        //u16
        (getByte() & 0xFF ) << 8 | getByte() & 0xFF; 
        //s16
        getByte() << 8 | getByte() & 0xFF; 
        //u32
        (getByte() & 0xFFL) << 24 | (getByte() & 0xFFL) << 16 | (getByte() & 0xFFL) << 8 | getByte() & 0xFFL
        //s32
        getByte() << 24 | (getByte() & 0xFFL) << 16 | (getByte() & 0xFFL) << 8 | getByte() & 0xFFL

Over even better:

private static long getUnsignedByte() 
{
    return  getByte() & 0xFFL;
}

        //u16
        getUnsignedByte()  << 8 | getUnsignedByte() ; 
        //s16
        getByte() << 8 | getUnsignedByte() ;
        //u32
        getUnsignedByte()  << 24 | getUnsignedByte() << 16 | getUnsignedByte()  << 8 | getUnsignedByte() 
        //s32
        getByte()  << 24 | getUnsignedByte() << 16 | getUnsignedByte()  << 8 | getUnsignedByte() 

Unable to read GIF files

I have been testing out metadata-extractor for the past few hours. I have been able to get it to read all file types except for GIF. As soon as it hits that type of file, I get the following stack trace:

Caused by: com.drew.imaging.ImageProcessingException: File format is not supported
at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)
at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)
at Find$Finder.find(Find.java:81)
... 10 more

I am using version 2.6.4. Is this a known issue?

File format is not supported [png special case]

This one is a bit tricky. The sample image is detected by most applications as a valid png file. However Metadata-extractor does not recognize it as png.

Original:
facebook

Modified version with metadata:
facebook1

Certain classes should override toString()

Overriding the toString() method from Object should be done were it seems appropriate. An example for a missing toString() method is the Metadata class.

Excerpt from the toString() javadoc: "The result should be a concise but informative representation that is easy for a person to read."

Review raised exceptions

Currently ImageProcessingException covers a wide range of exceptional circumstances. Review these and determine whether subclassing this exception would make sense.

For example UnsupportedImageFormatException.

(Adapted from Google Code issue 91)

Possible OutOfMemoryException when reading certain large TIFF files

TIFF files can store huge data buffers in tags which TiffReader happily loads into memory.

For example:

[Exif IFD0] Unknown tag (0x935c) = [443064764 bytes]

This can cause an error such as:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
  at com.drew.lang.RandomAccessFileReader.getBytes(Unknown Source)
  at com.drew.metadata.exif.ExifReader.processTag(Unknown Source)
  at com.drew.metadata.exif.ExifReader.processDirectory(Unknown Source)
  at com.drew.metadata.exif.ExifReader.extractIFD(Unknown Source)
  at com.drew.metadata.exif.ExifReader.extractTiff(Unknown Source)
  at com.drew.imaging.tiff.TiffMetadataReader.readMetadata(Unknown Source)
  at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)
  at com.drew.imaging.ImageMetadataReader.readMetadata(Unknown Source)

It'd be sensible to allow specifying a maximum tag size, or perhaps a list of tags to ignore, or maybe something else.

(Migrated from a Google Code issue)

IPTC character encoding

IPTC character encoding is assumed to match the system default (via system property file.encoding), which is incorrect.

It may be possible to use the IPTC CodedCharacterSet tag to determine the encoding. Otherwise the user should be able to specify an encoding at read time.

There is quite some discussion about this problem at Google Code issue 38, from which this was migrated.

Malformed Javadoc comments

The JDK 8 Javadoc tool has a new feature called DocLint. DocLint can check for malformed Javadoc comments and is enabled by default. We should use DocLint to correct the existing JavaDocs.

Roadmap for 2.8 โ€“ Suggestions are welcome!

If you as a user want something to be done in the next release, feel free to post below.

@drewnoakes
If I recall correctly you wanted to release 2.8 in January. I would prefer not to release before 11.1. As I already mentioned I think we should focus on supporting more formats respectively improve support for those which are already supported, f.e. gif and png, for 2.8.

Support Canon CRW camera RAW format

The older CRW file format (superseded by CR2) is treated as a TIFF file, however it does not meet the library's expectations of TIFF files.

When TiffMetadataReader attempts to verify them, it falls over with ExifReader expecting 49492a but getting 49491a due to the difference in file format.

CRW data is stored as CIFF (Camera Image File Format) which is similar to TIFF, but differs.

Specifications at:

http://www.sno.phy.queensu.ca/~phil/exiftool/canon_raw.html

(Migrated from Google Code issue 42)

Be more permissive when encountering invalid TIFF format codes

When an invalid TIFF format code is observed, earlier versions of the library would attempt to continue processing TIFF data. However sometimes this continue to interpret random bytes as meaningful data, producing randomised and misleading output. So in version 2.6.4, an invalid TIFF format code was considered a significant enough indication that processing should halt.

At the time this didn't adversely affect any of the images in the database.

However, one was found and reported in Google Code issue 94.

exiftool is able to process this file successfully.

The task here is to determine whether there is a safe way to allow processing to continue in the face of such an error. In general, sticking to the spec is useful and defensible, but in practice it can inconvenience some users.

Support Sony ARW camera RAW format

Sony cameras such as the Nex7 produce ARW files. metadata-extractor parses the TIFF successfully, but the tags are unknown or even incorrectly presented.

Note that Exiftool can process these files.

(Migrated from Google Code issue 35)

MetadataReader Interface rework

The MetadataReader is only used by PsdReader and not by any of the Classes which actually contain MetadataReader in their name. Thus the MetadataReader interface requires a rework.

Observed multiple instances of PNG chunk 'mkBT', for which multiples are not allowed

It seems like multiple instances of chunk 'mkBT' in png files are not that uncommon. I've got this particular messages for many png files. Also got some for other chunks but they were very rare compared to 'mkBT'.

We should investigate if multiple instances of 'mkBT' always indicates an error. Also we would need a way to deal with them. ExifTool can read the metadata without any error.

Sample:
hammer-icon

Improve Makernote support

There are a lot of different makernote tags out there. Many are documented online:

Hand-coding classes for all of these formats may not be the best approach. Some analysis could be done to see whether this could be data-driven (say from an XML file, for example), either at runtime or design time via codegen.

Adding support for these makernotes is quite easy, and a great place to get started if you want to contribute to this library.

(Migrated from Google Code issue 8)

Support Panasonic RW2 camera RAW format

Panasonic cameras such as the Lumix DMC-GF1 and GF3 produce RW2 files. metadata-extractor parses the TIFF successfully, but the tags are unknown or even incorrectly presented.

Note that Exiftool can process these files.

(Migrated from Google Code issue 35)

Produce values derived from one or more tags

There are many cases where answering a question about an image may involve reading multiple different tags, possibly from different directories.

Dealing with redundancy

Examples:

  • image width (equally height) may be obtained from the JpegDirectory and ExifIFD0Directory
  • There is often multiple ways to obtain exposure time
  • XMP duplicates a lot of existing tags

Devise a strategy that sits on top of the directories and tags for extracting certain commonly used values according to well tested heuristics. One challenge here is that tags may not agree and it may be unclear which to trust.

(Migrated from Google Code issue 26)

Grouping values

Sometimes multiple tags should be combined to produce one logical 'value':

  • GPS lat / lng
  • Date & time values (i.e. in IPTC data)
  • Aspect ratio (#494)

Specify source code encoding in ant build

Building in some environments can fail when trying to map UTF-8 characters to ASCII:

[javac] /path/Source/com/drew/lang/GeoLocation.java:81: error: unmappable character for encoding ASCII
[javac]         return dms[0] + "?? " + dms[1] + "' " + dms[2] + '"';
[javac]                          ^

@rosset.filipe suggests adding the following to all javac tasks of build.xml:

encoding="UTF-8"

(Migrated from Google Code issue 90)

Release date for 2.7

I would suggest the following release cycle:
1st week of December 2.7 release candidate
2nd week of December 2.7 official release

Open issues which should be done till release: 1, 3-9, 12, 36, 38

What do you think?

Support writing metadata

Currently metadata-extractor provides a read-only view onto the metadata within files.

Several use cases would benefit from or require the ability to write data back to files, such as comments, GPS location, image orientation, image size...

The implementation of this feature is non-trivial. Not all types of metadata can or should be modified, and of course there is a high cost associated with bugs that occur when people are overwriting their files, should images be lost.

Given that the library supports many types of metadata, it's more realistic to roll out support for writing different types of metadata incrementally. The first type to be attempted should probably be Exif and the first container type would likely be JPEG.

(Migrated from Google Code issue 66)

Additional tags for ExifSubIFDDirectory

New tags for ExifSubIFDDirectory:

public static final int TAG_RELATED_IMAGE_FILE_FORMAT = 0x1000; 
public static final int TAG_RELATED_IMAGE_WIDTH = 0x1001;
public static final int TAG_RELATED_IMAGE_LENGTH = 0x1002;
public static final int TAG_TRANSFER_RANGE = 0x0156 ;
public static final int TAG_JPEG_PROC = 0x0200;
public static final int TAG_MAKER_NOTE = 0x927C;
public static final int TAG_INTEROPERABILITY_OFFSET = 0xA005;

_tagNameMap.put(TAG_RELATED_IMAGE_FILE_FORMAT, "Related Image File Format"); 
_tagNameMap.put(TAG_RELATED_IMAGE_WIDTH, "Related Image Width");
_tagNameMap.put(TAG_RELATED_IMAGE_LENGTH, "Related Image Length");
_tagNameMap.put(TAG_TRANSFER_RANGE, "Transfer Range");
_tagNameMap.put(TAG_JPEG_PROC, "JPEG Proc");
_tagNameMap.put(TAG_COMPRESSED_AVERAGE_BITS_PER_PIXEL, "Compressed Bits Per Pixel");
_tagNameMap.put(TAG_MAKER_NOTE, "Maker Note");
_tagNameMap.put(TAG_INTEROPERABILITY_OFFSET, "Interoperability Offset");

(Migrated from a Google Code issue)

Review shared tags between various Exif directories

The ExifIFD0Directory, ExifSubIFDDirectory and ExifIFD1Directory classes share some common tags. These were once merged into a single directory, however there would be conflicts between values from, for example, the image and its thumbnail.

These directories have now been split as described above, but I'm not convinced the code here is quite right. The Exif spec needs a thorough read and the code a review.

(migrated from a Google Code issue)

Additional tags for ExifIFD0Directory

New tags for ExifIFD0Directory:

public static final int TAG_NEW_SUBFILE_TYPE = 0x00fe; 
public static final int TAG_IMAGE_WIDTH = 0x0100;
public static final int TAG_IMAGE_HEIGHT = 0x0101; 
public static final int TAG_BITS_PER_SAMPLE = 0x0102; 
public static final int TAG_COMPRESSION = 0x0103; 
public static final int TAG_PHOTOMETRIC_INTERPRETATION = 0x0106; 

public static final int TAG_SAMPLES_PER_PIXEL = 0x0115;
public static final int TAG_ROWS_PER_STRIP = 0x0116;
public static final int TAG_STRIP_BYTE_COUNTS = 0x0117;
public static final int TAG_STRIP_OFFSETS = 0x0111;

public static final int TAG_PLANAR_CONFIGURATION = 0x011C; // BUG: same value as below
public static final int TAG_SUB_IFDS = 0x011C;

public static final int TAG_DATE_TIME_ORIGINAL = 0x9003;
public static final int TAG_TIFF_EP_STANDARD_ID = 0x9216;

_tagNameMap.put(TAG_NEW_SUBFILE_TYPE, "New Subfile Type");
_tagNameMap.put(TAG_IMAGE_WIDTH, "Image Width");
_tagNameMap.put(TAG_IMAGE_HEIGHT, "Image Height");
_tagNameMap.put(TAG_BITS_PER_SAMPLE, "Bits Per Sample");
_tagNameMap.put(TAG_COMPRESSION, "Compression");
_tagNameMap.put(TAG_PHOTOMETRIC_INTERPRETATION, "Photometric Interpretation");

_tagNameMap.put(TAG_SAMPLES_PER_PIXEL, "Samples Per_Pixel");
_tagNameMap.put(TAG_ROWS_PER_STRIP, "Rows Per Strip");
_tagNameMap.put(TAG_STRIP_BYTE_COUNTS, "Strip Byte Counts");
_tagNameMap.put(TAG_STRIP_OFFSETS, "Strip Offsets");

_tagNameMap.put(TAG_PLANAR_CONFIGURATION, "Planar configuration");
_tagNameMap.put(TAG_SUB_IFDS, "tag Sub IFDs");

_tagNameMap.put(TAG_DATE_TIME_ORIGINAL, "Date Time Original");
_tagNameMap.put(TAG_TIFF_EP_STANDARD_ID, "Tiff EP Standard ID");

(Migrated from a Google Code issue)

Buffer over/underflows in PhotoshopReader

[Photoshop] Number of requested bytes cannot be negative
[Photoshop] Attempt to read from beyond end of underlying data source

These two errors are really common and should not happen that often. This issue may be related to other existing issues.

Include version information in manifest

The library should include vendor, implementation title and implementation version metadata for the package. This is achieved via the meta-inf/manifest.mf file within the JAR file.

Such data may then be obtained via code such as:

Package p = com.drew.imaging.ImageMetadataReader.class.getPackage();

String title = p.getImplementationTitle();
String vendor = p.getImplementationVendor();
String version = p.getImplementationVersion();

Manifest entries:

Implementation-Title: metadata-extractor
Implementation-Version: 2.6.4
Implementation-Vendor: drewnoakes.com

Review whether it's feasible to unit test that these values are present. Builds won't always be run from JAR files however.

(Migrated from Google Code issue 78)

Unit tests should pass in all locales

Some unit tests involve culture-sensitive formatting of values and as such can fail in different cultures/locales.

  • NikonType2MakernoteTest1#testGetAutoFlashCompensationDescriptionfails with 0,67 EV not 0.67 EV (in Germany)
  • PngMetadataReaderTest#testGimpGreyscaleWithManyChunks fails with Mon Dec 31 23:08:30 EST 2012 instead of Tue Jan 01 04:08:30 GMT 2013 (in EST).

This may also be the case for other tests.

Preference is to force the culture during unit testing rather than modifying the code under test to always use the en-GB culture. Users should get a format that's suited to their culture.

http://stackoverflow.com/questions/8190124/junit-testing-double-tostring-in-multiple-cultures

(Migrated from Google Code issues 29 and 92)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.