Coder Social home page Coder Social logo

Comments (7)

barasher avatar barasher commented on May 25, 2024 1

I created #56 to migrate from Scanner to Reader (and to return a cleaner response).
Anyway, thanks a lot for the bug report !
Fix will be released in v1.7.1

from go-exiftool.

barasher avatar barasher commented on May 25, 2024

Hi @jnathanh !

Since providing a folder makes the library bug, I think that the first thing to do is to reject folder paths.

I'm not convinced that handling folders is usefull. As a CLI it is (for exiftool), but as a library, if it does not increase performance, I think that keeping the library as simple as possible is better. What to you think ?

from go-exiftool.

jnathanh avatar jnathanh commented on May 25, 2024

@barasher I agree it's best to stick with the simplest fix, and I don't personally need to pass the directory so I think your first instinct to leave out directory support for now is probably the right call.

I did a little more digging and found the root issue, the buffer is getting exhausted. In ExtractMetadata, when the Scan() returns false, we currently assume there are no errors and continue (not reporting the error and leaving the Scanner in a bad state). I happened to find it via loading multiple files (from a directory), but it could theoretically manifest for a single large metadata file.

I started on a fix, but it gets messy without breaking changes to the api. The obvious change is to report actual scan errors first before the "no content" error. What is not as straight-forward is how to address the bad scanner state. Here are the different options I have explored:

  1. replace bufio.Scanner with bufio.Reader (or really any io.Reader implementation)
    • The bufio package recommends using bufio.Reader for the sequential read use case (see the end of the description for Scanner). We can see why with the scanner state issue we are running into here.
    • This breaks the Buffer() option (how important is the use case for this option?)
  2. Reset the scanner
    • Breaks the Buffer() option, or you have to add a bunch of logic to check if the buffer was configured and somehow clear that byte array
  3. Exit early and report a bad client state
    • panic (kind of heavy-handed)
    • add an error return value to the signature (breaking change)
    • depend on the file-level error (this does not make it clear the client is no longer usable)

I suggest option 1 for the more reliable design, or option 3 for the simplest solution (and least api disruption).

from go-exiftool.

barasher avatar barasher commented on May 25, 2024

Thanks for the analysis !

Backward compatibility is very important to me. I hate when libraries I use have breaking changes in their API :p
Since I think that handling a directory is not that useful, I've introduced a new sentinel error ErrNotFile that can be return when invoking extractMetadata and I've chosen to implement the option 3 you proposed.

Here is the PR : #55

What do you think ?

from go-exiftool.

jnathanh avatar jnathanh commented on May 25, 2024

@barasher the directory check seems like a sensible feature given this library's lack of support for directories. However, it only addresses a symptom of the issue I was trying to raise. You could also trigger the same issue with a single file, if it is big enough to fill the scanner's buffer (see test below). This is likely an edge case since metadata files are usually small (at least the photo metadata I'm used to working with). So, you may want to leave it unresolved for now, however, I want to make sure you're aware that is the decision you are making by only addressing the issue with the directory check.

func TestLargeMetadata(t *testing.T) {
	t.Parallel()

	bigFile, cleanup := createFileLargerThan(t, bufio.MaxScanTokenSize)
	defer cleanup()

	e, err := NewExiftool()
	require.NoError(t, err)
	defer e.Close()

	// big file should not error, but if it does, it should output the actual source of the error
	bigResults := e.ExtractMetadata(bigFile)
	require.Len(t, bigResults, 1)
	assert.NoError(t, bigResults[0].Err)

	// subsequent small file should also not be affected by previous file reads
	smallResults := e.ExtractMetadata("testdata/20190404_131804.jpg")
	require.Len(t, smallResults, 1)
	assert.NoError(t, smallResults[0].Err)
}

func createFileLargerThan(t *testing.T, byteCount int) (path string, cleanup func()) {
	t.Helper()

	dirPath := t.TempDir()
	cleanup = func() { os.RemoveAll(dirPath) }

	path = filepath.Join(dirPath, "20190404_131804.jpg")
	require.Nil(t, copyFile("testdata/20190404_131804.jpg", path))

	e, err := NewExiftool()
	require.NoError(t, err)
	defer e.Close()

	mds := e.ExtractMetadata(path)
	require.Len(t, mds, 1)
	require.NoError(t, mds[0].Err)

	valueLargerThanBuffer := ""
	for i := 0; i < byteCount+1; i++ {
		valueLargerThanBuffer += "x"
	}

	mds[0].SetString("Comment", valueLargerThanBuffer)
	mds[0].Err = nil

	e.WriteMetadata(mds)
	require.NoError(t, mds[0].Err)

	return
}

from go-exiftool.

barasher avatar barasher commented on May 25, 2024

I totally agree.

Nevertheless, currently, the Buffer(...) option can be used to customize go-exiftool's buffer. This way, we can extract any "amount" of metadata.

The only advantage I see using a bufio.Reader instead of a bufio.Scanner is the "streaming" (gradually) consumption of the buffer. This way, we might need a smaller buffer but the implementation will be really tricky.

from go-exiftool.

jnathanh avatar jnathanh commented on May 25, 2024

Ya, I think using a larger buffer is an acceptable solution. The problem is that a new user, using the program defaults, would never find out that is the issue without extensive debugging. Ideally the program should report that the buffer filled up so a user know they need to use a larger buffer. Even worse, the output actually reports that the metadata was empty, so without investigating further, a user would think that there was just not metadata in the file.

Based on your feedback so far, I suggest adding a panic. That would make it clear what the issue is and not report back incorrect metadata results.

from go-exiftool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.