Coder Social home page Coder Social logo

Comments (4)

vthib avatar vthib commented on June 5, 2024

This is probably the same issue and root-case that i stumbled upon, so i'm hijacking this issue with my analysis :)

I found that this stems from a subnode VCN value of 0 inside a NtfsAttributeListNonResidentAttributeValue.
In index.rs:177, the call to subnode_vcn looks like this:

index_allocation: NtfsIndexAllocation { ntfs: Ntfs { cluster_size: 4096, sector_size: 512, size: 106701438976, mft_position: 3221225472, file_record_size: 1024, serial_number: 9370544425783817375, upcase_table: None }, value: AttributeLis
tNonResident(NtfsAttributeListNonResidentAttributeValue { ntfs: Ntfs { cluster_size: 4096, sector_size: 512, size: 106701438976, mft_position: 3221225472, file_record_size: 1024, serial_number: 9370544425783817375, upcase_table: None }, initial_attribute_
list_entries: NtfsAttributeListEntries { attribute_list: Resident([160, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 1, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84
, 0, 0, 0, 0, 1, 0, 2, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 3, 0, 36, 0, 68, 0, 83, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 48, 0, 9, 26, 0, 0, 0, 0, 0, 0, 0, 0, 69, 2
2, 0, 0, 0, 0, 1, 0, 12, 0, 36, 0, 84, 0, 88, 0, 70, 0, 95, 0, 68, 0, 65, 0, 84, 0, 65, 0, 0, 0, 0, 0], 3227063552) }, connected_entries: AttributeListConnectedEntries { attribute_list_entries: Some(NtfsAttributeListEntries { attribute_list: Resident([160
, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 1, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 2, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0
, 0, 1, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 3, 0, 36, 0, 68, 0, 83, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 48, 0, 9, 26, 0, 0, 0, 0, 0, 0, 0, 0, 69, 22, 0, 0, 0, 0, 1, 0, 12, 0, 36, 0, 84, 0, 88, 0, 70, 0, 95, 0, 68, 0,
65, 0, 84, 0, 65, 0, 0, 0, 0, 0], 3227063552) }), instance: 1, ty: IndexAllocation }, data_size: 10747904, attribute_state: None, stream_state: StreamState { stream_data_run: None, stream_position: 0, data_size: 10747904 } }) },
index_record_size: 4096,
vcn: 0

The issue, afaict, is that the seek implementation for NtfsAttributeListNonResidentAttributeValue does not handle a vcn value of 0. It expects data_to_seek to be strictly positive, but do not handle it being 0 and walking inside the next data run of the current attribute, or the first data run of the next connect attribute.
I don't know enough about NTFS whether this is a valid case or if some parsing failed earlier, but it looks legit from what i've seen, and changing the check on data_to_seek into a loop seems to fix the issue.
I have attached a tentative PR for this: #11

from ntfs.

leofidus avatar leofidus commented on June 5, 2024

You're right, I just tested #11 and it does fix it the issue for me too.

from ntfs.

ColinFinck avatar ColinFinck commented on June 5, 2024

@leofidus @vthib Thanks a lot for your bug report and the analysis!
I dug into the related code again and I'm afraid this requires more than a 1-line fix to handle all such cases:

NtfsNonResidentAttributeValue and NtfsAttributeListNonResidentAttributeValue both share code in StreamState to read and seek in Data Runs.
When StreamState is created via StreamState::new, its stream_data_run field is initialized to None. The field is only filled with an actual Data Run when the first non-zero read or seek is performed. That way, StreamState::new remains simple and we only deal with Data Runs when actually required. There are also other cases when stream_data_run may be None:

  • We seek past the end of all Data Runs
  • We encounter a non-resident attribute with zero Data Runs (may not be valid as per NTFS spec, but the crate must not panic here)

The design above gets problematic as soon as I want to return the absolute filesystem byte position of the current seek position. I only have a valid value for that if stream_data_run is filled with a Some value.
Additionally, the Data Run itself can only know about its absolute seek position if we haven't seeked past its bounds and it's not a sparse Data Run. This is why all data_position functions return an Option<u64> instead of a simple definite u64.

Now I have multiple other parts of the code, which urgently need to know the absolute filesystem byte position, and can't just deal with an Option.
They all use .data_position().unwrap() and are supposed to previously check the conditions, so that .unwrap() cannot panic. That was thought to be easy: As that code also performs the seeks, it can make sure that it seeks within the bounds. It can also check for sparse Data Runs.
However, I must have missed that the data_position function of NtfsNonResidentAttributeValue and NtfsAttributeListNonResidentAttributeValue may also return None when their stream_data_run is None. The function-level comment doesn't mention that possibility.
Actually, I didn't entirely miss it. NtfsNonResidentAttributeValue::new has extra code to mitigate that:

// Get the first Data Run already here to let `data_position` return something meaningful.
if let Some(stream_data_run) = stream_data_runs.next() {
let stream_data_run = stream_data_run?;
stream_state.set_stream_data_run(stream_data_run);
}

This mitigation fails though when you seek to SeekFrom::Start(0).

NtfsAttributeListNonResidentAttributeValue misses any such workarounds, which is why you're encountering this bug in the first place.
PR #11 mitigates this bug, but only if you seek before calling data_position. It panics the same way if you don't seek at all.
None of the mitigations work when a non-resident attribute has no Data Runs at all.

I need some more time to think this through and come up with a definite solution that works in all cases.

  • Perhaps I can actually get the callers of data_position to handle an Option instead of unwrapping it. They usually don't work on it themselves, but pass it through to other functions. Could require quite some refactoring.
  • Maybe some of them don't even need the absolute byte position at all. I mostly use it to provide informative error messages.
  • I might also be able to just add some missing checks before unwrapping, and can thereby ensure that the Option is a Some value.

This may take some time. In any case, thanks for uncovering this bug and the related #9!

from ntfs.

leofidus avatar leofidus commented on June 5, 2024

I think #9 is a good example of how the error types should have position as an Option<u64>. Then the code would generate a error message

NtfsError::InvalidStructuredValueSize {
  position: None,
  ty: NtfsAttributeType::VolumeName,
  expected: 1,
  actual: 0,
}

which is completely reasonable (why would there be a data run if there is no data). From a cursory look changing the error types to an option and passing the option along as long as possible would also get rid of a lot of unwraps. It wouldn't solve #9 or #10 on its own, but I think it's a reasonable starting point.

from ntfs.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.