Coder Social home page Coder Social logo

microsoft / microsoft.io.recyclablememorystream Goto Github PK

View Code? Open in Web Editor NEW
2.0K 118.0 203.0 6.84 MB

A library to provide pooling for .NET MemoryStream objects to improve application performance.

License: MIT License

C# 99.89% Batchfile 0.11%

microsoft.io.recyclablememorystream's Issues

Plan to support more than 2GB?

Hello,

I'm from Dicom Server , and our service is heavily using RecyclableMemoryStream for performance improvement.

One issue we are facing is handling with >2GB file. Our service deals with medical images, and it happens some time the file size could be super big and exceed 2 GB.

I see the 2GB limitation from here , so wondering if there is a plan to support more than 2GB ?

Thanks
Peng Chen

Nuget package contains debugging symbols without source

In the [Publish Symbols] step during a build on our build server, the following error occurs:

##[error]Indexed source information could not be retrieved from 'F:\tfsagent_work\13\s\src[project]\bin\Microsoft.IO.RecyclableMemoryStream.pdb'. Symbol indexes could not be retrieved.

This seems to be caused by the presence of a .pdb file in the Nuget package and the absence of the corresponding source files.

GetBuffer() throws System.UnauthorizedAccessException on .NET Core

When calling GetBuffer() in a .Net Core app, an UnauthorizedAccessException ("MemoryStream's internal buffer cannot be accessed.") is thrown.

The reason is that GetBuffer() is not marked as override when compiled for netstandard. Therefore, the original implementation of MemoryStream.GetBuffer() will be called, unless the call is explicitly made on a variable of type RecyclableMemoryStream.

#if NETSTANDARD1_4
   public byte[] GetBuffer()
#else
   public override byte[] GetBuffer()
#endif

I don't know what was the motivation for this conditional compilation, since MemoryStream.GetBuffer() has always been virtual.

Typo in NuGet package title

According to the .nuspec file, this project's <title> is Micrisift.IO.RecyclableMemoryStream. This typo won't affect an Install-Package (since only the <id> is relevant for that), but the incorrect title can be seen e. g. on nuget.org:

micrisift

Why does Close call Dispose?

A closed MemoryStream still has uses such as GetBuffer which are now throwing NullReferenceException when migrating to RecyclableMemoryStream. Is it possible not to dispose on close?

Copying part of a RecyclableMemoryStream to another Stream

Hello,

The WriteTo method allows me to copy an entire RecyclableMemoryStream to another Stream without any buffer allocation. However, if I need to copy only part of a RecyclableMemoryStream to another Stream, I have to use a buffer.

What is your opinion about creating an overload of WriteTo that takes an offset and length?

Add a setting to disallow ToArray

Calling ToArray on a RecyclableMemoryStream should be considered a bug because it wipes out all the benefits of using the library. While the method does work as intended, we could add a setting on RecyclableMemoryStreamManager to cause an exception to be thrown when called.

Questions about defaults

I'm a little surprised that the default block size is 128 KB, effectively allocating these directly inside the large object heap. Isn't one of the purpose of this library avoiding LOH allocations?

I know it's configurable so that's not a big deal. :) I'm just curious to know how these defaults were chosen.

Stream could track current buffer / offset to improve performance

Every time a write happens, we convert the stream's position to a block/offset tuple to know where we need to start writing.

For few, long writes, this is fine. For many short writes the overhead could add up.

Instead, we could track the current block and offset index so we already know where we need to start writing.

Incorrect/superfluous MaxStreamLength check in Write(byte[] buffer, int offset, int count) method

The lines 553-558 in the method RecyclableMemoryStream.Write(byte[] buffer, int offset, int count) implement a check whether the required capacity for the write operation exceeds MaxStreamLength:

long requiredBuffers = (end + blockSize - 1) / blockSize;
            
if (requiredBuffers * blockSize > MaxStreamLength)
{
    throw new IOException("Maximum capacity exceeded");
}

This check is not only broken, but also superfluous.

It is broken for any scenario where MaxStreamLength is not a multiple of blockSize and streams with a capacity of exactly or close to MaxStreamLength. When a write operation is involving the last block, an IOException will be incorrectly thrown, since the expression requiredBuffers * blockSize will necessarily become greater than MaxStreamLength in such cases.

It is also superfluous, and it should be safe to simply delete it. The code lines just above it are doing the same check about MaxStreamLength being exceeded (and this one seems to be correct).

Additional Memory<byte> support

#68 added some support for RecyclableMemoryStream to read and write Span<byte>/Memory<byte>. It would be good to also add support for creating a RecyclableMemoryStream from an existing Memory<byte>.

RecyclableMemoryStream.GetStream already has an overload that takes an existing byte[] buffer:

MemoryStream GetStream(string tag, byte[] buffer, int offset, int count)

So the suggestion here is to add a new overload that takes an existing Memory<byte> buffer:

MemoryStream GetStream(string tag, Memory<byte> buffer)

Add ReadFully() method

Can you add a ReadFully(Stream stream) method to RecyclableMemoryStream?

This could save an intermediate buffer allocation and a lot of memory copying.

I took a stab at it with the caveat that the RecyclableMemoryStream can't be using a large buffer. You may want to remove this constraint.

You'll also probably want to have RecyclableMemoryStreamManager.GetStream() return RecyclableMemoryStream so that the method is accessible without a cast.

        public void ReadFully(Stream stream) {
            if (this.largeBuffer != null) {
                throw new InvalidOperationException();
            }

            while (true) {
                EnsureCapacity(this.length + 1);
                var blockAndOffset = GetBlockAndRelativeOffset(this.position);
                var block = this.blocks[blockAndOffset.Block];

                int count = stream.Read(block, blockAndOffset.Offset, block.Length - blockAndOffset.Offset);
                if (count == 0) {
                    break;
                }
                long end = (long)this.position + count;
                this.position = (int)end;
                this.length = Math.Max(this.position, this.length);
            }
        }

RecyclableMemoryStream class should override CopyToAsync

Because RecyclableMemoryStream is not overriding the CopyToAsync it means it's using the default one in Stream class which generates a byte[] buffer each time it's called.

This kind of defeats the purpose of using RecyclableMemoryStream.

Not sure why Stream doesn't support overriding CopyTo as well as this would also benefit from being overridden.

Assembly on NuGet in Debug release?

I'm testing something with Benchmark.NET and when I attempt to benchmark some code referencing this library, I get the message that Microsoft.IO.RecyclableMemory is non-optimized. Is it possible that the public NuGet package is accidentally including the debug config?

image

Documentation on library beyond blog post?

The work I am doing for my project deals with decompressing a file (like 10MB to a 40MB stream) and then running those streams through some patch program which in turn may output a 40MB stream that is then fed once again into a patch program many times. In this instance it means I am using like 5 or 6 40MB streams with a few seconds.

I've found this library significantly reduces memory usage but I can't really figure out what the options do. the only documentation specified is a blog post but it doesn't really explain what any of the options actually do (I don't deal with a lot of memory-related things). I have also found that the memory allocated for the pools doesn't seem to be returned, or returnable, unless I'm missing something. E.g. the app seems to allocate about 600MB of data (on top of 200 idle) but after it ends the app still sits at 800MB used. I understand you want to keep these pools around and allocated but is there a way to get rid of them? I only use them for a certain task, so once that task has finished, keeping it around is not beneficial. But the documentation has nothing that even looks at this kind of scenario.

The lack of intellisense makes using this library extremely difficult as I have almost no idea what some of the options do.

Why is there no default static pool manager?

I do not know if this is just a silly question or a feature request. Suppose there are multiple separately maintained assemblies allocating RecyclableMemoryStreams. Should each assembly declare its own static RecyclableMemoryStreamManager? This goes somewhat against the grain of the idea of pooling, although I understand that block size requirements might be different. But still I assume that the defaults may be quite sane for a sizable range of applications.

So I wonder why is there no default, per-appdomain RecyclableMemoryStreamManager? It would be accessed through a static property (RecyclableMemoryStreamManager.Default), and the RecyclableMemoryStream could then have a parameterless constructor (and other managerless constructors) that would use the default manager. As many other things in the BCL, the default pool may be made configurable through app.config as well.

RecyclableMemoryStream exhausts all available RAM if requestedSize > (int.MaxValue - BlockSize)

When a RecyclableMemoryStream is created where requestedSize satisfies the condition:

(numberOfRequiredBlocks * BlockSize) > int.MaxValue
where
numberOfRequiredBlocks = Ceil( requestedSize / blockSize )

then the method RecyclableMemoryStream.EnsureCapacity loops until all available virtual address space is exhausted (running x64 code with much more than 10GB RAM available; thus it is not a problem with regard to available RAM or 32bit virtual address space).

The problem occurs at the execution of the following while loop:

while (this.Capacity < newCapacity)
{
    blocks.Add((this.memoryManager.GetBlock()));
}

When given a requestedSize that satisfied the condition shown at the beginning of my post, the property Capacity will eventually overflow, thus not allowing the while-loop to exit. (1)

Proposed fix: Either make the Capacity property a long type (2), or add a sanity check that prevents Capacity from overflowing...


Foot notes:

(1) Strictly speaking, whether the while-loop eventually exits depends -- aside from the virtual memory size -- on the chosen block size and an "appropriate" requestedSize value. With the default block size of 128K, the Capacity property will overflow and eventually reach the value 0 again, thus turning the while-loop effectively into an infinite loop. When choosing some other 'odd' block size, Capacity property will still overflow, but not necessarily reach precisely 0 again, thus possibly allowing the while loop to eventually exit -- but chances are good that all RAM has been exhausted already before that would happen.

(2) It would be nice if RecyclableMemoryStream would support stream sizes larger than 2GB (i.e. not using int types but long types for all concerned method arguments/variables/fields/properties). It is not really a show stopper though, as one can split large data blobs into multiple MemoryStream objects that are wrapped into a custom Stream class representing those multiple MemoryStreams as one continuous stream...

IntelliSense Documentation Not Shown

Hi,

When installing RMS from NuGet in VS the IntelliSense doesn't show the comments/documentation. I think this is just a case of enabling the XML Documentation file in the Project Settings.

Cheers,
Indy

CLS-compliance

Can RecyclableMemoryStreamManager be made CLS-Compliant?

Clarification needed

The RecyclableMemoryStreamManager exposes an event called "StreamCreated". Is that triggered when a stream from the pool is used or only when a new stream is actually allocated?

Performance opportinities

I have just played with a profiler and massively increased the performance of my RMS fork for my primary use case of write/read small buffers of several Kbs. Spreads/Spreads@cbeac8f

  • Aggressively inlining all methods that are used internally. E.g. Capacity.
  • Making all methods related to event tracing conditional to a compiler symbol to avoid needless method calls.
  • Using System.Buffers array pool (modified to optionally return exact buffer sizes when requested) instead of ConcurrentBag.
  • Using vectorized memory copy instead of Buffer.BlockCopy.
  • Pooling of RMS instances using Roslyn's ObjectPool implementation.
  • Manual implementation of WriteByte instead of redirecting to Write via a temporary byte[1]. I also added non-virtual SafeWriteByte.
  • Using ThrowHelper for better inlining and less code size.

Some of this is directly applicable to the original implementation. My fork is now incompatible with upstream so I cannot create a PR, just put it there FYI & discussion.

I had the idea of integrating RMS with System.Buffers shared pool for a long time - not only this is faster, but also reduces memory by avoiding a separate pool. But the default shared pool implementation could return a buffer larger that the requested size, so I created a custom implementation with a parameter exactSize. Without shared pool modifications RMS could work with ArraySegments internally - that shouldn't be slower. And given that the shared pool returns a buffer that could be larger only by a power of two, that larger buffer could be split into several blocks and RMS will just increase capacity by more than one block. In Dispose() we just need to check if currently returning ArraySegment has the same buffer as previous and do not return it twice.

make the library portable

unless there is a strong dependency on desktop CLR, can this lib be portable? especially can add coreCLR target?

Are the event invocations inside RecyclableMemoryStreamManager susceptible to race conditions?

I noticed that RecyclableMemoryStreamManager raises events in the following manner:

if (this.BlockCreated != null)
{
    this.BlockCreated();
}

which appears to be susceptible to a race condition: what if another thread unsubscribes the last handler from the event just before the invocation, but after the null check? Since the buffer manager class is supposed to be thread-safe, this should probably be fixed.

In his book "CLR via C#" (see pp. 264-265 in the 3rd edition), Jeffrey Richter recommends the following pattern for safely raising an event:

EventHandler blockCreated = Interlocked.CompareExchange(ref this.BlockCreated, null, null);
if (blockCreated != null)
{
    blockCreated();
}

Alternatively, if C# 6 syntax can be used in this project:

this.BlockCreated?.Invoke();

Questions on memory usage and configuration

After spending some time reading the code, it appears default behavior is to pool everything with no upper limit. I wasn't able to find that information in the documentation.

Can you check my understanding?

MaximumFreeLargePoolBytes is never set, so the buffer gets added to the pool on dispose here.

The max potential memory usage of the large pool is

(maximumBufferSize/largeBufferMultiple) * MaximumFreeLargePoolBytes

or, stated another way:

the number of pools * MaximumFreeLargePoolBytes

If I don't call GetBuffer() and I don't call GetStream(asContiguousBuffer=true), I will only ever use small blocks.

So, if you rarely call GetBuffer(), a valid sizing strategy would be to create medium-sized small blocks (say 1/4 of the size of your expected common stream size), a large MaximumFreeSmallPoolBytes, and a MaximumFreeLargePoolBytes size of 1 byte to force unpooled large buffer allocation in the rare case you need it (if it was set to 0 large buffers would be pooled and retained indefinitely).

Use in HttpClient

Is there anyway of making HttpClient use this?

I'm processing files from blob storage of up to 100Mb in a webjob and I'm trying to minimise the amount of disk I/O and also memory churn.

Any other suggestions gratefully received.

The following piece of code hangs the write call

RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();
manager.GenerateCallStacks = true;
RecyclableMemoryStream stream = new RecyclableMemoryStream(manager, "Tag1");
StreamWriter writer = new StreamWriter(stream);
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
writer.Flush();

The second loop hangs at I value 1073611776

ETW is not supported on all platforms

For example, Unity on some platforms (e.g. Android) replaces the body of methods marked with EventAttribute with throw new NotSupportedException("linked away"), which results in runtime exceptions.

It would be nice to have automatic detection of ETW support (at worst ability to disable ETW from application code, e.g. set Events.Writer to null).

Make ToArray Obsolete

As it is not recommended to use ToArray method because it defeats the purpose of using this library it should be marked with Obsolete attribute.

Is the Dispose(false) implementation safe?

Hey implementors, great job on this project!

I had one question while looking at the code: I notice that you are "touching" another managed object (the manager) from the Dispose method on the stream, even when it is invoked from the finalizer. I was under the impression that was verboten, since the manager could have been gc'ed by the time the finalizer runs.

Is that no longer the case, generally?

Buffer size absurdly large

I was benchmarking this library when I noticed that the buffer length from GetBuffer was very large. E.G., a 2056 length string resulted in a buffer with a length of 131072. Moreover, my serialization of serval floats (length 146) also resulted in a buffer with a length of 131072.

I really would not like memory chunks of 130kb. That scares me.

I feel like I am missing a very basic implementation detail. Am I possibly doing it wrong or is this a bug ?

Guid generation is expensive

Hello

I've recently started using this library to optmise a hot path in order to reduce allocations but I've found that creating a new RecyclableMemoryStream is quite expensive:

image

The id field (Guid) on the RecyclableMemoryStream is being initialized on the constructor regardless of whether ETW tracing is enabled or not and generating a new Guid is expensive (interop call + 16 bytes allocated).

I've changed the constructor code to only initialize the field if ETW tracing is enabled and it now looks like this:

image

I've pushed my changes in case you find this solution sensible.

Possible memory leak

I am using the RecycableMemoryStream inside a using clause,
when I profile a long runnig process I see it has ~100K instances of RecycableMemoryStream and ~200K instances of byte[] .
Is it possible this is due to a memory leak? I am not using more than a few instances at a time.

Allow multiple Dispose calls OR expose Disposed state

A MemoryStream can be disposed twice or multiple times, but the RecyclableMemoryStream throws an exception.

Consider this example:

var stream = storage.MemoryStreamManager.GetStream();
blob.DownloadToStream(stream); // may throw a StorageException
using (var reader = new StreamReader(stream, Encoding.UTF8)) {
    // read data
}

Here, the stream should be disposed if DownloadToStream fails. However, when put in a using block, an exception will be raised when the reader is disposed (and configured to close the underlying stream, which is the default).

According to the Dispose docs disposing multiple times should be allowed:

If an object's Dispose method is called more than once, the object must ignore all calls after the first one. The object must not throw an exception if its Dispose method is called multiple times. Instance methods other than Dispose can throw an ObjectDisposedException when resources are already disposed.

Alternatively the disposed field could be exposed with a property to allow for explicit checks:

var stream = storage.MemoryStreamManager.GetStream();
try {... } finally { if (!stream.Disposed) stream.Dispose() }

... although supporting multiple Dispose calls would be more convenient and less clunky.

Unit tests: Replace `[ExpectedException]` with `Assert.Throws`

There are several unit tests in the UnitTests project that check whether exceptions are thrown when they should be. This is done in two different ways:

  • By placing an [ExpectedException(typeof(TException))] custom attribute on the test method, or
  • by wrapping a method call or short code block with Assert.Throws<TException>(() => โ€ฆ);.

Using [ExpectedException] is problematic for at least two reasons:

  • It is too coarse-grained. It doesn't matter which part of a test method throws. However, speaking in terms of Arrange-Act-Assert, only the Act part of a test method should be tested for exceptions.
  • This attribute is no longer supported starting with NUnit version 3, so it would stand in your way if you wanted to migrate to a more recent version of NUnit.

Suggestion: For these two reasons (and, to a minor degree, for consistency's sake) all test methods using [ExpectedException] should be converted to Assert.Throws.

NetStandard Support?

It would be helpful if this library was re-targeted to build for .Net Standard, so we could use it in .net core projects and other places.

Thanks!

Stream disposed too soon

using (var ms = memoryStreamManager.GetStream(nameof(ZipOutputStreamHelper)))
{
    using (StreamWriter writer = new StreamWriter(ms))
    using (JsonTextWriter jsonWriter = new JsonTextWriter(writer))
    {
        var ser = new JsonSerializer();
        ser.Serialize(jsonWriter, sequence);
        jsonWriter.Flush();
    }
    return ms.ToArray();
}

We use a pattern like this in our code. Basically, the implementation of the JsonTextWriter is such that it doesn't finish off writing to the stream until it is disposed. Calling Flush doesn't seem to finalise it. This worked fine as a MemoryStream, but not as a RecycledMemoryStream. Because Streams call Dispose on their underlying Streams we end up disposing the RecycledMemoryStream before we access the array.

Is there a way to change this at all? Or is it going to be a case of finding a way to correct the behaviour of the JsonTextWriter so that we can finalise the stream before calling ToArray?

Trying to read a huge file

Hi all,

I am trying to read a huge file (2.5 gigabyte) file, but it allways ends up with: "Unhandled Exception: System.IO.IOException: Maximum capacity exceeded" and the Ram is allways around 100% taken. What I am doing wrong?

           `int blockSize = 10;
            int largeBufferMultiple = 1024 * 1024;
            int maxBufferSize = 16 * largeBufferMultiple;

            var manager = new RecyclableMemoryStreamManager(blockSize,
                                                            largeBufferMultiple,
                                                            maxBufferSize);

            manager.GenerateCallStacks = true;
            manager.AggressiveBufferReturn = true;
            manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
            manager.MaximumFreeSmallPoolBytes = 100 * blockSize;
            RecyclableMemoryStream memoryStream = new RecyclableMemoryStream(manager);
            using (FileStream fileStream = File.OpenRead(@"C:\Temp\test.bin"))
            {
                


                // MemoryStream memoryStream = new MemoryStream();
                fileStream.CopyTo((memoryStream);`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.