microsoft / microsoft.io.recyclablememorystream Goto Github PK
View Code? Open in Web Editor NEWA library to provide pooling for .NET MemoryStream objects to improve application performance.
License: MIT License
A library to provide pooling for .NET MemoryStream objects to improve application performance.
License: MIT License
Do not throw exception if stream is disposed more than once because this violates default Dispose() behavior.
According to MSDN https://msdn.microsoft.com/en-us/library/system.idisposable.dispose.aspx
If an object's Dispose method is called more than once, the object must ignore all calls after the first one. The object must not throw an exception if its Dispose method is called multiple times.
Hello,
I'm from Dicom Server , and our service is heavily using RecyclableMemoryStream for performance improvement.
One issue we are facing is handling with >2GB file. Our service deals with medical images, and it happens some time the file size could be super big and exceed 2 GB.
I see the 2GB limitation from here , so wondering if there is a plan to support more than 2GB ?
Thanks
Peng Chen
In the [Publish Symbols] step during a build on our build server, the following error occurs:
##[error]Indexed source information could not be retrieved from 'F:\tfsagent_work\13\s\src[project]\bin\Microsoft.IO.RecyclableMemoryStream.pdb'. Symbol indexes could not be retrieved.
This seems to be caused by the presence of a .pdb file in the Nuget package and the absence of the corresponding source files.
When calling GetBuffer()
in a .Net Core app, an UnauthorizedAccessException
("MemoryStream's internal buffer cannot be accessed.") is thrown.
The reason is that GetBuffer()
is not marked as override when compiled for netstandard. Therefore, the original implementation of MemoryStream.GetBuffer()
will be called, unless the call is explicitly made on a variable of type RecyclableMemoryStream
.
#if NETSTANDARD1_4
public byte[] GetBuffer()
#else
public override byte[] GetBuffer()
#endif
I don't know what was the motivation for this conditional compilation, since MemoryStream.GetBuffer()
has always been virtual.
According to the .nuspec
file, this project's <title>
is Micrisift.IO.RecyclableMemoryStream
. This typo won't affect an Install-Package
(since only the <id>
is relevant for that), but the incorrect title can be seen e. g. on nuget.org
:
A closed MemoryStream still has uses such as GetBuffer which are now throwing NullReferenceException when migrating to RecyclableMemoryStream. Is it possible not to dispose on close?
Please create abstract interface for RecyclableMemoryStreamManager public surface to facilitate unit-testing.
Summary: for better memory re-usage I suggest using BufferManager type (which is available from .Net 3.0).
New byte arrays are created here: https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream/blob/master/src/RecyclableMemoryStreamManager.cs#L289
Pros:
Cons:
Hello,
The WriteTo method allows me to copy an entire RecyclableMemoryStream to another Stream without any buffer allocation. However, if I need to copy only part of a RecyclableMemoryStream to another Stream, I have to use a buffer.
What is your opinion about creating an overload of WriteTo that takes an offset and length?
Calling ToArray
on a RecyclableMemoryStream should be considered a bug because it wipes out all the benefits of using the library. While the method does work as intended, we could add a setting on RecyclableMemoryStreamManager to cause an exception to be thrown when called.
I'm a little surprised that the default block size is 128 KB, effectively allocating these directly inside the large object heap. Isn't one of the purpose of this library avoiding LOH allocations?
I know it's configurable so that's not a big deal. :) I'm just curious to know how these defaults were chosen.
Unfortunately, current version can not be referenced from signed assemblies.
Every time a write happens, we convert the stream's position to a block/offset tuple to know where we need to start writing.
For few, long writes, this is fine. For many short writes the overhead could add up.
Instead, we could track the current block and offset index so we already know where we need to start writing.
The lines 553-558 in the method RecyclableMemoryStream.Write(byte[] buffer, int offset, int count) implement a check whether the required capacity for the write operation exceeds MaxStreamLength:
long requiredBuffers = (end + blockSize - 1) / blockSize;
if (requiredBuffers * blockSize > MaxStreamLength)
{
throw new IOException("Maximum capacity exceeded");
}
This check is not only broken, but also superfluous.
It is broken for any scenario where MaxStreamLength is not a multiple of blockSize and streams with a capacity of exactly or close to MaxStreamLength. When a write operation is involving the last block, an IOException will be incorrectly thrown, since the expression requiredBuffers * blockSize
will necessarily become greater than MaxStreamLength in such cases.
It is also superfluous, and it should be safe to simply delete it. The code lines just above it are doing the same check about MaxStreamLength being exceeded (and this one seems to be correct).
#68 added some support for RecyclableMemoryStream
to read and write Span<byte>
/Memory<byte>
. It would be good to also add support for creating a RecyclableMemoryStream
from an existing Memory<byte>
.
RecyclableMemoryStream.GetStream
already has an overload that takes an existing byte[]
buffer:
MemoryStream GetStream(string tag, byte[] buffer, int offset, int count)
So the suggestion here is to add a new overload that takes an existing Memory<byte>
buffer:
MemoryStream GetStream(string tag, Memory<byte> buffer)
Can you add a ReadFully(Stream stream)
method to RecyclableMemoryStream
?
This could save an intermediate buffer allocation and a lot of memory copying.
I took a stab at it with the caveat that the RecyclableMemoryStream
can't be using a large buffer. You may want to remove this constraint.
You'll also probably want to have RecyclableMemoryStreamManager.GetStream()
return RecyclableMemoryStream
so that the method is accessible without a cast.
public void ReadFully(Stream stream) {
if (this.largeBuffer != null) {
throw new InvalidOperationException();
}
while (true) {
EnsureCapacity(this.length + 1);
var blockAndOffset = GetBlockAndRelativeOffset(this.position);
var block = this.blocks[blockAndOffset.Block];
int count = stream.Read(block, blockAndOffset.Offset, block.Length - blockAndOffset.Offset);
if (count == 0) {
break;
}
long end = (long)this.position + count;
this.position = (int)end;
this.length = Math.Max(this.position, this.length);
}
}
This repository does not provide any revisions that are tagged as release: https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream/releases
I believe it is worth to:
Because RecyclableMemoryStream is not overriding the CopyToAsync it means it's using the default one in Stream class which generates a byte[] buffer each time it's called.
This kind of defeats the purpose of using RecyclableMemoryStream.
Not sure why Stream doesn't support overriding CopyTo as well as this would also benefit from being overridden.
The work I am doing for my project deals with decompressing a file (like 10MB to a 40MB stream) and then running those streams through some patch program which in turn may output a 40MB stream that is then fed once again into a patch program many times. In this instance it means I am using like 5 or 6 40MB streams with a few seconds.
I've found this library significantly reduces memory usage but I can't really figure out what the options do. the only documentation specified is a blog post but it doesn't really explain what any of the options actually do (I don't deal with a lot of memory-related things). I have also found that the memory allocated for the pools doesn't seem to be returned, or returnable, unless I'm missing something. E.g. the app seems to allocate about 600MB of data (on top of 200 idle) but after it ends the app still sits at 800MB used. I understand you want to keep these pools around and allocated but is there a way to get rid of them? I only use them for a certain task, so once that task has finished, keeping it around is not beneficial. But the documentation has nothing that even looks at this kind of scenario.
The lack of intellisense makes using this library extremely difficult as I have almost no idea what some of the options do.
I do not know if this is just a silly question or a feature request. Suppose there are multiple separately maintained assemblies allocating RecyclableMemoryStream
s. Should each assembly declare its own static RecyclableMemoryStreamManager
? This goes somewhat against the grain of the idea of pooling, although I understand that block size requirements might be different. But still I assume that the defaults may be quite sane for a sizable range of applications.
So I wonder why is there no default, per-appdomain RecyclableMemoryStreamManager
? It would be accessed through a static property (RecyclableMemoryStreamManager.Default
), and the RecyclableMemoryStream
could then have a parameterless constructor (and other managerless constructors) that would use the default manager. As many other things in the BCL, the default pool may be made configurable through app.config as well.
When a RecyclableMemoryStream is created where requestedSize satisfies the condition:
(numberOfRequiredBlocks * BlockSize) > int.MaxValue
where
numberOfRequiredBlocks = Ceil( requestedSize / blockSize )
then the method RecyclableMemoryStream.EnsureCapacity loops until all available virtual address space is exhausted (running x64 code with much more than 10GB RAM available; thus it is not a problem with regard to available RAM or 32bit virtual address space).
The problem occurs at the execution of the following while loop:
while (this.Capacity < newCapacity)
{
blocks.Add((this.memoryManager.GetBlock()));
}
When given a requestedSize that satisfied the condition shown at the beginning of my post, the property Capacity will eventually overflow, thus not allowing the while-loop to exit. (1)
Proposed fix: Either make the Capacity property a long type (2), or add a sanity check that prevents Capacity from overflowing...
Foot notes:
(1) Strictly speaking, whether the while-loop eventually exits depends -- aside from the virtual memory size -- on the chosen block size and an "appropriate" requestedSize value. With the default block size of 128K, the Capacity property will overflow and eventually reach the value 0 again, thus turning the while-loop effectively into an infinite loop. When choosing some other 'odd' block size, Capacity property will still overflow, but not necessarily reach precisely 0 again, thus possibly allowing the while loop to eventually exit -- but chances are good that all RAM has been exhausted already before that would happen.
(2) It would be nice if RecyclableMemoryStream would support stream sizes larger than 2GB (i.e. not using int types but long types for all concerned method arguments/variables/fields/properties). It is not really a show stopper though, as one can split large data blobs into multiple MemoryStream objects that are wrapped into a custom Stream class representing those multiple MemoryStreams as one continuous stream...
Hi,
When installing RMS from NuGet in VS the IntelliSense doesn't show the comments/documentation. I think this is just a case of enabling the XML Documentation file in the Project Settings.
Cheers,
Indy
Can RecyclableMemoryStreamManager
be made CLS-Compliant?
The RecyclableMemoryStreamManager exposes an event called "StreamCreated". Is that triggered when a stream from the pool is used or only when a new stream is actually allocated?
Hello guys,
is this project still maintained?
Please override TryGetBuffer
(to call GetBuffer()
internally) to create the appropriate ArraySegment
object.
I have just played with a profiler and massively increased the performance of my RMS fork for my primary use case of write/read small buffers of several Kbs. Spreads/Spreads@cbeac8f
Buffer.BlockCopy
.WriteByte
instead of redirecting to Write
via a temporary byte[1]
. I also added non-virtual SafeWriteByte.Some of this is directly applicable to the original implementation. My fork is now incompatible with upstream so I cannot create a PR, just put it there FYI & discussion.
I had the idea of integrating RMS with System.Buffers shared pool for a long time - not only this is faster, but also reduces memory by avoiding a separate pool. But the default shared pool implementation could return a buffer larger that the requested size, so I created a custom implementation with a parameter exactSize
. Without shared pool modifications RMS could work with ArraySegments internally - that shouldn't be slower. And given that the shared pool returns a buffer that could be larger only by a power of two, that larger buffer could be split into several blocks and RMS will just increase capacity by more than one block. In Dispose()
we just need to check if currently returning ArraySegment has the same buffer as previous and do not return it twice.
unless there is a strong dependency on desktop CLR, can this lib be portable? especially can add coreCLR target?
I noticed that RecyclableMemoryStreamManager
raises events in the following manner:
if (this.BlockCreated != null)
{
this.BlockCreated();
}
which appears to be susceptible to a race condition: what if another thread unsubscribes the last handler from the event just before the invocation, but after the null check? Since the buffer manager class is supposed to be thread-safe, this should probably be fixed.
In his book "CLR via C#" (see pp. 264-265 in the 3rd edition), Jeffrey Richter recommends the following pattern for safely raising an event:
EventHandler blockCreated = Interlocked.CompareExchange(ref this.BlockCreated, null, null);
if (blockCreated != null)
{
blockCreated();
}
Alternatively, if C# 6 syntax can be used in this project:
this.BlockCreated?.Invoke();
After spending some time reading the code, it appears default behavior is to pool everything with no upper limit. I wasn't able to find that information in the documentation.
Can you check my understanding?
MaximumFreeLargePoolBytes
is never set, so the buffer gets added to the pool on dispose here.
The max potential memory usage of the large pool is
(maximumBufferSize/largeBufferMultiple) * MaximumFreeLargePoolBytes
or, stated another way:
the number of pools * MaximumFreeLargePoolBytes
If I don't call GetBuffer() and I don't call GetStream(asContiguousBuffer=true), I will only ever use small blocks.
So, if you rarely call GetBuffer(), a valid sizing strategy would be to create medium-sized small blocks (say 1/4 of the size of your expected common stream size), a large MaximumFreeSmallPoolBytes
, and a MaximumFreeLargePoolBytes
size of 1 byte to force unpooled large buffer allocation in the rare case you need it (if it was set to 0 large buffers would be pooled and retained indefinitely).
Is there anyway of making HttpClient use this?
I'm processing files from blob storage of up to 100Mb in a webjob and I'm trying to minimise the amount of disk I/O and also memory churn.
Any other suggestions gratefully received.
RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();
manager.GenerateCallStacks = true;
RecyclableMemoryStream stream = new RecyclableMemoryStream(manager, "Tag1");
StreamWriter writer = new StreamWriter(stream);
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
writer.Flush();
The second loop hangs at I value 1073611776
For example, Unity on some platforms (e.g. Android) replaces the body of methods marked with EventAttribute with throw new NotSupportedException("linked away")
, which results in runtime exceptions.
It would be nice to have automatic detection of ETW support (at worst ability to disable ETW from application code, e.g. set Events.Writer
to null).
As it is not recommended to use ToArray method because it defeats the purpose of using this library it should be marked with Obsolete attribute.
Hey implementors, great job on this project!
I had one question while looking at the code: I notice that you are "touching" another managed object (the manager) from the Dispose method on the stream, even when it is invoked from the finalizer. I was under the impression that was verboten, since the manager could have been gc'ed by the time the finalizer runs.
Is that no longer the case, generally?
I was benchmarking this library when I noticed that the buffer length from GetBuffer was very large. E.G., a 2056 length string resulted in a buffer with a length of 131072. Moreover, my serialization of serval floats (length 146) also resulted in a buffer with a length of 131072.
I really would not like memory chunks of 130kb. That scares me.
I feel like I am missing a very basic implementation detail. Am I possibly doing it wrong or is this a bug ?
Hello
I've recently started using this library to optmise a hot path in order to reduce allocations but I've found that creating a new RecyclableMemoryStream is quite expensive:
The id field (Guid) on the RecyclableMemoryStream is being initialized on the constructor regardless of whether ETW tracing is enabled or not and generating a new Guid is expensive (interop call + 16 bytes allocated).
I've changed the constructor code to only initialize the field if ETW tracing is enabled and it now looks like this:
I've pushed my changes in case you find this solution sensible.
I am using the RecycableMemoryStream inside a using clause,
when I profile a long runnig process I see it has ~100K instances of RecycableMemoryStream and ~200K instances of byte[] .
Is it possible this is due to a memory leak? I am not using more than a few instances at a time.
A MemoryStream
can be disposed twice or multiple times, but the RecyclableMemoryStream
throws an exception.
Consider this example:
var stream = storage.MemoryStreamManager.GetStream();
blob.DownloadToStream(stream); // may throw a StorageException
using (var reader = new StreamReader(stream, Encoding.UTF8)) {
// read data
}
Here, the stream
should be disposed if DownloadToStream
fails. However, when put in a using
block, an exception will be raised when the reader
is disposed (and configured to close the underlying stream, which is the default).
According to the Dispose
docs disposing multiple times should be allowed:
If an object's Dispose method is called more than once, the object must ignore all calls after the first one. The object must not throw an exception if its Dispose method is called multiple times. Instance methods other than Dispose can throw an ObjectDisposedException when resources are already disposed.
Alternatively the disposed
field could be exposed with a property to allow for explicit checks:
var stream = storage.MemoryStreamManager.GetStream();
try {... } finally { if (!stream.Disposed) stream.Dispose() }
... although supporting multiple Dispose
calls would be more convenient and less clunky.
This avoids allocations and copies in the Stream base class.
There are several unit tests in the UnitTests
project that check whether exceptions are thrown when they should be. This is done in two different ways:
[ExpectedException(typeof(TException))]
custom attribute on the test method, orAssert.Throws<TException>(() => โฆ);
.Using [ExpectedException]
is problematic for at least two reasons:
Suggestion: For these two reasons (and, to a minor degree, for consistency's sake) all test methods using [ExpectedException]
should be converted to Assert.Throws
.
add implementation with int64 capacity
The manager by itself can be quite a useful thing without using streams. An example use case could be reusing buffers with socket io.
It would be helpful if this library was re-targeted to build for .Net Standard, so we could use it in .net core projects and other places.
Thanks!
As netStandard code is already merged in master.
Can you please publish a nuget package which supports net standard?
using (var ms = memoryStreamManager.GetStream(nameof(ZipOutputStreamHelper)))
{
using (StreamWriter writer = new StreamWriter(ms))
using (JsonTextWriter jsonWriter = new JsonTextWriter(writer))
{
var ser = new JsonSerializer();
ser.Serialize(jsonWriter, sequence);
jsonWriter.Flush();
}
return ms.ToArray();
}
We use a pattern like this in our code. Basically, the implementation of the JsonTextWriter is such that it doesn't finish off writing to the stream until it is disposed. Calling Flush doesn't seem to finalise it. This worked fine as a MemoryStream, but not as a RecycledMemoryStream. Because Streams call Dispose on their underlying Streams we end up disposing the RecycledMemoryStream before we access the array.
Is there a way to change this at all? Or is it going to be a case of finding a way to correct the behaviour of the JsonTextWriter so that we can finalise the stream before calling ToArray?
hi, I was wondering why and an int was chosen for the datatype for the position and length of the stream when a long would allow us to use streams larger than 2GB.
Hi all,
I am trying to read a huge file (2.5 gigabyte) file, but it allways ends up with: "Unhandled Exception: System.IO.IOException: Maximum capacity exceeded" and the Ram is allways around 100% taken. What I am doing wrong?
`int blockSize = 10;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;
var manager = new RecyclableMemoryStreamManager(blockSize,
largeBufferMultiple,
maxBufferSize);
manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;
RecyclableMemoryStream memoryStream = new RecyclableMemoryStream(manager);
using (FileStream fileStream = File.OpenRead(@"C:\Temp\test.bin"))
{
// MemoryStream memoryStream = new MemoryStream();
fileStream.CopyTo((memoryStream);`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.