Coder Social home page Coder Social logo

Update LZMA SDK about managed-lzma HOT 8 OPEN

weltkante avatar weltkante commented on June 20, 2024
Update LZMA SDK

from managed-lzma.

Comments (8)

weltkante avatar weltkante commented on June 20, 2024 1

Yeah I probably should provide Stream wrappers around the new API in case someone wants to use LZMA without 7z. The API design goal was to avoid as many unnecessary copies as possible, in particular when chaining encoders/decoders in the 7z case, because the previous version lost a lot of performance by copying data around.

The project itself should be pretty clean to compile, there is a shared source project and a lot of other projects referencing it. Just unload all projects except the shared source and the instantiation of the library you want to use.

About LZMA2: its just a multithreaded extension on top of LZMA (i.e. no change in algorithm), separating the data stream into (large) chunks and compressing them in parallel. You lose compression (because each chunk begins with a fresh context) but gain speed (using all cpu cores). If you prefer compression over speed you should always use LZMA and not LZMA2 (unless the source data is not compressible, LZMA doesn't handle that well, LZMA2 can include uncrompessible data literally).

Fair warning, since you mentioned playing with the BCJ2 encoder/decoder: there is issue #24 which I consider pretty worrying, unfortunately I didn't have the time to figure out what the cause is yet. There is some unknown bug or incompatibility in the 7z and/or BCJ layers in my codebase. LZMA shouldn't be affected as far as I can tell.

from managed-lzma.

weltkante avatar weltkante commented on June 20, 2024

Seems I forgot to update this issue. When I looked into it back then it appeared as if none of the changes to the LZMA code was necessary to include, so I had dropped priority on this issue. Creating a stable public API had higher priority since then.

from managed-lzma.

daPhie79 avatar daPhie79 commented on June 20, 2024

hi weltanke! i'm really happy to have finally found your project, as i've been trying to find someone who's properly ported 7zip archive support on c#. somehow i couldn't find you on google or other search engines. the project only popped up after a very precise research on github and in like the second page, lol

so, to the point. i've also undertaken the task of making a 7zip read/write archive library for c#. i was pretty much done with header parsing and streams processing when i found your library, so i'm not giving up.

i just wanted to tell you, don't know if you've tested it, but using your translated code, i get slower decompression, but faster compression, compared to the official c# lzma sdk. on my test files, 81 nes rom files, i get:

  • compression in 10 seconds with lzma sdk, and ~7 seconds with your code
  • decompression in 1.6 seconds with lzma sdk and 2.6 seconds with your code

i'm currently weighing the pros and cons of using both libraries, especially since my goal was to have a tiny library, and the more code i keep, the bigger the lib gets. it's at roughly 170kb with your code only, and 130kb with the sdk's lzma decoder/encoder. it's such a dilemma, lol

from managed-lzma.

weltkante avatar weltkante commented on June 20, 2024

Last time I looked the "official" C# lzma sdk is actually not official, as far as I understand its a contribution re-ported from the java port and is not maintained, so its quite outdated compared to the C/C++ codebase. If the timestamps in the source archives are to be trusted this situation hasn't changed.

The fact that my decompression runs slower is interesting, I'll try and see if I can reproduce that (I can assume you have taken enough care to feed both decoders the same stream to make sure the performance is comparable?)

As far as size is concerned, my library includes LZMA2 and 7z which both is not included in the lzma sdk, so obviously the size is a bit larger, can't help this. If a 50kb size difference is that important for you then you should consider tools like ILMerge to inline dependencies and throw away any classes you don't need (also other tools allow to gain additional size benefits by renaming private/internal symbols).

from managed-lzma.

daPhie79 avatar daPhie79 commented on June 20, 2024

you're totally right, i tried tracking to see if there were any actual updates to this "official lzma sdk", but the more recent updates date back to 2009. it's not "up to date" indeed haha!

as for decompression speed, it seems to be the case. i don't know if it's because the code is simpler (it looks simpler to me, i'm more intimidated by the c/c++ source code than that one, but often times it doesn't mean anything, lol) but yeah, that code seems to agree more with C#'s JIT compiler. i get almost twice the performance out of it.

i know your code is multi-threaded optimized when it's possible though, like with LZMA2 right? in my case though it's not a priority, and i'm still a bit shy of doing heavy multi-threaded stuff. i implemented your code in my project in a single threaded manner.

as for size, i was comparing your project integrated in my project versus SharpCompress' adaptation of the official LZMA sdk also integrated in my project. their version of it can decompress LZMA2, although it does not compress with it, only LZMA. i would've been happy with it, if it wasn't so slow at compressing! i also tested decompressing an EXE compressed with BCJ2 + LZMA streams, and the cumulative difference was striking.

all in all though, your project seems to have been quite the undertaking! i have difficulty just compiling it since it has so many subprojects, and i had an issue with file security preventing the native project to compile, etc. took me a while, but i'm now able to understand, at least partially, how you implemented the decoders in your new approach, even though i personally still prefer the basic stream approach in your legacy approach. it's funny because i built my own 7zip parser before finding out about your project and before i was able to understand the actual source code, so i went a slightly different approach to all the other projects, but in the end it's almost the same result :)

from managed-lzma.

daPhie79 avatar daPhie79 commented on June 20, 2024

trying to picture this in my mind, but i think using simple stream chain, there is no useless copying done? i mean every decoder will read a chunk and decompress it in supplied buffer, but this can't really be avoided?

thanks for the tip, if i use your library i will keep that in mind! it is quite the extensive project, seriously!

i didn't experience issues decompressing a BCJ2 -> 3 x LZMA + 1 x LZMA stream yet. i am using your legacy filters though. i back-converted your PPMd decoder to a stream form as well to fit with the other decoders :P

if you have a bit of spare time, tell me what you think of the way i implemented 7zip header parsing and streams decompression ^_^

from managed-lzma.

weltkante avatar weltkante commented on June 20, 2024

I've managed to repro the decoder performance issue with the Decoder class - I assume you were testing that and not the AsyncDecoder ? Having deadlocks in that one ... Anyways I'll try to figure out why its slower than the sdk code, it shouldn't have to be, both are based on the same algorithm after all. I'll open another issue to track it.

Coming back to the code I realize how much of it was left unfinished. The decoder API is indeed not great, had to rush it to get something usable out. It basically just exposes the underlying mechanics in a safe way.

A lot of the async API is also currently placeholder. It works (or is supposed to work) but the currently implemented locking is crude and I know better techniques for this kind of async programming (store a delegate and call it when you are done, writing custom awaitables). Unfortunately its more work and I never had the time to finish it properly.

from managed-lzma.

daPhie79 avatar daPhie79 commented on June 20, 2024

I'll keep tabs on your progress :) I'm not very good with async code at the moment myself, it's so confusing! And the deadlocks are so frickin' everywhere, lol

from managed-lzma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.