Coder Social home page Coder Social logo

Bitmaps? about numsharp HOT 44 CLOSED

scisharp avatar scisharp commented on June 19, 2024
Bitmaps?

from numsharp.

Comments (44)

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@fdncred There is a sample for loading MNIST image dataset into NDArray. 10K image data, works very well.

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@Oceania2018 Thanks for the info but there aren't a lot of 10k images that need processing. For instance, one I'm working now is 2531 x 2081 x 3 bpp, which makes for > 15.8 million bytes, in a 3 dimensional array. Obviously, anything big like that would be slower but I'm not sure if I want to find out how much slower. I may have to code it up just to see.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@fdncred Hi, I havn't test that large dataset. There is definitely way to optimize it. Like use yield keyword. Read image one by one and feed into neural network.

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@Oceania2018 I experimented with Bitmaps and it just crashes visual studio when I try to inspect the np variable. I take that to mean that the arrays are so large that NumSharp can't handle it or I've constructed the numpy array incorrectly. I suspect I'm not using NumSharp correctly. Any ideas?

My system is pretty beefy - 16GB Ram, 12-CPUs.

This is what I did.

  1. Make your project multitargeting following this guide.
  2. Build the net472 version of numsharp and include the assembly in my other project.
  3. Add this code and the image below and set a break point at the end so I can inspect the np variable.

Note: If you uncomment the byteImage code, my code creates a 3 dimensional byte array. I use this in other places and works great. This code was meant for testing and only really handles 32-bpp and 24-bpp images.

The intent of the code was to create a 2d array of image data by following the example of Array2Dim TestMethod. I know it's not right since np[0] only has two values where it should have 3. I'd like to create a 3d array like with byteImage but I'm not sure how to do that with NumSharp.

private void BitmapToArray(string notes1a)
{
    var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
    var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
    var dataSize = bmpd.Stride * bmpd.Height;
    byte[] data = new byte[dataSize];
    Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
    bmp.UnlockBits(bmpd);

    var includeAlpha = false;
    var stride = bmpd.Stride;
    //var byteImage = new byte[bmpd.Height][][];
    var w = bmpd.Width;
    var dataLen = data.Length / 4;

    var np = new NumSharp.NDArray<List<int>>();
    var list = new List<List<int>>();

    for (int i = 0; i < dataLen; i++)
    {
        var x = i % w;
        var y = i / w;
        //if (x == 0)
        //    byteImage[y] = new byte[w][];
        var o = (y * stride + x * 4);
        if (includeAlpha)
        {
            //byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
            list.Add(new List<int>() { data[o], data[o + 3], data[o + 2], data[o + 1] });
        }
        else // FYI - Data is in BGR layout
        {
            //byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
            list.Add(new List<int>() { data[o + 3], data[o + 2], data[o + 1] });
        }
    }
    np = np.Array(list);
}

notesa1

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred hm I will check your code on Visual Studio Code, Windows, .NET Core 2.1 and maybe I try to use NDArray<double[]> .... somehow. I am not 100% sure if the List is best data type .... we use it in tests often because the lists (arrays) are small in tests. but it is possible to use double[]- so C# arrays instead. they have much better performance.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred not sure if it is important. But what operating system you use? normal Windows?

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

Windows 10 1809 Build 17763.55 64-bit

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

This may be closer but still not right because the shape is wrong.

private void BitmapToArray(string notes1a)
{
    var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
    var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
    var dataSize = bmpd.Stride * bmpd.Height;
    byte[] data = new byte[dataSize];
    Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
    bmp.UnlockBits(bmpd);

    var includeAlpha = false;
    var stride = bmpd.Stride;
    //var byteImage = new byte[bmpd.Height][][];
    var w = bmpd.Width;
    var h = bmpd.Height;
    var dataLen = data.Length / 4;

    var arr = new NumSharp.NDArray<NumSharp.NDArray<NumSharp.NDArray<byte>>>();
    arr.Data = new NumSharp.NDArray<NumSharp.NDArray<byte>>[h];
    for (int i = 0; i < dataLen; i++)
    {
        var x = i % w;
        var y = i / w;
        if (x == 0)
        {
            //byteImage[y] = new byte[w][];
            arr[y] = new NumSharp.NDArray<NumSharp.NDArray<byte>>();
            arr[y].Data = new NumSharp.NDArray<byte>[w];
        }
        var o = (y * stride + x * 4);
        if (includeAlpha)
        {
            //byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
            arr[y][x] = new NumSharp.NDArray<byte>();
            arr[y][x].Data = new byte[4];
            arr[y][x].Data[0] = data[o];
            arr[y][x].Data[1] = data[o + 3];
            arr[y][x].Data[2] = data[o + 2];
            arr[y][x].Data[3] = data[o + 1];
        }
        else // FYI - Data is in BGR layout
        {
            //byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
            arr[y][x] = new NumSharp.NDArray<byte>();
            arr[y][x].Data = new byte[3];
            arr[y][x].Data[0] = data[o + 3];
            arr[y][x].Data[1] = data[o + 2];
            arr[y][x].Data[2] = data[o + 1];
        }
    }
}

I'm trying to match the array from this python code. Which is shape(2531, 2081, 3).

pil_img = Image.open(filename)
img = np.array(pil_img)

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

Understand. When at home maybe will try to extend array method for this. By the way. Thanks for show us the python code. It is important that we really match the APIs as well as possible.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred u use the code from github and builit or u take the nuget package? Just to know how to support ur case best

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

I downloaded and compiled the code from Github.

That's what I meant when I said this above.

This is what I did.

1. Make your project multitargeting following this guide.
2. Build the net472 version of numsharp and include the assembly in my other project.
3. Add this code and the image below and set a break point at the end so I can inspect the np variable.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

Ah yes. Lol sorry my fault. Ok will test it at home.

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

No worries, thanks for your help.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@dotChris90 Do you think we should refactor NDArray class to every specific generic type? separate NDArray to NDArray<double[]> or NDArray where T is limited to value type, and change

public IList<TData> Data { get; set; }

to

public T[] Data { get; set; }

For 3 Dim will be

public class NDArray3<T> 
{
    public T[,,] Data { get; set; }
}

I thought this will definitly get the best performance.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 yes maybe we should consider some restructure.

Performance
I was really surprised to read that the jagged array double[][] should be faster than double[,]. On Stackoverflow it was often mentioned and in http://www.monitis.com/blog/improving-net-application-performance-part-13-arrays/ the author gave a reason for this. I am not 100% sure if this is really true - just want to mention it here.

Generic aspect
Honestly I was thinking if it is more user friendly if ( it is really just a consideration and not a 100% sure ) the generic type T is exactly our array storage (the member "Data"). We could restrict the generic type to classes which implement IList or which are children of Array and implement IList (not sure if it is possible to do multiple restriction). The users than can be 100% sure what they are construct. At moment I think it is complex for many. This complexity will be reduced. look example

var A = new NDArray<double[][]>().Array( .... );
var b = new NDArray<double[]> ().Array ( .... );

var c = A.Dot(b);

It is quite clean since you see "ok A is array of array --> so a matrix" and "ok b is array".
NDArray is at the end just a adapter class which extending the existing arrays in C# world.

So a NDArray will look like this

public partial class NDArray where TData : IList
{
public TData Data {get;set;}
}

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 what you think? Honestly I do not want to start sth like "NDArray2" or "NDArray3" because it is not numpy API ;)

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

An alternate approach is to compile the numpy source code into a c++ dll and then p/invoke calls out of it. This is kind of what python does. Numpy isn't written in python, just the wrapper is. Then you'd have all the speed of numpy and one would have to figure out how to marshal data back and forth.

Update
I take it back. After looking at the numpy source code and libopenblas I'm not sure p/invoking would even be possible. What a mess. No wonder no one else has done it.

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

But I did find this. Looks like it could be helpful.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@dotChris90 I like the jagged array.

var A = new NDArray<double[][]>()

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 ok - if you do not mind I would do a totally restructure at Friday and weekend (have some holidays). I suggest just one of us (so me) do this because it also include changing the unit Tests etc.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

I have another idea. What about create new class named NumSharp, it will be equivalent np when you do bar np = new NumSharp(). then np.arange(10). NumSharp will act like a router.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

NumSharp will hide the mass of NDArray. I agree with you. You will do the restructuring. Appreciate.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred interesting. Seems NumSharp is not the only project try to reconstruct numpy lol Thanks for post. I just think that in .NET Core 3.X the .NET system will bring a lot more stuff for machine learning, array performance and so on. That is the reason I avoid using C. AT the moment. But if we find out in 2019 that .NET Core 3.0 does not bring us what we wish we will go with C maybe. And about the Numpy project. I think at moment they use their internal mechanism by including the Python.h in their files. If we want to integrate this into .NET it feels a little bit too much wrapping and we still have to implement the classes. I really would like to see if the numpy team would writing their stuff in C and compile to shared object and linking their python object code to this shared object. Anyway maybe we can have a look on their Github repo :D

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 honestly the np = new NumSharp(). is a fantastic idea. lol this makes all stuff look more like numpy. We could try to use .NET script or Powershell and make some examples. after restructure the array stuff.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@dotChris90 Sounds great, let's do it. I will add a NumSharp class, you do the NDArray restructing.
Another advantage of NumSharp is making our API more stable for high level usage. Just change NDArray implementation, better encapsulation of OOP.

var np = new NumSharp();
np.arrange(10);

@fdncred Are you interested joining this project?

Created a new issue #34

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@dotChris90 I posted that C++ link in order to help port to C#. For me, at least, it's easier to read C/C++ and turn it into C# than it is for me to ready python and turn it into C#. Here's another C++ port of the NumPy functionality with help. Again, may just be useful seeing how other people reinterpret numpy.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred I made Testmethod for your case to try and play around with this use case of byte[][][]. In Visual Studio Code the debugger for this image working fine - slow but fine. but not for our NDArray - i just tried at moment for byte[][][].

I would suggest I will do the restructure of our NDArray this week and extend Array method. I will let you know when finish. Honestly until now we did not think about Tensor types like byte[][][]. Maybe that was the reason the Shaping method does not work proper. When finish the restructure will let you know. And You can try than sth like

var myArray = new NDArray<byte[][][]>().Array(new Bitmap("pathToImage"));

For now - if you want to play around with the code now - Maybe you could try to make :

var myArray = NDArray<byte[][]>(); // so a NDArray of byte array of byte array - but it looks like matrix thats why want to restructure

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred sure. Honestly the link was interesting. and totally agree with you. C++ and C# are much closer to each others than Python. Even python is a nice language but ... lot of things are missing. Generics - just as example. Maybe will have a look deeper in this C++ projects

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred just question of curiosity. What API you suggest to implement for byte[][][]? In other words - what would be good to see for images?

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred @Oceania2018 I checked your link https://xtensor.readthedocs.io/en/latest/numpy.html amazing! but I asked myself - It is not possible to have an array<double,2> generic - am I right? Because this looks extreme nice for users. But I never saw this in C# or in general .NET world.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@dotChris90 I just disucssed with someone else. We have an other solution. Please hold on. Don't do any change.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 ok will do nothing for today. But what was discussing about the NDArray<double[][]> , the Bitmap or np = new Numsharp? :D

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

I pushed code. Please refer NDArrayOptimized. All data should be persist in a one dimension array. NDArrayOptimized will parse the 1d array to any dim array only when data is used.

        [TestMethod]
        public void arange()
        {
            var np = new NDArrayOptimized<int>();

            np.arange(3);
            Enumerable.SequenceEqual(np.Data, new int[] { 0, 1, 2 });

            np.arange(7, 3);
            Enumerable.SequenceEqual(np.Data, new int[] { 3, 4, 5, 6 });

            np.arange(7, 3, 2);
            Enumerable.SequenceEqual(np.Data, new int[] { 3, 5 });
        }

NDArrayOptimized will return corresponding form according different Shape like shape(3, 5) by parse 1 dim array to n dim array.

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@dotChris90 I don't understand your question about what API for byte[][][]. Sorry. Having an image in a byte[][] or byte[][][] is only useful as it relates to numpy's algorithms. If you look at this python project you'll see how they're using it. This python project is where I got the idea to use NumSharp when I converted it to C#.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@fdncred I think we would do it like this:

var np = new NDArrayOptimized<byte>();
np.reshape(n1, n2, n3);
// load image bytes into np

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@Oceania2018 That seems intuitive to me as long as it returns np[height][width][byte[3]]. I think that's what python is doing but maybe it's returning tuples - I can never tell with imaging on python.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred the project page is enough. Before usual always working with time series. not too much with images. :D So I dont know well which functions are used mostly. Just was looking for some inspiration or use cases.

from numsharp.

Oceania2018 avatar Oceania2018 commented on June 19, 2024

@fdncred I created a 1M bytes, cost 38ms.

image

image

@Oceania2018
I get similar performance.
perf
With more realistic bitmap dimensions
bmpsim
Not sure why my shape & ndim is different.
bmpsim-expand

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 Do I really understand you well? So you want store everything (1D,2D,3D,...ND) in a single array? The properties like Shape decide the dimension? Do not get me wrong but this will leads to some ... problems I think.

1 ) our methods will get longer and not so well structured. Until now we can have "MethodName(NDArray< double >)" and "MethodName(NDArray<double[]>" to differ between vector and matrix. Since polymorphism we can have 2 different methods with the same name but different parameters. If our objects are always NDArray you can not make this but instead always have to do a huge "if else" structure. If method see it is vector do this, if matrix this. So this leads to less files but also increase the danger of "people have to work on 1 file at the same time".

2 ) It is not totally OOP in my opinion. In OOP we say "This is a matrix and it has this methods and properties" and this is a vector with properties and methods. But here we say it is an array - It can be anything. That is dynamic interpreter style not compiler. It is python - not C#.

3 ) Performance. Sorry I say but I am not sure if a huge array brings better performance. We should do some tests to find out best but I am very sure jagged arrays are faster than 1 huge array and you have to search the elements first with every access.

4 ) Do you really, really want to rewrite all the operations and methods? It will be hard because on Stackoverflow you will find code examples with double[][] and so on but never a example with double[] for a matrix.

5 ) Why you want to create your own Array? .NET world already has very fine and optimised ways for arrays. Python not so they developed from scratch. So we should always stay with this array type system.
Thats why I suggested NDArray<double[][]>. I know at end it is a jagged array. So I know at end the corresponding type and most important we can keep NDArray as an adapter class and not a new fancy own class - .NET does not need this.

So please give me some reason why NDArray< double > matrix = new NDArray< double >().Array(...) is better than NDArray<double[][]> matrix = new NDArray<double[][]>().Array() ?

I know the QuantStack do it and I find little bit weird since I CAN NOT SEE FROM MY CODE WHAT Numeric TYPE I HAVE. Have a look again :

var matrix = new NDArray<double[][]>() // I can see 100% it must be a matrix
var matrix = new NDArray< double >() // Is it a vector, a matrix? no! it is something we do not know -.-

So give me some arguments and pros. I do not want something like "because QuantStack do like this". I want sth like "better performance for Matrix Multiplication" because honestly all the points I listed at moment makes me feel not comfortable with the QuantStack solution.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@Oceania2018 will open an other issue to discuss this. Bitmap it not the best name for this ;)

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred I pushed an array method to NumSharp which accepts a bitmap object as input parameter. You can try and play. :)

With the new 1D array strategy we could simple take the byte array of this Marshal. Copy method and put into NDArray Data property. Just need to set the shape as height wideth 3.

Only thing I don't understand is that the order of rgb vector is different now in numpy and Marshall. Copy. Shall we correct this?

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred took your image and example code for the method and unit test.

Hope u don't mind :)

from numsharp.

fdncred avatar fdncred commented on June 19, 2024

@dotChris90 I'll take a look at it. I have no problem with you using any of the code I've pushed or put in issues, so feel free to use it without question.

The thing about dotnet bitmaps is they're stored in BGR format. So that may be why the vector is different. So, typically there's a byte swap of R & B to get them aligned properly.

I see some things I'd change but this is definitely a good start. We just have to figure how what BPP we will support and be able to handle those flavors of bitmap.

For speed purposes we could also use unsafe calls on bmp.Scan0 instead of marshaling. Marshaling isn't exactly fast, but we can decide that later.

from numsharp.

dotChris90 avatar dotChris90 commented on June 19, 2024

@fdncred totally agree. First let make a nice start for NumSharp.

:)

from numsharp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.