Coder Social home page Coder Social logo

Comments (8)

austinksmith avatar austinksmith commented on May 22, 2024

I was expecting the workload to be distributed among these 10 threads. But after some console logging, I see they are getting the exact same array and they are doing the exact same work, just now 10 times. How does that help? Maybe the example code just doesn't explain this but I was expecting either:

Testing in my console shows that using v4.1.0 that the result of your supplied function

const foo = () => {
  var params = {
    'array':[0,1,2,3,4,5,6,7,8,9]
  };
  hamsters.run(params, function() {
      var arr = params.array;
      arr.forEach(function(item) {
        rtn.data.push((item * 120)/10);
      });
  }, function(output) {
     console.log(output);
  }, 10);
};
foo();

Is exactly what it should be which is an array containing 10 subarrays as your final output, since you have not asked the library to aggregate your results back together. If you want to have a single output you need to change your logic to

const foo = () => {
  var params = {
    'array':[0,1,2,3,4,5,6,7,8,9]
  };
  hamsters.run(params, function() {
      var arr = params.array;
      arr.forEach(function(item) {
        rtn.data.push((item * 120)/10);
      });
  }, function(output) {
     console.log(output);
  }, 10, true);
};
foo();

As far as

But after some console logging, I see they are getting the exact same array and they are doing the exact same work, just now 10 times.

This isn't the case, your array assuming you are defining multiple threads WILL be split across as many threads as you have specified, the same operation you defined before will be executed across all items in the array regardless of what thread they make use of. Inspecting http://hamsters.io/performance while pasting your function into the console shows me that each thread is in fact getting a subarray equal to Array.size / threads. Perhaps the problem you're having is that you aren't making use of a real worker implementation and you're seeing the behavior of the legacy mode which honestly should be following the same exact process so I'm at a loss as to what problems you're seeing.

Could you help me understand how this library should be used? If you need an example, say we have a big array of size n (n > 10k) of positive integers and we are trying to find (x) => x * (x - 1) * (x - 2) * ... * 2 * 1 for every item in the array. Since the data items have no dependency on each other, ideally we could spawn n threads where each one would grab one number and start calculating. Could you illustrate how this could be done with this library?

That's exactly what is illustrated above, you've just forgotten to tell the library you want a single output.

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

Sorry maybe I didn't make it clear in my first statement. By "some logging" I meant more console.logs than I showed in the code, I removed all of them to make the code simpler and closer to the original example code in readme. It would look more like this in my testing:

const foo = () => {
  var params = {
    'array':[0,1,2,3,4,5,6,7,8,9]
  };
  hamsters.run(params, function() {
      console.log('params', params);
      var arr = params.array;
      arr.forEach(function(item) {
        console.log('run');
        rtn.data.push((item * 120)/10);
      });
  }, function(output) {
     console.log(output);
  }, 10);
};
foo();

I saw "params" in the logs 10 times, each with identical content, and "run" 100 times, which is exactly the size of the array times the thread count I provided, while I'm expecting "run" only 10 times.

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

As of the log from the output function, for me that was:

[ [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ],
  [ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108 ] ]

while I was expecting something like:

[ [ 0 ],
  [ 12 ],
  [ 24 ],
  [ 36 ],
  [ 48 ],
  [ 60 ],
  [ 72 ],
  [ 84 ],
  [ 96 ],
  [ 108 ] ]

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

You need to provide your entire example source for me to understand why you are seeing different behavior from both the provided benchmark example logic and jasmine tests.

const foo = () => {
  var params = {
    'array':[0,1,2,3,4,5,6,7,8,9]
  };
  hamsters.run(params, function() {
      var arr = params.array;
      arr.forEach(function(item) {
        rtn.data.push((item * 120)/10);
      });
  }, function(output) {
     console.log(output);
  }, 10, true);
};

Using 4.0.0 functions exactly as it should and my final output is in fact how it should be which is a single array that looks like [0, 12, 24, 36, 48, 60, 72, 84, 108]

As mentioned before, you are not telling your output to be aggregated into a single output, pay special attention to #5 on the how it works section of the read me.

This optional argument will tell the library whether or not we want to aggregate our individual thread outputs together after execution, this is only relevant if you are executing across multiple threads and defaults to false.

Additionally the documentation does not say to declare hamsters as a constant, the hamsters object should never be treated as immutable because the library modifies it self during runtime.

var hamsters = require('hamsters.js');

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

As mentioned before, you are not telling your output to be aggregated into a single output, pay special attention to #5 on the how it works section of the read me.

I'm aware of that and I didn't expect the result to be aggregated (as I plan to do the aggregation myself). What I was saying was the result contains redundant data that apparently is produced by redundant work.

I'll use the aggregation flag, hopefully it's easier for you to see what I mean.

image

I'm printing the amount of data each thread received, as well as the amount of work cycles they did. With 2 threads, the total amount of work cycle should always be the input data size, which is 10, but we are seeing 20 cycles (runs).

Also, with the aggregation flag set, the output contains redundant data.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

So I've managed to reproduce this issue only under the following conditions

  1. The library is making use of legacy mode which means making use of the main thread.
  2. The function invoked is using more than 1 thread
  3. The function invoked is using a non typed array, using a typed array does not suffer the same issue.

My debugging so far leads me to believe this is an inheritance issue and a race condition based on the time slicing behavior of setTimeout causing multiple "threads" to modify the same array, I'll have a fix ready in the next release version which shouldn't be too long.

In the mean time my recommendations are to use a 3rd party worker implementation with Node.js as this is only going to affect the legacy mode of the library and only within Node or browsers that do not support web workers eg. IE9.

Thanks for spotting this, it's very hard to debug multithreaded logic so understand my requests for more info are because I wasn't seeing the problem until I recreated every condition of your setup.

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

I'm absolutely glad to help. I was looking for easy-to-use node parallelism solutions and came across this library and I think it is very promising.

By the way, could you comment on my side note there? I'd assume there's some limitation that I'm not aware of forcing you to design the API this way but wouldn't it easier for people to understand to have the thread function signature like this:

function threadFunc (params, report) {
  // Do work here with data provided in `params`.
  var data = params.array.map((v) => v + 1);
  // Send results back with the `report` function.
  report(data);
}

Or simply:

function threadFunc (params) {
  // Do work here with data provided in `params`.
  var data = params.array.map((v) => v + 1);
  // `return`ed data is automatically collected by the main process.
  return data;
}

I know in the second case it's probably harder to allow async work, in which case Promise probably could be used. My general suggestion is to make the thread function look more like a normal function, instead of using some seemingly undefined "internal variables".

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

I've gone ahead and pushed out v4.1.1, please update to v4.1.1 and your issues should be resolved.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.1

It's unfortunately not possible to do it that way unless they added that functionality as a first class part of the language.

from hamsters.js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.