Coder Social home page Coder Social logo

Comments (23)

Zodiase avatar Zodiase commented on May 22, 2024

FYI, using [email protected] is fine, while 4.1.0 throws the error.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

Please take a look at #23 , Node.js does not ship with a native web worker implementation which is why you are seeing the Worker is not defined error. The library should be working in legacy fall back mode assuming you haven't modified the package.json to point to the es6 version in which case you won't see the performance benefits you're looking for since everything will still be executed on the main thread until you add a 3rd party worker implementation. I'll go through the library to ensure it's working properly in legacy mode for the next release, let me know if that clarifies things for you.

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

I for some reason was under the impression that for node this library would not be using Worker but something else like child_process for example. I guess I drank too much coffee at night...

Since I don't see Web Worker coming to node any time soon, it should be very beneficial to state this piece of dependency more clearly in the documentation. Pointing to a preferred Worker implementation in the documentation, at least for node users, would be even better.

The current readme, this part:

Environment Support

  • All major browsers IE9+
  • Inside of existing web workers (threads inside threads)
  • Javascript shell environments
  • React Native
  • Node.js

reads more like saying the use of "web workers" does not apply to node.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

The library does not make use of child_processes since those are not relevant to many places the library is executed, I didn't feel as the project maintainer it would be necessary to explain Node.js and what it does & does not support out of the box as anyone using Node.js should already have an understanding of what it comes with. That being said the library definitely does function within Node.js, and future documentation will try to cover the shortcomings that Node.js has when it comes to native threading however any third party worker implementation that adheres to the worker spec will function with Hamsters.js

As I mentioned in #23 the third party library recommendation I have at the moment will be this one https://www.npmjs.com/package/webworker-threads, just make sure you define Worker before you define hamsters, additionally for the sake of sanity do NOT declare hamsters as a constant as I don't want immutability to interfere with the library functionality, you can freely declare hamsters using either let or var but you should probably avoid const.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

I've gone ahead and pushed out v4.1.1, please update to v4.1.1 and your issues should be resolved.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.1

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

Same as csterritt said here #23 (comment), I'm not able to get [email protected] working with webworker-threads in node.

There's an issue blocking installing webworker-threads in node v7.7.0 so I'm using node v6.10.2.

with these modules:

  "dependencies": {
    "hamsters.js": "^4.1.0",
    "webworker-threads": "^0.7.11"
  }

Case 1

Without using webworker-threads, it obviously complains about Worker not defined.
image

Case 2

With require("webworker-threads");, still says Worker not defined.
image

Case 3

With global.Worker = require("webworker-threads").Worker;, the main process runs non-stop and won't end as long as I've waited. Definitely longer than 1 minute and at which point I killed it. The job should finish in less than 1ms sequentially...
image

Do you know what might be going on? Since you are claiming it supports node, the library must had worked for you. What combination of packages and platforms did you successfully achieved parallelism?

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

Please create a new ticket for the new issue you're encountering, v4.1.1 solves the problem related to this ticket.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

Reopening this, I mixed it up with the other one you had open.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

I'm implementing a fix that will be released in 4.1.2 which will allow this to work with any 3rd party implementation, the solution i had previously isn't working currently. If you use 4.1.1 for now you will be able to write your logic and have it in place for when 4.1.2 drops.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

@Zodiase take a look at this release please and let me know if you encounter more issues.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.2

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

@Zodiase this issue will close within 48 hours without a response.

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

Process still won't terminate or yield any result. Tested with node v6.10.2 and this project setup: https://gist.github.com/Zodiase/3c8b9d9862ec3ac9c486ec75b99f341a

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

@Zodiase I've gone ahead and used your example and resolved the issue you are mentioning, it looks like a change I made back in v3.9.* caused the recent incompatibilities. This is now resolved and I've successfully tested your example using this release. Please give v4.1.3 a try and report back, thanks.

https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.3

from hamsters.js.

Hizoul avatar Hizoul commented on May 22, 2024

(sorry for barging in although im not Zodiase I did encounter the same problem with previous builds, but since this Issue was already there, did not report a duplicate)
I can confirm that v4.1.3 fixed the Issue of not being able to use node webworker implementations.
I confirmed this using Zodiases Example and with custom code.
webworker-threads works, but I ended up using tiny-workers, which works too, because it supports require() statements.
tested on linux with node v7.10

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

That's great to hear @Hizoul as I had only tested it with web-workerthreads so far, I think the implementation should be fairly consistent across the board. In the future don't worry about commenting on an open issue, the more people able to shed light on an issue the better. If you don't mind me asking what is your current use case for the library and how is the performance you're seeing in Node? Node is the least tested and performance profiled platform for the library as support has been a bit of a hassle to get right.

from hamsters.js.

Hizoul avatar Hizoul commented on May 22, 2024

Well my use case is rather an edge case for now, just playing around with isomorphic multi-core usage in js for some existing point calculations. In the beginning I tried hamster.running everything, but realized very quickly that if you do not use the TypedArray's, then copy-times are a HUGE bottle-neck. Meaning single-threaded node is much faster than calling hamsters.run 1000x with huge objects that are not taking advantage of TypedArray. The code I now have could be achieved with nodes cluster as well, but the goal was portable code that could theoretically run on web/rn as well (which I actually don't need yet) :P
So in the end just classic, "omg this library looks awesome, it will solve all performance problems there might be, beautiful site, fast benchmark, readable docs, etc." going to "the lib is awesome but the data structures and the way I will be processing might not be the best fit for this kind of parallelization, and the library does not auto solve all my problems" :P
But even with non-intended usage, I did drop from 20 to 18 seconds, and I'm confident that even bigger data will result in more performance improvements even with increased copying times

Here's an abstract of my Usage:

const users = [] // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
  for (let user of params.array) {
    rtn.data.push(makeScoreObjectForUser(user, userActions))
  }
}, (res) => {
  // synchronous sort + insert do db
}, hamsters.maxThreads, true)

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

Yes serialization can be a pretty big bottleneck, you might be able to get around this by manually stringifying your data and parsing it within a thread, in some browsers that provides a pretty decent performance boost. Older versions I was manually creating array buffers to pass normal arrays using transferrables but I don't think it's going to get you past the userActions array since that contains objects I'm not sure if its as easy to simply pass a reference over. I might push a release today that restores that array buffer functionality if you are willing to report back on whether or not it was a good improvement.

const users = new Float32Array([]) // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
  for (let user of params.array) {
    rtn.data.push(makeScoreObjectForUser(user, userActions))
  }
}, (res) => {
  // synchronous sort + insert do db
}, hamsters.maxThreads, true, null, null, 'ascAlpha');

You might be able to get away with having the library do your sorting for you by making use of https://github.com/austinksmith/Hamsters.js/wiki/Sorting and if it doesn't impact your input data making any of your input arrays a typed array will bring some significant performance improvements even if you are not defining dataType to change your output array.

The library has some growing pains and I'm working on improving the way it's used to make less easily parallelizable tasks more performant, on a side note you might try switching your for(let user of params.array) into a native for(var i = 0.... loop as a native for loop is an order of magnitude faster than the for you are using right now. You can also probably lower the amount of threads you are scaling across either in your function or in your initialization to a reasonable thread count like 2 or 4 and see performance improvements since you will be spending less time communicating between threads and more time executing logic within a thread which helps maximize the improvement you can see.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

Just to provide some more clarity, in your params object your input array is going to be split into smaller chunks and each thread is going to receive a small chunk of that input array to work with, the problem is that your userActions array is not going to be broken into smaller pieces and is instead going to be duplicated to every thread so your serialization costs are going to go up exponentially with the threads you add, keeping it to 2 threads or 4 threads will keep that costs down to a reasonable level.

So basically userActions -> 1 thread will be 1 serialization to the thread and 1 serialization back from the thread, increase that by how many threads you have and at 8 threads you are hitting that serialization costs 16 times in total.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

This limitation is only temporary as once JavaScript adds support for shared atomics that will allow us to completely avoid the costs of duplicating that data to every thread, it's an unfortunate side effect of just how poorly (in my opinion) they implemented the worker features in JavaScript which likely stemmed from a small minded belief system that is very against the idea of threading in general...why people like Douglas Crockford hate multithreading and made it as difficult as possible is beyond me honestly.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics

from hamsters.js.

Hizoul avatar Hizoul commented on May 22, 2024

Wow thank you very much for the very detailed explanation and performance Improvements suggestions. Very Happy to see such detailed help and I do agree with everything you say.
Serialization is a good pointer also saw the background stringify and parse methods in the docs.

Unfortunately the makeScoreObjectForUser is very deeply nested structure {a: {b: {c: 3}}}, with varying keys per user. Then all those numbers (per deepest key) are aggregated into an array at that deep key again synchronously in the callback function and these small arrays are also sorted (and then tagged with position value thats why i sort) and these are then converted to db objects and inserted. Since the last step in the callback only takes about 1 sec there's no need for improvement there but the initial filtering + calculation that I'm now splitting across 4 rather than 1 "Thread" are the thing that's slow.
I am aware of the way data splitting works, and that passing this huge userActions Array is dumb for "threading" performance.
I'm iterating fully through that array for each user, filtering relevant actions and while filtering (for speed purposes) am already calculating the number into the nested structure. (not doing that in the database itself because my queries were too complicated and the db became unresponsive for normal usage during production which was not acceptable. that's why i'm trying to do everything purely in node. There's also more than just userActions because i also search for other db collections in the makeScoreObjectForUser. Like I said my usage of the Library is horrible and not at all the way it was intended, and cloning mongodb into ram is also something that does not scale at all and will explode into my face. Just splitting up at the very beginning across my CPU's and happy I'm now reasonably fast :P)

I'm also very probably thinking way too complicated with the data structuring and all. But it works now and this is the fourth reimplementation going from:

  1. ~3h
  2. ~30min
  3. ~3min
  4. ~20sec (after a few runs thanks to the jit even goes down to ~9-13 secs

runtime for full calculation (2. was already directly db and in 3. i made all mongodb findcalls in parallel with complicated queries thats why it locked up), so for now there's no need for further improvements :/

Regarding the growing pains, I found the library quite easy to understand and easy to apply to existing code. I have worked with webworkers directly before so that probably helped. I guess some people might be confused about:

  1. scoping of the executed function
  2. why one has to aggregate the data into a single array (because webworkers don't share heap and use messaging), because one usually could directly aggregate into a custom structure via locking.

And I do agree Javascript "Threading" with webworkers is unneccessarily complicated. It has always been a hassle, and node's cluster also sucks because you're executing the same code twice and need to check for Cluster.isMaster x) (although i think i recently saw a webworker similar api for clustering in node's own apis recently).
I'm coming from a Java-Background where Threading makes sense since the Heap stays the same. But I do also think that stuff like Goroutines are better and easier to use.

And regarding atomics thanks for the pointer, I was not aware that that was coming but it sound like a nice feature, i will keep an eye out for it.

Yes the webworker API is very crippled. I think it might also be because it was invented in a time where you tried to limit the browser's capability while still enhancing available API's. Not like now where they try to enable the Web to operate as a unified Operating-System-API (e.g. Battery-Status 🤢)

from hamsters.js.

Zodiase avatar Zodiase commented on May 22, 2024

@austinksmith v4.1.3 works yay :D.

Although I noticed one minor issue: In my example provided previously, hamsters.init stated maxThreads: 3, but the thread count argument is omitted in hamsters.run and from examining the logs it appears only one thread is used. I'm not sure the purpose of configuring maxThreads in hamsters.init. In my assumption, if the thread count argument in hamsters.run is omitted, shouldn't the maxThreads be used? There's not much about maxThread in the documentation. (Doesn't really prevent me from using the library but I thought I'd report it in case it was caused by some unintended changes during these patches.)

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

@Zodiase The maximum threads you are defining is a global setting limiting the maximum number of threads the library is to make use of within the thread pool. If you define maxThreads in your hamsters.init to 4 and then later call a function using hamsters.run that invokes 16 threads the library is going to split your work into 16 smaller pieces just like it would if it had access to 16 threads to work with but it's going to pool the remaining work and keep the 4 threads active until all the work is complete.

If you do not define a thread count in your hamsters.run function the library defaults to a thread count of 1 for your function, it will not automatically scale a function for you to the maximum number of threads. That is an option exposed to allow you to change thread use on an individual function level.

This also allows the library to scale the work across any number of clients regardless of what their logical core count is, since maxThreads defaults to the logical core count detected on a machine if you do not modify it within hamsters.init there is no logical reason to tie specific functions to that value unless you as the developer want it to be that way. You can define a function to run across 200 threads if you want while limiting maxThreads to 1 and the library will pool all that work and keep that single thread completely occupied until all 200 tasks have completed.

from hamsters.js.

austinksmith avatar austinksmith commented on May 22, 2024

I'm going to close this issue ticket, thanks @Zodiase and @Hizoul

from hamsters.js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.