Comments (23)
FYI, using [email protected]
is fine, while 4.1.0
throws the error.
from hamsters.js.
Please take a look at #23 , Node.js does not ship with a native web worker implementation which is why you are seeing the Worker is not defined
error. The library should be working in legacy fall back mode assuming you haven't modified the package.json to point to the es6 version in which case you won't see the performance benefits you're looking for since everything will still be executed on the main thread until you add a 3rd party worker implementation. I'll go through the library to ensure it's working properly in legacy mode for the next release, let me know if that clarifies things for you.
from hamsters.js.
I for some reason was under the impression that for node this library would not be using Worker but something else like child_process
for example. I guess I drank too much coffee at night...
Since I don't see Web Worker coming to node any time soon, it should be very beneficial to state this piece of dependency more clearly in the documentation. Pointing to a preferred Worker implementation in the documentation, at least for node users, would be even better.
The current readme, this part:
Environment Support
- All major browsers IE9+
- Inside of existing web workers (threads inside threads)
- Javascript shell environments
- React Native
- Node.js
reads more like saying the use of "web workers" does not apply to node.
from hamsters.js.
The library does not make use of child_processes
since those are not relevant to many places the library is executed, I didn't feel as the project maintainer it would be necessary to explain Node.js and what it does & does not support out of the box as anyone using Node.js should already have an understanding of what it comes with. That being said the library definitely does function within Node.js, and future documentation will try to cover the shortcomings that Node.js has when it comes to native threading however any third party worker implementation that adheres to the worker spec will function with Hamsters.js
As I mentioned in #23 the third party library recommendation I have at the moment will be this one https://www.npmjs.com/package/webworker-threads, just make sure you define Worker
before you define hamsters
, additionally for the sake of sanity do NOT declare hamsters as a constant as I don't want immutability to interfere with the library functionality, you can freely declare hamsters using either let
or var
but you should probably avoid const
.
from hamsters.js.
I've gone ahead and pushed out v4.1.1, please update to v4.1.1 and your issues should be resolved.
https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.1
from hamsters.js.
Same as csterritt
said here #23 (comment), I'm not able to get [email protected]
working with webworker-threads
in node.
There's an issue blocking installing webworker-threads
in node v7.7.0
so I'm using node v6.10.2
.
with these modules:
"dependencies": {
"hamsters.js": "^4.1.0",
"webworker-threads": "^0.7.11"
}
Case 1
Without using webworker-threads
, it obviously complains about Worker not defined.
Case 2
With require("webworker-threads");
, still says Worker not defined.
Case 3
With global.Worker = require("webworker-threads").Worker;
, the main process runs non-stop and won't end as long as I've waited. Definitely longer than 1 minute and at which point I killed it. The job should finish in less than 1ms sequentially...
Do you know what might be going on? Since you are claiming it supports node, the library must had worked for you. What combination of packages and platforms did you successfully achieved parallelism?
from hamsters.js.
Please create a new ticket for the new issue you're encountering, v4.1.1 solves the problem related to this ticket.
from hamsters.js.
Reopening this, I mixed it up with the other one you had open.
from hamsters.js.
I'm implementing a fix that will be released in 4.1.2 which will allow this to work with any 3rd party implementation, the solution i had previously isn't working currently. If you use 4.1.1 for now you will be able to write your logic and have it in place for when 4.1.2 drops.
from hamsters.js.
@Zodiase take a look at this release please and let me know if you encounter more issues.
https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.2
from hamsters.js.
@Zodiase this issue will close within 48 hours without a response.
from hamsters.js.
Process still won't terminate or yield any result. Tested with node v6.10.2
and this project setup: https://gist.github.com/Zodiase/3c8b9d9862ec3ac9c486ec75b99f341a
from hamsters.js.
@Zodiase I've gone ahead and used your example and resolved the issue you are mentioning, it looks like a change I made back in v3.9.* caused the recent incompatibilities. This is now resolved and I've successfully tested your example using this release. Please give v4.1.3 a try and report back, thanks.
https://github.com/austinksmith/Hamsters.js/releases/tag/v4.1.3
from hamsters.js.
(sorry for barging in although im not Zodiase I did encounter the same problem with previous builds, but since this Issue was already there, did not report a duplicate)
I can confirm that v4.1.3 fixed the Issue of not being able to use node webworker implementations.
I confirmed this using Zodiases Example and with custom code.
webworker-threads works, but I ended up using tiny-workers, which works too, because it supports require() statements.
tested on linux with node v7.10
from hamsters.js.
That's great to hear @Hizoul as I had only tested it with web-workerthreads so far, I think the implementation should be fairly consistent across the board. In the future don't worry about commenting on an open issue, the more people able to shed light on an issue the better. If you don't mind me asking what is your current use case for the library and how is the performance you're seeing in Node? Node is the least tested and performance profiled platform for the library as support has been a bit of a hassle to get right.
from hamsters.js.
Well my use case is rather an edge case for now, just playing around with isomorphic multi-core usage in js for some existing point calculations. In the beginning I tried hamster.running everything, but realized very quickly that if you do not use the TypedArray's, then copy-times are a HUGE bottle-neck. Meaning single-threaded node is much faster than calling hamsters.run 1000x with huge objects that are not taking advantage of TypedArray. The code I now have could be achieved with nodes cluster as well, but the goal was portable code that could theoretically run on web/rn as well (which I actually don't need yet) :P
So in the end just classic, "omg this library looks awesome, it will solve all performance problems there might be, beautiful site, fast benchmark, readable docs, etc." going to "the lib is awesome but the data structures and the way I will be processing might not be the best fit for this kind of parallelization, and the library does not auto solve all my problems" :P
But even with non-intended usage, I did drop from 20 to 18 seconds, and I'm confident that even bigger data will result in more performance improvements even with increased copying times
Here's an abstract of my Usage:
const users = [] // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
for (let user of params.array) {
rtn.data.push(makeScoreObjectForUser(user, userActions))
}
}, (res) => {
// synchronous sort + insert do db
}, hamsters.maxThreads, true)
from hamsters.js.
Yes serialization can be a pretty big bottleneck, you might be able to get around this by manually stringifying your data and parsing it within a thread, in some browsers that provides a pretty decent performance boost. Older versions I was manually creating array buffers to pass normal arrays using transferrables but I don't think it's going to get you past the userActions array since that contains objects I'm not sure if its as easy to simply pass a reference over. I might push a release today that restores that array buffer functionality if you are willing to report back on whether or not it was a good improvement.
const users = new Float32Array([]) // ~2k users
const userActions = [] // ~20k actionobjects
hamsters.run({array: users, userActions}, () => {
for (let user of params.array) {
rtn.data.push(makeScoreObjectForUser(user, userActions))
}
}, (res) => {
// synchronous sort + insert do db
}, hamsters.maxThreads, true, null, null, 'ascAlpha');
You might be able to get away with having the library do your sorting for you by making use of https://github.com/austinksmith/Hamsters.js/wiki/Sorting and if it doesn't impact your input data making any of your input arrays a typed array will bring some significant performance improvements even if you are not defining dataType
to change your output array.
The library has some growing pains and I'm working on improving the way it's used to make less easily parallelizable tasks more performant, on a side note you might try switching your for(let user of params.array) into a native for(var i = 0.... loop as a native for loop is an order of magnitude faster than the for you are using right now. You can also probably lower the amount of threads you are scaling across either in your function or in your initialization to a reasonable thread count like 2 or 4 and see performance improvements since you will be spending less time communicating between threads and more time executing logic within a thread which helps maximize the improvement you can see.
from hamsters.js.
Just to provide some more clarity, in your params object your input array is going to be split into smaller chunks and each thread is going to receive a small chunk of that input array to work with, the problem is that your userActions array is not going to be broken into smaller pieces and is instead going to be duplicated to every thread so your serialization costs are going to go up exponentially with the threads you add, keeping it to 2 threads or 4 threads will keep that costs down to a reasonable level.
So basically userActions -> 1 thread will be 1 serialization to the thread and 1 serialization back from the thread, increase that by how many threads you have and at 8 threads you are hitting that serialization costs 16 times in total.
from hamsters.js.
This limitation is only temporary as once JavaScript adds support for shared atomics that will allow us to completely avoid the costs of duplicating that data to every thread, it's an unfortunate side effect of just how poorly (in my opinion) they implemented the worker features in JavaScript which likely stemmed from a small minded belief system that is very against the idea of threading in general...why people like Douglas Crockford hate multithreading and made it as difficult as possible is beyond me honestly.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics
from hamsters.js.
Wow thank you very much for the very detailed explanation and performance Improvements suggestions. Very Happy to see such detailed help and I do agree with everything you say.
Serialization is a good pointer also saw the background stringify and parse methods in the docs.
Unfortunately the makeScoreObjectForUser is very deeply nested structure {a: {b: {c: 3}}}, with varying keys per user. Then all those numbers (per deepest key) are aggregated into an array at that deep key again synchronously in the callback function and these small arrays are also sorted (and then tagged with position value thats why i sort) and these are then converted to db objects and inserted. Since the last step in the callback only takes about 1 sec there's no need for improvement there but the initial filtering + calculation that I'm now splitting across 4 rather than 1 "Thread" are the thing that's slow.
I am aware of the way data splitting works, and that passing this huge userActions Array is dumb for "threading" performance.
I'm iterating fully through that array for each user, filtering relevant actions and while filtering (for speed purposes) am already calculating the number into the nested structure. (not doing that in the database itself because my queries were too complicated and the db became unresponsive for normal usage during production which was not acceptable. that's why i'm trying to do everything purely in node. There's also more than just userActions because i also search for other db collections in the makeScoreObjectForUser. Like I said my usage of the Library is horrible and not at all the way it was intended, and cloning mongodb into ram is also something that does not scale at all and will explode into my face. Just splitting up at the very beginning across my CPU's and happy I'm now reasonably fast :P)
I'm also very probably thinking way too complicated with the data structuring and all. But it works now and this is the fourth reimplementation going from:
- ~3h
- ~30min
- ~3min
- ~20sec (after a few runs thanks to the jit even goes down to ~9-13 secs
runtime for full calculation (2. was already directly db and in 3. i made all mongodb findcalls in parallel with complicated queries thats why it locked up), so for now there's no need for further improvements :/
Regarding the growing pains, I found the library quite easy to understand and easy to apply to existing code. I have worked with webworkers directly before so that probably helped. I guess some people might be confused about:
- scoping of the executed function
- why one has to aggregate the data into a single array (because webworkers don't share heap and use messaging), because one usually could directly aggregate into a custom structure via locking.
And I do agree Javascript "Threading" with webworkers is unneccessarily complicated. It has always been a hassle, and node's cluster also sucks because you're executing the same code twice and need to check for Cluster.isMaster x) (although i think i recently saw a webworker similar api for clustering in node's own apis recently).
I'm coming from a Java-Background where Threading makes sense since the Heap stays the same. But I do also think that stuff like Goroutines are better and easier to use.
And regarding atomics thanks for the pointer, I was not aware that that was coming but it sound like a nice feature, i will keep an eye out for it.
Yes the webworker API is very crippled. I think it might also be because it was invented in a time where you tried to limit the browser's capability while still enhancing available API's. Not like now where they try to enable the Web to operate as a unified Operating-System-API (e.g. Battery-Status 🤢)
from hamsters.js.
@austinksmith v4.1.3
works yay :D.
Although I noticed one minor issue: In my example provided previously, hamsters.init
stated maxThreads: 3
, but the thread count argument is omitted in hamsters.run
and from examining the logs it appears only one thread is used. I'm not sure the purpose of configuring maxThreads
in hamsters.init
. In my assumption, if the thread count argument in hamsters.run
is omitted, shouldn't the maxThreads
be used? There's not much about maxThread
in the documentation. (Doesn't really prevent me from using the library but I thought I'd report it in case it was caused by some unintended changes during these patches.)
from hamsters.js.
@Zodiase The maximum threads you are defining is a global setting limiting the maximum number of threads the library is to make use of within the thread pool. If you define maxThreads
in your hamsters.init
to 4 and then later call a function using hamsters.run
that invokes 16 threads the library is going to split your work into 16 smaller pieces just like it would if it had access to 16 threads to work with but it's going to pool the remaining work and keep the 4 threads active until all the work is complete.
If you do not define a thread count in your hamsters.run
function the library defaults to a thread count of 1 for your function, it will not automatically scale a function for you to the maximum number of threads. That is an option exposed to allow you to change thread use on an individual function level.
This also allows the library to scale the work across any number of clients regardless of what their logical core count is, since maxThreads
defaults to the logical core count detected on a machine if you do not modify it within hamsters.init
there is no logical reason to tie specific functions to that value unless you as the developer want it to be that way. You can define a function to run across 200 threads if you want while limiting maxThreads
to 1 and the library will pool all that work and keep that single thread completely occupied until all 200 tasks have completed.
from hamsters.js.
I'm going to close this issue ticket, thanks @Zodiase and @Hizoul
from hamsters.js.
Related Issues (20)
- Uncaught ReferenceError: adTestentListener is not defined HOT 1
- Tests not pasing HOT 1
- React-native android side, typeerror using hamsters and react-native-threads HOT 18
- how to migrate from v4 to v5 HOT 18
- Promises error HOT 15
- rtn data is pushed on different index than 0 HOT 19
- Windows 10 | Node.js - no output when using hamsters.js + webworker-threads package HOT 13
- The examples in the website don't have the source code of the example HOT 2
- Cannot read property 'buffer' of undefined." HOT 4
- im not getting correct results from the rtn.data object in hamsterjs v5.1.3 HOT 21
- Pointers for use with ReactJS HOT 2
- react-native-threads not working. HOT 2
- react native workers HOT 4
- ERROR: while initiating hamsters.js [React Native] HOT 5
- Your country is blocked from accessing this content. HOT 1
- http://www.hamsters.io/ offline HOT 2
- Anyone use hamesters.js successfully in Chrome Browser ? HOT 3
- Uncaught TypeError: hamsters.run is not a function HOT 3
- rtn undefined HOT 4
- react-native-hamsters/example - undefined is not an object (evaluating 'message.data.data') HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hamsters.js.