Coder Social home page Coder Social logo

How to configure any particular test suites to be distributed to different shards in the most efficient way? about karma-parallel HOT 9 CLOSED

joeljeske avatar joeljeske commented on April 20, 2024
How to configure any particular test suites to be distributed to different shards in the most efficient way?

from karma-parallel.

Comments (9)

esolCrusador-betclic avatar esolCrusador-betclic commented on April 20, 2024 1

I think it can be done in the way similar to [always] prefix. We could have something like [shard:1] prefix. And then to distribute by shards by: shardNumber % shardsCount.

from karma-parallel.

joeljeske avatar joeljeske commented on April 20, 2024 1

Interesting. This would be very valuable to better organize the test suites, specifically for suites with a specs that vary greatly in execution time.

I cannot think of any other use case to indicate which shard to run a spec in, except to better distribute the load. If that is the case, I wonder if a more direct approach be better suited.

Perhaps if specs could indicate the relative “weight” or “time” such that this project could attempt to evenly distribute all the specs. Perhaps all specs could have a weight of 1 by default but that could be changed with by adding the substring [weight: 10] to the description similar to [always].

It could be tricky to figure out the relative weight, but I also think it would be tricky to directly indicate which shard a spec should run in, especially as it would be configured in the source code but the number of executors is dynamic.

Thoughts?

from karma-parallel.

KhalipskiSiarhei avatar KhalipskiSiarhei commented on April 20, 2024

You are right about the initial goal of the requested feature: better tests distribution. In the ideal way all shards should be executed in more less the same time. In our project we can't achieve it: the fastest shard does work ~3min, the last one: ~5-6min. But the resently added long running tests added to the last shard and the execution time increased to 8-9min, but the fastest one is still the same with ~3min. We tried different test suites naming and more less could mitigate it but this solution is not stable because new added/refactored tests can break this hard achieved distribution. Additionally, we still have different shards execution tine from 3 min to 7 min.

Your idea with weight or time will allow to fix tests distributions issue for the developers/teams who are interested in it and have more less equal tests execution time in every shard. In my view both solutions (weight or time) will allow to do it but time approach will provide more accurate approach + we always have exact test execution time from the appropriate test reporter and it will be very easy to introduce it.

from karma-parallel.

joeljeske avatar joeljeske commented on April 20, 2024

I like this idea, but it would seem very cumbersome to require marking each spec with a weight or time. It would be prone to mis-configuration due to lack of real insight into how long each spec takes relative to each other.

It would seem difficult as well, but I wonder if it would be more useful to install a timing mechanism and save the results to a local file for use during the next run. It would be much more automatic and nice and most efficient with least amount of user configuration, but it would also likely have new results on every run, thus causing frequent diffs in the file that would be unwanted and annoying to track in git. Of course this timings file could go ignored, but then the CI would be inefficient on every run.. Perhaps if there was a lot of leeway in timings file (perhaps only changing the file if off by >= 20%), it could produce generally consistent deterministic results and not produce lots of diffs.

Also, timings would have be converted to relative weights such that they would be somewhat consistent, even when running on different hardware.

Do you have any thoughts?

from karma-parallel.

KhalipskiSiarhei avatar KhalipskiSiarhei commented on April 20, 2024

If we are speaking about any internal mechanism which will allow to do the good distribution according to the prev calculated metrics (execution time) it is the best solution. Of course, at this case it should be considered weights instead of time and it should be introduced a mechanism to convert time to weights. Regarding % of changed files to update the prev metrics - if this value (20% by default) can be configured via configuration it would be nice. If there are any fixed tests (fdescribe or fit) then tests distribution calculation/applying should be ignored. If calculated tests distribution is missed then it should be used the current approach with the configured strategy (robin or description length). If it is changed shards count then the prev calculated and available tests distribution should be missed ad well with the usage of default strategy. I think with taking into accounts all these details karma-parallel still will work perfectly as it is now. But the main question regarding such adwinced functionality -complexity and time to implement it. What do you think?

P.S. In general regarding weight vs time I think weights are more preferable solution because the same tests are run in different hardware environments (devs PCs and CI) and weights will be more accurate. I was wrong in my prev comment that time is the best metric.

from karma-parallel.

joeljeske avatar joeljeske commented on April 20, 2024

Yea, I would think that an automatic solution would be the best but I think there would be the following issues:

  • Need to track the file in git to optimize CI scenarios
  • If tracked in git, there would be potential conflicts when merging in branches with changed unit tests
  • If tracked in git, there would have to be a mechanism to remove noise from the calculated weights to avoid changing the file too frequently.
  • Potentially longer to implement.

I do think that your initial approach of adding a group name to a set of tests such that they are distributed would also work. I do have a couple of concerns about that:

  • What would be the best name for this: "id", "group", "distribution group"?
  • Currently this project only looks at the top-level describes and distributes them accordingly. That means that only a single shard executes a top-level describe. If you have a number of expensive specs inside that single top-level describe, this project will still not be able to distribute those tests. Do you think that would be a problem?

from karma-parallel.

KhalipskiSiarhei avatar KhalipskiSiarhei commented on April 20, 2024

In my view all mentioned issues with git are not critical because at this case we are getting very good dynamic distribution. But complexity and time to implement are the main questions here.

With regard to top level describe blocks - it is ok for us and it is exactly how karma-parallel does work now. We locate our long running tests in different *.spec files.

Regarding naming - I am not sure, but I like group or distribution group.

from karma-parallel.

joeljeske avatar joeljeske commented on April 20, 2024

I have added support for a custom shard strategy function that you can configure in the karma conf. It will run for each top-level describe block and it is expected to return true or false to determine if the describe block should run in the current executor instance. Please read the notes in the README for more info.

from karma-parallel.

skhalipski avatar skhalipski commented on April 20, 2024

@joeljeske thanks! We are looking at it soon and try to integrate it into our build process. I will keep you updated about the status and any issues if they are.

from karma-parallel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.