vivekjoshy / openskill.py Goto Github PK
View Code? Open in Web Editor NEWMultiplayer Rating System. No Friction.
Home Page: https://openskill.me
License: MIT License
Multiplayer Rating System. No Friction.
Home Page: https://openskill.me
License: MIT License
Raising as part of JOSS review openjournals/joss-reviews/issues/5901
As the data files are stored on Git LFS and the free LFS quota for this account seems to be regularly exceeded (see openjournals/joss-reviews#5901 (comment)) it would be useful to document an alternative approach for accessing the data, ideally one which uses an open data repository which doesn't require subscribing to an account to download. While the datasets have been made available on Kaggle (openjournals/joss-reviews#5901 (comment)) this is not currently documented in this repository and a Kaggle account is required to download. An open research data repository / archive like Zenodo would seem to be a better fit with JOSS requirement that the software should be stored in a repository that can be cloned without registration. While I don't think this strictly extends to data associated with the software, from a FAIR data and reproducibility perspective a service like Zenodo is much better than Kaggle.
A potentially even nicer approach would be to use a tool like pooch to automate getting the data from a remote repository as part of running the benchmarks.
I'm using openskill for a game where sometimes we have teams of, for example, 6 vs 7. When making teams we put the better players on the team with the lesser amount of players. Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.
Does anyone have any insight on how to tune a parameter that makes team disparity less important?
Thanks!
Is your feature request related to a problem? Please describe.
Currently, it's not explicitly clear how to determine convergence in the Bradley-Terry model implemented in openskill.py. Users might struggle to ascertain whether the model has converged, leading to uncertainty in the validity of the results.
Describe the solution you'd like
I propose adding documentation or guidance on determining convergence criteria for the Bradley-Terry model in openskill.py. This could include recommended thresholds or methods for assessing convergence, such as examining parameter estimates or likelihood changes over iterations.
Describe alternatives you've considered
One alternative is leaving the determination of convergence criteria to individual users, which could lead to inconsistency and confusion. Another option is relying solely on default convergence settings, but this might not be suitable for all use cases and datasets.
Additional context
Convergence is a crucial aspect of model fitting, particularly in iterative algorithms like those used in the Bradley-Terry model. Providing clear guidelines on determining convergence will enhance the usability and reliability of openskill.py for researchers and practitioners utilizing the Bradley-Terry model for skill estimation.
Please consider the following in drafting the software manuscript:
This issue is related to this submission: openjournals/joss-reviews#5901
Describe the bug
mu=0 results in mu=25. Same goes for sigma and potentially other optional parameters that I did not investigate.
To Reproduce
To reproduce simply do:
model = PlackettLuce()
player = model.rating(mu=0,sigma=1)
print(player.mu) # prints 25.0 but expected is 0.0
Expected behavior
When mu is not None
then take whatever the user provides.
Additional context
This can lead to unexpected behaviour AND wrong predictions. The issue happens in the wrong initialization of Rating
objects.
# Replace this:
return self.PlackettLuceRating(mu or self.mu, sigma or self.sigma, name)
# With something more like this:
if self.mu is None:
return self.PlackettLuceRating(mu,...)
else:
return self.PlackettLuceRating(self.mu, ...)
# and the same for other parameters
Raising as part of JOSS review openjournals/joss-reviews#5901
The documentation / top-level README need a clearer statement of need / what problems the software is designed to solve and who the intended audience is.
The current summary at the begining of the README
A faster and open license asymmetric multi-team, multiplayer rating system comparable to TrueSkill.
assumes knowledge of what a multiplayer rating system is and what TrueSkill is. There is also no specific mention of online gaming communities, which reading between the lines, seems to be the primary target audience.
The summary in the documentation index page
This advanced rating system is faster, accurate, and flexible with an open license. It supports multiple teams, factions, and predicts outcomes. Elevate your gaming experience with OpenSkill - the superior alternative to TrueSkill.
similarly needs a bit more context and explanation.
Describe the bug
predict_win
and predict_rank
do not work properly on 3x3x3 games
To Reproduce
Step 1:
from openskill.models import PlackettLuce
model = PlackettLuce()
p1 = model.rating(mu=34, sigma=0.25)
p2 = model.rating(mu=34, sigma=0.25)
p3 = model.rating(mu=34, sigma=0.25)
p4 = model.rating(mu=32, sigma=0.5)
p5 = model.rating(mu=32, sigma=0.5)
p6 = model.rating(mu=32, sigma=0.5)
p7 = model.rating(mu=30, sigma=1)
p8 = model.rating(mu=30, sigma=1)
p9 = model.rating(mu=30, sigma=1)
team1, team2, team3 = [p1, p2, p3], [p4, p5, p6], [p7, p8, p9]
r = model.predict_win([team1, team2, team3])
print(r)
Results in:
[0.439077174955099, 0.3330210112526078, 0.2279018137922932]
Step 2, change p9
mu to 40
:
from openskill.models import PlackettLuce
model = PlackettLuce()
p1 = model.rating(mu=34, sigma=0.25)
p2 = model.rating(mu=34, sigma=0.25)
p3 = model.rating(mu=34, sigma=0.25)
p4 = model.rating(mu=32, sigma=0.5)
p5 = model.rating(mu=32, sigma=0.5)
p6 = model.rating(mu=32, sigma=0.5)
p7 = model.rating(mu=30, sigma=1)
p8 = model.rating(mu=30, sigma=1)
p9 = model.rating(mu=40, sigma=1)
team1, team2, team3 = [p1, p2, p3], [p4, p5, p6], [p7, p8, p9]
print([team1, team2, team3])
r = model.predict_win([team1, team2, team3])
print(r)
Results are the same:
[0.439077174955099, 0.3330210112526078, 0.2279018137922932]
Expected behavior
After p9
mu increase team3 are expected to have a bigger chance of victory
Platform Information
Additional context
https://github.com/OpenDebates/openskill.py/blob/f76df19c3e388f31050c988a0059367bd1dadc76/openskill/models/weng_lin/bradley_terry_full.py#L765
I have no idea what is going on here, and why it selects rating only of the first player, but it just does not work as intended
Looks like my poor spelling propagated here. It should be Thurstone Mosteller.
Raising as part of JOSS review openjournals/joss-reviews/issues/5901
Ideally you should some clear and easily findable guidelines for how to report issues and seek support with the software.
This section in the user manual page in the documentation
already partially fits the bill, but
The Rating
objects currently can be mixed and used between models. This may or may not make sense depending on the models under consideration. It is definitely erring on the side of caution to disallow this. It allows us to have different values (instead of mu
and sigma
for different models (perhaps Glicko? Standard Elo?).
Note: Added a spoiler to not cause bias from my recommendation.
Example code:
from openskill.models import BradleyTerryFull
# Initialize Rating System
system = BradleyTerryFull(tau=0.3, constrain_sigma=True)
# Team 1
a1 = system.Rating()
a2 = system.Rating(mu=32.444, sigma=5.123)
# Team 2
b1 = system.Rating(43.381, 2.421)
b2 = system.Rating(mu=25.188, sigma=6.211)
# Rate with BradleyTerryFull
[[x1, x2], [y1, y2]] = system.rate([[a1, a2], [b1, b2]]) # No need to pass tau and so on again.
All functions that can be, will be converted to methods under the. All constants in the methods can be manually overridden as normal. A variable called custom
will be set to True
if models are mixed or constants are changed within a system after ratings have taken place.
Rating
objects will contain a public attribute (Rating.model
that references the model with which they were created in mind. So, if the user tries to use it in any function that takes those objects, it will produce an error.
If there are no active objections from users or any other implementation developers by the time I get to this issue in the Project Release Board (which should be a while still), then it will be shipped in the next major release.
If someone has another API idea, you are also free to suggest it in this issue.
Mentions: @philihp
Relevant Issues: philihp/openskill.js#231
Going to add contributors to the CONTRIBUTORS.md file. If there is a problem with credit, mention it here.
When you enter scores into rate(), the difference between the scores have no effect on the rating - meaning: rate([team1,team2],score(1,0)) == rate([team1,team2],score(100,0))
is true
.
They have exactly the same rating effect on team1 and team2.
I don't know if it is mathematical possible and how it would look like. But it would be great if the difference could be somehow factored into the calculation, as it is (if your game has a score) quite an important datapoint for skill evaluation.
First of all, congrats and thanks for the great repo!
In a scenario that Player A has 2x the rating of Player B, the predicted win probability is 60% vs 40%. This seems strange.
players = [ [Rating(50)], [Rating(25)] ]
predict_win(teams=players)
[ 1 ]: [0.6002914159316424, 0.39970858406835763]
If I use this function implementation, I get 97% vs 3% which sounds more reasonable to me.
Maybe the predict_win
function has some flaw?
If I understand it correctly, those two functions seem to perform calculations using equations numbered (65) in the paper. However, those equations seems to be specific to Thurstone-Mosteller model and as far as I can tell, the proper way to calculate probabilities for Bradley-Terry model would be to use equations (48) and (51) (also seen as p_iq in equation (49)). Is this intended? Or am I misunderstanding either the paper or the code of these functions?
Describe the bug
Player rating parameters mu and sigma cant be set to 0.0, they are overwritten by default values 25 and 8.333.
The issue is file openskill/rate.py on rows 28, 29:
self.mu = mu if mu else default_mu(**options)
self.sigma = sigma if sigma else default_sigma(**options)
Conditions mu if mu
and sigma if sigma
return False
when mu or sigma is set to 0.
Also, one cosmetic thing, you have the wrong typing in openskill/constants.py for functions z
and mu
. When the default z
or mu
value is used you are returning 3 or 25 (int) instead of float.
To Reproduce
from openskill import Rating
Rating(mu=0.0, sigma=5)
Rating(mu=25, sigma=0.0)
Rating(mu=0.0, sigma=0.0)
Expected behavior
It should be possible to set them to a value of 0.0.
Possible solution
if isinstance(mu, float) or isinstance(mu, int):
self.mu = mu
else:
self.mu = default_mu(**options)
Platform Information
Additional context
Is your feature request related to a problem? Please describe.
When a model is rewritten or improved, due to changes internally, the expected API outputs will change significantly. Re-entering the correct values into the test suite to verify determinism is wasted effort on the developer's part long term.
Describe the solution you'd like
Use Hypothesis to generate tests and pytest parameterization.
Tasks:
Apologies for being a noob - it seems that the score margin doesn't have any affect on how the ratings are updated and it's effectively the same as the rank option just that the higher score is better. If that is true, is there a way to consider the score margin for games where it is important?
hello,
congrats for your work! I was wondering if there is a predict function for the rankings?
greetings
This is obviously a very difficult problem that relies on a few parts being successful.
No: | Dependency Changes | Strict Typing | Implementation | OS | Performance Gains | Implementation Difficulty |
---|---|---|---|---|---|---|
1 | None | Possible | CPython, PyPy | Windows, Ubuntu, MacOS | Insignificant | Easy |
2 | Numpy | Partial | CPython | Windows, Ubuntu, MacOS | Significant | Difficult |
3 | Scipy | Not Possible | Cpython | Windows, Ubuntu, MacOS | Significant | Normal |
4 | Conditional Numpy | Partial | CPyhton, PyPy | Windows, Ubuntu, MacOS | Significant | Very Difficult |
Option 4 is ideal for best compatibility and performance but is a huge undertaking at the end of which strict typing may still end up being not possible.
Regardless of which option is being pursued, these tasks need to be completed first:
Is your feature request related to a problem? Please describe.
Creating models of tournaments is hard since you have to parse the data using another library (depending on the format) and then pass everything into rate
and predict
manually. It's a lot of effort to predict the entire outcome of say, "2022 FIFA World Cup" easily.
Describe the solution you'd like
it would be nice if there was a tournament class of some kind that allowed us to pass in rounds which themselves contained matches. Then using an exhaustive approach predict winners and move them along each bracket/round. Especially now that #74 has landed it would be easier to predict whole matches and in turn tournaments.
The classes should be customizable to allow our own logic. For instance, allow using the munkres
algorithm and other such methods.
Describe alternatives you've considered
I don't know any other libraries that do this already.
Would it be possible to add some sort of system to weight player performance like how the official trueskill module does it? I'm trying to create a system that weights players overall performance compared to their teams to get a more accurate skill rating.
Is your feature request related to a problem? Please describe.
This is a request to add a section to the documentation on how matches should be arranged.
Do the models make any assumptions on how matches should be arranged? For example should matches avoid playing the same players or teams back to back, or should matches avoid players arranging their own opponents? Should matches always try to balance teams based on the latest ratings?
In my use case I plan on using the rating algorithm for in person matches, where the player pool to make matches at any given time would be less than 20, with random teams.
It would be great if there was documentation on guidance on how to arrange matches to make rating convergence faster.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.