I noticed that the implementation of the Kruskal-Wallis test is somewhat different fro

yes, this one should be tackled. It is an easy fix, just the message that needs

Hello, So, in the same order, Thanks for taking care of th

Point 1 covered in <a class="issue-link js-issue-link" data-error-text="Failed to load

Kruskal-Wallis with Posthoc Dunn implementation about statannotations HOT 20 OPEN

trevismd commented on July 19, 2024

Kruskal-Wallis with Posthoc Dunn implementation

from statannotations.

Comments (20)

sepro commented on July 19, 2024 4

Finally got around writing up a post about how to use the current version of statannotatons with scikit-posthocs. http://blog.4dcu.be/programming/2021/12/30/Posthoc-Statannotations.html if there are no short-term plans to support post-hoc tests, it might be an option to include these as an example in the documentation.

from statannotations.

trevismd commented on July 19, 2024

Hello!
Thank you for your message!

You are right on several points, here are my thoughts:

There is a mistake in the output message of the test, it is indeed for independent samples. You may submit a PR or I can fix that quickly, as you prefer.
This test, as it is currently supported by statannotations should only be applied on two samples, no more. This is not explicit in the code/documentation and maybe it should be for the less statistically-inclined users (could be in the same PR). It is basically the reason why a chi-square is not yet implemented (see discussion in #32). Statannotations would probably benefit from these "cross-multi-samples" omnibus tests, as I suggested there, and you are one to confirm the interest in this.
Of course, post-hoc tests will then be interesting (it is a relatively big project for the package).
To answer your question about implementing those, I'd it could be useful already to people who calculated an omnibus test than want to annotate using a posthoc test that. However, it would likely be more useful if available as a complete feature together with the initial test (ANOVA, KW, etc).
If I read correctly, you may also prepare a function to be used as statannotations test function, to perform all the steps in your code, such as the example here:

statannotations/statannotations/stats/StatTest.py

Line 9 in bbab408

def wilcoxon(group_data1, group_data2, verbose=1, **stats_params):

. Maybe a repository of additonal custom functions would be appreciated by the community?
Additionally, we should be mindful of managing the dependencies. For example, Dunn's test is on the roadmap for scipy's development, and it would be easier not to add scikit to our mix. That said, if the rest of the features are developed and scipy is not ready, we would of course use it. As for the rest of the codebase, statannotations code should therefore not depend much on the specific implementation.

What do you think ?

from statannotations.

sepro commented on July 19, 2024

yes, this one should be tackled. It is an easy fix, just the message that needs to be corrected. I've created a small PR correcting the message.
Thomas Wiecki wrote somewhere in a PyMC3 tutorial "Most people doing statistics aren't statisticians" (that applies to me as well). I think it would be fair to show a warning in case Kruskal-Wallis is used with more than two groups, you can expect people to use this library without an in depth knowledge of the underlying statistics.
Few things to unpack here:
- Kruskal-Wallis + Dunn is one example, ANOVA + TukeyHSD and others indeed be useful additions. These are available from scikit_posthocs.
- These cannot be implemented as the current tests in StatTest.py as the current tests expect exactly two groups to compare, not the full set of groups to be compared (which would be a pre-requisite for KW, ANOVA, ...). _get_results runs all (or all selected) pairs and stores the p-values if I see it correctly. I think an option to include post hoc tests is to adapt _get_results to have it check if a postdoc test is required, if so run this style of test (which would need to be added) if not run the current implementation.
- Despite the name scikit_posthocs does not depend on scikit-learn, it's dependencies are pandas, numpy, matplotlib, seaborn, scipy, and statsmodels . So with the exception of the latter I think every project running statsannotations will already need those.
- It depends also which direction you want to go with this project. I can also see another package wrapping this one specifically to do posthoc tests or simply adding a few examples (like the one I made) in the documentation/tutorial. In which case there is no need to include this here.

from statannotations.

trevismd commented on July 19, 2024

Hello,
So, in the same order,

Thanks for taking care of that, I'll tweak a few things and it should be merged soon.
Agreed, but it should be a generic message that can be shown for all tests with a specific attribute set.
Point 3:
1/2: I meant the pariwise posthoc tests such as Dunn's test can be performed with statannotations. I'll make a gist to show you. So if you ran KW with another tool, and then you wanted to annotate pairs on chart with Dunn's test, you can.
3: I wrote scipy but the same applies to scikit_posthocs (although much smaller), i.e. if you don't copy the code (bad idea), it's an additional dependency. (note, statsmodels is already an optional dependency to have multiple comparisons correction support, so indeed we could do the same for scikit_posthoc for some tests available only there.)
4: I definitely would like to have omnibus tests and posthoc options available here. Just have to find some time to make it right. That's why I see the examples repo as a temporary solution to make these features available sooner.

from statannotations.

trevismd commented on July 19, 2024

Point 1 covered in #40/#42, thanks!

from statannotations.

rorraro commented on July 19, 2024

Guys, molecular biologist here impressed by the huge work you all guys are doing. My project requires using the same code for repeated experimental variations and automatically annotating significances in sns plots following KW+Dunn's from 3 or more groups is exactly what I am looking for (KW +Bonferroni does not really gear well statistically, apparently...).

So really looking forward if you guys come up with an user-friendly solution, of the style ''...test='Kruskal', comparisons_correction="Dunn's"..." fro 3 or more groups, for guys like me with very limited python expertise.

Thanks again for all your effor!

from statannotations.

sepro commented on July 19, 2024

I can't take credit for the work on statannotations, but I'm happy to help where I can as this is a feature that would be used all over imho.

Have been adding some examples how to do things you often see in papers on my blog under the tag Code Nugget especially for people with some but not a lot of experience. Will add the combination of KW and Dunn and statannotations there as well probably with ANOVA and TukeyHSD too (eventually, we just had a baby so time is limited right now)

from statannotations.

rorraro commented on July 19, 2024

That is very kind of you sir. I will be eagerly waiting for you to release this Kruskal+Dunn for 3 or more with statannotation.
And my best wishes for the newborn member of the family!

from statannotations.

rorraro commented on July 19, 2024

Many thanks Sepro!
Very nice tutorial. Is a bit over my current level of PyThon, but I am sure following your steps I will be able to make it through.

Very, Very Very grateful to you. You are a legend, sir!

Looking forward the day someone will make this one-liner code package, as you point out in your conclusion.

from statannotations.

rorraro commented on July 19, 2024

Unfortunately I haven't been able to tackle the posthoc_dunn() command right... i am trying to understand why.

"LCRsHamburgDMW" is my df, "Disease" are my categories and "SUM all Bands" are the numerical values that I want to plot agains "Disease" categories.

However when I create this following your tutorial I get an error.

I attach two images of the

from statannotations.

sepro commented on July 19, 2024

The argument group_col in the function posthoc_dunn should be set to "Disease" .

The variables DiseaseHZDMW and dataHZDMW are created for the Kruskal-Wallis test, you don't need them later on (you try to reference the former but this is incorrect as you should specify a column in the original dataframe).

from statannotations.

rorraro commented on July 19, 2024

Many thanks @sepro, that worked beautifully!!

I´ll keep the pathway. Hopefully no more blocks because of my lack of expertise.

Cheers!

from statannotations.

rorraro commented on July 19, 2024

Your magic worked, Sebastian! Thanks so much!
Now I have another challenge for the future. Trimming the data to only significant results. haha

from statannotations.

rorraro commented on July 19, 2024

I think got it creating a molten_df_trim like

molten_df_trim = molten_df[molten_df["value"] < 0.05]

would that be conceptually correct?

from statannotations.

sepro commented on July 19, 2024

yes, that is the way to do this. Do mention that only significant results are shown, but all pairwise combinations were tested (e.g. in the figure caption) as the number of comparisons does affect the correction for multiple testing.

from statannotations.

rorraro commented on July 19, 2024

OH yes, good point! I´ll remember that

By the way, I am also trying to eliminate the annotation pairs that I do not need because they are comparisons that do no make sense in my study.

I have come with this solution. My understanding of python is very low, so I know this is not an elegant solution, but I think it worked just in case anyone else needs it. it comes after "molten_df" has been stablished.

#Reset index
molten_df_ri=molten_df.reset_index()
molten_df_ri

#Create a list of "non-useful" pairs and trim the list removing rows with unuseful pairs
nonuseful=[here the row numbers with the pairs that are not neededseparated by coma]
molten_df_ri_trim = molten_df_ri.drop(molten_df_ri.index[nonuseful])
molten_df_ri_trim

#Eliminate non-significant pairs
molten_df_final = molten_df_ri_trim[molten_df_ri_trim["value"] < 0.05]
molten_df_final

"molten_df_final" will go in here as provided by @sepro

pairs = [(i[1]["index"], i[1]["variable"]) for i in molten_df_final.iterrows()]
p_values = [i[1]["value"] for i in molten_df_final.iterrows()]

And here are the results

from statannotations.

sepro commented on July 19, 2024

This discussion is getting a little far off topic, could you send me an email? There are contact details on my blog.

from statannotations.

rorraro commented on July 19, 2024

Thanks Sebastian! I apologise. I thought it could be useful for newies like me.

Over and out!

from statannotations.

sepro commented on July 19, 2024

#Create a list of "non-useful" pairs and trim the list removing rows with unuseful pairs nonuseful=[here the row numbers with the pairs that are not neededseparated by coma] molten_df_ri_trim = molten_df_ri.drop(molten_df_ri.index[nonuseful]) molten_df_ri_trim

Comparisons that don't make sense should be excluded before correcting for multiple testing. So the quoted part is incorrect. There are ways to do this, though I don't think this is the place to discuss this as this is far beyond the scope of statannotations.

from statannotations.

rorraro commented on July 19, 2024

Many thanks, Sebastian! You have been extremely helpful!

from statannotations.

Kruskal-Wallis with Posthoc Dunn implementation about statannotations HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent