tjunlp-lab / m3ke Goto Github PK
View Code? Open in Web Editor NEWA Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
在测试集中,answer字段为“”,在不知道正确答案的情况下,如何对模型的回复效果进行评估和测试呢?
This assessment data also appears to be in the form of multiple choice tasks similar to MMLU, but there are many detailed differences in the practice of MMLU, and these detailed differences have a significant impact on the quality outcome value. Among them, the accuracy calculation method provided by MMLU is based on the probability normalization of four options, from which the maximum probability is selected as the prediction result. However, many others have changed it to the generated form and then extracted the ABCD option from the generated answer, and the prompt setting and the extraction method of the answer will affect the final result.
So what are the evaluation code details based on which the results table is reported in your repository?
我看下载下来的数据只有测试集和验证集
请问下readme说有两万多个问题,但是我git clone下来好像并没有那么多。请问如何获取完整的数据?
Hi, how can we access the complete dataset? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.