thunlp / wantwords Goto Github PK

View Code? Open in Web Editor NEW

7.0K 72.0 616.0 14.88 MB

An open-source online reverse dictionary.

Home Page: https://wantwords.net/

Python 15.01% JavaScript 44.07% HTML 40.54% CSS 0.38%

reverse-dictionary word nlp natural-language-processing

wantwords's Introduction

中|En

An Open-source Online Reverse Dictionary [link]

News

The WantWords MiniProgram has been launched. Welcome to scan the following QR code to try it!

What Is a Reverse Dictionary?

Opposite to a regular (forward) dictionary that provides definitions for query words, a reverse dictionary returns words semantically matching the query descriptions.

What Can a Reverse Dictionary Do?

Solve the tip-of-the-tongue problem, the phenomenon of failing to retrieve a word from memory
Help new language learners
Help word selection (or word dictionary) anomia patients, people who can recognize and describe an object but fail to name it due to neurological disorder

Our System

Workflow

Core Model

The core model of WantWords is based on our proposed Multi-channel Reverse Dictionary Model [paper] [code], as illustrate in the following figure.

Pre-trained Models and Data

You can download and decompress the pre-trained models and data to BASE_PATH/website_RD/ to reimplement the system.

Key Requirements

Django==2.2.5
django-cors-headers==3.5.0
numpy==1.17.2
pytorch-transformers==1.2.0
requests==2.22.0
scikit-learn==0.22.1
scipy==1.4.1
thulac==0.2.0
torch==1.2.0
urllib3==1.25.6
uWSGI==2.0.18
uwsgitop==0.11

Cite

If the code or data help you, please cite the following two papers.

@inproceedings{qi2020wantwords,
  title={WantWords: An Open-source Online Reverse Dictionary System},
  author={Qi, Fanchao and Zhang, Lei and Yang, Yanhui and Liu, Zhiyuan and Sun, Maosong},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={175--181},
  year={2020}
}

@inproceedings{zhang2020multi,
  title={Multi-channel reverse dictionary model},
  author={Zhang, Lei and Qi, Fanchao and Liu, Zhiyuan and Wang, Yasheng and Liu, Qun and Sun, Maosong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  pages={312--319},
  year={2020}
}

wantwords's People

Contributors

Stargazers

Watchers

Forkers

chatxxsc qicheb cao20 wang-404 chaoxuelei teemofanfan allensmile linuer guanchu kunsanw 3p14 playing332 derektso tim-taoxq houyonghui biyou dumpmemory lioarther ethpony venhow jeromeyoung slarkkk jdk6979 minute aistr venkyyuvy chanceyans tokisakikurumi2001 pmlaogui ivvlko tr02 awesome-jianso cmeninwa guptam hhy5277 saonam yibit huangweiboy2 sthagen laplacekorea edydkim yp314311 m97mm11 criticalpulsar rijieli tensor1024 naihetxi chaozhang35 good-man-1998 forbidden-creater pactortester ambrosex bash1350 sigeshitou dongnanyanhai robotpin 201206030 kuustudio 309746069 ringfu vincentwushichao uncle-byte klgood happensgb taoste chenpaopao 994603831 zwleagle rogmaximus laoli2046 yiqiesuiyuanba zwj6 mu-l jascenn xiaoxiaoqing1 lgk12138 cnchef zhanxueyou liangleft copymoney tttsuperpyer chunlaiqingke rain168 ubunhu mytfx voidexception slcnx cumlx yummyship git-stateless luoguanggit haibo-w colinger xzchill zuohao0907 chanphy shineshencn skyformat99 tc5141 dsw1985815

wantwords's Issues

汉-英搜索"发明"显示error: the input characters are unrecognizable.

建议改名为 WantsWords，更接近万磁王的发音，更容易记忆和传播

Missing data file from given link in README.md

Hi syheliel,
First of all, your project is amazing, but downloaded data from the given link in README.md is insufficient to run the project (e.g., missing 'wordTrans_Ch_En_Sort.json', 'wordTrans_En_Ch_Sort.json', 'word2synset_synset.txt', etc.).
It would be great if you can provide the link with such missing files.

Thank you so much.

请问在哪可以找到wd_def_for_website_xhzd+ch+xh.json

请问从这句代码中要加载的文件，在哪里找？
wd_data_ = json.load(open(BASE_DIR + 'wd_def_for_website_xhzd+ch+xh.json',encoding='utf-8'))

网页端和小程序均无法加载

网页端显示403，小程序输入词语搜索无显示结果

Bug: 直接访问“ 汉-英” 对应页面下查询时，查询结果为空，需要先一次访问其他页面

清除缓存，通过该链接直接访问该页面

点击”汉语“ 标签后，再切回”汉-英“页面

此外还有一个附带问题，多次访问后似乎会记录最常访问的标签，我用 edge 隐身模式访问 https://wantwords.net/ 时，不会直接跳转 ”汉-英“ 页面，但带缓存访问时，会直接跳转

嗯……我怀疑不少网络风月文学作品入驻了训练库，而且量还很大。

英-汉中搜索 “google” 出现页面错误，可重现。

@liu 同学的反馈： https://meta.appinn.net/t/topic/27375/

data download

The data download address may be wrong and the data cannot be downloaded.(https://cloud.tsinghua.edu.cn/...)

个人可以使用吗

我想在自己的app中加入你这个功能，可以用你们的代码吗

词库太老了

汉-英反查：
搜索“新冠病毒”，结果如下：
1.retrovirus
2.HIV
3.viral
4.cytomegalovirus
5.virus
6.herpes zoster
7.antiviral
8.acyclovir
9.H1N1
10.virology

正确结果应该是：Coronavirus

能识别的字太少了

刚才我验证了0x4e00到0x9fff的每一个汉字能否被识别（能出现相关近义词就是能识别，返回{"error": 1}就不能识别）

验证代码：（可能需要数小时）

import requests as r

result = 0
result_2 = 0
for i in range(0x4e01, 0x9fff + 1):
    t = r.get(f"https://wantwords.thunlp.org/ChineseRD/?description={chr(i)}&mode=CC")
    if i % 256 == 0:
        print(f"There are {result_2} unrecognizable characters in 256 characters({hex(i-256)}~{hex(i)}).")
        result_2 = 0
    if t.text == '{"error": 1}':
        result += 1
        result_2 += 1
print(result)

结果显示：在所有的20992个汉字中，竟然有9033个汉字不能被识别，能识别的仅有11959个！

因此，我觉得软件支持的汉字太少（CJK基本集支持度才57%，扩展区更加不行），很多不算太生僻的字都不能识别。可以考虑扩展词库了（肯定可以，有些不支持的汉字百度都能搜到）。

展示结果建议：对相关性极低的词语单独列一个 Section

以“老婆”为例，在“老公”这个词后面的结果相关性没有那么强。或许可以为非专业用户做一点展示上的优化，按“高相关性”“中相关性”“低相关性”等进行分区或者分三列展示，并说明不同的可信程度。

婆娘
女人
妻子
媳妇儿
太太
娘儿们
妻
妻室
娘子
爱人
婆姨
老伴
老婆子
夫人
老小
内助
老公
小老婆
丈母娘
*
小姨子
儿媳妇
岳母
*货
二奶
一男半女
二婚
小姑子
外遇
*
戴绿帽子
小叔子
公婆
媳妇
大老婆
嫂子
闺女
女朋友
三妻四妾
公爹
婊子
大姨子
绿帽子
老娘
前妻
独守空房
情夫
沾花惹草
舅妈
贤惠
打光棍
丈夫
拈花惹草
富婆
贱货
男方
奶子
贤妻良母
荡妇
娶
嫁人
偷情
女方
守寡
守活寡
红杏出墙
娇妻
女友
鬼混
糟糠之妻
上床
淫妇
偷人
色鬼
知冷知热
男友
后妈
鸡巴
姐夫
妞
前夫
吃醋
孙媳妇
百依百顺
少奶奶
臭钱
嫂嫂
养老送终
女婿
小姑
花心
情妇
复婚
弟媳
爸
老大不小
再婚
家务活
千依百顺
婶子

支持通过 GET 请求访问

比如支持通过 GET https://wantwords.net/GetChDefis?m={q} 访问。

因为几乎所有支持添加自定义搜索引擎的，无论是 Chrome/Firefox 插件还是欧路之类词典，都只支持通过 URL 定义，不支持 Post 及参数。

something funny

I firmly believe it will get better.

The output of the pre-trained model is different from the results on the website 模型的输出和网站上的结果不同

I am not sure if I am running the model correctly, but I have noticed a significant difference in the output results of the pre-trained model compared to the results provided on your website. Has the model been updated? If so, is there any way that I can get access to the updated model? Thank you!

我不确定是我运行模型的方式不正确，但是我发现下载的预训练模型的输出结果和您提供的网站上的结果差别很大。请问是模型更新了吗？如果是更新了，请问有有什么途径可以获得新的模型吗？谢谢！