Coder Social home page Coder Social logo

association-words-with-wiki's Introduction

關聯詞搜尋

  • 使用維基百科尋找關聯詞

使用

  • 下載維基資料包放置在wikidata底下

維基資料包下載

  • 引入KeyMatch
from KeyMatch import KeyMatch
km = KeyMatch()
  • 切割檔案,進行預處理(只有在第一次使用需要)
km.split(jsonDataPath = jsonFile, blackFlags = blackFlags)
# jsonDataPath:str 欲切割的wikijson file
# blackFlags:list 欲過濾的詞性
# 詞性列表 https://gist.github.com/luw2007/6016931

也有切割好的檔案,放置在splitdata底下即可使用

  • 進行關鍵字配對
km.match(key = key, blackWords = blackWords)
# key:str 關鍵字
# blackWords:list 過濾字
  • 取得結果
km.getTop(n)
# n:int 筆數

範例輸出

# key = 華碩

[('電腦', 89), ('宏碁', 40), ('ZenFone', 40), ('臺灣', 26), ('ASUS', 24), ('廠商', 23), ('公司', 21), ('產品', 20), ('手機', 18), ('科技', 17), ('系列', 17), ('PC', 16), ('有限公司', 16), ('金獎', 16), ('銀獎', 16), ('Eee', 15), ('臺', 14), ('集團', 13), ('Pad', 13), ('電', 13), ('Nexus', 12), ('微星科技', 11), ('鄭州', 11), ('佳作', 11), ('製造業', 11), ('黃靜', 10), ('技嘉科技', 10), ('金控', 10), ('品牌', 9), ('董事長', 9), ('廣告', 9), ('企業', 9), ('友達光電', 9), ('主機板', 8), ('平板電腦', 8), ('代工', 8), ('中華電信', 8), ('宏達國際電子', 8), ('華科技', 8), ('銅獎', 8)]
# key = 蔡依林

[('歌曲', 256), ('專輯', 242), ('巡迴演唱', 192), ('上海站', 164), ('臺灣歌手', 154), ('音樂', 116), ('臺灣', 111), ('演唱會', 88), ('歌手', 80), ('表演', 74), ('MV', 64), ('zh', 60), ('單曲', 59), ('排行榜', 57), ('嘉賓', 51), ('錄音室專輯', 50), ('唱片', 46), ('演唱', 45), ('羅志祥', 42), ('場次', 40), ('周杰倫', 37), ('Play', 34), ('上海', 34), ('藝人', 33), ('張惠妹', 31), ('唱片公司', 30), ('媒體', 30), ('舞蹈', 30), ('冠軍', 29), ('收錄於', 29), ('臺北', 28), ('電影', 27), ('E', 26), ('級', 26), ('S', 25), ('女歌手', 25), ('舞曲', 25), ('香港', 24), ('美國', 23), ('安可', 23)]

ENV require

  • python 3.8+

association-words-with-wiki's People

Contributors

p208p2002 avatar

Watchers

 avatar  avatar

Forkers

zhiqiao761

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.