pawa- / lingua-ja-webidf Goto Github PK
View Code? Open in Web Editor NEWWebIDF calculator
WebIDF calculator
NAME Lingua::JA::WebIDF - WebIDF calculator SYNOPSIS use Lingua::JA::WebIDF; my $webidf = Lingua::JA::WebIDF->new(%config); print $webidf->idf("東京"); # low print $webidf->idf("スリジャヤワルダナプラコッテ"); # high DESCRIPTION Lingua::JA::WebIDF calculates WebIDF weight. WebIDF(Inverse Document Frequency) weight represents the rarity of a word on the Web. The WebIDF weight of a rare word is high. Conversely, the WebIDF weight of a common word is low. IDF is based on the intuition that a query term which occurs in many documents is not a good discriminator and should be given less weight than one which occurs in few documents. METHODS new( %config || \%config ) Creates a new Lingua::JA::WebIDF instance. The following configuration is used if you don't set %config. KEY DEFAULT VALUE ----------- --------------- idf_type 1 api 'YahooPremium' appid undef driver 'TokyoCabinet' df_file './df.tch' fetch_df 0 expires_in 365 documents 250_0000_0000 Furl_HTTP undef verbose 1 idf_type => 1 || 2 || 3 The type1 is the most commonly cited form of IDF. N idf(t_i) = log ----- (1) n_i N : the number of documents n_i: the number of documents which contain term t_i t_i: term The type2 is a simple version of the RSJ weight. N - n_i + 0.5 idf(t_i) = log ---------------- (2) n_i + 0.5 The type3 is a modification of (2). N + 0.5 idf(t_i) = log ----------- (3) n_i + 0.5 api => 'Yahoo' || 'YahooPremium' Uses the specified Web API when fetches WebDF(Document Frequency). driver => 'Storable' || 'TokyoCabinet' Fetches and saves WebDF with the specified driver. df_file => $path Saves WebDF to the specified path. In order to reduce access to Web API, please download a big df file from <http://misc.pawafuru.com/webidf/>. I recommend that you change the file depending on the type of Web API you specifies because WebDF may be different depending on it. fech_df => 0 Never fetches WebDF from the Web if 0 is specified. If the WebDF you want to know has already saved, it is used. If it is not so, returns undef. expires_in => $days If 365 is specified, WebDF expires in 365 days after fetches it. Furl_HTTP => \%option Sets the options of Furl::HTTP->new. If you want to use proxy server, you have to use this option. verbose => 1 || 0 If 1 is specified, shows verbose error messages. idf($word) Calculates the WebIDF weight of $word via df($word) method. df($word) Fetches the WebDF of $word. If the WebDF of $word has not been saved yet or has expired, fetches it by using the Web API you specified and saves it. If the WebDF of $word has expired and fetch_df is 0, the expired WebDF is used. db_open($mode) Opens the database file which is located in $path. If you use TokyoCabinet, you have to open the database file via this method before idf|df|db_close|purge method is called. $mode is 'read' or 'write'. db_close Closes the database file which is located in $path. This method is called automatically when the object is destroyed, so you might not need to use this method explicitly. purge($expires_in) Purges old data in df_file. If 365 is specified, the data which 365 days elapsed are purged. AUTHOR pawa <[email protected]> SEE ALSO Lingua::JA::TFWebIDF Lingua::JA::WebIDF::Driver::TokyoTyrant Yahoo API: <http://developer.yahoo.co.jp/> Tokyo Cabinet: <http://fallabs.com/tokyocabinet/> S. Robertson, Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60, 503-520, 2004. LICENSE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.