This repository contains Perl code I no longer use, including
- a group of Dirichlet/Pitman-Yor processes,
- a character-bigram-based zerogram word model, and
- unigram/bigram word models with token-based, block and type-based sampling.
The following CPAN modules are required:
- Math::GSL
- Math::Cephes
- Regexp::Assemble
- Carp::Assert
% perl -Ilib scripts/sample-token.pl --seed=1 --type=Dirichlet --input=samples/alice.unseg --iter=100 --nested --debug --randInit=0.1