hit-scir / scir-training-day Goto Github PK

View Code? Open in Web Editor NEW

137.0 137.0 113.0 16.1 MB

a small training program for new crews of HIT-SCIR

Python 45.20% Jupyter Notebook 54.80%

scir-training-day's People

Contributors

Stargazers

Watchers

Forkers

carfly jiangfeng1124 snakehunt2012 antiherochen hitwsl dragonls brenir xxxwtc88 oop96000 ruoshui1126 niuox xjswmc fseasy endyul wyx931021 dontpanic92 luochuwei yingxingdechibang xiaoyangren redreamality zrh091110225 dengwc suncj xubenben zhanggang01 seedsquall eva-n27 wyjss2015 javelir ethan1214 liu946 adoni darcyyaoting goingcoder git2191866109 redfox9 huayet uniphix000 sudazzk limkim bicongwang sdzhangbo curtainsky qgzang thenamek hitercs windinwillows nlpjoe kikihiter albertwy learningprml marisuki ksboy ranpox amshb001 shengyudingli laurence-042 hseaweed shulisong spico197 paulpig lifangd cins-china roger40 wen-min xk503775229 yanyan007 shamy1997 buerkobe wp0517 cristianezsq wurentidai xhc19930714 zhangjiantong evemaximem roger0227-nlp 914295860 hjw1 jerryten sherlockhoatszx hzyang95 zhaoxingniu sudacn rp1124 jeremy8080 huangk4 longwind98 yuminmmm yabx jialeguo fcpluto richardhgl cytsinghua zhangfeiyu5610 logicsense1 sun-yi-heng pengshi27 wz9917 gaoxiaoqian2021 hmzo

scir-training-day's Issues

4-HMM练习问题

HMM那块又有个编译问题，文章中写的这句代码运行有错误，还是看不懂

$python eval_gene_tagger.py gene.key gene.dev
Could not align gold standard and predictions in line 1.
Gold standard: BACKGROUND  Prediction file:

最大正向匹配算法-测试错误

你好，在正向最大匹配分词练习里，我在文件eval.py遇到了编译错误，错误如下：

Traceback (most recent call last):
  File "eval.py", line 175, in <module>
    num_recall, num_pred, num_gold = evaluate(pred_inst, gold_inst, opt.mode)
  File "eval.py", line 34, in evaluate
    assert (pred.raw == gold.raw)
AssertionError

我的最大匹配分词代码如下：

def max_match_segment(line, dic):
    # write your code here
    # line = line.decode('utf-8')
    s = "" # pattern正常窗口
    s_f = "" # pattern前倾一位窗口
    ret = []
    tmp = set()
    for cur_word in line:  # line 为str
        s = s_f
        s_f += cur_word # s_f前倾一位
        if len(tmp) == 0: # 新词典为空，构建新词典 s_f是word子串，把word加入新词典
            tmp = set([word for word in dic if s_f in word])
        else: # 新词典不为空，遍历对比, 移除词典中不符合条件的词
            tmp = set([elem for elem in tmp if s_f in elem])

        if len(tmp) == 0: # 匹配到最大词，加入列表
            ret.append(s)
            s_f = "" + cur_word # 重置前倾
    return ret

我是mac系统，最后输出到output.dat文件中是乱码的，在decode再encode成UTF-8编码后虽然文字没问题，但是运行python eval.py --format=segment --mode=segment --eval=output.dat --gold=eval.dat依旧是同样的编译错误，不知道是什么原因呢?

增加 IBM Model I 训练项目

增加 Perceptron based Tagging 训练项目

Thank you for using Fashion-MNIST in your lecture

It means a lot to us and future machine learning scientists.

hit-scir / scir-training-day Goto Github PK

scir-training-day's People

Contributors

Stargazers

Watchers

Forkers

scir-training-day's Issues

4-HMM练习问题

最大正向匹配算法-测试错误

增加 IBM Model I 训练项目

增加 Perceptron based Tagging 训练项目

Thank you for using Fashion-MNIST in your lecture

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent