政治外宣类汉法句平行语料库及词典
一级目录 | 二级目录/二级文件 | 文件内容 |
---|---|---|
CPCF_all | CFPC_all_metadata.xlsx | 元信息汇总 |
CPCF_all | CPCF_all_bilingual_utf-8.txt | 汉法句平行双语总语料 |
CPCF_all | CPCF_all_chinese_utf-8.txt | 中文总语料 |
CPCF_all | CPCF_all_french_utf-8.txt | 法文总语料 |
CPCF_separate | 01.CPCF_bilingual_sents_txt_utf-8 | 汉法句平行双语语料 |
CPCF_separate | 02.CPCF_chinese_sents_txt_utf-8 | 中文语料 |
CPCF_separate | 03.CPCF_french_sents_txt_utf_8 | 法文语料 |
CPCF_separate | 04.CPCF_aligner_ata | 使用软件ABBYY Aligner制作的汉法句平行语料 |
CPCF_separate | 05.CPCF_metadata_txt_utf-8 | 语料的元信息 |
CPCF_separate | 06.CPCF_chinese_tokenization_jieba_txt_utf-8 | 使用jieba分词之后的中文语料 |
CPCF_separate | 07.CPCF_chinese_tokenization_spacy_txt_utf-8 | 使用spaCy分词之后的中文语料 |
CPCF_separate | 08.CPCF_french_tokenization_spacy_txt_utf-8 | 使用spaCy分词之后的法文语料 |
CPCF_separate | 09.CPCF_chinese_pos_txt_utf-8 | 词性赋码后中文语料 |
CPCF_separate | 10.CPCF_french_pos_lemma_txt_utf-8 | 词性赋码、词形还原后法文语料 |
CPCF_separate | 11.CPCF_french_semantic_txt_utf-8 | 语义标注后法文语料 |
CPCF_separate | 12.CPCF_french_sentiment_txt_utf-8 | 情感标注后法文语料 |
CPCF_separate | 13.CPCF_chinese_parser_txt_utf-8 | 句法标注后中文语料 |
CPCF_separate | 14.CPCF_french_parser_txt_utf-8 | 句法标注后法文语料 |
CPCF_separate | 15.CPCF_french_morphology_txt_utf-8 | 形态标注后法文语料 |
CPCF_separate | 16.CPCF_chinese_named_entity_txt_utf-8 | 命名实体标注后中文语料 |
CPCF_separate | 17.CPCF_french_named_entity_txt_utf-8 | 命名实体标注后法文语料 |
CPCF_statistics | CPCF_all_statistics | 中文语料总词频、法文语料总词频、中文语料句长、总TTR/sTTR统计、中文语料总词性统计、法文语料总词性统计 |
CPCF_statistics | 01.CPCF_statistics_chinese_ttr_txt_utf-8 | 中文语料TTR统计 |
CPCF_statistics | 02.CPCF_statistics_french_ttr_txt_utf-8 | 法文语料TTR统计 |
CPCF_statistics | 03.CPCF_statistics_chinese_sttr_txt_utf-8 | 中文语料sTTR统计 |
CPCF_statistics | 04.CPCF_statistics_french_sttr_txt_utf-8 | 法文语料sTTR统计 |
CPCF_statistics | 05.CPCF_statistics_chinese_freq_raw_txt_utf-8 | 未经处理的中文语料词频统计 |
CPCF_statistics | 06.CPCF_statistics_chinese_freq_without_stop_words_txt_utf-8 | 去掉停用词后的中文语料词频统计 |
CPCF_statistics | 07.CPCF_statistics_french_freq_raw_txt_utf-8 | 未经处理的法文语料词频统计 |
CPCF_statistics | 08.CPCF_statistics_french_freq_lemma_txt_utf-8 | 词形还原后的法文语料词频统计 |
CPCF_statistics | 09.CPCF_statistics_french_freq_without_stop_words_txt_utf-8 | 词形还原且去掉停用词后的法文语料词频统计 |
CPCF_statistics | 10.CPCF_statistics_chinese_pos_freq_txt_utf-8 | 中文语料词性统计 |
CPCF_statistics | 11.CPCF_statistics_french_pos_freq_txt_utf-8 | 法文语料词性统计 |
CPCF_statistics | 12.CPCF_statistics_french_semantic_freq_txt_utf-8 | 法文语料语义标注统计 |
CPCF_statistics | 13.CPCF_statistics_french_mean_length_txt_utf-8 | 法文语料平均句长统计 |
CPCF_statistics | 14.CPCF_statistics_french_sentiment_freq_txt_utf-8 | 法文语料情感标注统计 |
CPCF_statistics | 15.CPCF_statistics_chinese_named_entity_freq_txt_utf-8 | 中文语料命名实体标注统计 |
CPCF_statistics | 16.CPCF_statistics_french_named_entity_freq_txt_utf-8 | 法文语料命名实体标注统计 |
CPCF_statistics | 17.CPCF_statistics_chinese_punctuation_freq_txt_utf-8 | 中文语料标点符号统计 |
CPCF_statistics | 18.CPCF_statistics_french_punctuation_freq_txt_utf-8 | 法文语料标点符号统计 |
CPCF_python_scripts | 01.CPCF_tokenization_fr_zh.py | chinese_tokenization:使用spacy进行中文分词 |
CPCF_python_scripts | 01.CPCF_tokenization_fr_zh.py | french_tokenization:使用spacy进行法文分词 |
CPCF_python_scripts | 01.CPCF_tokenization_fr_zh.py | chinese_tokenization_jieba:使用jieba进行中文分词 |
CPCF_python_scripts | 02.CPCF_pos_tagging_fr_zh.py | get_pos_fr:法文词性赋码 |
CPCF_python_scripts | 02.CPCF_pos_tagging_fr_zh.py | get_pos_zh:中文词性赋码 |
CPCF_python_scripts | 03.CPCF_sentiment_tagging_fr.py | get_emotions_fr:法文情感标注 |
CPCF_python_scripts | 02.CPCF_pos_tagging_fr_zh.py | get_pos_zh:中文词性赋码 |
CPCF_python_scripts | 03.CPCF_sentiment_tagging_fr.py | get_emotions_res_fr:法文情感标注统计 |
CPCF_python_scripts | 04.CPCF_morphology_tagging_fr.py | get_morph_fr:法文形态标注 |
CPCF_python_scripts | 04.CPCF_morphology_tagging_fr.py | get_certain_morph_fr:获得某个特定的法文形态标注 |
CPCF_python_scripts | 05.CPCF_ner_tagging_fr_zh.py | get_ner_fr:获取法文NER和统计结果 |
CPCF_python_scripts | 05.CPCF_ner_tagging_fr_zh.py | get_ner_zh:获取中文NER和统计结果 |
CPCF_python_scripts | 06.CPCF_parser_tagging_fr_zh.py | get_parser_fr:法文句法标注 |
CPCF_python_scripts | 06.CPCF_parser_tagging_fr_zh.py | get_parser_zh:中文句法标注 |
CPCF_python_scripts | 07.CPCF_semantic_tagging_fr.py | get_semantic_fr:法文语义标注和统计结果 |
CPCF_python_scripts | 07.CPCF_semantic_tagging_fr.py | get_type_words:获取某个语义下的所有单词 |
CPCF_python_scripts | 08.CPCF_statistics_mean_length_fr.py | sent_len_in_token_mean_fr:计算法文句子的平均句长 |
CPCF_python_scripts | 09.CPCF_statistics_pos_freq_fr_zh.py | pos_freq_fr:法文词性统计 |
CPCF_python_scripts | 09.CPCF_statistics_pos_freq_fr_zh.py | pos_freq_zh:中文词性统计 |
CPCF_python_scripts | 10.CPCF_statistics_punctuation_freq_fr_zh.py | get_punctuation_fr:获取法文标点符号统计 |
CPCF_python_scripts | 10.CPCF_statistics_punctuation_freq_fr_zh.py | get_punctuation_zh:获取中文标点符号统计 |
CPCF_python_scripts | 11.CPCF_statistics_ttr_sttr_fr_zh.py | type_token_ratio_fr:法文语料TTR统计 |
CPCF_python_scripts | 11.CPCF_statistics_ttr_sttr_fr_zh.py | standardized_type_token_ratio_fr:法文语料sTTR统计 |
CPCF_python_scripts | 11.CPCF_statistics_ttr_sttr_fr_zh.py | type_token_ratio_zh:中文语料TTR统计 |
CPCF_python_scripts | 11.CPCF_statistics_ttr_sttr_fr_zh.py | standardized_type_token_ratio_zh:中文语料sTTR统计 |
CPCF_python_scripts | 12.CPCF_statistics_word_freq_fr_zh.py | word_freq_fr:获取未经处理的法文词频、词形还原后的法文词频、词形还原且去掉停用词后的法文词频 |
CPCF_python_scripts | 12.CPCF_statistics_word_freq_fr_zh.py | word_freq_zh:获取未经处理的中文词频和去掉停用词后的中文词频 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | CPCF_electronic_dictionary:制作电子词典的类 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_1_all_sentences:匹配含有某中文单词的所有语料 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_2_translation_freq:获取一种或几种法文翻译的语料并统计频数 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_3_translation_except:获取除了一种或几种法文翻译的语料并统计频数 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_4_entries:制作法文词条的义项 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_5_electronic_dictionary_normal:整合一个词条的所有义项(普通词) |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_5_electronic_dictionary_explain:整合一个词条的所有义项(需解释缩写含义的词) |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | get_6_example_sentences:获取某个义项的所有例句 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | transfer_example_sentences:将未转换的句子转换成例句格式 |
CPCF_python_scripts | 13.CPCF_electronic_dictionary.py | merge_electronic_dictionary:将所有词条及例句合并成一个文件,以完成词典的转化 |
CPCF_electronic_dictionary | CPCF_all_entries.xlsx | 首期建设的30个词条(表中打“√”的词) |
CPCF_electronic_dictionary | CPCF_all_electronic_dictionary.txt | 电子词典(文本格式) |
CPCF_electronic_dictionary | CPCF_all_electronic_dictionary.eudic | 电子词典(《法语助手》格式) |
CPCF_reference | french_semantic_lexicon.txt | 参考资料:法文语义标注词典 |
CPCF_reference | french_semantic_terms.txt | 参考资料:法文情感标注项目 |
CPCF_reference | french_sentiment_dict.txt | 参考资料:法文情感标注词典 |