Coder Social home page Coder Social logo

cpcf's Introduction

CPCF

政治外宣类汉法句平行语料库及词典

一级目录 二级目录/二级文件 文件内容
CPCF_all CFPC_all_metadata.xlsx 元信息汇总
CPCF_all CPCF_all_bilingual_utf-8.txt 汉法句平行双语总语料
CPCF_all CPCF_all_chinese_utf-8.txt 中文总语料
CPCF_all CPCF_all_french_utf-8.txt 法文总语料
CPCF_separate 01.CPCF_bilingual_sents_txt_utf-8 汉法句平行双语语料
CPCF_separate 02.CPCF_chinese_sents_txt_utf-8 中文语料
CPCF_separate 03.CPCF_french_sents_txt_utf_8 法文语料
CPCF_separate 04.CPCF_aligner_ata 使用软件ABBYY Aligner制作的汉法句平行语料
CPCF_separate 05.CPCF_metadata_txt_utf-8 语料的元信息
CPCF_separate 06.CPCF_chinese_tokenization_jieba_txt_utf-8 使用jieba分词之后的中文语料
CPCF_separate 07.CPCF_chinese_tokenization_spacy_txt_utf-8 使用spaCy分词之后的中文语料
CPCF_separate 08.CPCF_french_tokenization_spacy_txt_utf-8 使用spaCy分词之后的法文语料
CPCF_separate 09.CPCF_chinese_pos_txt_utf-8 词性赋码后中文语料
CPCF_separate 10.CPCF_french_pos_lemma_txt_utf-8 词性赋码、词形还原后法文语料
CPCF_separate 11.CPCF_french_semantic_txt_utf-8 语义标注后法文语料
CPCF_separate 12.CPCF_french_sentiment_txt_utf-8 情感标注后法文语料
CPCF_separate 13.CPCF_chinese_parser_txt_utf-8 句法标注后中文语料
CPCF_separate 14.CPCF_french_parser_txt_utf-8 句法标注后法文语料
CPCF_separate 15.CPCF_french_morphology_txt_utf-8 形态标注后法文语料
CPCF_separate 16.CPCF_chinese_named_entity_txt_utf-8 命名实体标注后中文语料
CPCF_separate 17.CPCF_french_named_entity_txt_utf-8 命名实体标注后法文语料
CPCF_statistics CPCF_all_statistics 中文语料总词频、法文语料总词频、中文语料句长、总TTR/sTTR统计、中文语料总词性统计、法文语料总词性统计
CPCF_statistics 01.CPCF_statistics_chinese_ttr_txt_utf-8 中文语料TTR统计
CPCF_statistics 02.CPCF_statistics_french_ttr_txt_utf-8 法文语料TTR统计
CPCF_statistics 03.CPCF_statistics_chinese_sttr_txt_utf-8 中文语料sTTR统计
CPCF_statistics 04.CPCF_statistics_french_sttr_txt_utf-8 法文语料sTTR统计
CPCF_statistics 05.CPCF_statistics_chinese_freq_raw_txt_utf-8 未经处理的中文语料词频统计
CPCF_statistics 06.CPCF_statistics_chinese_freq_without_stop_words_txt_utf-8 去掉停用词后的中文语料词频统计
CPCF_statistics 07.CPCF_statistics_french_freq_raw_txt_utf-8 未经处理的法文语料词频统计
CPCF_statistics 08.CPCF_statistics_french_freq_lemma_txt_utf-8 词形还原后的法文语料词频统计
CPCF_statistics 09.CPCF_statistics_french_freq_without_stop_words_txt_utf-8 词形还原且去掉停用词后的法文语料词频统计
CPCF_statistics 10.CPCF_statistics_chinese_pos_freq_txt_utf-8 中文语料词性统计
CPCF_statistics 11.CPCF_statistics_french_pos_freq_txt_utf-8 法文语料词性统计
CPCF_statistics 12.CPCF_statistics_french_semantic_freq_txt_utf-8 法文语料语义标注统计
CPCF_statistics 13.CPCF_statistics_french_mean_length_txt_utf-8 法文语料平均句长统计
CPCF_statistics 14.CPCF_statistics_french_sentiment_freq_txt_utf-8 法文语料情感标注统计
CPCF_statistics 15.CPCF_statistics_chinese_named_entity_freq_txt_utf-8 中文语料命名实体标注统计
CPCF_statistics 16.CPCF_statistics_french_named_entity_freq_txt_utf-8 法文语料命名实体标注统计
CPCF_statistics 17.CPCF_statistics_chinese_punctuation_freq_txt_utf-8 中文语料标点符号统计
CPCF_statistics 18.CPCF_statistics_french_punctuation_freq_txt_utf-8 法文语料标点符号统计
CPCF_python_scripts 01.CPCF_tokenization_fr_zh.py chinese_tokenization:使用spacy进行中文分词
CPCF_python_scripts 01.CPCF_tokenization_fr_zh.py french_tokenization:使用spacy进行法文分词
CPCF_python_scripts 01.CPCF_tokenization_fr_zh.py chinese_tokenization_jieba:使用jieba进行中文分词
CPCF_python_scripts 02.CPCF_pos_tagging_fr_zh.py get_pos_fr:法文词性赋码
CPCF_python_scripts 02.CPCF_pos_tagging_fr_zh.py get_pos_zh:中文词性赋码
CPCF_python_scripts 03.CPCF_sentiment_tagging_fr.py get_emotions_fr:法文情感标注
CPCF_python_scripts 02.CPCF_pos_tagging_fr_zh.py get_pos_zh:中文词性赋码
CPCF_python_scripts 03.CPCF_sentiment_tagging_fr.py get_emotions_res_fr:法文情感标注统计
CPCF_python_scripts 04.CPCF_morphology_tagging_fr.py get_morph_fr:法文形态标注
CPCF_python_scripts 04.CPCF_morphology_tagging_fr.py get_certain_morph_fr:获得某个特定的法文形态标注
CPCF_python_scripts 05.CPCF_ner_tagging_fr_zh.py get_ner_fr:获取法文NER和统计结果
CPCF_python_scripts 05.CPCF_ner_tagging_fr_zh.py get_ner_zh:获取中文NER和统计结果
CPCF_python_scripts 06.CPCF_parser_tagging_fr_zh.py get_parser_fr:法文句法标注
CPCF_python_scripts 06.CPCF_parser_tagging_fr_zh.py get_parser_zh:中文句法标注
CPCF_python_scripts 07.CPCF_semantic_tagging_fr.py get_semantic_fr:法文语义标注和统计结果
CPCF_python_scripts 07.CPCF_semantic_tagging_fr.py get_type_words:获取某个语义下的所有单词
CPCF_python_scripts 08.CPCF_statistics_mean_length_fr.py sent_len_in_token_mean_fr:计算法文句子的平均句长
CPCF_python_scripts 09.CPCF_statistics_pos_freq_fr_zh.py pos_freq_fr:法文词性统计
CPCF_python_scripts 09.CPCF_statistics_pos_freq_fr_zh.py pos_freq_zh:中文词性统计
CPCF_python_scripts 10.CPCF_statistics_punctuation_freq_fr_zh.py get_punctuation_fr:获取法文标点符号统计
CPCF_python_scripts 10.CPCF_statistics_punctuation_freq_fr_zh.py get_punctuation_zh:获取中文标点符号统计
CPCF_python_scripts 11.CPCF_statistics_ttr_sttr_fr_zh.py type_token_ratio_fr:法文语料TTR统计
CPCF_python_scripts 11.CPCF_statistics_ttr_sttr_fr_zh.py standardized_type_token_ratio_fr:法文语料sTTR统计
CPCF_python_scripts 11.CPCF_statistics_ttr_sttr_fr_zh.py type_token_ratio_zh:中文语料TTR统计
CPCF_python_scripts 11.CPCF_statistics_ttr_sttr_fr_zh.py standardized_type_token_ratio_zh:中文语料sTTR统计
CPCF_python_scripts 12.CPCF_statistics_word_freq_fr_zh.py word_freq_fr:获取未经处理的法文词频、词形还原后的法文词频、词形还原且去掉停用词后的法文词频
CPCF_python_scripts 12.CPCF_statistics_word_freq_fr_zh.py word_freq_zh:获取未经处理的中文词频和去掉停用词后的中文词频
CPCF_python_scripts 13.CPCF_electronic_dictionary.py CPCF_electronic_dictionary:制作电子词典的类
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_1_all_sentences:匹配含有某中文单词的所有语料
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_2_translation_freq:获取一种或几种法文翻译的语料并统计频数
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_3_translation_except:获取除了一种或几种法文翻译的语料并统计频数
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_4_entries:制作法文词条的义项
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_5_electronic_dictionary_normal:整合一个词条的所有义项(普通词)
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_5_electronic_dictionary_explain:整合一个词条的所有义项(需解释缩写含义的词)
CPCF_python_scripts 13.CPCF_electronic_dictionary.py get_6_example_sentences:获取某个义项的所有例句
CPCF_python_scripts 13.CPCF_electronic_dictionary.py transfer_example_sentences:将未转换的句子转换成例句格式
CPCF_python_scripts 13.CPCF_electronic_dictionary.py merge_electronic_dictionary:将所有词条及例句合并成一个文件,以完成词典的转化
CPCF_electronic_dictionary CPCF_all_entries.xlsx 首期建设的30个词条(表中打“√”的词)
CPCF_electronic_dictionary CPCF_all_electronic_dictionary.txt 电子词典(文本格式)
CPCF_electronic_dictionary CPCF_all_electronic_dictionary.eudic 电子词典(《法语助手》格式)
CPCF_reference french_semantic_lexicon.txt 参考资料:法文语义标注词典
CPCF_reference french_semantic_terms.txt 参考资料:法文情感标注项目
CPCF_reference french_sentiment_dict.txt 参考资料:法文情感标注词典

cpcf's People

Contributors

wenjiaoyin avatar

Stargazers

XD avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

oneoutlier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.