yongzhuo,Macropodus,github

bert

bert分类, classify, classifier. TensorFlow code and pre-trained models for BERT

汉字字形/拼音/语义相似度(单字, 可用于数据增强, CSC错别字检测识别任务(构建混淆集)) Chinese character font/pinyin/semantic similarity (single character, can be used for data augmentation, CSC misclassified character detection and recognition tasks (building confusion sets))

char_cnn_text_classification_chinese2pinyin

char_CNN_text_classification_Chinese2Pinyin，中文转拼音实例-基于字符的卷积神经网络-超短文本分类-主要代码为lc222的github项目，有HTTP访问等

chatglm-maths

chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu

chatglm2-sft

ChatGLM2-6B微调, SFT/LoRA, instruction finetune

chatglm3-sft

chatglm3-6b, 微调/LORA/推理/单机多卡/deepspeed/支持多轮对话

complexeventextraction

A concept and obvious expression pattern collection of Chinese compound event extraction which then be evolved into ComplexEventGraph，本项目提出了中文复合事件的概念与显式模式，包括条件事件、因果事件、顺承事件、反转事件等事件抽取，并形成事理图谱。

gemma-sft

Gemma-SFT, gemma-2b/gemma-7b微调(finetune,transformers)/LORA(peft)/推理(inference)

internlm-sft

InternLM-7B微调, SFT/LoRA, instruction finetune

kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

keras-textclassification

中文长文本分类、短句子分类、多标签分类、两句子相似度（Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short），字词句向量嵌入层（embeddings）和网络层（graph）构建基类，FastText，TextCNN，CharCNN，TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

layoutlmv3-layoutxlm-chinese

chinese document classification of layoutlmv3 and layoutxlm

leetcode-in-out

leetcode一些热门题型的python代码，包括输入输出。leetcode of hot, which Includes input and output.

llama2-sft

Llama2-SFT, Llama-2-7B微调(transformers)/LORA(peft)/推理

llm-sft

中文大模型微调(LLM-SFT), 数学指令数据集MWP-Instruct, 支持模型(ChatGLM-6B, LLaMA, Bloom-7B, baichuan-7B), 支持(LoRA, QLoRA, DeepSpeed, UI, TensorboardX), 支持(微调, 推理, 测评, 接口)等.

macadam

Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。

macrogpt-pretrain

macrogpt大模型全量预训练(1b3,32层), 多卡deepspeed/单卡adafactor

macropodus

自然语言处理工具Macropodus，基于Albert+BiLSTM+CRF深度学习网络架构，中文分词，词性标注，命名实体识别，新词发现，关键词，文本摘要，文本相似度，科学计算器，中文数字阿拉伯数字(罗马数字)转换，中文繁简转换，拼音转换。tookit(tool) of NLP，CWS(chinese word segnment)，POS(Part-Of-Speech Tagging)，NER(name entity recognition)，Find(new words discovery)，Keyword(keyword extraction)，Summarize(text summarization)，Sim(text similarity)，Calculate(scientific calculator)，Chi2num(chinese number to arabic number)

near-synonym

near-synonym, 中文反义词/近义词(antonym/synonym)工具包, 包括25w+反义词/近义词词典.

nlg-yongzhuo

中文文本生成（NLG）之文本摘要（text summarization）工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。（graph，feature，topic model，summarize tool or tookit）

nlp_xiaojiang

自然语言处理（nlp），小姜机器人（闲聊检索式chatbot），BERT句向量-相似度（Sentence Similarity），XLNET句向量-相似度（text xlnet embedding），文本分类（Text classification），实体提取（ner，bert+bilstm+crf），数据增强（text augment, data enhance），同义句同义词生成，句子主干提取（mainpart），中文汉语短文本相似度，文本特征工程，keras-http-service调用

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

open-information-extraction-system

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

pytorch-loss

pytorch版损失函数，改写自科学空间文章，【通过互信息**来缓解类别不平衡问题】、【将“softmax+交叉熵”推广到多标签分类问题】

pytorch-model-to-tensorflow

transformers-model of pytorch1.x to tensorflow2.x, deploy for tf-serving

pytorch-nlu

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

yongzhuo Goto Github PK

Macropodus's Projects

Recommend Projects

Recommend Topics

Recommend Org