Coder Social home page Coder Social logo

bytepiece's People

Contributors

bojone avatar eggqq007 avatar hscspring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bytepiece's Issues

转换成 sentencepiece 的之后载入失败

通过类方法 convert_to_sentencepiece 转换为 sp model,再进行 load 的时候报错

import sentencepiece as spm

sp_model = spm.SentencePieceProcessor()
sp_model.Load("sp.model")
libc++abi: terminating due to uncaught exception of type Darts::Details::Exception: /Users/runner/work/sentencepiece/sentencepiece/third_party/darts_clone/darts.h:1143: exception: failed to insert key: zero-length key

相关 issue google/sentencepiece#156

模型里面有 "\0",是否应该在 convert 的时候去掉,以及是否有副作用?

pip 安装失败

image

如图所示,打包的时候没有把 readme 打包进来。

Training error by multiple works

Hey, I had an error in my training.

trainer = Trainer(order=6, max_vocab_size=100000, min_count=32)
trainer.train(w, workers=2, batch_size=1000)

but I got an error AttributeError: Can't pickle local object 'Trainer.pcount.<locals>.worker_func'

This is my env version:
python 3.8.16
multiprocess 0.70.14

加载model报错

按照以下加载模型做分词测试报错。
环境:加载模型
bytepiece.id.eu.plus.80k.model
from bytepiece import Tokenizer

tokenizer = Tokenizer('bytepiece.model')
这一步报错
fe036e740e0d4f2bba9de520d7630c1
是加载编码有问题吗,看代码这里的K 是二进制编码
6ad6e449d88238762b883a51b219d7e

convert_to_sentencepiece fail

bytepiece-0.6.3

line 356, in convert_to_sentencepiece
p = re.sub(' ', '▁', p.decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

建议作者增加一个词表扩充的demo或教程

目前的bytepiece算法和实现已经足够优秀,但是相较于sp而言文档较少,demo不多,所以目前好像更像是很多人研究的玩具而非生产力工具。综上,建议作者完善文档并增加一个词表扩充的demo或教程。感谢!

Redundant vocab?

There seems to be redundancy in the model you released.

Please correct me if I am wrong. 😁

For example, when I run:
grep -A 2 -B 2 -n '","' bytepiece_80k.model > test.txt

I got:

67637-    "77yM": [
67638-        13530,
67639:        ",",
67640-        52141276
67641-    ],
--
114017-    "gO+8jA==": [
114018-        22806,
114019:        ",",
114020-        27651
114021-    ],
--
114062-    "ge+8jA==": [
114063-        22815,
114064:        ",",
114065-        30470
114066-    ],
--
114077-    "gu+8jA==": [
114078-        22818,
114079:        ",",
114080-        17876
114081-    ],
--
114092-    "g++8jA==": [
114093-        22821,
114094:        ",",
114095-        15013
114096-    ],
--
114112-    "hO+8jA==": [
114113-        22825,
114114:        ",",
114115-        28134
114116-    ],
--
114127-    "he+8jA==": [
114128-        22828,
114129:        ",",
114130-        24900
114131-    ],
--
114142-    "hu+8jA==": [
114143-        22831,
114144:        ",",
114145-        33913
114146-    ],
--
114157-    "h++8jA==": [
114158-        22834,
114159:        ",",
114160-        19583
114161-    ],
--
114177-    "iO+8jA==": [
114178-        22838,
114179:        ",",
114180-        29143
114181-    ],
--
114232-    "ie+8jA==": [
114233-        22849,
114234:        ",",
114235-        56579
114236-    ],
--
114252-    "iu+8jA==": [
114253-        22853,
114254:        ",",
114255-        27238
114256-    ],
--
114307-    "i++8jA==": [
114308-        22864,
114309:        ",",
114310-        55373
114311-    ],
--
114397-    "jO+8jA==": [
114398-        22882,
114399:        ",",
114400-        27059
114401-    ],
--
114427-    "je+8jA==": [
114428-        22888,
114429:        ",",
114430-        26193
114431-    ],
--
114437-    "ju+8jA==": [
114438-        22890,
114439:        ",",
114440-        24179
114441-    ],
--
114462-    "j++8jA==": [
114463-        22895,
114464:        ",",
114465-        38457
114466-    ],
--
114477-    "kO+8jA==": [
114478-        22898,
114479:        ",",
114480-        23832
114481-    ],
--
114517-    "ke+8jA==": [
114518-        22906,
114519:        ",",
114520-        45492
114521-    ],
114522-    "ku+8jA==": [
114523-        22907,
114524:        ",",
114525-        14602
114526-    ],
--
114537-    "k++8jA==": [
114538-        22910,
114539:        ",",
114540-        16818
114541-    ],
--
114557-    "lO+8jA==": [
114558-        22914,
114559:        ",",
114560-        25217
114561-    ],
--
114572-    "le+8jA==": [
114573-        22917,
114574:        ",",
114575-        22115
114576-    ],
--
114592-    "lu+8jA==": [
114593-        22921,
114594:        ",",
114595-        35881
114596-    ],
--
114612-    "l++8jA==": [
114613-        22925,
114614:        ",",
114615-        20337
114616-    ],
--
114632-    "mO+8jA==": [
114633-        22929,
114634:        ",",
114635-        20429
114636-    ],
--
114657-    "me+8jA==": [
114658-        22934,
114659:        ",",
114660-        23524
114661-    ],
--
114682-    "mu+8jA==": [
114683-        22939,
114684:        ",",
114685-        25417
114686-    ],
--
114707-    "m++8jA==": [
114708-        22944,
114709:        ",",
114710-        39254
114711-    ],
--
114722-    "nO+8jA==": [
114723-        22947,
114724:        ",",
114725-        21866
114726-    ],
--
114737-    "ne+8jA==": [
114738-        22950,
114739:        ",",
114740-        30855
114741-    ],
114742-    "nu+8jA==": [
114743-        22951,
114744:        ",",
114745-        13346
114746-    ],
--
114777-    "n++8jA==": [
114778-        22958,
114779:        ",",
114780-        19853
114781-    ],
--
114797-    "oO+8jA==": [
114798-        22962,
114799:        ",",
114800-        15771
114801-    ],
--
114812-    "oe+8jA==": [
114813-        22965,
114814:        ",",
114815-        15774
114816-    ],
--
114827-    "ou+8jA==": [
114828-        22968,
114829:        ",",
114830-        21457
114831-    ],
--
114837-    "o++8jA==": [
114838-        22970,
114839:        ",",
114840-        19483
114841-    ],
--
114847-    "pO+8jA==": [
114848-        22972,
114849:        ",",
114850-        21556
114851-    ],
--
114877-    "pe+8jA==": [
114878-        22978,
114879:        ",",
114880-        31149
114881-    ],
--
114892-    "pu+8jA==": [
114893-        22981,
114894:        ",",
114895-        24183
114896-    ],
--
114907-    "p++8jA==": [
114908-        22984,
114909:        ",",
114910-        27728
114911-    ],
--
114922-    "qO+8jA==": [
114923-        22987,
114924:        ",",
114925-        28618
114926-    ],
--
114947-    "qe+8jA==": [
114948-        22992,
114949:        ",",
114950-        19995
114951-    ],
--
114957-    "qu+8jA==": [
114958-        22994,
114959:        ",",
114960-        13975
114961-    ],
--
114982-    "q++8jA==": [
114983-        22999,
114984:        ",",
114985-        20687
114986-    ],
--
114997-    "rO+8jA==": [
114998-        23002,
114999:        ",",
115000-        32861
115001-    ],
--
115012-    "re+8jA==": [
115013-        23005,
115014:        ",",
115015-        28320
115016-    ],
--
115032-    "ru+8jA==": [
115033-        23009,
115034:        ",",
115035-        20423
115036-    ],
--
115047-    "r++8jA==": [
115048-        23012,
115049:        ",",
115050-        35625
115051-    ],
--
115072-    "sO+8jA==": [
115073-        23017,
115074:        ",",
115075-        25750
115076-    ],
--
115082-    "se+8jA==": [
115083-        23019,
115084:        ",",
115085-        26685
115086-    ],
--
115092-    "su+8jA==": [
115093-        23021,
115094:        ",",
115095-        11848
115096-    ],
--
115107-    "s++8jA==": [
115108-        23024,
115109:        ",",
115110-        17627
115111-    ],
--
115127-    "tO+8jA==": [
115128-        23028,
115129:        ",",
115130-        28616
115131-    ],
115132-    "te+8jA==": [
115133-        23029,
115134:        ",",
115135-        12874
115136-    ],
--
115147-    "tu+8jA==": [
115148-        23032,
115149:        ",",
115150-        23628
115151-    ],
--
115182-    "t++8jA==": [
115183-        23039,
115184:        ",",
115185-        25204
115186-    ],
--
115212-    "ue+8jA==": [
115213-        23045,
115214:        ",",
115215-        37675
115216-    ],
--
115252-    "uu+8jA==": [
115253-        23053,
115254:        ",",
115255-        39151
115256-    ],
--
115282-    "u++8jA==": [
115283-        23059,
115284:        ",",
115285-        26261
115286-    ],
115287-    "vO+8jA==": [
115288-        23060,
115289:        ",",
115290-        20389
115291-    ],
--
115312-    "ve+8jA==": [
115313-        23065,
115314:        ",",
115315-        32564
115316-    ],
--
115332-    "vu+8jA==": [
115333-        23069,
115334:        ",",
115335-        15513
115336-    ],
--
115352-    "v++8jA==": [
115353-        23073,
115354:        ",",
115355-        27048
115356-    ],
--
118962-    "77yM5A==": [
118963-        23795,
118964:        ",",
118965-        9190
118966-    ],
118967-    "77yM5Q==": [
118968-        23796,
118969:        ",",
118970-        205451
118971-    ],
118972-    "77yM5g==": [
118973-        23797,
118974:        ",",
118975-        172391
118976-    ],
118977-    "77yM5w==": [
118978-        23798,
118979:        ",",
118980-        67781
118981-    ],
118982-    "77yM6A==": [
118983-        23799,
118984:        ",",
118985-        128861
118986-    ],
118987-    "77yM6Q==": [
118988-        23800,
118989:        ",",
118990-        55129
118991-    ],
--
158192-    "uK3vvIw=": [
158193-        31641,
158194:        ",",
158195-        9172
158196-    ],
--
160082-    "77yM5Lg=": [
160083-        32019,
160084:        ",",
160085-        113663
160086-    ],
160087-    "77yM5Lk=": [
160088-        32020,
160089:        ",",
160090-        47189
160091-    ],
160092-    "77yM5Lo=": [
160093-        32021,
160094:        ",",
160095-        27242
160096-    ],
160097-    "77yM5Ls=": [
160098-        32022,
160099:        ",",
160100-        61788
160101-    ],
160102-    "77yM5Lw=": [
160103-        32023,
160104:        ",",
160105-        14111
160106-    ],
160107-    "77yM5L0=": [
160108-        32024,
160109:        ",",
160110-        36471
160111-    ],
160112-    "77yM5YU=": [
160113-        32025,
160114:        ",",
160115-        45865
160116-    ],
160117-    "77yM5YY=": [
160118-        32026,
160119:        ",",
160120-        20102
160121-    ],
160122-    "77yM5Yc=": [
160123-        32027,
160124:        ",",
160125-        18068
160126-    ],
160127-    "77yM5Yg=": [
160128-        32028,
160129:        ",",
160130-        22664
160131-    ],
160132-    "77yM5Y0=": [
160133-        32029,
160134:        ",",
160135-        45655
160136-    ],
160137-    "77yM5Y8=": [
160138-        32030,
160139:        ",",
160140-        78634
160141-    ],
160142-    "77yM5ZA=": [
160143-        32031,
160144:        ",",
160145-        40057
160146-    ],
160147-    "77yM5ZI=": [
160148-        32032,
160149:        ",",
160150-        10479
160151-    ],
160152-    "77yM5Zs=": [
160153-        32033,
160154:        ",",
160155-        10167
160156-    ],
160157-    "77yM5aQ=": [
160158-        32034,
160159:        ",",
160160-        35717
160161-    ],
160162-    "77yM5aU=": [
160163-        32035,
160164:        ",",
160165-        24338
160166-    ],
160167-    "77yM5aY=": [
160168-        32036,
160169:        ",",
160170-        13069
160171-    ],
160172-    "77yM5a4=": [
160173-        32037,
160174:        ",",
160175-        31083
160176-    ],
160177-    "77yM5a8=": [
160178-        32038,
160179:        ",",
160180-        22719
160181-    ],
160182-    "77yM5bA=": [
160183-        32039,
160184:        ",",
160185-        60831
160186-    ],
160187-    "77yM5bw=": [
160188-        32040,
160189:        ",",
160190-        12414
160191-    ],
160192-    "77yM5b4=": [
160193-        32041,
160194:        ",",
160195-        15229
160196-    ],
160197-    "77yM5b8=": [
160198-        32042,
160199:        ",",
160200-        11786
160201-    ],
160202-    "77yM5oM=": [
160203-        32043,
160204:        ",",
160205-        9275
160206-    ],
160207-    "77yM5og=": [
160208-        32044,
160209:        ",",
160210-        25038
160211-    ],
160212-    "77yM5ok=": [
160213-        32045,
160214:        ",",
160215-        30394
160216-    ],
160217-    "77yM5oo=": [
160218-        32046,
160219:        ",",
160220-        19145
160221-    ],
160222-    "77yM5os=": [
160223-        32047,
160224:        ",",
160225-        18043
160226-    ],
160227-    "77yM5o4=": [
160228-        32048,
160229:        ",",
160230-        10899
160231-    ],
160232-    "77yM5pU=": [
160233-        32049,
160234:        ",",
160235-        12918
160236-    ],
160237-    "77yM5pc=": [
160238-        32050,
160239:        ",",
160240-        9034
160241-    ],
160242-    "77yM5pg=": [
160243-        32051,
160244:        ",",
160245-        16571
160246-    ],
160247-    "77yM5pw=": [
160248-        32052,
160249:        ",",
160250-        21640
160251-    ],
160252-    "77yM5p0=": [
160253-        32053,
160254:        ",",
160255-        13906
160256-    ],
160257-    "77yM5q0=": [
160258-        32054,
160259:        ",",
160260-        7468
160261-    ],
160262-    "77yM5q8=": [
160263-        32055,
160264:        ",",
160265-        12216
160266-    ],
160267-    "77yM5rI=": [
160268-        32056,
160269:        ",",
160270-        11066
160271-    ],
160272-    "77yM55Q=": [
160273-        32057,
160274:        ",",
160275-        11502
160276-    ],
160277-    "77yM55s=": [
160278-        32058,
160279:        ",",
160280-        11973
160281-    ],
160282-    "77yM57s=": [
160283-        32059,
160284:        ",",
160285-        16988
160286-    ],
160287-    "77yM6IA=": [
160288-        32060,
160289:        ",",
160290-        15794
160291-    ],
160292-    "77yM6Jk=": [
160293-        32061,
160294:        ",",
160295-        8488
160296-    ],
160297-    "77yM6K4=": [
160298-        32062,
160299:        ",",
160300-        18199
160301-    ],
160302-    "77yM6K8=": [
160303-        32063,
160304:        ",",
160305-        33792
160306-    ],
160307-    "77yM6LA=": [
160308-        32064,
160309:        ",",
160310-        13559
160311-    ],
160312-    "77yM6LU=": [
160313-        32065,
160314:        ",",
160315-        14349
160316-    ],
160317-    "77yM6L8=": [
160318-        32066,
160319:        ",",
160320-        51748
160321-    ],
160322-    "77yM6YA=": [
160323-        32067,
160324:        ",",
160325-        23202
160326-    ],
160327-    "77yM6YE=": [
160328-        32068,
160329:        ",",
160330-        10723
160331-    ],
160332-    "77yM6YI=": [
160333-        32069,
160334:        ",",
160335-        25998
160336-    ],
160337-    "77yM6Zk=": [
160338-        32070,
160339:        ",",
160340-        8209
160341-    ],

大数据量训练的时候卡住

20GB 数据量可以正常训练,100GB 在跑到某一步的时候会卡住。bytepiece==0.6.3

image

某个 thread 的堆栈信息,看不出来,直接问 GPT 似乎是多进程的问题:

#0  0x00007f168f6207a4 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f168f620898 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f1683848699 in semlock_acquire ()
   from /opt/rh/rh-python38/root/usr/lib64/python3.8/lib-dynload/_multiprocessing.cpython-38-x86_64-linux-gnu.so
#3  0x00007f168f7ed4e6 in PyCFunction_Call () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#4  0x00007f168f7ac932 in _PyObject_MakeTpCall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#5  0x00007f168f862c5c in _PyEval_EvalFrameDefault () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#6  0x00007f168f84fe05 in _PyFunction_Vectorcall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#7  0x00007f168f7ab7bd in PyObject_Call () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#8  0x00007f168f860081 in _PyEval_EvalFrameDefault () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#9  0x00007f168f84fe05 in _PyFunction_Vectorcall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#10 0x00007f168f85e323 in _PyEval_EvalFrameDefault () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#11 0x00007f168f84fe05 in _PyFunction_Vectorcall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#12 0x00007f168f85e323 in _PyEval_EvalFrameDefault () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#13 0x00007f168f84fe05 in _PyFunction_Vectorcall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#14 0x00007f168f8507cb in method_vectorcall () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#15 0x00007f168f7ab7bd in PyObject_Call () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#16 0x00007f168f8ad6d1 in t_bootstrap () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#17 0x00007f168f86bbc4 in pythread_wrapper () from /opt/rh/rh-python38/root/usr/lib64/libpython3.8.so.rh-python38-1.0
#18 0x00007f168f6174e2 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f168f3f25b3 in clone () from /lib64/libc.so.6

使用 huggingface tokenizers 加载 bytepiece 的模型文件

通过以下代码转换模型文件后,可以 huggingface tokenizers 的接口直接加载 bytepiece 的模型。转换后,同样支持 byte_fallback 等特性。

以下代码为抛转引玉,未经过完全的测试,希望苏神可以测试一下。

# convert.py
from itertools import product
from pathlib import Path
from dataclasses import dataclass
import math
import json


@dataclass(slots=True)
class WordItem:
    word: str
    log_probability: float

    def aslist(self):
        return [self.word, self.log_probability]


def convert_bytepiece_model_to_hugging_face(bytepiece_model_file: str | Path, output_file: str | Path):
    bytepiece_model_file = Path(bytepiece_model_file)
    if not bytepiece_model_file.exists():
        raise FileNotFoundError(f'{bytepiece_model_file} not found.')

    output_file = Path(output_file)
    if output_file.exists():
        raise FileExistsError(f'{output_file} already exists.')

    initial_word_items = [WordItem(f'<0x{a}{b}>', 0.0) for a, b in product('0123456789ABCDEF', '0123456789ABCDEF')]
    bytepiece_model_data = json.load(bytepiece_model_file.open())
    log_sum = math.log(sum(item[2] for item in bytepiece_model_data.values()))
    
    vocab_word_items: list[WordItem] = []
    for item in bytepiece_model_data.values():
        _, word, frequence = item
        if not word.strip():
            continue

        word_item = WordItem(word, math.log(frequence) - log_sum)
        vocab_word_items.append(word_item)
    word_items = initial_word_items + vocab_word_items
    
    hugging_face_dict = {
        'version': "1.0",
        "pre_tokenizer": {
            "type": "CharDelimiterSplit",
            "delimiter": "\x00",
        },
        "model": {
            "type": "Unigram",
            "unk_id": 0,
            "vocab": [item.aslist() for item in word_items],
            "byte_fallback": True,
        }
    }

    json.dump(hugging_face_dict, output_file.open('w'), indent=2, ensure_ascii=False)


if __name__ == '__main__':
    from argparse import ArgumentParser

    parser = ArgumentParser()
    parser.add_argument('bytepiece_model_file', type=str)
    parser.add_argument('output_file', type=str)
    args = parser.parse_args()
    convert_bytepiece_model_to_hugging_face(args.bytepiece_model_file, args.output_file)
python convert.py bytepiece.model bytepiece.json
from tokenizers import Tokenizer

bytepiece_tokenizer = Tokenizer.from_file('bytepiece_plus_240k.json')
sentences = [
'中外科学名著',
'提高产品质量',
'鞭炮声响彻夜空',
'这事的确定不下来',
'邓颖超生前使用过的物品',
'쯈',
]
for sentence in sentences:
    print(bytepiece_tokenizer.encode(sentence).tokens)

for sentence in sentences:
    print(bytepiece_tokenizer.encode(sentence).offsets)

# OUTPUT:
# ['中外', '科学', '名著']
# ['提高', '产品', '质量']
# ['鞭炮', '声', '响彻', '夜空']
# ['这事', '的确', '定', '不', '下来']
# ['邓', '颖', '超', '生前', '使用', '过的', '物品']
# ['<0xEC>', '<0xAF>', '<0x88>']
# [(0, 2), (2, 4), (4, 6)]
# [(0, 2), (2, 4), (4, 6)]
# [(0, 2), (2, 3), (3, 5), (5, 7)]
# [(0, 2), (2, 4), (4, 5), (5, 6), (6, 8)]
# [(0, 1), (1, 2), (2, 3), (3, 5), (5, 7), (7, 9), (9, 11)]
# [(0, 1), (0, 1), (0, 1)]

安装遇到的问题并解决办法

今天安装了一天,遇到了几个问题,分享一下,方便后面类似遇到的同学。

1. python的版本问题:

  • 测试了多个python版本,最终完美的版本是3.11
  • 3.12版本imp模块已被移除。虽然官方有替代方案,我捣鼓半天失败了:(

2. pyahocorasick的安装问题:

  • AHOCORASICK_BYTES=1 pip install git+https://github.com/WojciechMula/pyahocorasick.git,我在虚拟环境下没有安装成功。
  • 我的摸索的方法,先git clone https://github.com/WojciechMula/pyahocorasick.git
  • 然后进入目录,打开setup.py。build_as_bytes 这个变量直接设置为True。保证AHOCORASICK_BYTES安装
  • 直接python setup.py install,当然,还需要一个C++的库,这个直接按照错误提示下载安装就好了,这个简单,就是有点大。

3. 会提示GBK编码问题(这个应该就是在windows中存在):

  • pieces = json.load(open(pieces))直接指定编码: pieces = json.load(open(pieces, encoding="utf-8"))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.