sooftware / ksponspeech Goto Github PK

View Code? Open in Web Editor NEW

83.0 83.0 18.0 137 KB

Pre-processing KsponSpeech corpus (Korean Speech dataset) provided by AI Hub.

License: MIT License

Shell 5.83% Python 94.17%

aihub korean-speech kospeech ksponspeech

ksponspeech's Introduction

I'm Soohwan Kim

Career

Co-founder & A.I. team leader at TUNiB 2021.03 ~ present
A.I. Engineer at Kakao Brain 2020.08 ~ 2021.03

Service

Dearmate : A.I. SNS platform with a variety of characters with unique personas
TUNiBridge : NLP Cloud API Services

ksponspeech's People

Contributors

Stargazers

Watchers

Forkers

rheehot changnaman joovvhan yeonnie1010 appleholic usha451 nguyenvulong sangkwonlim-haii lian6605 poveteen jaekookang zw76859420 kang7367 k-juyeon webstorage119 hgong paulsunnypark gedebabin

ksponspeech's Issues

KeyError: 'id' while running main.py

`base) C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master>python main.py
preprocess started..
create_char_labels started..
create_script started..
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 22, in
create_script(opt.dataset_path, opt.script_prefix)
File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\preprocess.py", line 64, in create_script
char2id, id2char = load_label('aihub_labels.csv')
File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\functional.py", line 9, in load_label
id_list = ch_labels["id"]
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\frame.py", line 2800, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'`
I have been trying to fix it but no luck so for. Hope you could help me with it.

KsponSpeech corpus

hi,
i cannot find the KsponSpeech corpus in the ai hub website. is it not available now? thank you.

UnicodeDecodeError: 'cp949' codec can't decode byte 0xb8 in position 64: illegal multibyte sequence

(base) C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master>python main.py preprocess started.. create_char_labels started.. create_script started.. Traceback (most recent call last): File "main.py", line 22, in <module> create_script(opt.dataset_path, opt.script_prefix) File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\preprocess.py", line 64, in create_script char2id, id2char = load_label('aihub_labels.csv') File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\functional.py", line 8, in load_label ch_labels = pd.read_csv(filepath, encoding="cp949") File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 448, in _read parser = TextFileReader(fp_or_buf, **kwds) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 880, in __init__ self._make_engine(self.engine) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__ File "pandas\_libs\parsers.pyx", line 720, in pandas._libs.parsers.TextReader._get_header File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2063, in pandas._libs.parsers.raise_parser_error UnicodeDecodeError: 'cp949' codec can't decode byte 0xb8 in position 64: illegal multibyte sequence
Hello. Could you help with this error please?

Pls how should I apply for ksponspeech dataset?

I am a newbie, and i want to apply the kspnspeech dataset. How should i apply for? tks

file type error

In my directory, 'pcm' files and 'txt' files are mixed.

So, below code show error(encoding error).

`def create_script(dataset_path, script_prefix):
print('create_script started..')
char2id, id2char = load_label('aihub_labels.csv')

for directory in os.listdir(dataset_path):
    path = os.path.join(dataset_path, directory)
    for file in os.listdir(path):
        sentence, target = None, None

        with open(os.path.join(path, file), "r") as f:
            sentence = f.read()

        with open(os.path.join(path, script_prefix + file[12:]), "w") as f:
            target = sentence_to_target(sentence, char2id)
            f.write(target)`

I think it is better to put one line like below

`def create_script(dataset_path, script_prefix):
print('create_script started..')
char2id, id2char = load_label('aihub_labels.csv')

for directory in os.listdir(dataset_path):
    path = os.path.join(dataset_path, directory)
    if file.endswith('txt'): # like this line`

I use AIhub data.

have a good day.

OSError: [Errno 5] Input/output error:

When I preprocess ksponspeech data, I meet OSError: [Errno 5] Input/output error. What can I do to solve this problem. I can't find any solution to this.
Thank you.

Hi, where is the ksponspeech dataset to download. The address https://aihub.or.kr/aidata/105 is not valid anymore.

Hi,
where is the ksponspeech dataset to download. The address https://aihub.or.kr/aidata/105 is not valid anymore.
Thanks

sooftware / ksponspeech Goto Github PK

ksponspeech's Introduction

I'm Soohwan Kim

Career

Service

ksponspeech's People

Contributors

Stargazers

Watchers

Forkers

ksponspeech's Issues

KeyError: 'id' while running main.py

KsponSpeech corpus

UnicodeDecodeError: 'cp949' codec can't decode byte 0xb8 in position 64: illegal multibyte sequence

Pls how should I apply for ksponspeech dataset?

file type error

OSError: [Errno 5] Input/output error:

Hi, where is the ksponspeech dataset to download. The address https://aihub.or.kr/aidata/105 is not valid anymore.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent