I'm Soohwan Kim
- Co-founder & A.I. team leader at TUNiB
2021.03 ~ present
- A.I. Engineer at Kakao Brain
2020.08 ~ 2021.03
- Dearmate : A.I. SNS platform with a variety of characters with unique personas
- TUNiBridge : NLP Cloud API Services
Pre-processing KsponSpeech corpus (Korean Speech dataset) provided by AI Hub.
License: MIT License
2021.03 ~ present
2020.08 ~ 2021.03
`base) C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master>python main.py
preprocess started..
create_char_labels started..
create_script started..
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 22, in
create_script(opt.dataset_path, opt.script_prefix)
File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\preprocess.py", line 64, in create_script
char2id, id2char = load_label('aihub_labels.csv')
File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\functional.py", line 9, in load_label
id_list = ch_labels["id"]
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\frame.py", line 2800, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'`
I have been trying to fix it but no luck so for. Hope you could help me with it.
hi,
i cannot find the KsponSpeech corpus in the ai hub website. is it not available now? thank you.
(base) C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master>python main.py preprocess started.. create_char_labels started.. create_script started.. Traceback (most recent call last): File "main.py", line 22, in <module> create_script(opt.dataset_path, opt.script_prefix) File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\preprocess.py", line 64, in create_script char2id, id2char = load_label('aihub_labels.csv') File "C:\Users\Admin\ELYOR\kospeech\preprocessing\KsponSpeech-preprocess-master\preprocess\functional.py", line 8, in load_label ch_labels = pd.read_csv(filepath, encoding="cp949") File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 448, in _read parser = TextFileReader(fp_or_buf, **kwds) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 880, in __init__ self._make_engine(self.engine) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__ File "pandas\_libs\parsers.pyx", line 720, in pandas._libs.parsers.TextReader._get_header File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2063, in pandas._libs.parsers.raise_parser_error UnicodeDecodeError: 'cp949' codec can't decode byte 0xb8 in position 64: illegal multibyte sequence
Hello. Could you help with this error please?
I am a newbie, and i want to apply the kspnspeech dataset. How should i apply for? tks
In my directory, 'pcm' files and 'txt' files are mixed.
So, below code show error(encoding error).
`def create_script(dataset_path, script_prefix):
print('create_script started..')
char2id, id2char = load_label('aihub_labels.csv')
for directory in os.listdir(dataset_path):
path = os.path.join(dataset_path, directory)
for file in os.listdir(path):
sentence, target = None, None
with open(os.path.join(path, file), "r") as f:
sentence = f.read()
with open(os.path.join(path, script_prefix + file[12:]), "w") as f:
target = sentence_to_target(sentence, char2id)
f.write(target)`
I think it is better to put one line like below
`def create_script(dataset_path, script_prefix):
print('create_script started..')
char2id, id2char = load_label('aihub_labels.csv')
for directory in os.listdir(dataset_path):
path = os.path.join(dataset_path, directory)
if file.endswith('txt'): # like this line`
I use AIhub data.
have a good day.
When I preprocess ksponspeech data, I meet OSError: [Errno 5] Input/output error. What can I do to solve this problem. I can't find any solution to this.
Thank you.
Hi,
where is the ksponspeech dataset to download. The address https://aihub.or.kr/aidata/105 is not valid anymore.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.