mozillaitalia / deepspeech-italian-model Goto Github PK
View Code? Open in Web Editor NEWTooling for producing Italian model (public release available) for DeepSpeech and text corpus
License: GNU General Public License v3.0
Tooling for producing Italian model (public release available) for DeepSpeech and text corpus
License: GNU General Public License v3.0
I think that we got a lot of issues here.
There are issues related to the syntax used by user in chat rooms, messages with fancy characters from chatbot and symbols used for text-graphic stuff. I'll just leave here some recurring examples
Chat room info messages:
DCC CHAT ip <ip.address.format.here>
CoTiDie is [email protected] * CoTiDieMoRi
U:\>tracert 195.94.177.137
Tracing route to TARAS [195.94.177.137]
over a maximum of 30 hops:
e questa è la sua stazione
NetBIOS Remote Machine Name Table
Name Type Status
TARAS UNIQUE Registered\
EULOGOS GROUP Registered
TARAS UNIQUE Registered
TARAS UNIQUE Registered
EULOGOS GROUP Registered
FRANCESCO UNIQUE Registered
MAC Address = 52-54-AB-DD-22-98
wolvie is AWAY since Mon Oct 13 11:00:29 1997 Reason: 4OKKUPATO: lavoro
El_Diablo ----==>>>>------>12,11 EL-GRECO ----==>>>>------>
] [Time/0h 0m] [Log/On] [Page/On]
free-join 1,15 -The Most Advanced Script Ever Seen-
free-join 15,15 14,14 15,15
free-join 15,15 14,14 14,1-16=14º15 14°14S15ho16wD15ow14N 14P15r16O14°15,1 14º16=14-14,14 15,15
free-join 15,15 14,14 15,15
E_D-away Set Away: Tuesday 10/14/97 Pager: On MsgLog: Off Beeper: Off Reason: not chatting
Type /ctcp E_D-away PAGE REASON to get my attention
CI-WUGY (AWAY:scripting...) gØne since: 4:34pm
<U+0081> -=º °ShowDowN v6.5 PrO° º=- <U+0081>
Junes [^?Auto-Set Back^?] at (2:47:51pm) Away for: 46mins 28secs -LOG OFF-
/ctcp nextphase This_Is_Not_A_Fucking_CTCP___This_Is_A_CoCoNuts_Island_CTCP_ :D
Deth got [<U+008D>(8Lemon)<U+008D>] [<U+008D>(8Lemon)<U+008D>] [<U+008D>(--2-7---)<U+008D>]
Symbols and text noise made by nicknames or by chatting:
ciao a tutti ;9
.... ....
\\olverin pensa che sia il caso di sganciare
c e qualcuno????????????????????????????,,
kimy [Pitch], usate le query please
/msg drago ciao
adios agua.......................- - ->
Pannella usa la tromba§
Mannaccia **§§°ç°ç§é*ç°é*ç°é°çé°ç°
etupensicheseiofossiilpresidentedellajamaicastareiacazzeggiarequicontuttelefighechecisonola'???????????????' :> [08]
/ ___| |_ _| / \ / _ \ | |
| | | | / _ \ | | | | | |
\____| |___| /_/ \_\ \___/ (_)
italia .'..'.
Que|o io avrei il crack del kali
/Msg LiveFast, sono il suo agente
che ca^?^?o hai capito
different languages:
nothing je t ai dit que je suis la
If Anyone Speaks Too Long Texts He Will Be Kicked
hello th^?^?e girls
Just align with https://github.com/Common-Voice/commonvoice-fr/pull/97/files
Hi there,
( have to write in English or i can write in Italian ? )
i have tested the docker instance and i've found this errors:
wget https://lingualibre.fr/datasets/Q385-ita-Italian.zip -O /mnt/source/lingua_libre_Q385-ita-Italian_train.zip
change "/mnt/source/" into "/mnt/sources/"
sed -i s/#//g '/mnt/extracted/data/*test.csv'
sed: can't read /mnt/extracted/data/*test.csv: No such file or directory
my workaround was to specify the directories like this:
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*test.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*train.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*dev.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*test.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*train.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*dev.csv
+ rm /mnt/lm/lm.arpa
+ '[' '!' -f /mnt/lm/trie ']'
+ curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.ba56407376f1e1109be33ac87bcb6eb9709b18be.cpu/artifacts/public/native_client.tar.xz
+ pixz -d
+ tar -xf -
can not seek in input: Illegal seek
Not an XZ file
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
browsing that url gives: ResourceNotFound
P.S.
Everytime pixz return:
can not seek in input: Illegal seek
hope this is only a warning
P.P.S.
i've tried to format this thread as best as possible but seems i can't...sorry if it is too chaotic
Hope to help in some way
Regards
Massimo
I Initializing variables...
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:42 | Steps: 45 | Loss: 184.312358
Epoch 0 | Validation | Elapsed Time: 0:00:08 | Steps: 31 | Loss: 165.173247 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
I Saved new best validating model with loss 165.173247 to: /mnt/checkpoints/best_dev-45
Epoch 1 | Training | Elapsed Time: 0:00:39 | Steps: 45 | Loss: 171.796456
Epoch 1 | Validation | Elapsed Time: 0:00:07 | Steps: 31 | Loss: 162.115896 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 1 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
WARNING:tensorflow:From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0925 18:24:51.043290 140513397847872 deprecation.py:323] From /home/trainer/ds-train/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I Saved new best validating model with loss 162.115896 to: /mnt/checkpoints/best_dev-90
Epoch 2 | Training | Elapsed Time: 0:00:39 | Steps: 45 | Loss: 150.745584
Epoch 2 | Validation | Elapsed Time: 0:00:07 | Steps: 31 | Loss: 136.187367 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 2 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
I Saved new best validating model with loss 136.187367 to: /mnt/checkpoints/best_dev-135
Epoch 3 | Training | Elapsed Time: 0:00:39 | Steps: 45 | Loss: 127.614623
Epoch 3 | Validation | Elapsed Time: 0:00:08 | Steps: 31 | Loss: 123.730088 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 3 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
I Saved new best validating model with loss 123.730088 to: /mnt/checkpoints/best_dev-180
Epoch 4 | Training | Elapsed Time: 0:00:39 | Steps: 45 | Loss: 114.725798
Epoch 4 | Validation | Elapsed Time: 0:00:08 | Steps: 31 | Loss: 115.417479 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 4 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
I Saved new best validating model with loss 115.417479 to: /mnt/checkpoints/best_dev-225
Epoch 5 | Training | Elapsed Time: 0:00:39 | Steps: 45 | Loss: 104.686398
Epoch 5 | Validation | Elapsed Time: 0:00:08 | Steps: 31 | Loss: 136.464502 | Dataset: /mnt/extracted/data/cv-it/clips/dev.csv
Epoch 5 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /mnt/extracted/data/lingualibre/lingua_libre_Q385-ita-Italian_dev.csv
I Early stop triggered as (for last 4 steps) validation loss: 136.464502 with standard deviation: 8.535361 and mean: 125.111644
I FINISHED optimization in 0:04:52.388711
E While processing /mnt/extracted/data/cv-it/clips/common_voice_it_17894238.wav:
E "ERROR: Your transcripts contain characters (e.g. '#') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."
As It is now during development or debugging but also an usage can be handy to execute the various scripts of the Model generation manually.
https://github.com/mozilla/TTS it is a complete different project but we can release a model also for this.
Hi all,
thanks for the hard-work that you've put in creating this fork. I'm in the process of creating a reduced dictionary to control a robot but I'm having some issues with the trie file generation. I've tried to generate the trie file using the generate_trie script of the 0.7.0a1 native client without success. I've event tried to run it on the lm_binary that comes with the "2020.03.13" release but it still fails to recognize words. This is, for instance, the result if I try to get "cinque sei sette otto nove dieci" recognized:
Obviously everything works fine if I use the lm.binary and trie provided together with the model
The idea is a script that execute the others and at the end generate an unique txt file and do some sanitization on that like:
Respira.
[^A-Za-z0-9àÈèìéòù,;:'.! ]
The exporter currently has two problems that severy limit its parallelization:
Allineare il repo
I think it is important to define some rules for processing sentences from all importers.
This checks can either be done in the wrapper script or in sanitize.py (this can be more efficient)
My proposal is:
[^\s'abcdefghijklmnopqrstuvwxyzàèéìíòóôùú,\.!?:;]
if a sentence match this regex it contains not valid chars so should be discardedThe discard will be done after trying to clean sentences (like removing trailing dashes or unescaping html)
After discussing in the community (especially @paolo-losi) the model need a better text corpus, right now the available ones have issues with the licensing that we need.
CC0 or public domain, or CC license with commercial support but we want to release just the scripts and not the final dataset to avoid any troubles.
The point is to create a corpus not from wikipedia or encyclopedic, manuals and so on but colloquial resources like chats, discussions/emails, quotes that are more similar to the needs of a voice recognition usage.
So we need to replace https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/it/build_lm.sh#L12 with something else.
So our idea is to generate a static txt file on the fly with a billion words from this kind of text.
We need stuff after 1920+ to get an italian more modern.
For every resource we need to sanitize and cleaning to remove symbols and other stuff not needed.
MITADS
= Mozilla Italia DeepSpeech (we can change codename for the corpus, refer to #65)Considering that the deepspeech model is executed on linux machine we can use Bash but is not so very fast so we have to use Python.
Also this corpus doesn't need to be generated at every model generation but once for all of them by us.
Just write in the readme about how we are releasing the model.
My idea is 2019.2-0.1
, this means using the whole set of scripts with the 2nd CV Italian dataset but generated as 0.1
version because of different testing or other reasons.
What do you think? @astrastefania @mone27
There is an Italian project of Wikipedia pages read in Italian.
The link for all the various audio and pages: https://it.wikipedia.org/wiki/Categoria:Voci_parlate
Some of this recordings are public domain like https://it.wikipedia.org/wiki/File:Itwiki-Barile_(unit%C3%A0_di_misura).ogg
Another problem is that those recordings are of old version of the pages so we need to recover the version read to associate with the recording.
Lists of resources we can implement to add more datasets for DeepSpeech (maybe generate a custom dataset based on Common Voice dataset organization, in the readme there is a sample, or on the fly to avoid license issues):
Check also: #34
Otherwise we can evaluate this tools to generate a dataset based on youtube:
Another solution is to use https://github.com/srinivr/kaldi-long-audio-alignment with the italia model to auto split text+audio in small fragment to speed up.
The most important part is that the data need to be aggregated to avoid license issue, this means that the files need to be all together and is not possible to recreate the original files.
The purpose of this ticket is to find a name to the text corpus we are working on that will be used as reference everywhere, probably also outside this project.
Basically it will be like a brand name, it is important that is easy to reconize the original/author (at least for me).
I personally don't like the MITADS names because it is difficult to pronunce and understand what it meas.
Here i propose some other random ideas:
Looking forward to hear your input
Originally posted by @mone27 in #36 (comment)
I am wondering if we can speed up the scripts for the text corpus generation using python thread.
It is something that we can do when we will have all the script working, so we can hack all of them to split their process on reading and cleaning data.
Our estimation that is can take like 4 hours now.
https://github.com/MozillaItalia/commonvoice-it/blob/master/DeepSpeech/import_lingualibre.sh needs a way to automatically download and unzip https://lingualibre.fr/datasets/Q385-ita-Italian.zip if the file doesn't exists like for https://github.com/MozillaItalia/commonvoice-it/blob/master/DeepSpeech/import_cvit.sh
In this way the execution will be faster if we save the text downloaded instead of query everytime the website that has a delay.
deepspeech version 0.5.1 installed with pip in a fresh virtualenv cannot properly
load the italian model. It seem that kenml version used to train the model is more recent
than the version linked to deepspeech 0.5.1
$ deepspeech --model italian/output_graph.pbmm --audio test.wav --lm italian/lm.binary --trie italian/trie --alphabet italian/alphabet.txt
Loading model from file italian/output_graph.pbmm
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.1-0-g4b29b78
2019-11-13 11:02:34.378997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-13 11:02:34.387460: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-11-13 11:02:34.387495: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-11-13 11:02:34.387508: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-11-13 11:02:34.387609: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Loaded model in 0.0109s.
Loading language model from files italian/lm.binary italian/trie
Error: Trie file version mismatch (4 instead of expected 3). Update your trie file.
Loaded language model in 0.0439s.
Running inference.
Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state
Segmentation fault (core dumped)
We need to migrate all the bash scripts and docker file replacing the French references from file to parameters to italian.
Right now localize the readme in that folder in english or italian is not a priority https://github.com/MozillaItalia/commonvoice-it/tree/master/DeepSpeech
The scripts in that fodler download various packages from other resources like lingualibre to add more data to the model generation, package and generate the model for deepspeech.
Until this is not done we cannot generate the model for italian to use with deepspeech.
The pr to add to this:
As title we need to explain better the new scripts at https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/MITADS/README.md
actually in build_lm.sh there is:
curl -sSL https://github.com/Common-Voice/commonvoice-fr/releases/download/lm-0.1/wiki.txt.xz | pixz -d | tr '[:upper:]' '[:lower:]' > wiki_it_lower.txt
Pointing at french version of wiki
If can be helpfull i have used:
https://raw.githubusercontent.com/alesarrett/CostituzioneItaliana/master/costituzione.txt
edit 2: edit: change batch size to 128 nevermind, it crashes
I think is better to define a training pipeline as the official Deepspeech releases state.
We dont have the same amount of hours and videocards as DeepSpeech guys so lets start with 0.6 version hyperparameters.
I was thinking to some kind of pipelines to apply to a training-from-scratch model or starting from a pretrained checkpoint (transfer learning). What do you think?
generate the scorer with LM_ALPHA and LM_BETA = 0
EPOCHS=30
BATCH_SIZE=64
N_HIDDEN=2048
LEARNING_RATE=0.0001
DROPOUT=0.4
EARLY_STOP
ES_EPOCHS (early stop after)=10
MAX_TO_KEEP=3 (we can keep more checkpoint when we will have more disk space)
DROP_SOURCE_LAYERS=1 (if using transfer learning)
USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)
or:
generate the scorer with LM_ALPHA and LM_BETA = 0
EPOCHS=100
BATCH_SIZE=64
N_HIDDEN=2048
LEARNING_RATE=0.0001
DROPOUT=0.4
EARLY_STOP
ES_EPOCHS (early stop after)=25 (default value)
MAX_TO_KEEP=3
REDUCE_LR_ON_PLATEAU=1 (when learning got stuck, LR will be reduced)
PLATEAU_EPOCHS=10 (default,number of epochs to consider for RLROP. Smaller than ES_EPOCHS)
DROP_SOURCE_LAYERS=1 (if using transfer learning)
USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)
Ref: http://www.clips.unina.it/it/index.jsp
Tasks:
We need to parse the txt of every recording to generate a unique CSV and package this csv with all the wav and remove the rest of the files.
New package name Clips-Mitads, just as reference.
wav_filename,wav_filesize,transcript
common_voice_it_19574474.wav,175148,ben degna di ammirazione
common_voice_it_19574387.wav,291884,noi possiamo benissimo non ritrovarci in quello che facciamo
Scripts unfinished: https://gist.github.com/Mte90/116e5d8a17973b7bd9bd9050662736dd
We want to document the variables in https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/Dockerfile_it.train and also the 4 env files we provide now.
As now there are some variables not very clear, others was just added by us like only_export
so @lissyx can help us:
english_compatible
amp
max_to_keep
n_hidden
lm_beta
lm_alpha
These are some problems that I've found on looking at ted_importer.py output. I'll write down from the most serious, at least for me :)
code issues:
output issues:
symbols: ♪ ; ♫ ; T ∇ Sτ ; E=mc² ; 31¼%
html escape: (sanitize.py escapehtml() could be useful):
&
"
Nessuno aveva mai studiato l'�involucro
L'�immagine alle mie spalle mostra
Ed ecco come userò il mio premio di 10
00 dollari
falo´
È un infuso creato
E'il drone
E'l'applicazione
E'stata
realta'virtuale
E'più
L'immagine che venne un po'dopo aveva una spiegazione semplice
La natura nel senso del IXX secolo, giusto
VV: Sì, tre persone sono scese sul fondo dell'Oceano Pacifico
and about this last one: lot of sentences starts with 2 letter and then ":"
AC: Se dimagrisci un po'
AG: Ci sono certamente implicazioni tecniche
ZF: Scendi tu dal palco
Missing files:
Originally posted by @alex179ohm in #2 (comment)
The idea is to add audio books (not about poetry/prose) written after 1930 from Librivox (are released CC0) and that the author is died since 70 years.
Also recording more then 1 hour.
We can accept books also released before but with a modern italian.
The big point is we need also an importer for deepspeech of that like for https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/it/import_m-ailabs.sh.
List (books from D'annunzio. Pascoli, Pirandello, Verga):
We need to generate a dataset for https://www.comune.borgomanero.no.it/audio/audio.aspx
In order to simplify development of the italian model I propose to remove the lingualibre dataset.
The reasons are that lingualibre in Italian is only 4 minutes long, so is not so useful to improve the dataset.
The main issue as pointed out in #17 is that the max test batch size is 16 due to the small lingualibre dataset, but I have not found an easy way to specify it correctly in the docker image (make test_batch_size and batch_size different). Moreover it would force to use a not so optimal test batch for the other dataset.
There is dataset where the same Italian text is read in various different way: www.mspkacorpus.it/
We probably need to do an adaptor like https://github.com/MozillaItalia/DeepSpeech-Italian-Model/blob/master/DeepSpeech/it/import_m-ailabs.sh
Found those issues:
e` a` o` i` u`
eg: sai dire il nome della squadra che giochera` contro il Pine`
Those could be fixed by adding some regex rules to mapping_normalization list:
instead removing only the single square brackets:
re.compile('\[.*?\]'), u'']
and then:
re.compile('a`'), u'à']
re.compile('u`'), u'ù']
re.compile('i`'), u'ì']
re.compile('o`'), u'ò']
and about e`
:
re.compile('perche`'), u'perché']
re.compile(' ne`'), u'né']
.. list of other words that needs é instead of è...
re.compile(' e`'), u'è']
I tried this code:
but it failed, i resolved following this:
There are different scripts in CommonVoice-Data that are used to download stuff in CC0 and test the model generated:
We need to replace because they doesn't exist in italian so we can create similar aggregator from:
In https://github.com/MozillaItalia/commonvoice-it/blob/master/DeepSpeech/import_cvit.sh the script looks for the dataset.
It is more useful to check and in case is not avalaible to download the package from the url https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-3/it.tar.gz
We need also to check the sha1 and update it in the script.
As written at #45 (comment)
Require a computer with nvidia card
Ciao a tutti,
vorrei sapere se c'e'un forum, chat o altro canale ufficiale o non ufficiale dove discutere dello sviluppo di DeepSpeech italiano e dare/avere supporto durante il training o lo sviluppo di un language model custom.
Greazie
It would be great if the instructions in the README were dumb-proof.
I just tried to follow them and the results were nonsensical.
It may clearly be due to error on our side or the environment (WSL) but looking at the release, I suspect that some data is missing (I just followed strictly what's on the README).
This is what happens with parlareitaliano
:
...17066%, 0 MB, 58052 KB/s, 0 seconds passedDownloading in ./parsing/parlareitaliano/b01f001f.hsw
...12047%, 0 MB, 45343 KB/s, 0 seconds passedDownloading in ./parsing/parlareitaliano/b01f003f.hsw
...24824%, 0 MB, 59074 KB/s, 0 seconds passedDownloading in ./parsing/parlareitaliano/b01f005f.hsw
We need to test all the importers for their status bar because sometimes don't works flawlessy.
We have the issue that the text corpus include roman numbers but we need to convert those as usual numbers but also to spot fake positives and so on.
We need a way to detect roman numbers and not other text that include that letters.
We need to migrate https://github.com/MozillaItalia/commonvoice-it/blob/master/CommonVoice-Data/names.py
We need a list of generic names of streets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.