rasbt / llms-from-scratch Goto Github PK
View Code? Open in Web Editor NEWImplementing a ChatGPT-like LLM from scratch, step by step
Home Page: https://www.manning.com/books/build-a-large-language-model-from-scratch
License: Other
Implementing a ChatGPT-like LLM from scratch, step by step
Home Page: https://www.manning.com/books/build-a-large-language-model-from-scratch
License: Other
Hi @rasbt,
I found that implementation of the MultiHeadAttention
class has the following line:
mask_unsqueezed = mask_bool.unsqueeze(0).unsqueeze(0)
But there is only one unsqueeze operation in the notebook:
mask_unsqueezed = mask_bool.unsqueeze(0)
But as I understand we can skip unsqueeze operation at all because masked_fill_()
method supports broadcasting
Thank you.
Hi @rasbt,
I found the following statement in the mentioned section:
Figure 3.24 illustrates the structure of a multi-head attention module, which consists of
multiple single-head attention modules, as previously depicted in Figure 3.24, stacked on
top of each other.
Did you mean Figure 3.18 in the second case?
Thank you.
Hi @rasbt,
There is the following description in this section:
Previously, we have seen how to convert a single token ID into a three-dimensional
embedding vector. Let's now apply that to all four input IDs we defined earlier(torch.tensor([5, 1, 3, 2]))
:
But probably there is a typo in the notebook and you specified only 3 tokens for the same code (after cell [47]):
To embed all three input_ids values above, we do
Thank you.
Thanks for the great work. I have several questions about class MHAPyTorchScaledDotProduct
in mha-implementations.ipynb:
class MHAPyTorchScaledDotProduct(nn.Module):
def __init__(self, d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False):
super().__init__()
assert d_out % num_heads == 0, "embed_dim is indivisible by num_heads"
self.num_heads = num_heads
self.context_length = context_length
self.head_dim = d_out // num_heads
self.d_out = d_out
self.qkv = nn.Linear(d_in, 3 * d_out, bias=qkv_bias)
self.proj = nn.Linear(d_in, d_out)
self.dropout = dropout
self.register_buffer(
"mask", torch.triu(torch.ones(context_length, context_length), diagonal=1)
)
def forward(self, x):
batch_size, num_tokens, embed_dim = x.shape
# (b, num_tokens, embed_dim) --> (b, num_tokens, 3 * embed_dim)
qkv = self.qkv(x)
# (b, num_tokens, 3 * embed_dim) --> (b, num_tokens, 3, num_heads, head_dim)
qkv = qkv.reshape(batch_size, num_tokens, 3, self.num_heads, self.head_dim)
# (b, num_tokens, 3, num_heads, head_dim) --> (3, b, num_heads, num_tokens, head_dim)
qkv = qkv.permute(2, 0, 3, 1, 4)
# (3, b, num_heads, num_tokens, head_dim) -> 3 times (b, num_heads, num_tokens, head_dim)
queries, keys, values = qkv.unbind(0)
use_dropout = 0. if not self.training else self.dropout
context_vec = nn.functional.scaled_dot_product_attention(
queries, keys, values, attn_mask=None, dropout_p=use_dropout, is_causal=True)
# Combine heads, where self.d_out = self.num_heads * self.head_dim
context_vec = context_vec.transpose(1, 2).contiguous().view(batch_size, num_tokens, self.d_out)
return context_vec
.reshape()
or .view()
? # (b, num_tokens, 3 * embed_dim) --> (b, num_tokens, 3, num_heads, head_dim)
qkv = qkv.reshape(batch_size, num_tokens, 3, self.num_heads, self.head_dim)
# (b, num_tokens, 3 * embed_dim) --> (b, num_tokens, 3, num_heads, head_dim)
qkv = qkv.view(batch_size, num_tokens, 3, self.num_heads, self.head_dim)
.unbind(0)
is not necessary (the shape of queries, keys, values does not change without it), is it a speed concern? # (3, b, num_heads, num_tokens, head_dim) -> 3 times (b, num_heads, num_tokens, head_dim)
queries, keys, values = qkv.unbind(0)
# (3, b, num_heads, num_tokens, head_dim) -> 3 times (b, num_heads, num_tokens, head_dim)
queries, keys, values = qkv
self.proj()
is missing at the end: # Combine heads, where self.d_out = self.num_heads * self.head_dim
context_vec = context_vec.transpose(1, 2).contiguous().view(batch_size, num_tokens, self.d_out)
return context_vec
# Combine heads, where self.d_out = self.num_heads * self.head_dim
context_vec = context_vec.transpose(1, 2).contiguous().view(batch_size, num_tokens, self.d_out)
context_vec = self.proj(context_vec)
return context_vec
.reshape()
or .view()
? # Combine heads, where self.d_out = self.num_heads * self.head_dim
context_vec = context_vec.transpose(1, 2).contiguous().view(batch_size, num_tokens, self.d_out)
return context_vec
Hi @rasbt,
Could you please clarify this sentence:
In fact, the BPE tokenizer that was used to train models such as GPT-2, GPT-3,
and ChatGPT has a total vocabulary size of 50,257, with <|endoftext|> being assigned
the largest token ID.
Which model do you mean by 'ChatGPT'?
I saw different definitions of this term and based on this definitions there are different vocabulary sizes:
Thank you.
In the notebook ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb
, the parameter d_out
is not divided by num_heads
. As a result, the shape differs from other implementations: [8, 1024, 9216] versus [8, 1024, 768]. Additionally, the implementation lacks the final projection.
It is correctly implemented in ch03\01_main-chapter-code\multihead-attention.ipynb
cell 6 and 7.
This inconsistency leads to a significant performance gap in the subsequent cells.
Hi @rasbt,
There is a cell [28] in this notebook where there is an output but no variable to output is specified (probably it was linear.weight
which was deleted after cell execution):
torch.manual_seed(123)
linear = torch.nn.Linear(num_idx, out_dim, bias=False)
---
Parameter containing:
tensor([[-0.2039, 0.0166, -0.2483, 0.1886],
[-0.4260, 0.3665, -0.3634, -0.3975],
[-0.3159, 0.2264, -0.1847, 0.1871],
[-0.4244, -0.3034, -0.1836, -0.0983],
[-0.3814, 0.3274, -0.1179, 0.1605]], requires_grad=True)
Thank you.
FileNotFoundError occured when trying to instantiate the bpe_openai_gpt2 as following
--------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[20], line 1
----> 1 orig_tokenizer = get_encoder(model_name="gpt2", models_dir=".")
File ~/localdev/python/LLMs-from-scratch/ch02/02_bonus_bytepair-encoder/bpe_openai_gpt2.py:140, in get_encoder(model_name, models_dir)
139 def get_encoder(model_name, models_dir):
--> 140 with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
141 encoder = json.load(f)
142 with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './gpt2/encoder.json'
Dear Dr. Sebastian Raschka,
Greetings! I am a researcher passionate about machine learning and artificial intelligence. As a native Chinese speaker, I would like to extend my deepest respect and gratitude for the open-source repository of "Build a Large Language Model From Scratch" that you have made available on GitHub. This book is not only comprehensive and beautifully illustrated but also organized in such a manner that beginners like myself find it both intuitive and easy to understand. Your work showcases profound expertise while being incredibly accessible to newcomers, from which I have greatly benefited.
Above all, I am inspired by your passion for AI and open-source software. Motivated by this passion, I have embarked on a project to translate your book and its associated code into Chinese. This effort aims to assist Chinese-speaking learners, like me, in better understanding the process of building large language models. To date, I have completed the translation of the first four chapters. During this process, I have made a concerted effort to clarify any contextual differences and added some foundational knowledge to help beginners grasp the material more effectively.
I am eager to contribute my translated version to the project and wonder if it would be possible to do so by including a link to my forked version in the official GitHub repository's readme or through another method you deem appropriate. My forked version is located at Intelligence-Manifesto/LLMs-from-scratch, which contains the translation work completed so far.
With this letter, I wish to express not only my admiration and thanks for this invaluable book but also seek your guidance and assistance on how I might integrate my work into this admirable open-source project in a suitable manner. How might I contribute my translation so that more Chinese readers can benefit?
Thank you again for your outstanding work and contributions to the open-source community. I look forward to your response.
Sincerely,
Intelligence-Manifesto
When reading the README.md for this repository, it's not immediately clear what this repository contains or what it is for. I think this should be clarified.
First off, great book!
Second, I noticed a small issue in Section 5.1.1 that stumped me for a bit.
"ctx_len": 256, # Shortened context length (orig: 1024)
If this is set to 1024, the val_loader
will fail to load with the train_ratio
of 0.90
. Adjusting to 0.80
will load the data but the shape is mismatched.
Restoring the ctx_len
to 256 fixes the issue.
I'm curious as to why this is occurring?
Hi @rasbt,
I noticed that when we decode the following encoded sentence:
"It's the last he painted, you know," Mrs. Gisburn said with
pardonable pride.
We will have additional leading spaces at the start of the sentence and after apostrophe in the word It' s
:
"It' s the last he painted, you know," Mrs. Gisburn said with
pardonable pride.
Formally, this does not matter for our case, because we do not take into account spaces, but in general, here we do not precisely restore the original text, right?
Could you please tell if you are interested in such insignificant feedback like this or it is not worth the notes or new issues?
Thank you.
Hello Razbt,
Nice to meet you! I've been enjoying your book so far (LLMs from scratch), but I find the examples hard to follow as some of the tools used do not mention which versions you used. I tried to follow along but packages like tiktoken and pytorch refuse to work, or even get installed. I tried using conda to install environments with both Python 3.9 and 3.10. and both successfully install tiktoken, but fail to import it in the jupyter notebook. The command I ran to attempt installation was pip install tiktoken
.
Can you let me know which version of Python / tiktoken / pytorch you were using? Is there any intermediate step I missed?
I am running Windows 11 and an (non-Nividia) GPU.
hi, @rasbt~
This project is awesome and the tutorial structure is rather clear, I was able to get up and running quickly and I'm learning a lot from it. Really appreciate your work! Would you be interested in having a Chinese version of your project? So that LLM learners from China can refer to your work more efficiently. Maybe I can begin with README-zh.md?
Hi @rasbt,
This notebook contains the following implementaion of CausalAttention:
class CausalAttention(nn.Module):
def __init__(self, d_in, d_out, block_size, dropout, qkv_bias=False):
super().__init__()
self.d_out = d_out
self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
self.dropout = nn.Dropout(dropout) # New
self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1)) # New
def forward(self, x):
b, num_tokens, d_in = x.shape # New batch dimension b
keys = self.W_key(x)
queries = self.W_query(x)
values = self.W_value(x)
attn_scores = queries @ keys.transpose(1, 2) # Changed transpose
attn_scores.masked_fill_( # New, _ ops are in-place
self.mask.bool()[:num_tokens, :num_tokens], -torch.inf)
attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
attn_weights = self.dropout(attn_weights) # New
context_vec = attn_weights @ values
return context_vec
I have a question - why do we need the following 2 lines in the forward()
method implementation:
def forward(self, x):It
b, num_tokens, d_in = x.shape # New batch dimension b
...
attn_scores.masked_fill_( # New, _ ops are in-place
self.mask.bool()[:num_tokens, :num_tokens], -torch.inf)
...
Can we remove the first line and just replace the second line to the following code:
attn_scores.masked_fill_(self.mask.bool(), -torch.inf)
As I understand num_tokens = batch_size
and we provide batch_size
value as the argument, so neither calculating x.shape
nor indexing [:num_tokens, :num_tokens]
is required.
Is it correct?
Thank you.
Hi @rasbt - very much enjoying your book! Just a heads up about a difference between the book and repo I found. Results in the same value and code in the repo is what I expected. Screenshot attached. I think d_k = keys.shape[1]
.
In https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py#L95
file, the '<|endoftext|>' symbol always appear at val_data_set, and train_data_set always not contains it.
I have an issue running pretraining_simple.py. I have downloaded ca. 50% of the files from Project Gutenberg via the gutenberg repo and then ran your scripts:
The text data preparation works fine so far:
prepare_dataset.py
root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python prepare_dataset.py
--data_dir gutenberg/data
--max_size_mb 500
--output_dir gutenberg_preprocessed
16697 file(s) to process.
But when trying to train the model, it comes to a shape mismatch. It seems like the data will not be trained batch-wise:
pretraining_simple.py
root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python pretraining_simple.py --
data_dir "gutenberg_preprocessed" --n_epochs 1 --batch_size 4 --output_dir model_checkpoints
Total files: 16
Tokenizing file 1 of 16: gutenberg_preprocessed/combined_1.txt
Training ...
Traceback (most recent call last):
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py", line 200, in
train_losses, val_losses, tokens_seen = train_model_simple(
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py", line 110, in train_model_simple
loss = calc_loss_batch(input_batch, target_batch, model, device)
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/previous_chapters.py", line 247, in calc_loss_batch
loss = torch.nn.functional.cross_entropy(logits.flatten(0, -1), target_batch.flatten())
File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: size mismatch (got input: [205852672], target: [4096])
I believe the issue comes from the flatten func. In calc_loss_batch()
in previous_chapters.py, what do you think about exchanging flatten()
with using view()
?
loss = torch.nn.functional.cross_entropy(logits.view(-1, logits.size(-1)), target_batch.view(-1))
Please double-check if this idea and output is correct.
I have run the updated script locally on my RTX 3080 Ti, the output is:
root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python pretraining_simple.py --data_dir "gutenberg_preprocessed" --n_epochs 1 --batch_size 4 --output_dir model_checkpoints
Total files: 16
Tokenizing file 1 of 16: gutenberg_preprocessed/combined_1.txt
Training ...
Ep 1 (Step 0): Train loss 9.952, Val loss 9.663
Every effort moves you
Ep 1 (Step 100): Train loss 6.567, Val loss 6.906
Ep 1 (Step 200): Train loss 6.468, Val loss 6.637
Ep 1 (Step 300): Train loss 6.170, Val loss 6.578
Ep 1 (Step 400): Train loss 5.560, Val loss 6.485
Ep 1 (Step 500): Train loss 5.874, Val loss 6.381
Ep 1 (Step 600): Train loss 5.481, Val loss 6.449
Ep 1 (Step 700): Train loss 5.620, Val loss 6.314
...
"dataloader = create_dataloader_v1(raw_text, batch_size=8, max_length=4, stride=5, shuffle=False)\n",
This code does skip one word, which is different to the text in the book saying we do not skip a word and do not overlap. stride=4
make it consistent with the book.
Hi @rasbt,
I found that in the latest book version (v5) there is an incorrect code output in the section "2.2 Tokenizing text":
result = re.split(r'([,.]|\s)', text) print(result)We can see that the words and punctuation characters are now separate list entries just
as we wanted:['Hello', ',', '', ' ', 'world.', ' ', 'This', ',', '', ' ', 'is', ' ', 'a', ' ', 'test.']
and
The resulting whitespace-free output looks like as follows:
['Hello', ',', 'world.', 'This', ',', 'is', 'a', 'test.']
But if we execute provided notebook, the output is correct.
P.S. It is a great pleasure to explore your next new book, especially about LLMs, thank you! :)
Thank you.
Hi,
Can you please add a requirements.txt
to the repo as well (to set the environment for book in one go, without needing to install every package manually)?
Hi @rasbt,
I tried to run your DDP script and found that there is an error while executing this script "as-is":
PyTorch version: 2.2.1+cu121
CUDA available: True
Number of GPUs available: 2
Traceback (most recent call last):
File "/home/user/app/DDP-script.py", line 178, in <module>
mp.spawn(main, args=(world_size, num_epochs), nprocs=world_size)
File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 241, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 158, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/home/user/app/DDP-script.py", line 128, in main
features, labels = features.to(rank), labels.to(rank) # New: use rank
AttributeError: 'int' object has no attribute 'to'
The reason is the following incorrect line:
for features, labels in enumerate(train_loader):
which should be like that:
for idx, (features, labels) in enumerate(train_loader):
or like that (because idx
was not used):
for features, labels in train_loader:
Thank you.
Hi @rasbt,
I found that solution to the Excercise 2.1 already exists also in the notebook with the main code (section "Experiments with unknown words")
Thank you.
Hi @rasbt,
I am trying to explore and reproduce Chapter 3 and found that I can't reproduce results that you specified in the notebook and the book, even if I download notebook and run without any changes.
The difference appears only starting with the following 2 cells (I haven't checked the next cells yet):
Cell [31]
torch.manual_seed(123)
dropout = torch.nn.Dropout(0.5) # dropout rate of 50%
example = torch.ones(6, 6) # create a matrix of ones
print(dropout(example))
Your output
tensor([[2., 2., 0., 2., 2., 0.],
[0., 0., 0., 2., 0., 2.],
[2., 2., 2., 2., 0., 2.],
[0., 2., 2., 0., 0., 2.],
[0., 2., 0., 2., 0., 2.],
[0., 2., 2., 2., 2., 0.]])
My output
tensor([[2., 2., 2., 2., 2., 2.],
[0., 2., 0., 0., 0., 0.],
[0., 0., 2., 0., 2., 0.],
[2., 2., 0., 0., 0., 2.],
[2., 0., 0., 0., 0., 2.],
[0., 2., 0., 0., 0., 0.]])
Cell [32]
torch.manual_seed(123)
print(dropout(attn_weights))
Your output
tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.7599, 0.6194, 0.6206, 0.0000, 0.0000, 0.0000],
[0.0000, 0.4921, 0.4925, 0.0000, 0.0000, 0.0000],
[0.0000, 0.3966, 0.0000, 0.3775, 0.0000, 0.0000],
[0.0000, 0.3327, 0.3331, 0.3084, 0.3331, 0.0000]],
grad_fn=<MulBackward0>)
My output
tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.8966, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.6206, 0.0000, 0.0000, 0.0000],
[0.5517, 0.4921, 0.0000, 0.0000, 0.0000, 0.0000],
[0.4350, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.3327, 0.0000, 0.0000, 0.0000, 0.0000]],
grad_fn=<MulBackward0>)
Thank you.
This book is a wonderful read, just wanted to submit one small comment on the notebooks which could just be personal learning style. It's nice to have to run the actual notebook to get the output so block-by-block it's easier to focus on that without being distracted with the output already rendered. So maybe there could 2 notebooks per chapter, a clean one and a completed one? In the meantime I'm just using nbstripeout
locally but wanted to pass along the feedback.
hi @rasbt,what a amazing job. But the definition of stride confuses me as follow:
We use a sliding window approach where we slide the window one word at a time (this is also known as stride=1):
An example using stride equal to the context length (here: 4) as shown below:
I think stride is the separation distance between two inputs. In fig 1, two inputs The distance between the two inputs is actually four words. But now,stride marks the distance between input and target.
hi @rasbt : fantastic work - and code which is clean and readable.
One small feedback / issue, I noticed with the "early access book" is that in chapter 3 , the manual seed of 789 is missing - which is what brought my here :)
By convention, the unnormalized attention weights are referred to as "attention scores" whereas the normalized attention scores, which sum to 1, are referred to as "attention weights"
The attention weights and context vector calculation are summarized in the figure below:
In 3.3.1, there seems to be a missing image between "The attention weights and context vector calculation are summarized in the figure below:" and "The code below walks through the figure above step by step."
Perhaps the sentence needs to be modified
Hi @rasbt,
There is a probably typo in the description of torch.arange()
function here:
As shown in the preceding code example, the input to the pos_embeddings is usually a
placeholder vector torch.arange(block_size), which contains a sequence of numbers
1, 2, ..., up to the maximum input length.
I think you mean the range 0, 1, ..., up to the maximum input length - 1?
Thank you.
i just check out the code of appendix-A/01_main-chapter-code /DDP-script.py,how about adding
from torch.profiler import profile
with profile() as prof:
#the main function training code
if rank == 0:
print("exporting trace")
prof.export_chrome_trace("trace_ddp_simple.json")
than we can see the tracing profiling json file in google Chrome
Hi @rasbt,
I noticed that in the book you provide the following code with function name create_dataloader
and the argument stride = max_length + 1
to avoid overlap in data even for targets:
dataloader = create_dataloader(raw_text, batch_size=8, max_length=4,
stride=5)
data_iter = iter(dataloader)
inputs, targets = next(data_iter)
print("Inputs:\n", inputs)
print("\nTargets:\n", targets)
But in the cell of the jupyter notebook with main code (cell [43]) and jupyter notebook with only dataloader (cell [2]) you use function with name create_dataloader_v1
and argument stride = max_length
.
Could you please tell do I understand correctly that we need to use stride = max_length + 1
to avoid overfitting? Does the overlap in target (when stride = max_length
) seriously increase the risk of overfitting?
Thank you.
Hi @rasbt,
I don't know if packages from the notebooks with bonus materials like this notebook with tokenizers comparison are intended to be included in requirements.txt, but there are 2 missing libraries:
tqdm
(which is required by import from bpe_openai_gpt2 import get_encoder, download_vocab
)transformers
To simplify the work with the control of the libraries used for this project I use poetry which is great to track all explicit and implicit dependencies, so if you want I can send you my configuration for it.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.