Comments (29)
I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.
from ragatouille.
Hey @timothepearce, thanks for flagging! I believe this is a very separate problem (the multiprocessing in your case runs fine, but there seems to be another problem). Could you create a new issue so I can look into it a bit more? And could you try out the notebooks in examples/ ? I think there might be something wrong with the README, which is (probably) that there aren't enough documents in the example (which I could fix by adopting a separate logic for n_docs that are far too small).
from ragatouille.
@bclavie You're right, the code doesn't work either in the Python CLI, and seems related to the ColBERT library.
I'll open a new issue and dig a little bit more.
from ragatouille.
Hey @vanetreg, for your other issue, the partial init -- no idea what's going on there, it seems like something weird happened when initialising ntlk?
I've tested some things on my end and I can confirm this is due to how ColBERT does multiprocessing, which causes the issue in some environments (seemingly Colab and Windows 10). This will eventually be fixed once the multiprocessing handling is changed upstream but sadly there doesn't seem to be a good in-notebook workaround on those two platforms at the moment.
If you use RAGatouille in a python script (making sure to have it inside if __name__ == "__main__":
), it should hopefully run fine (though again, not tested on Windows)!
from ragatouille.
I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.
@jponline77 I tested it both in VSC and Cursor, in both WSL extension installed. Maybe Windows version (10 / 11 ) matters?
Yeah, maybe it's a Windows 10 issue. Just be sure, if you are using WSL, that it's actually running in WSL. If you are setup to run in WSL, then you should be able to try to run it command line from WSL directly without using VSC or Cursor. My experience with WSL is that it runs everything that runs in Ubuntu in a very similar way as if it was a standalone Linux system. So, it would surprise me a little if it matters if you are Windows 10 or 11. That said, any reason you aren't interested in upgrading to 11? I've now got RAGatouille running on two different systems with Windows 11 and WSL. One was a Laptop with a low end integrated GPU and 16GB of memory. It did take 10 minutes to index a small file but it worked.
from ragatouille.
Hey, thanks for this @jponline77 -- indexing is slow sadly, taking a while to create the index is the tradeoff to querying very large corpuses at near-constant time. It can maybe be optimised though (that'd require work on the upstream ColBERT repo), but that's something for the future!
I'm working on a feature to do index-free search, it's not very scaleable, at least at the moment (you could query maybe up to 1k documents in >1s on a T4 GPU, and obviously much slower every time you add something) but for smaller corpuses it will make it easy to try it out!
@vanetreg I think (not sure) you could try it out in a standalone script like I mentioned earlier? Wrap it in if __name__ == "__main__":
... It's not ideal for interactivity but it could work! (At least it does on every non-windows platform I've tried). Anyhow, the Mac Mini is an excellent choice 😄
from ragatouille.
Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51.
This is likely to fix the indexing problems on Windows (or at least, one of the problems). Please let me know if the latest version of RAGatouille fixes it for you!
from ragatouille.
The CUBLAS errors turned out to be faiss
incompatible driver issues for most people. This should be fixed by the new experimental default indexing in 0.0.8, which skips using faiss
(does K-means in pure pytorch) as long as you're indexing fewer than ~100k documents!
from ragatouille.
Hi @vanetreg, sorry about that! I believe this is related to the issue making it not work on Google Colab -- everything is currently multiprocessing even when it doesn't need to be, and it hangs in certain environments (outside __main__
in scripts and in some notebook environments like colab).
I don't have a windows machine to try this, but it might be the Windows + Cursor combo. We'll be looking at fixing this shortly (cc @okhat)
from ragatouille.
I think indexing and search should definitely work on colab?
https://colab.research.google.com/github/stanford-futuredata/ColBERT/blob/main/docs/intro2new.ipynb
from ragatouille.
I did notice that it works on the main repo, but doesn't with RAGatouille, must be how we handle the Run... I need to track down exactly why, but it actually hangs:
https://colab.research.google.com/drive/1S3s_5FUjzjOCxuwRhdfEdrZoa2LcEsME?usp=sharing
from ragatouille.
Same issue here, same environment (even Cursor!)
from ragatouille.
@bclavie @okhat
pls. note I've never referred to (Google) Colab.
I tested again and while executing first cell:
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
I got:
d:\***\ragatouille-venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
so after installing ipywidgets (requirements?!) and restarting Cursor, now without the above written warning,
the index creation cell's run should again be stopped after 10+ mins, having this note:
[Jan 05, 23:16:42] #> Note: Output directory .ragatouille/colbert\indexes/Miyazaki already exists
#> Starting...
After trying to run next cell:
k = 3 # How many documents you want to retrieve, defaults to 10, we set it to 3 here for readability
results = RAG.search(query="What animation studio did Miyazaki found?", k=k)
results
I got error:
NameError Traceback (most recent call last)
[d:\Projects\AI_testing\RAGatouille\01-basic_indexing_and_search.ipynb](file:///D:/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb) Cell 14 line 2
[1](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=0) k = 3 # How many documents you want to retrieve, defaults to 10, we set it to 3 here for readability
----> [2](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=1) results = RAG.search(query="What animation studio did Miyazaki found?", k=k)
[3](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=2) results
NameError: name 'RAG' is not defined
I've always checked every each cells execution timestamp, so all previous cells (especially where RAG is defined) run without errors this time.
from ragatouille.
Hey, thanks for confirming @MikeRenwick-ICG and shining some light on it being a Windows (non-WSL) issue @jponline77.
@vanetreg While you're not using Google Colab, this is definitely the same multiprocessing issue that's causing it to hang in colab. I believe the issue you're seeing is still the same problem -- RAG isn't defined because the previous cell never actually ran and just timed out. The likely issue is identified (#13), I'll ping you when we get a fix out for it!
so after installing ipywidgets (requirements?!)
d:\***\ragatouille-venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
This is a bit of an annoying warning, but it doesn't negatively impact running anything. To avoid overloading the lib with dependencies one wouldn't use outside a notebook, we don't generally add ipython/notebook related dependencies to requirements, but definitely do install it if you're going to be running notebooks a lot!
from ragatouille.
@bclavie
don't you use colab and Jupyter notebook expressions interchangeably? :)
Today I tested again this and it really must be Windows related:
after switch on PC, restarting Windows, the first cell, where RAG is defined, dropped the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[d:\Projects\AI_testing\RAGatouille\01-basic_indexing_and_search.ipynb](file:///D:/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb) Cell 2 line 1
----> [1](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#W1sZmlsZQ%3D%3D?line=0) from ragatouille import RAGPretrainedModel
[3](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#W1sZmlsZQ%3D%3D?line=2) RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\__init__.py:2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:2)
[1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:1) __version__ = "0.0.1c"
----> [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:2) from .RAGPretrainedModel import RAGPretrainedModel
[3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:3) from .RAGTrainer import RAGTrainer
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:5) __all__ = ["RAGPretrainedModel", "RAGTrainer"]
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\RAGPretrainedModel.py:3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:3)
[1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:1) from typing import Callable, Optional, Union
[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:2) from pathlib import Path
----> [3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:3) from ragatouille.data.corpus_processor import CorpusProcessor
[4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:4) from ragatouille.data.preprocessors import llama_index_sentence_splitter
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:5) from ragatouille.models import LateInteractionModel, ColBERT
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\__init__.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:1) from .corpus_processor import CorpusProcessor
[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:2) from .preprocessors import llama_index_sentence_splitter
[3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:3) from .training_data_processor import TrainingDataProcessor
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\corpus_processor.py:2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:2)
[1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:1) from typing import Callable, Optional, Union
----> [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:2) from ragatouille.data.preprocessors import llama_index_sentence_splitter
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:5) class CorpusProcessor:
[6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:6) def __init__(
[7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:7) self,
[8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:8) document_splitter_fn: Optional[Callable] = llama_index_sentence_splitter,
[9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:9) preprocessing_fn: Optional[Union[Callable, list[Callable]]] = None,
[10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:10) ):
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\preprocessors.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:1) from llama_index import Document
[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:2) from llama_index.text_splitter import SentenceSplitter
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:5) def llama_index_sentence_splitter(documents: list[str], chunk_size=256):
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\__init__.py:13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:13)
[10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:10) from typing import Callable, Optional
[12](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:12) # import global eval handler
---> [13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:13) from llama_index.callbacks.global_handlers import set_global_handler
[14](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:14) from llama_index.data_structs.struct_type import IndexStructType
[16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:16) # embeddings
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\callbacks\__init__.py:7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:7)
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:5) from .open_inference_callback import OpenInferenceCallbackHandler
[6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:6) from .schema import CBEvent, CBEventType, EventPayload
----> [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:7) from .token_counting import TokenCountingHandler
[8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:8) from .utils import trace_method
[9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:9) from .wandb_callback import WandbCallbackHandler
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\callbacks\token_counting.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:6)
[4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:4) from llama_index.callbacks.base_handler import BaseCallbackHandler
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:5) from llama_index.callbacks.schema import CBEventType, EventPayload
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:6) from llama_index.utilities.token_counting import TokenCounter
[7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:7) from llama_index.utils import get_tokenizer
[10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:10) @dataclass
[11](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:11) class TokenCountingEvent:
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utilities\token_counting.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:6)
[1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:1) # Modified from:
[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:2) # https://github.com/nyno-ai/openai-token-counter
[4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:4) from typing import Any, Callable, Dict, List, Optional
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:6) from llama_index.llms import ChatMessage, MessageRole
[7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:7) from llama_index.utils import get_tokenizer
[10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:10) class TokenCounter:
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\__init__.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:1) from llama_index.llms.ai21 import AI[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:2)1
2 from llama_index.llms.anthropic import Anthropic
[3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:3) from llama_index.llms.anyscale import Anyscale
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\ai21.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:6)
[4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:4) from llama_index.callbacks import CallbackManager
[5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:5) from llama_index.llms.ai21_utils import ai21_model_to_context_size
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:6) from llama_index.llms.base import llm_chat_callback, llm_completion_callback
[7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:7) from llama_index.llms.custom import CustomLLM
[8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:8) from llama_index.llms.generic_utils import (
[9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:9) completion_to_chat_decorator,
[10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:10) get_from_param_or_env,
[11](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:11) )
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\base.py:25](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:25)
[14](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:14) from llama_index.callbacks import CallbackManager, CBEventType, EventPayload
[15](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:15) from llama_index.llms.types import (
[16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:16) ChatMessage,
[17](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:17) ChatResponse,
(...)
[23](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:23) LLMMetadata,
[24](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:24) )
---> [25](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:25) from llama_index.schema import BaseComponent
[28](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:28) def llm_chat_callback() -> Callable:
[29](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:29) def wrap(f: Callable) -> Callable:
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\schema.py:16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:16)
[13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:13) from typing_extensions import Self
[15](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:15) from llama_index.bridge.pydantic import BaseModel, Field, root_validator
---> [16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:16) from llama_index.utils import SAMPLE_TEXT, truncate_text
[18](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:18) if TYPE_CHECKING:
[19](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:19) from haystack.schema import Document as HaystackDocument
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utils.py:89](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:89)
[85](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:85) self._stopwords = stopwords.words("english")
[86](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:86) return self._stopwords
---> [89](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:89) globals_helper = GlobalsHelper()
[92](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:92) # Global Tokenizer
[93](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:93) @runtime_checkable
[94](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:94) class Tokenizer(Protocol):
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utils.py:45](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:45), in GlobalsHelper.__init__(self)
[43](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:43) def __init__(self) -> None:
[44](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:44) """Initialize NLTK stopwords and punkt."""
---> [45](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:45) import nltk
[47](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:47) self._nltk_data_dir = os.environ.get(
[48](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:48) "NLTK_DATA",
[49](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:49) os.path.join(
(...)
[52](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:52) ),
[53](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:53) )
[55](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:55) if self._nltk_data_dir not in nltk.data.path:
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\__init__.py:180](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:180)
[177](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:177) else:
[178](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:178) from nltk import cluster
--> [180](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:180) from nltk.downloader import download, download_shell
[182](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:182) try:
[183](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:183) import tkinter
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:2479](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2479)
[2469](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2469) pass
[2472](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2472) ######################################################################
[2473](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2473) # Main:
[2474](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2474) ######################################################################
(...)
[2477](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2477)
[2478](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2478) # Aliases
-> [2479](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2479) _downloader = Downloader()
[2480](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2480) download = _downloader.download
[2483](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2483) def download_shell():
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:515](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:515), in Downloader.__init__(self, server_index_url, download_dir)
[513](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:513) # decide where we're going to save things to.
[514](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:514) if self._download_dir is None:
--> [515](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:515) self._download_dir = self.default_download_dir()
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:1072](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1072), in Downloader.default_download_dir(self)
[1069](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1069) # Check if we have sufficient permissions to install in a
[1070](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1070) # variety of system-wide locations.
[1071](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1071) for nltkdir in nltk.data.path:
-> [1072](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1072) if os.path.exists(nltkdir) and nltk.internals.is_writable(nltkdir):
[1073](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1073) return nltkdir
[1075](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1075) # On Windows, use %APPDATA%
AttributeError: partially initialized module 'nltk' has no attribute 'internals' (most likely due to a circular import)
Note
1075 # On Windows, use %APPDATA%
at the end of error message.
from ragatouille.
I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.
@jponline77
I tested it both in VSC and Cursor, in both WSL extension installed.
Maybe Windows version (10 / 11 ) matters?
from ragatouille.
anyone tried this outside of windows Jupyter? I'm keen to drop this in as a direct replacement for single vector RAG
from ragatouille.
Hi @bclavie,
I'm not sure if the problem is related to Colab, I also have an error using Jupyter locally on my Ubuntu server.
The basic readme.md
example doesn't work and the cell never finish executing.
Here's the code and stacktrace if that helps:
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
my_documents = [
"This is a great excerpt from my wealth of documents",
"Once upon a time, there was a great document"
]
index_path = RAG.index(index_name="my_index", collection=my_documents)
output the following:
[Jan 06, 10:41:35] #> Creating directory .ragatouille/colbert/indexes/my_index
#> Starting...
#> Starting...
nranks = 2 num_gpus = 2 device=1
[Jan 06, 10:41:38] [1] #> Encoding 0 passages..
nranks = 2 num_gpus = 2 device=0
[Jan 06, 10:41:38] [0] #> Encoding 2 passages..
File "/home/np/miniconda3/envs/np-ml/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 101, in setup
avg_doclen_est = self._sample_embeddings(sampled_pids)
File "/home/np/miniconda3/envs/np-ml/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 140, in _sample_embeddings
self.num_sample_embs = torch.tensor([local_sample_embs.size(0)]).cuda()
AttributeError: 'NoneType' object has no attribute 'size'
from ragatouille.
@runonthespot Feel free to try https://github.com/bclavie/RAGatouille/blob/main/examples/01-basic_indexing_and_search.ipynb, it's fully plug-and-play!
from ragatouille.
Hey @bclavie ,
I'm gonna test it on Replit during weekend and be back with the result.
from ragatouille.
Hey @bclavie , I'm gonna test it on Replit during weekend and be back with the result.
Hey @bclavie
after 2 days I found out Replit doesn't handle Jupyter notebooks properly, so I'm not able to test RAGatouille there, so since Google Colab also isn't an option, I should wait for the Windows fix :)
from ragatouille.
@jponline77
I won't upgrade this PC, neither HW or Windows 10, so if I don't find a proper online IDE / runtime ( payed for Replit Core, considering Google Colab Pro ) with optional GPUs, I'll go for an M2 Mac mini :)
from ragatouille.
Hey, thanks for this @jponline77 -- indexing is slow sadly, taking a while to create the index is the tradeoff to querying very large corpuses at near-constant time.
I was actually a little surprised it worked at all on the laptop. Indexing speed was much faster on my RTX4080 system with 128GB of ram :)
from ragatouille.
I'm getting it hanging on wsl2 ubuntu (win 11) as well. In a notebook and as a standalone python script (as well as wrapped in main). been using cuda + pytorch in wsl2 for a long time, first time i've seen this nccl issue pop up, and trying to trace around to where it might be coming from.
Pretty sure it's something to do with nccl, and likely colbert (edit: although the colbert notebook posted by @okhat above works fine).
my best guess so far is https://github.com/stanford-futuredata/ColBERT/blob/03fb1becb30c1d01e83d210ba0c4a25108543809/colbert/utils/distributed.py#L27
edit:
this error after running the RAG.index in example1, as well as any RAG.index function.
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.
Last error:
socketStartConnect: Connect to 11.16.94.50<60757> failed : Software caused connection abort
from ragatouille.
@bclavie
I updated to latest
0.0.4b2
(having Win 10, python 3.11.6)
and after loading dotenv ( setting HF_HUB_DISABLE_SYMLINKS_WARNING=true )
I still have errors while running:
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
having error messages:
CalledProcessError Traceback (most recent call last)
Cell In[3], [line 3](vscode-notebook-cell:?execution_count=3&line=3)
[1](vscode-notebook-cell:?execution_count=3&line=1) from ragatouille import RAGPretrainedModel
----> [3](vscode-notebook-cell:?execution_count=3&line=3) RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
...
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\torch\utils\cpp_extension.py:2382](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2382), in _write_ninja_file(path, cflags, post_cflags, cuda_cflags, cuda_post_cflags, cuda_dlink_post_cflags, sources, objects, ldflags, library_target, with_cuda)
[2380](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2380) link_rule = ['rule link']
[2381](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2381) if IS_WINDOWS:
-> [2382](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2382) cl_paths = subprocess.check_output(['where',
[2383](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2383) 'cl']).decode(*SUBPROCESS_DECODE_ARGS).split('\r\n')
[2384](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2384) if len(cl_paths) >= 1:
[2385](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2385) cl_path = os.path.dirname(cl_paths[0]).replace(':', '$:')
File [~\AppData\Local\Programs\Python\Python311\Lib\subprocess.py:466](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:466), in check_output(timeout, *popenargs, **kwargs)
[463](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:463) empty = b''
[464](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:464) kwargs['input'] = empty
--> [466](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:466) return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
[467](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:467) **kwargs).stdout
File [~\AppData\Local\Programs\Python\Python311\Lib\subprocess.py:571](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:571), in run(input, capture_output, timeout, check, *popenargs, **kwargs)
[569](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:569) retcode = process.poll()
[570](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:570) if check and retcode:
--> [571](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:571) raise CalledProcessError(retcode, process.args,
[572](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:572) output=stdout, stderr=stderr)
[573](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:573) return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.
from ragatouille.
I think this is an issue with Windows 10 and loading cpp extensions in PyTorch? Saw a few similar issues on other projects floating around... I think the current stance will be that the lib doesn't support Win10 unless someone can figure out a solid fix to this 😞
from ragatouille.
In case others are trying to get it working on Windows 10, I did get past the cl error with non-zero exit status above (by installing the C++ parts of VS 2022 Build Tools) but I then ran into issues with pthread.h not being found. I tried vcpkg to install it (which was possible) but I still couldn't get it to work with the compiler and when I saw that cpp_extensions now seems archived, that, along with the time/effort taken to get to that point made me give up on Windows directly (for now at least!)
However I didn't have any problems with ragatouille using WSL on Windows (Ubuntu 20.04) via pip install ragatouille within a conda env with Python 3.11.7.
from ragatouille.
I'm having similar issues. I'm using WSL2 Windows 10 with faiss-gpu installed and faiss-cpu uninstalled. The basic script below has been running for 30 minutes...
I have 256GB RAM and 24GB of GPU RAM.
[Feb 13, 18:25:18] [0] #> Encoding 81 passages..
[Feb 13, 18:25:20] [0] avg_doclen_est = 129.82716369628906 len(local_sample) = 81
[Feb 13, 18:25:20] [0] Creating 1,024 partitions.
[Feb 13, 18:25:20] [0] *Estimated* 10,516 embeddings.
[Feb 13, 18:25:20] [0] #> Saving the indexing plan to .ragatouille/colbert/indexes/Miyazaki/plan.json ..
After about 30 minutes, I got the error:
WARNING clustering 9991 points to 1024 centroids: please provide at least 39936 training points
Clustering 9991 points in 128D to 1024 clusters, redo 1 times, 20 iterations
Preprocessing in 0.00 s
Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 128) x (1024, 128)' = (512, 1024) gemm params m 1024 n 512 k 128 trA T trB N lda 128 ldb 128 ldc 1024
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Alternatively, the T4 GPU Colab ran very quickly around 5 minutes.
Any ideas?
from ragatouille.
(Copy/pasting this message in a few related issues)
Hey guys!
Thanks a lot for bearing with me as I juggle everything and trying to diagnose this. It’s complicated to fix with relatively little time to dedicate to it, as it seems like the dependencies causing issues aren’t the same for everyone, with no clear platform pattern as of yet. Overall, the issues center around the usual suspects of faiss
and CUDA
.
While because of this I can’t fix the issue with PLAID optimised indices just yet, I’m also noticing that most of the bug reports here are about relatively small collections (100s-to-low-1000s). To lower the barrier to entry as much as possible, #137 is introducing a second index format, which doesn’t actually build an index, but performs an exact search over all documents (as a stepping stone towards #110, which would use an HNSW index to be an in-between compromise between PLAID optimisation and exact search).
This approach doesn’t scale, but offers the best possible search accuracy & is still performed in a few hundred milliseconds at most for small collections. Ideally, it’ll also open up the way to shipping lower-dependency versions (#136)
The PR above (#137) is still a work in progress, as it needs CRUD support, tests, documentation, better precision routing (fp32/bfloat16) etc… (and potentially searching only subset of document ids).
However, it’s working in a rough state for me locally. If you’d like to give it a try (with the caveat that it might very well break!), please feel free to install the library directly from the feat/full_vectors_indexing
branch and adding the following argument to your index()
call:
index(…
index_type=“FULL_VECTORS”,
)
Any feedback is appreciated, as always, and thanks again!
from ragatouille.
Related Issues (20)
- Windows support HOT 1
- add_to_index uses too much GPU RAM and crashes HOT 1
- What should I do if I want a blank, untrained ColBRET? HOT 1
- How to check the centroids and the data in the clusters?
- Feature Request : Please include server search code from official Colbert repository into this repository for production usages.
- How to do Indexing using from_index() on CPU only? HOT 4
- Trainer stuck HOT 7
- How to load a fine-tuned model? HOT 5
- About Fine-Tuning
- Stuck at " Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)..." HOT 1
- ImportError: cannot import name 'PromptTemplate' from 'llama_index' (unknown location)
- Compatibility with LangChain 0.2.0
- How to extract embeddings generated by Colbert? HOT 2
- Idea: Make CorpusProcessor (and splitter_fn / preprocessing_fn) to have access to metadata
- Embedding Model with Existing Index
- How to index collection using generator function?
- Training script is not working as is
- Making deletions will alter the collection.json file, hence the search function unusable because we access the collection using list indices.
- can't access my finetuned model
- Use base model or sentence transformer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ragatouille.