Comments (9)
Hi @AmenRa , thanks for your help! I ran the sample code you provided me, sadly (?) everything worked fine, the output was the expected one in both the console and the logfile. Seems tqdm isn't the culprit here. I agree that the problem may lie in the fact that I'm using Windows. As I was saying this kind of behavior seems to be linked on how files are opened under the hood. I'll try to dig deeper and keep you updated if I find anything interesting, since this behavior is strange and not as intended I guess!
from retriv.
Hi @AmenRa, thanks for your support, I finally solved the issue.
The problem doesn't lie in retriv per se, but in the fact that's using multiprocessing.
The logger was instantiated outside the if name == "main" section of the code, so the multiprocessing executed it more than once, leading to unexpected output. Once the instantiation of the code has been moved under the if, everything works as expected. Leaving this here so other windows users won't be caught in OS this difference.
from retriv.
Could you provide a reproducible snippet?
from retriv.
Hi @AmenRa ,
Here is a script that reproduces the error. The logger class only has this problem when SearchEngine().index() is run, if commented the logs are correctly printed.
I may have done something wrong, but I can't seem to find the cause for this since every other test I've run didn't reproduce the issue.
from retriv.
Hi, i removed from commons.logger import Logger
as I do not know what commons
is.
Everything seems to work fine on my end.
I get this in both my terminal and the log file:
[2023-09-02 09:16:28,031] {binary_log.py:55} INFO - Building index...
[2023-09-02 09:16:28,303] {binary_log.py:57} INFO - Index built.
from retriv.
@AmenRa I checked and confirmed that even without the import the console log works fine but the log file still has the problem, I tested this script on two different PCs which both run windows. Maybe this could be something related to the operating system?
Also, can I ask you your configuration? Thing like OS, Python version, and the such.
Thanks a lot
from retriv.
Hi @AmenRa I did some further testing and I found something interesting. If I change the order of the lines
logger.info("Building index...")
SearchEngine("new-index").index(collection, show_progress=False)
logger.info("Index built.")
to
logger.info("Building index...")
logger.info("Index built.")
SearchEngine("new-index").index(collection, show_progress=False)
the log file becomes empty. So it seems from the tests I conducted that somehow rertiv is still sending something to stdout even when with show_progress=False. If I comment the indexing line, the log correctly is shown in the file. I'm running on windows 11 with python 3.10.0. I don't know if I can give you any more information, but please tell me if you need something else to try to reproduce the error in your environment.
from retriv.
Hi @AmenRa , quick update, I tried changing the logging library with loguru, and the issue still occurs when opening the file in W mode. From further digging, I found that the log file has NUL (yes, with just one "L") values before the last line, this seems to be an issue related to the underlying open file command and multithreading/multiprocessing which, given the high speed of retriv, I guess it's being done under the hood. I'll try to see if I can dig some other information.
from retriv.
Hi, show_progress
is passed to a tqdm
progress bar in this fashion:
tqdm(
...
disable=not show_progress,
)
Try running a loop with tqdm
like this to verify whether tqdm
is (one of) the cause:
from tqdm import tqdm
logger.info("Building index...")
for _ in tqdm(range(10_000), disable=True):
continue
logger.info("Index built.")
Other tqdm
options that I use are:
desc="something"
dynamic_ncols=True
mininterval=0.5
Honestly, I am not aware of other things that may interfere with a logger.
Also, I do not have a Windows machine, so maybe the issue is related to Windows / Windows + tqdm
.
from retriv.
Related Issues (20)
- BM25 time complexity HOT 1
- Doc strings HOT 4
- I would like to see `retriv` part of the Search Benchmark, the Game
- [Feature Request] Allow GPU for query embedding HOT 1
- Minimal example for Hybrid Search fails HOT 3
- Input file format HOT 2
- [Feature Request] Add documents to index after initializing? HOT 1
- Multiprocess error triggers while trying example code HOT 3
- Does Advanced Retrieve support semantic searching? HOT 1
- [Feature Request] build index on a sequence of json/jsonl files HOT 3
- Compare retriv's permance to rank_bm25 and pyserini HOT 4
- HybridRetriever raise KeyError: -1 if the len of doc less than 1_000 HOT 1
- autotune Function Usage Example HOT 1
- ANN_Searcher not dealing with -1 returned by faiss_index.search()
- fsspec==2023.12.2 does not allow '**' in path
- HybridRetriever does not respect cutoff when calling sub-retrievers and the merger
- [BUG] Segmentation fault (core dumped) HOT 1
- Getting Out of Memory Error HOT 1
- using another ANN
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retriv.