Comments (6)
Writing down some notes:
- Eyeballing the T5-based highlighter as implemented seems to yield worse results (e.g., more super random highlightings of completely unrelated material, such as extraneous CC-BY attribution text).
- We tried dynamic query representation as well as fixed query representation, i.e.,
f'Query: {query} Document: {document} Relevant:'
vs'{query}'
and'{document}'
. - If we want to take advantage of caching the reranker, the T5-based highlighter is limited to a maximum sequence length of 256, whereas BioBERT to 512.
- The T5-based highlighter is 0-25% faster with reranker caching and 10-20% slower without.
from covidex.
I think the main problem is that we are using 256 tokens for the reranker. Could you please try increasing to 512 tokens? There might be only a small increase in latency because we were underutilizing the GPU when feeding it with 256 tokens..
Also, since we will then have a spare GPU, we can use it to cut the inference time by half (but that we can leave for another PR)
from covidex.
Sure, but the results won't be the same as the TensorFlow implementation. Is that okay?
from covidex.
I guess I can evaluate it on R04.
from covidex.
Yeah, evaluating on R04 is an even better idea
from covidex.
can you throw some light on what do u mean highlighting...
can i get link to this BioBERT's highlighter?
from covidex.
Related Issues (20)
- The Covidex ai website is offline for a few days now
- Is Covidex now Cydex? HOT 1
- Is covidex.ai down? HOT 1
- a query works for basic but fails for neural HOT 1
- Add link to our arxiv paper on landing page HOT 1
- add the acknowledgement of CIFAR
- Refactoring and integration with pygaggle HOT 1
- Discussion: faceting and pagination in multistage ranking
- sorting HOT 4
- Doesn't retrieve important article HOT 2
- Explore Bing search dataset for Coronavirus Intent
- Update index of basic covidex HOT 3
- Retrieve related articles metadata from Pyserini index HOT 4
- Add x65han to landing page
- Refactor to use pygaggle HOT 2
- Build HNSW index in covidex
- No requirements.txt and .env.sample in repo HOT 1
- No update-anserini.sh in repo. HOT 1
- E tensorflow/core/platform/cloud/curl_http_request.cc:611]
- Docker image + index modification request HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covidex.