Coder Social home page Coder Social logo

algoprog / quin Goto Github PK

View Code? Open in Web Editor NEW
69.0 10.0 7.0 53 KB

An easy to use framework for large-scale fact-checking and question answering

License: MIT License

Python 100.00%
semantic-search search-engine question-answering fact-checking retrieval nlp qr-bert sentence-embeddings

quin's Introduction

Hi, I'm Chris ๐Ÿ‘‹

I'm a PhD student in Information Retrieval at the Center for Intelligent Information Retrieval at University of Massachusetts Amherst, supervised by Hamed Zamani. Iโ€™m currently doing research in conversational search.

๐Ÿ‘” linkedin | ๐Ÿค twitter | ๐Ÿ“ฐ google scholar | ๐Ÿ  website


๐Ÿ“‘ Latest Research


๐Ÿ’ป Latest Projects

  • :octocat: Faspect: A library with various model implementations for open domain query facet extraction and generation
  • :octocat: Quin: A framework for large-scale fact-checking and question answering

quin's People

Contributors

algoprog avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quin's Issues

Python version for installation

Hi, sorry to disturb haha. When I cloned the repo and tried to follow the instructions on ReadMe, the installation of the requirements.txt always fails. I tried using Python 2.7, Python 3.8 and Python 3.6, but Python 3.8 failed to install scipy==1.2.1, Python 2.7 and Python 3.6 failed to install faiss_cpu==1.6.3.

Thus, I wonder could u pls offer me some help on more details(like the python version) of installation? Thank you so much! : )

System: Windows 10
IDE: PyCharm Community 2020.01

Demo interface

Hi,
Nice project! I am working on a similar problem and I was wondering if your Web interface used in the demo is available anywhere. Cheers!

ModuleNotFoundError: No module named 'models.text_encoder'

I have downloaded NLI model weights and Densed Encoder FC in the folder model/weights folder.
After extracting the relevant folder in the model/weights directory, it looks like this:
image

I also had to modify the following line otherwise it would not detect my model:
self.text_embedding_model = SentenceTransformer('{}/encoder/qrbert-factcheck'.format(models_path), device=device)

Now, I am getting the following error:
image

Can you help resolve this error and show exactly how to configure your codebase?

Questions about datasets used to train dense encoder models

Hello,
I read both of your papers that are linked in this repo and went through the code, and I'm a little bit confused about what datasets were used to train each of the dense encoder models that are listed in the README.
From reading your papers, I'm assuming that (1) the dense encoder FC model was trained on NLI and Factual NLI datasets; (2) the dense encoder QA model was pretrained on NLI and Factual NLI and fine tuned on MSMARCO; and (3) the dense encoder M model was trained only on the Factual NLI+ dataset. Could you please confirm if this is correct? Thanks a lot!

Sparse Index not building properly

Hello,

I have found a small error in your building of the sparse index, There is a small mistake in your _calc_idf funtion for sparse indexes.

In this function you are using an eps value to replace the negative idf values. However your code is a little bit faulty.

  1. Firstly you are using r_freq to determine negative idf value, why not just use the idf value itself as a check?
  2. your statement:
if r_freq > 0.5:
       continue

Is not correct because what is happening here is that if the r_freq > 0.5 (or idf is negative) it skips the rest of the code under the if statement including:

if idf < 0:
   negative_idfs.append(word)

So those words whose idf's are negative never appear in the idf dictionary. You can confirm this by checking your idf dictionary for a word which exists in more than 50% of the corpus.

I would suggest changing the _calc_idf function to:

    def _calc_idf(self, nd):
        """
        Calculates frequencies of terms in documents and in corpus.
        This algorithm sets a floor on the idf values to eps * average_idf
        """
        # collect idf sum to calculate an average idf for epsilon value
        idf_sum = 0
        # collect words with negative idf to set them a special epsilon value.
        # idf can be negative if word is contained in more than half of documents
        negative_idfs = []
        for word, freq in nd.items():
            idf = math.log(self.corpus_size - freq + 0.5) - math.log(freq + 0.5)
            #r_freq = float(freq) / self.corpus_size
            if idf < 0:
                negative_idfs.append(word)
                continue
            self.idf[word] = idf
            idf_sum += idf      
        self.average_idf = idf_sum / len(self.idf)

        eps = self.epsilon * self.average_idf
        for word in negative_idfs:
            self.idf[word] = eps

This will ensure that the negative idf values are replaced with the value of epsilon and that they are present in the idf dictionary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.