Coder Social home page Coder Social logo

KeyError: ')' about articlepairmatching HOT 7 CLOSED

bangliu avatar bangliu commented on August 17, 2024
KeyError: ')'

from articlepairmatching.

Comments (7)

BangLiu avatar BangLiu commented on August 17, 2024 1
  1. I tried different strategies for keywords extraction. You can just use keyword or main_keywords.
  2. The time is document publish time. Category is the topic category, like "current events", "entertainment", "economy" and so on. It is determined by a topic classification model. As publish time is not always available in any datasets, therefore I don't use it.
  3. You can just calculate the IDF in the dataset, or from a news articles corpus. I think maybe I deleted some low-frequency words. You can check the code.
  4. That is what my previous project does:
    https://arxiv.org/abs/1803.00189
    And the code is also in my git repository.

from articlepairmatching.

BangLiu avatar BangLiu commented on August 17, 2024

This means word_to_ix doesn't contain the key ")". Can you check what it contains? ")" shall be contained in this dictionary.

from articlepairmatching.

BangLiu avatar BangLiu commented on August 17, 2024

in main.py
word_to_ix = {word: i for i, word in enumerate(W2V)}
I think maybe you have some decoding problem. I saw different platforms may have different encoding decoding issues for Chinese text. You can try to remove ".decode("utf-8")". Or you can check word_to_ix to see what is the problem.

from articlepairmatching.

tingwt avatar tingwt commented on August 17, 2024

thanks, I have solved the probelm.

from articlepairmatching.

BangLiu avatar BangLiu commented on August 17, 2024

Is it due to encoding?

from articlepairmatching.

tingwt avatar tingwt commented on August 17, 2024

Yes, and I also have some other questions.
1、The data contains keywords\main_keywords\ner_keywords, what's the differences? especially for the ner_keywords.
2、The data contains the category and time of two documents, but the code only uses the category1, why ? will the two features influence the model? and how are the categories divided?
dataset2featurefile(
"../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
"../../../../data/processed/event-story-cluster/same_event_doc_pair.cd.debug.json",
"label", "category1", "time1", "time2", "content1", "content2",
["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
col_title1=None, col_title2=None, use_cd=True,
draw_fig=True, parallel=False, extract_range=range(2), print_fig=True)
3、Does the IDF.txt contain all words? and what kinds of methods do you use to get the model?
4、Whether it can be used to organize the development of a news along time?

from articlepairmatching.

tingwt avatar tingwt commented on August 17, 2024

Thanks for your explainations.
Great job!

from articlepairmatching.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.