Coder Social home page Coder Social logo

kip_einfacherklaert's Introduction

kip_einfacherklaert's People

Contributors

ben-qre avatar felixdigitalis avatar larsaars avatar simesway avatar

Watchers

 avatar  avatar

kip_einfacherklaert's Issues

.

what can we do about it?

invalid chars from mdr scraper causing problem with pandas data frames

2024-06-01 12:30:56,376 - INFO - Saving https://mdr.de/nachrichten/deutschland/panorama/regen-starkregen-wetter-hochwasserwarnung-sachsen-anhalt-thueringen-100.html
Traceback (most recent call last):
  File "c:\Users\felix\OneDrive\Vorlesungen\6_Semester\KIP\KIP_EinfachErklaert\scrapers\mdr\current_news_scraper.py", line 93, in <module>
    MDRCurrentScraper().scrape()
  File "c:\Users\felix\OneDrive\Vorlesungen\6_Semester\KIP\KIP_EinfachErklaert\scrapers\mdr\current_news_scraper.py", line 87, in scrape
    self.matcher.match_by_hand(easy_article_url, hard_article_url)
  File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\matchers\SimpleMatcher.py", line 29, in match_by_hand
    hard = self.data_handler.search_by("h", "url", hard)
  File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\datahandler\DataHandler.py", line 122, in search_by
    return self.helper._search_url_in_lookup(dir, attribute_value)
  File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\datahandler\DataHandler.py", line 296, in _search_url_in_lookup
    df = pd.read_csv(table)
  File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1723, in _make_engine
    return mapping[engine](f, **self.options)
  File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "parsers.pyx", line 579, in pandas._libs.parsers.TextReader.__cinit__
  File "parsers.pyx", line 668, in pandas._libs.parsers.TextReader._get_header
  File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 1190: invalid start byte

probably this could be a windows error but not shure

scraper_Deutschlandfunk.py oder DataHandler Fehler

Wenn scraper_Deutschlandfunk.py ausgeführt wird kommt diese Fehlermeldung

benreher@im-kigs:/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf$ python3 scrape_Deutschlandfunk.py
Traceback (most recent call last):
File "/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf/scrape_Deutschlandfunk.py", line 19, in
DeutschlandfunkScraper().scrape()
File "/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf/DLFScrapers.py", line 66, in scrape
if not self.data_handler.is_already_saved(self.difficulty_level, article_url):
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 137, in is_already_saved
if self.search_by(dir, "url", url) == None:
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 111, in search_by
return self.helper._search_url_in_lookup(dir, attribute_value)
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 265, in _search_url_in_lookup
res = df.loc[df["url"].str.contains(url), "path"]
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 925, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 862, in _getitem_lowerdim
return getattr(section, self.name)[new_key]
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 931, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
elif com.is_bool_indexer(key):
File "/usr/lib/python3/dist-packages/pandas/core/common.py", line 139, in is_bool_indexer
raise ValueError(na_msg)
ValueError: Cannot mask with non-boolean array containing NA / NaN values

I implemented simple matcher to my scraper, got again a file not found error by the matcher

Traceback (most recent call last):
File "/home/me/projects/KIP_EinfachErklaert/scrapers/mdr/./current_news_scraper.py", line 93, in
MDRCurrentScraper().scrape()
File "/home/me/projects/KIP_EinfachErklaert/scrapers/mdr/./current_news_scraper.py", line 87, in scrape
self.matcher.match_by_hand(easy_article_url, hard_article_url)
File "/home/me/projects/KIP_EinfachErklaert/matchers/SimpleMatcher.py", line 36, in match_by_hand
self.write_match(easy, hard)
File "/home/me/projects/KIP_EinfachErklaert/matchers/BaseMatcher.py", line 28, in write_match
with open(self.file, "a", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/mdr/matches_mdr.csv'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.