larsaars / kip_einfacherklaert Goto Github PK
View Code? Open in Web Editor NEWKI Projekt - Gruppe Einfach Erklärt
License: Apache License 2.0
KI Projekt - Gruppe Einfach Erklärt
License: Apache License 2.0
what can we do about it?
2024-06-01 12:30:56,376 - INFO - Saving https://mdr.de/nachrichten/deutschland/panorama/regen-starkregen-wetter-hochwasserwarnung-sachsen-anhalt-thueringen-100.html
Traceback (most recent call last):
File "c:\Users\felix\OneDrive\Vorlesungen\6_Semester\KIP\KIP_EinfachErklaert\scrapers\mdr\current_news_scraper.py", line 93, in <module>
MDRCurrentScraper().scrape()
File "c:\Users\felix\OneDrive\Vorlesungen\6_Semester\KIP\KIP_EinfachErklaert\scrapers\mdr\current_news_scraper.py", line 87, in scrape
self.matcher.match_by_hand(easy_article_url, hard_article_url)
File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\matchers\SimpleMatcher.py", line 29, in match_by_hand
hard = self.data_handler.search_by("h", "url", hard)
File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\datahandler\DataHandler.py", line 122, in search_by
return self.helper._search_url_in_lookup(dir, attribute_value)
File "C:\Users/felix/OneDrive/Vorlesungen/6_Semester/KIP/KIP_EinfachErklaert\datahandler\DataHandler.py", line 296, in _search_url_in_lookup
df = pd.read_csv(table)
File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1723, in _make_engine
return mapping[engine](f, **self.options)
File "C:\Users\felix\anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 579, in pandas._libs.parsers.TextReader.__cinit__
File "parsers.pyx", line 668, in pandas._libs.parsers.TextReader._get_header
File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 1190: invalid start byte
probably this could be a windows error but not shure
Wenn scraper_Deutschlandfunk.py ausgeführt wird kommt diese Fehlermeldung
benreher@im-kigs:/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf$ python3 scrape_Deutschlandfunk.py
Traceback (most recent call last):
File "/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf/scrape_Deutschlandfunk.py", line 19, in
DeutschlandfunkScraper().scrape()
File "/data/projects/einfach/KIP_EinfachErklaert/scrapers/dlf/DLFScrapers.py", line 66, in scrape
if not self.data_handler.is_already_saved(self.difficulty_level, article_url):
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 137, in is_already_saved
if self.search_by(dir, "url", url) == None:
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 111, in search_by
return self.helper._search_url_in_lookup(dir, attribute_value)
File "/data/projects/einfach/KIP_EinfachErklaert/services/DataHandler.py", line 265, in _search_url_in_lookup
res = df.loc[df["url"].str.contains(url), "path"]
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 925, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 862, in _getitem_lowerdim
return getattr(section, self.name)[new_key]
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 931, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/usr/lib/python3/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
elif com.is_bool_indexer(key):
File "/usr/lib/python3/dist-packages/pandas/core/common.py", line 139, in is_bool_indexer
raise ValueError(na_msg)
ValueError: Cannot mask with non-boolean array containing NA / NaN values
Traceback (most recent call last):
File "/home/me/projects/KIP_EinfachErklaert/scrapers/mdr/./current_news_scraper.py", line 93, in
MDRCurrentScraper().scrape()
File "/home/me/projects/KIP_EinfachErklaert/scrapers/mdr/./current_news_scraper.py", line 87, in scrape
self.matcher.match_by_hand(easy_article_url, hard_article_url)
File "/home/me/projects/KIP_EinfachErklaert/matchers/SimpleMatcher.py", line 36, in match_by_hand
self.write_match(easy, hard)
File "/home/me/projects/KIP_EinfachErklaert/matchers/BaseMatcher.py", line 28, in write_match
with open(self.file, "a", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/mdr/matches_mdr.csv'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.