crccheck / atx-bandc Goto Github PK
View Code? Open in Web Editor NEWScrape Austin, TX Boards and Commissions into RSS feeds
Home Page: https://bandc.crccheck.com/
License: BSD 3-Clause "New" or "Revised" License
Scrape Austin, TX Boards and Commissions into RSS feeds
Home Page: https://bandc.crccheck.com/
License: BSD 3-Clause "New" or "Revised" License
because 'date' is usually in the future for new items
that way I can Google things
so we can use it in the url for the document detail.
non-documents like videos won't work though
heroku won't work since I can't make thumbnails there. maybe digital ocean.
I think the scraper should go back in time once in a while
only an issue for the rss feed since html pages just use dates
convert -transparent white -fuzz 10% 211162.pdf z.png
convert is even available on heroku. use dream objects to host
ERROR bandc.apps.agenda.pdf PDF scrape error on EDIMS: 327897 Error: %d format: a number is required, not bytes
ERROR bandc.apps.agenda.pdf PDF scrape error on EDIMS: 334453 Error: int() argument must be a string, a bytes-like object or a number, not 'PSKeyword'
dataset uses sqlalchemy under the hood, I should give it another shot.
Scraping "Urban Transportation Commission" id #50
Scraped "Urban Transportation Commission"
Scraping "Low Income Consumer Advisory Task Force" id #130
Traceback (most recent call last):
File "/app/.venv/lib/python3.10/site-packages/django/db/models/query.py", line 581, in get_or_create
return self.get(**kwargs), False
File "/app/.venv/lib/python3.10/site-packages/django/db/models/query.py", line 435, in get
raise self.model.DoesNotExist(
bandc.apps.agenda.models.Document.DoesNotExist: Document matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: agenda_document.edims_id
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/manage.py", line 21, in <module>
main()
File "/app/manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "/app/.venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "/app/.venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/app/.venv/lib/python3.10/site-packages/django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "/app/.venv/lib/python3.10/site-packages/django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "/app/bandc/apps/agenda/management/commands/scrape.py", line 48, in handle
bandc.pull()
File "/app/bandc/apps/agenda/models.py", line 92, in pull
return pull_bandc(self)
File "/app/bandc/apps/agenda/utils.py", line 200, in pull_bandc
should_process_next = _save_page(meeting_data, doc_data, bandc=bandc)
File "/app/bandc/apps/agenda/utils.py", line 140, in _save_page
doc, created = Document.objects.get_or_create(
File "/app/.venv/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/app/.venv/lib/python3.10/site-packages/django/db/models/query.py", line 588, in get_or_create
return self.create(**params), True
File "/app/.venv/lib/python3.10/site-packages/django/db/models/query.py", line 453, in create
obj.save(force_insert=True, using=self.db)
File "/app/.venv/lib/python3.10/site-packages/django/db/models/base.py", line 739, in save
self.save_base(using=using, force_insert=force_insert,
File "/app/.venv/lib/python3.10/site-packages/django/db/models/base.py", line 776, in save_base
updated = self._save_table(
File "/app/.venv/lib/python3.10/site-packages/django/db/models/base.py", line 881, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/app/.venv/lib/python3.10/site-packages/django/db/models/base.py", line 919, in _do_insert
return manager._insert(
File "/app/.venv/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/app/.venv/lib/python3.10/site-packages/django/db/models/query.py", line 1270, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/app/.venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
cursor.execute(sql, params)
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
with self.db.wrap_database_errors:
File "/app/.venv/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/app/.venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: agenda_document.edims_id
maybe even make it the default? since the pdf is usually not my personal desired action
It's abandoned and does not support Django 4 https://github.com/jmcclell/django-bootstrap-pagination
to silence cron
output. I don't need it now that I added Sentry back
I should be able to see all recent scrapes, what boards were scraped, what docs were found, and what scrape errors happened
Heck, should probably just gather data intelligence on everything.
would be nice context. Go ahead and make a field instead of a meta object. We don't care about needing lots of migrations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.