cnumr / ecoindex_python_fullstack Goto Github PK
View Code? Open in Web Editor NEWRefactoring of ecoindex in one monorepo using polylith pattern
License: Other
Refactoring of ecoindex in one monorepo using polylith pattern
License: Other
Also find a way to apply it to:
Use common webpage categorization to classify analyzed webpages
https://www.kaggle.com/code/bpmtips/iab-classification-of-text/notebook
Sometimes we want to exclude some hosts from analyzis, for example, when API is deployed in production, we don't want to authorize analyzis of localhost
...
Docker images do not work on Mac ARM platform for both API and CLI images
Needs to automate build / publish images with github action
Ecoindex API
Mac
No response
No response
Create a simple SDK to interact with the API
Error when trying to analyze an mp3 file
Ecoindex Scraper
No response
https://www.pascalfaure.com/hyper_relax/01_relaxation_corporelle.mp3
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 760, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/autoretry.py", line 60, in run
ret = task.retry(exc=exc, **retry_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/task.py", line 736, in retry
raise_with_context(exc)
File "/usr/local/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
return task._orig_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/worker/tasks.py", line 36, in ecoindex_task
queue_task_result = run(
^^^^
File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/worker/tasks.py", line 52, in async_ecoindex_task
ecoindex = await EcoindexScraper(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/scraper/scrap.py", line 48, in get_page_analysis
page_metrics = await self.scrap_page()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/scraper/scrap.py", line 90, in scrap_page
return PageMetrics(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 164, in __init__
__pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValueError
Step to reproduce :
alias ecoindex-cli="docker run -it --rm --add-host=host.docker.internal:host-gateway -v /tmp/ecoindex-cli:/tmp/ecoindex-cli vvatelot/ecoindex-cli:2.26.0 ecoindex-cli"
ecoindex-cli analyze --url https://simbios.fr --sitemap https://simbios.fr/sitemap.xml --export-format json
Only 1 URL was found while the sitemap contains more than 50 elements : https://simbios.fr/sitemap.xml
Ecoindex CLI
Linux
https://simbios.fr/sitemap.xml
No response
Create a simple desktop version using html + CSS
Instead of crawling all the website, try to get urls from sitemap
When trying to analyze a loclahost page, it fails when using docker.
Should declare host network configuration in readme
Ecoindex CLI
No response
No response
No response
Based on:
cd projects/ecoindex_cli
poetry lock
poetry build-project
docker build -t ecoindex-cli:playwright --build-arg="wheel=ecoindex_cli-2.23.0-py3-none-any.whl" .
cd projects/ecoindex_api
poetry lock
poetry build-project
docker build -t ecoindex-api-backend --build-arg="ecoindex_api-3.1.0-py3-none-any.whl" -f docker/backend/dockerfile .
docker build -t ecoindex-api-worker --build-arg="ecoindex_api-3.1.0-py3-none-any.whl" -f docker/worker/dockerfile .
docker compose up
cd projects/ecoindex_api
poetry run alembic revision --autogenerate -m "Migration name"
bump_type
: major, minor, patchpip install poetry
poetry config virtualenvs.create false
poetry self add poetry-multiproject-plugin
echo "previous_version=$(poetry version)" >> $GITHUB_OUTPUT
ecoindex_compute_version
in components/ecoindex/data/__init__.py
poetry version ${{ inputs.bump_type }}
echo "new_version=$(poetry version)" >> $GITHUB_OUTPUT
chore(compute): bump version to ${version}
poetry build-project
v${version}-compute
${{ github.ref }}
v${version}-compute
v${previous_version}-compute
[compute] ${version}
The total count match the number of analyzis et not hosts...
Ecoindex API
No response
No response
No response
Bonjour,
Pouvez-vous m'éclairer sur les éléments qui sont pris en compte avec l'écoindex-cli pour calculer le métric poids d'une page via la ligne de commande ?
Est-ce que tout entre en compte dans le calcul : js, css, images (eager, lazy, est-ce que tout ce qui est en source srset aussi même si on fait le calcul dans une résolution précise ?) , page html, fonts internes.
Les ressources internes uniquements ? Internes et externes ?
Merci !
Bonjour,
Je ne suis pas familier avec les tasks de poetry et je n'ai pas trouvé comment lancer comme sur le repo autonome https://github.com/cnumr/ecoindex_cli le client ecoindex_cli pour mes analyzes.
J'ai essayé
`ubuntu@xxxx:~/ecoindex_python_fullstack/projects/ecoindex_cli$ task docker-build
Updating dependencies
Resolving dependencies... (5.1s)
The command "build-project" does not exist.
exit status 1
ubuntu@xxxx:~/ecoindex_python_fullstack/projects/ecoindex_cli$ `
Sans trop de succès.
Pourriez-vous m'aider.
Merci !
For SEO purposes most sites have a ready to go document containing all their URL in a XML document.
Here is the example on my website : https://simbios.fr/sitemap.xml
It might be a good idea to use this document (by default?) instead of scrapping the whole website.
La librairie fonctionne correctement pour certains sites mais j’ai malheureusement rencontré un problème avec certaines URLs.
Exception message : "'charmap' codec can't decode byte 0x9d in position 766503: character maps to "
Code test utilisé pour appliquer EcoIndex Scraper :
import asyncio
from pprint import pprint
from ecoindex.scraper import EcoindexScraper
def main():
print('')
print("ECOINDEX ANALYSIS ")
print('')
url = https://www.orange.com/fr
try :
pagenalysis = asyncio.run(EcoindexScraper(url=url).get_page_analysis())
print(pagenalysis.score)
print(pagenalysis.ges)
print(pagenalysis.water)
except Exception as e :
print('Error on execute EcoIndex scrapper')
print(e)
if name == "main":
main()
Ecoindex Scraper
Windows
https://www.orange.com/fr
https://www.businessdecision.com/fr-fr
*************************
ECOINDEX ANALYSIS
*************************
Error on execute EcoIndex scrapper
'charmap' codec can't decode byte 0x9d in position 766503: character maps to <undefined>
A bug happened with the BFF
/{version}/ecoindexes/latest
Ecoindex API
No response
No response
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `datetime` but got `str` - serialized value may not be as expected
ecoindex-api-backend | Expected `uuid` but got `str` - serialized value may not be as expected
ecoindex-api-backend | return self.serializer.to_python(
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 760, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/autoretry.py", line 60, in run
ret = task.retry(exc=exc, **retry_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/celery/app/task.py", line 736, in retry
raise_with_context(exc)
File "/usr/local/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
return task._orig_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/worker/tasks.py", line 38, in ecoindex_task
queue_task_result = run(async_ecoindex_task(self, url, width, height))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 664, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/worker/tasks.py", line 47, in async_ecoindex_task
await check_quota(host=urlparse(url=url).netloc)
File "/usr/local/lib/python3.12/site-packages/ecoindex/backend/utils/__init__.py", line 86, in check_quota
count_daily_request_per_host = await get_count_daily_request_per_host(host=host)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ecoindex/database/repositories/ecoindex.py", line 110, in get_count_daily_request_per_host
results = await db.execute(statement)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlmodel/ext/asyncio/session.py", line 145, in execute
return await super().execute(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/ext/asyncio/session.py", line 455, in execute
result = await greenlet_spawn(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 190, in greenlet_spawn
result = context.throw(*sys.exc_info())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlmodel/orm/session.py", line 129, in execute
return super().execute(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2308, in execute
return self._execute_internal(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2190, in _execute_internal
result: Result[Any] = compile_state_cls.orm_execute_statement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
result = conn.execute(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
return meth(
^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1848, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1988, in _exec_single_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2346, in _handle_dbapi_exception
raise exc_info[1].with_traceback(exc_info[2])
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/mysql/aiomysql.py", line 93, in execute
return self.await_(self._execute_async(operation, parameters))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
return current.driver.switch(awaitable) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
value = await result
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sqlalchemy/dialects/mysql/aiomysql.py", line 102, in _execute_async
result = await self._cursor.execute(operation, parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiomysql/cursors.py", line 239, in execute
await self._query(query)
File "/usr/local/lib/python3.12/site-packages/aiomysql/cursors.py", line 457, in _query
await conn.query(q)
File "/usr/local/lib/python3.12/site-packages/aiomysql/connection.py", line 469, in query
await self._read_query_result(unbuffered=unbuffered)
File "/usr/local/lib/python3.12/site-packages/aiomysql/connection.py", line 683, in _read_query_result
await result.read()
File "/usr/local/lib/python3.12/site-packages/aiomysql/connection.py", line 1164, in read
first_packet = await self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiomysql/connection.py", line 609, in _read_packet
packet_header = await self._read_bytes(4)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/aiomysql/connection.py", line 657, in _read_bytes
data = await self._reader.readexactly(num_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/streams.py", line 734, in readexactly
await self._wait_for_data('readexactly')
File "/usr/local/lib/python3.12/asyncio/streams.py", line 527, in _wait_for_data
await self._waiter
RuntimeError: Task <Task pending name='Task-268' coro=<async_ecoindex_task() running at /usr/local/lib/python3.12/site-packages/ecoindex/worker/tasks.py:47> cb=[_run_until_complete_cb() at /usr/local/lib/python3.12/asyncio/base_events.py:180]> got Future <Future pending> attached to a different loop
Ecoindex API
No response
No response
No response
Add a way to evaluate best practices in scraper and also apply to CLI and API
ecoindex_cli does not take the content of iframes into account when calculating the ecoindex of a page.
I'm using the following files as a testcase.zip, served locally on port 8080.
I analyze the page using the following command and get this result :
ecoindex-cli analyze --url http://127.0.0.1:8080/testEcoIndexIFrame.html --export-format json --outputfile ./result.json
result:
[
{
"width": 1920,
"height": 1080,
"url": "http://127.0.0.1:8080/testEcoIndexIFrame.html",
"size": 2.511,
"nodes": 6,
"requests": 3,
"grade": "A",
"score": 97.0,
"ges": 1.06,
"water": 1.59,
"ecoindex_version": "5.4.1",
"date": "2023-02-27 17:50:40.593434",
"page_type": null
}
]
The number of DOM nodes detected by ecoindex-cli is 6, which maps the content of the main html file, but does not include the content of the html inside the iframe.
The number of requests and the size seems to be correct though : when I increase the subPage.hml page's size, the size reported by ecoindex-cli increases as well.
On some real life pages, this can make a huge different in the final rank obtained when analyzing a page (from G to E rank).
Above 3.6
Linux
No response
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.