Coder Social home page Coder Social logo

jdillard / sphinx-sitemap Goto Github PK

View Code? Open in Web Editor NEW
54.0 5.0 20.0 214 KB

Sphinx extension to generate a multi-lingual, multi-version sitemap for HTML builds

Home Page: https://sphinx-sitemap.readthedocs.io/en/latest/index.html

License: MIT License

Python 100.00%
sphinx-extension conda-forge sphinx

sphinx-sitemap's Introduction

sphinx-sitemap's People

Contributors

arxanas avatar bollwyvl avatar bstrdsmkr avatar delazj avatar dependabot[bot] avatar fabricesalvaire avatar jdetaeye avatar jdillard avatar larsoner avatar liborjelinek avatar mart-e avatar penguinpee avatar pre-commit-ci[bot] avatar ruksi avatar sabotageandi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sphinx-sitemap's Issues

Sphinx 6.1.3 error: Sitemap file generates, but with no entries

I have an error with generating sitemap for my website. Sphinx 6.1.3 (newest) generates sitemap.xml file but without any entries. Unfortunately there is no log info besides:

sphinx-sitemap: sitemap.xml was generated for URL https://python3.info in /home/docs/checkouts/readthedocs.org/user_builds/workshop-python/checkouts/latest/_readthedocs/html/sitemap.xml

In my conf.py I have:

html_baseurl = 'https://python3.info'
sitemap_url_scheme = '{link}'
sitemap_filename = 'sitemap.xml'
sitemap_locales = [None]
html_extra_path = ['robots.txt']

Generated sitemap.xml (yes, that's all file):

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://python3.info/</loc>
<lastmod>2023-04-03T23:49:44.379481+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>

References:

PyPI build doesn't seem to have latest version

I installed your extension using pip install --user sphinx-sitemap. version.py is updated but not __init__.py. See this diff:

diff ~/.local/lib/python2.7/site-packages/sphinx_sitemap/__init__.py __init__.py

19c19
<     """Setup conntects events to the sitemap builder"""
---
>     """Setup connects events to the sitemap builder"""
22c22
<         default='https://my-site.com/docs/',
---
>         default=None,

, etc.

Sitemap customization for locale paths

Hello!

Looking at the sitemap_url_scheme customization for multi-language sites, and I'm trying to figure out how to pass a specific configuration to enable the following behavior:

in conf.py;

language = 'en'
locale_dirs = ['locale/']

and then in locale/:

-> de
-> fr
-> es

and their associated files.

With a configuration of sitemap_url_scheme = "{lang}{link}" I get correct URLs to the locales for anything except for en which for my site is "unnamed" in the URL path.

Am I doing it wrong? How might I pass "use the locale subpath for anything other than en"?

Generate the <lastmod> tag

There currently is no <lastmod> tag for each URL in the sitemap.xml. To implement this will likely require a different extension.

example: https://bitbucket.org/dhellmann/pymotw-3/src/17b6ea3b657b93ad45b6ccd5c295e767f4f4be71/source/conf.py?at=master&fileviewer=file-view-default#conf.py-449

def html_page_context(app, pagename, templatename, context, doctree):
    # Use the last modified date from git instead of applying a single
    # value to the entire site.
    context['last_updated'] = _get_last_updated(app, pagename)
def _get_last_updated(app, pagename):
    # Use the last modified date from git instead of applying a single
    # value to the entire site.
    last_updated = None
    src_file = app.builder.env.doc2path(pagename)
    if os.path.exists(src_file):
        try:
            last_updated_t = subprocess.check_output(
                [
                    'git', 'log', '-n1', '--format=%ad', '--date=short',
                    '--', src_file,
                ]
            ).decode('utf-8').strip()
            last_updated = datetime.datetime.strptime(last_updated_t,
                                                      '%Y-%m-%d')
        except (ValueError, subprocess.CalledProcessError):
            pass
    return last_updated

caveats to consider:

  1. Included files may have a later updated date than the parent page.
  2. Files included for substitution purposes won't take into account if only a substitution on that page changed, making it hard to determine if the change date on the included files are accurate for that page.

Provide wheels on new releases

For a while when installing packages with pip one gets the following warning if there is no binary installation of the package:

DEPRECATION: sphinx-sitemap is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

Would it be possible to release new versions of sphinx-sitemap with wheels (the binary format)? ๐Ÿ™๐Ÿพ

Only references pages that are new or have changed since the last build

Issue

The builder currently doesn't access to the saved environment, so it can only see pages that are new or have changed since the last build, possibly leading to an incomplete sitemap.

Work around

To get around this you can manually clean the build directory or use the -E flag to not use the saved environment and rebuild completely as part of your deployment process.

Possible fix

The saved environment is pickled after the parsing stage, so accessing it may make it possible to always produce a full sitemap. Using doctree-resolved instead might be a possibility.

Temp fix

Test if sphinx is running a partial build and output a warning.

loading pickled environment is failed, possibly because using `multiprocessing.Manager`

I initially reported this issue in the sphinx-doc repository (sphinx-doc/sphinx#11463), but upon further investigation, I've come to realize that the error might stem from an extension. I've narrowed down the culprit to sphinx-sitemap, which utilizes multiprocessing.Manager.

It appears that the code leveraging multiprocessing.Manager cannot be pickled.

Any insights or solutions would be greatly appreciated. Thank you!

2.5.0: pytest warnings

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

  • python3 -sBm build -w --no-isolation
  • because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
  • install .whl file in </install/prefix>
  • run pytest with $PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>
  • build is performed in env which is cut off from access to the public network (pytest is executed with -m "not network")

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-sphinx-sitemap-2.5.0-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-sphinx-sitemap-2.5.0-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra -m 'not network'
============================= test session starts ==============================
platform linux -- Python 3.8.16, pytest-7.2.1, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/sphinx-sitemap-2.5.0
collected 4 items

tests/test_parallel_mode.py .                                            [ 25%]
tests/test_simple.py ...                                                 [100%]

=============================== warnings summary ===============================
tests/test_parallel_mode.py::test_parallel
tests/test_parallel_mode.py::test_parallel
  /usr/lib64/python3.8/importlib/__init__.py:127: RemovedInSphinx80Warning: The alias 'sphinx.util.SkipProgressMessage' is deprecated, use 'sphinx.util.display.SkipProgressMessage' instead. Check CHANGES for Sphinx API modifications.
    return _bootstrap._gcd_import(name[level:], package, level)

tests/test_parallel_mode.py::test_parallel
tests/test_parallel_mode.py::test_parallel
tests/test_parallel_mode.py::test_parallel
tests/test_parallel_mode.py::test_parallel
  /usr/lib64/python3.8/importlib/__init__.py:127: RemovedInSphinx80Warning: The alias 'sphinx.util.progress_message' is deprecated, use 'sphinx.util.display.progress_message' instead. Check CHANGES for Sphinx API modifications.
    return _bootstrap._gcd_import(name[level:], package, level)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 4 passed, 6 warnings in 1.47s =========================

Here is list of installed modules in build env

Package                       Version
----------------------------- -----------------
alabaster                     0.7.12
appdirs                       1.4.4
attrs                         22.2.0
Babel                         2.11.0
build                         0.9.0
charset-normalizer            3.0.1
cssselect                     1.1.0
distro                        1.8.0
docutils                      0.19
exceptiongroup                1.0.0
extras                        1.0.0
fixtures                      4.0.0
gpg                           1.18.0-unknown
idna                          3.4
imagesize                     1.4.1
importlib-metadata            6.0.0
iniconfig                     2.0.0
Jinja2                        3.1.2
libcomps                      0.1.19
lxml                          4.9.2
MarkupSafe                    2.1.1
numpy                         1.24.1
olefile                       0.46
packaging                     21.3
pbr                           5.9.0
pep517                        0.13.0
Pillow                        9.4.0
pip                           22.3.1
pluggy                        1.0.0
Pygments                      2.14.0
PyGObject                     3.42.2
pyparsing                     3.0.9
pytest                        7.2.1
python-dateutil               2.8.2
pytz                          2022.4
requests                      2.28.2
rpm                           4.17.0
scour                         0.38.2
setuptools                    65.6.3
six                           1.16.0
snowballstemmer               2.2.0
Sphinx                        6.1.3
sphinx_contributors           0.2.7
sphinxcontrib-applehelp       1.0.2.dev20221204
sphinxcontrib-devhelp         1.0.2.dev20221204
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1.dev20230128
sphinxcontrib-qthelp          1.0.3.dev20230128
sphinxcontrib-serializinghtml 1.1.5
sphinxemoji                   0.2.0
testtools                     2.5.0
tomli                         2.0.1
urllib3                       1.26.12
wheel                         0.38.4
zipp                          3.11.0

I also have two small patches which I'm using during build my package with sphinx documentation as man page.
Beow patch alows build socumetation without have installed sphinx-sitemap module using straight source tree

--- a/docs/conf.py
+++ b/docs/conf.py
@@ -11,9 +11,13 @@
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #
+import os
+import sys
 import re
 import subprocess

+sys.path.insert(0, os.path.abspath(".."))
+
 # -- Project information -----------------------------------------------------

 project = "Sphinx Sitemap"

Sesond patch allows obtain module version not from git tag but from module version

--- a/docs/conf.py
+++ b/docs/conf.py
@@ -15,22 +15,18 @@
 import sys
 import re
 import subprocess
-
 sys.path.insert(0, os.path.abspath(".."))

+import sphinx_sitemap
+
 # -- Project information -----------------------------------------------------

 project = "Sphinx Sitemap"
 copyright = "Jared Dillard"
 author = "Jared Dillard"

-# check if the current commit is tagged as a release (vX.Y.Z)
-GIT_TAG_OUTPUT = subprocess.check_output(["git", "tag", "--points-at", "HEAD"])
-current_tag = GIT_TAG_OUTPUT.decode().strip()
-if re.match(r"^v(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)$", current_tag):
-    version = current_tag
-else:
-    version = "latest"
+version = sphinx_sitemap.__version__
+
 # The full version, including alpha/beta/rc tags
 release = ""

Second on is usuefull on build module out of autogenerated fron git tag tar ball which has no .git metadata.
With abpve patches is possible to generate documentation by:

+ /usr/bin/sphinx-build -n -T -b man docs build/sphinx/man
Running Sphinx v6.1.3
making output directory... done
loading intersphinx inventory from https://www.sphinx-doc.org/en/master/objects.inv...
building [mo]: targets for 0 po files that are out of date
writing output...
building [man]: all manpages
updating environment: [new config] 7 added, 0 changed, 0 removed
reading sources... [ 14%] advanced-configuration
reading sources... [ 28%] changelog
reading sources... [ 42%] configuration-values
reading sources... [ 57%] contributing
reading sources... [ 71%] getting-started
reading sources... [ 85%] index
reading sources... [100%] search-optimization

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
writing... python-sphinx-sitemap.3 { getting-started advanced-configuration search-optimization configuration-values contributing changelog } done
sphinx-sitemap: No pages generated for sitemap.xml
build succeeded.

Feel free to commit those patch or let me know if you want them as PRs.

Is it possible to remove the generated language, e.g. "/en/" from the generated sitemap URLs?

Sorry, I couldn't figure this out. My documentation site is only in a single language, english, and I prefer to leave the "/en/" out of all my urls. However, I can't seem to figure out how to tell this library how to remove that path from the URLs. I have tried setting language = None and sitemap_locales = [None] but it still seems to add /en/ to all the paths, which then points to 404 pages.

Is this possible to do with this package?

It seems like no matter what I do these lines:

        if  app.builder.config.language:
            lang = app.builder.config.language + "/"
        else:
            lang = ""

Always cause the language to get set.

Forced URL scheme for multilanguage and multiversion

Hello,

Reading the code, I understand that you assume the page is published, using the version and language configuration, following the scheme :

  • https://<html_baseurl>/<language>/<version>/<page>.html

ET.SubElement(url, "loc").text = site_url + \
app.builder.config.language + '/' + version + link

Is there a reason to force this construction? For instance, on our website, we use the construction:

  • https://<base>/<version>/<language>/<page>/html

What would be the way to make it work for our scheme?

Thanks

Missing slash in sitemap URL scheme when upgrading from sphinx-sitemap 2.2 to 2.3

Hi!

I was on Sphinx<5 and sphinx-sitemap==2.2 and updated my project dependencies to Sphinx>5 and sphinx-sitemap>2.2. When I did so, I noticed that the URLs generated in sitemap.xml where missing a backslash before {lang}, which made the URLs incorrect. See the picture below for the diff:
bilde

I can solve this by setting sitemap_url_scheme = "/{lang}{version}{link}", but that doesn't look like a good solution to me. Do you know if there is any other source of error here, possibly within sphinx-sitemap? I've seen https://sphinx-sitemap.readthedocs.io/en/latest/advanced-configuration.html, but this error appears when just changing between sphinx-sitemap 2.2 to 2.3 (and keeping Sphinx>5 constant), so it looks like it was introduced by upgrading sphinx-sitemap.

"default" config broken in sphinx 5

With no language configured, a mono-lingual site sitemap suddenly gains /en/ in the sitemap after updating to sphinx 5, breaking all paths

Set priority attribute based on version

Set <priority> as:

  • 1 for the pages of the latest or stable version.
  • for each following version, decrease the priority of 0.1 at each version
  • 0.1 for the pages for other version if there is more than 9 versions.

The priority could possibly be a config value, sitemap_priority, manually set/configured in conf.py with the ability to be changed based on the versioning method (change the value for each tag/branch).

Taken from here: readthedocs/readthedocs.org#557

Root dir is missing

Hi,

when using the domain my-site.com, sphinx-sitemap adds the page my-site.com/index.html to the sitemap.
However, we would like to index the root (my-site.com) instead.
Is that configurable?

2.3.0: test suite is installed

Looks test suite is installed in latest version.
Here is content of the genenerated .whl archive

 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    6037  Defl:N     2156  64% 12-21-2022 22:29 9dd4c9eb  sphinx_sitemap/__init__.py
     484  Defl:N      286  41% 12-21-2022 22:29 0c4d7010  tests/conftest.py
     608  Defl:N      336  45% 12-21-2022 22:29 6f943e6f  tests/test_simple.py
      32  Defl:N       34  -6% 12-21-2022 22:29 527a47a5  tests/roots/test-root/conf.py
    1070  Defl:N      641  40% 12-26-2022 00:24 df67ef30  sphinx_sitemap-2.3.0.dist-info/LICENSE
    8750  Defl:N     2716  69% 12-26-2022 00:24 0fcb35d4  sphinx_sitemap-2.3.0.dist-info/METADATA
      92  Defl:N       92   0% 12-26-2022 00:24 ee31a5a1  sphinx_sitemap-2.3.0.dist-info/WHEEL
      26  Defl:N       28  -8% 12-26-2022 00:24 f69adfab  sphinx_sitemap-2.3.0.dist-info/top_level.txt
     737  Defl:N      454  38% 12-26-2022 00:24 1c92e7f6  sphinx_sitemap-2.3.0.dist-info/RECORD
--------          -------  ---                            -------
   17836             6743  62%                            9 files

Doesn't reference pages in subdirectories

Support internationalization / multiple languages

More information on multilingual sitemaps can be found here: https://en.wikipedia.org/wiki/Sitemaps#Multilingual_and_multinational_Sitemaps

Setting the language flag in conf.py: http://www.sphinx-doc.org/en/master/usage/configuration.html#confval-language

Setting the location of the locales in conf.py: http://www.sphinx-doc.org/en/master/usage/configuration.html#confval-locale_dirs

Building multiple languages: http://www.sphinx-doc.org/en/master/intl.html

Note: All the sitemaps will need to be manually added to a sitemapindex.

Sitemap rules not respected for multilingual documentations

Hi,
According to Google documentation, each language variation should be put with <xhtml:link> including the default language.

Current Sitemap:

    <url>
        <loc>https://super_doc.com/fr/index.html</loc>
        <xhtml:link rel="alternate" hreflang="en" href="https://super_doc.com/en/index.html" />
    </url>

Expected Sitemap:

    <url>
        <loc>https://super_doc.com/fr/index.html</loc>
        <xhtml:link rel="alternate" hreflang="en" href="https://super_doc.com/en/index.html" />
        <xhtml:link rel="alternate" hreflang="fr" href="https://super_doc.com/fr/index.html" />
    </url>

https://developers.google.com/search/docs/advanced/crawling/localized-versions#sitemap

Add developer documentation

How to install and test the extension as a developer.

Install:

  1. If your project doesn't have an extensions directory, create _exts and point conf.py to it:
    sys.path.append(os.path.abspath('../_exts'))
  2. Add the sphinx-sitemap as a directory in your project's extensions directory, and rename it to sphinx-sitemap-dev.

Test:

  1. run pep8 on changed python files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.