Coder Social home page Coder Social logo

algolia-docsearch-action's People

Contributors

aditya-mitra avatar darrenjennings avatar psi-4ward avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

algolia-docsearch-action's Issues

Decoding error

Hi, I'm getting an error when running an indexing job:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)

Do you have any hints?

Here is the full log:

(...)
[36](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:37)
Successfully installed certifi-2022.5.18.1 distlib-0.3.4 filelock-3.4.1 importlib-metadata-4.8.3 importlib-resources-5.4.0 pipenv-2022.4.8 platformdirs-2.4.0 six-1.16.0 typing-extensions-4.1.1 virtualenv-20.14.1 virtualenv-clone-0.5.7 zipp-3.6.0
[37](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:38)
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
[38](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:39)
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
[39](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:40)
Installing dependencies from Pipfile.lock (aabb41)...
[40](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:41)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/numbers/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[41](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:42)
Traceback (most recent call last):
[42](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:43)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[43](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:44)
    body = response.body.decode(response.encoding)
[44](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:45)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position 4: ordinal not in range(128)
[45](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:46)

[46](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:47)
During handling of the above exception, another exception occurred:
[47](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:48)

[48](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:49)
Traceback (most recent call last):
[49](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:50)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[50](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:51)
    current.result = callback(current.result, *args, **kw)
[51](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:52)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[52](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:53)
    self.add_records(response, from_sitemap=True)
[53](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:54)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[54](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:55)
    records = self.strategy.get_records_from_response(response)
[55](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:56)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[56](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:57)
    self.dom = self.get_dom(response)
[57](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:58)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[58](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:59)
    result = lxml.html.fromstring(response.body)
[59](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:60)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[60](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:61)
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[61](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:62)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[62](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:63)
    "Document is empty")
[63](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:64)
lxml.etree.ParserError: Document is empty
[64](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:65)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[65](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:66)
Traceback (most recent call last):
[66](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:67)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[67](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:68)
    body = response.body.decode(response.encoding)
[68](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:69)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)
[69](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:70)

[70](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:71)
During handling of the above exception, another exception occurred:
[71](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:72)

[72](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:73)
Traceback (most recent call last):
[73](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:74)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[74](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:75)
    current.result = callback(current.result, *args, **kw)
[75](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:76)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[76](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:77)
    self.add_records(response, from_sitemap=True)
[77](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:78)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[78](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:79)
    records = self.strategy.get_records_from_response(response)
[79](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:80)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[80](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:81)
    self.dom = self.get_dom(response)
[81](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:82)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[82](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:83)
    result = lxml.html.fromstring(response.body)
[83](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:84)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[84](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:85)
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[85](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:86)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[86](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:87)
    "Document is empty")

ValueError: CONFIG is not a valid JSON

Since 2 days, the workflow crashes with the following error:

Cloning into 'docsearch-scraper'...
Collecting pipenv
  Downloading pipenv-2021.5.29-py2.py3-none-any.whl (3.9 MB)
Collecting virtualenv-clone>=0.2.5
  Downloading virtualenv_clone-0.5.7-py3-none-any.whl (6.6 kB)
Requirement already satisfied: pip>=18.0 in /usr/local/lib/python3.6/site-packages (from pipenv) (21.2.4)
Collecting certifi
  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Requirement already satisfied: setuptools>=36.2.1 in /usr/local/lib/python3.6/site-packages (from pipenv) (57.5.0)
Collecting virtualenv
  Downloading virtualenv-20.10.0-py2.py3-none-any.whl (5.6 MB)
Collecting importlib-resources>=1.0
  Downloading importlib_resources-5.4.0-py3-none-any.whl (28 kB)
Collecting filelock<4,>=3.2
  Downloading filelock-3.3.2-py3-none-any.whl (9.7 kB)
Collecting six<2,>=1.9.0
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting distlib<1,>=0.3.1
  Downloading distlib-0.3.3-py2.py3-none-any.whl (496 kB)
Collecting platformdirs<3,>=2
  Downloading platformdirs-2.4.0-py3-none-any.whl (14 kB)
Collecting importlib-metadata>=0.12
  Downloading importlib_metadata-4.8.1-py3-none-any.whl (17 kB)
Collecting backports.entry-points-selectable>=1.0.4
  Downloading backports.entry_points_selectable-1.1.0-py2.py3-none-any.whl (6.2 kB)
Collecting zipp>=0.5
  Downloading zipp-3.6.0-py3-none-any.whl (5.3 kB)
Collecting typing-extensions>=3.6.4
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Installing collected packages: zipp, typing-extensions, importlib-metadata, six, platformdirs, importlib-resources, filelock, distlib, backports.entry-points-selectable, virtualenv-clone, virtualenv, certifi, pipenv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Successfully installed backports.entry-points-selectable-1.1.0 certifi-2021.10.8 distlib-0.3.3 filelock-3.3.2 importlib-metadata-4.8.1 importlib-resources-5.4.0 pipenv-2021.5.29 platformdirs-2.4.0 six-1.16.0 typing-extensions-3.10.0.2 virtualenv-20.10.0 virtualenv-clone-0.5.7 zipp-3.6.0
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Installing dependencies from Pipfile.lock (aabb41)...
WARNING: The directory '/github/home/.cache/pipenv' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Collecting incremental==21.3.0
  Downloading incremental-21.3.0-py2.py3-none-any.whl (15 kB)
Installing collected packages: incremental
WARNING: Ignoring invalid distribution -mportlib-metadata (/usr/local/lib/python3.6/site-packages)
Successfully installed incremental-21.3.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/config/config_loader.py", line 101, in _load_config
    data = json.loads(config, object_pairs_hook=OrderedDict)
  File "/usr/local/lib/python3.6/json/__init__.py", line 367, in loads
    return cls(**kw).decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docsearch", line 5, in <module>
    run()
  File "/github/workspace/docsearch-scraper/cli/src/index.py", line 161, in run
    exit(command.run(sys.argv[2:]))
  File "/github/workspace/docsearch-scraper/cli/src/commands/run_config.py", line 21, in run
    return run_config(args[0])
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/index.py", line 33, in run_config
    config = ConfigLoader(config)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/config/config_loader.py", line 69, in __init__
    data = self._load_config(config)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/config/config_loader.py", line 106, in _load_config
    raise ValueError('CONFIG is not a valid JSON')
ValueError: CONFIG is not a valid JSON

Here is my config:

name: Docsearch Scrap

on:
  schedule:
    - cron: "0 8 * * *"

jobs:
  scrap:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
      - uses: darrenjennings/algolia-docsearch-action@master
        with:
          algolia_application_id: "MY_ID"
          algolia_api_key: ${{secrets.DOCSEARCH_API_KEY}}
          file: "$GITHUB_WORKSPACE/docsearch-scrapper-config.json"

About algolia configuring key for Name

Thank you very much for making this GitHub Actions warehouse. I have the following questions to ask.
In my configuration, there is the configuration of name. Do you need to enter it in the warehouse? in addition, an error has been reported here. Please take a look at it sometime.

image

Exception: Env CHROMEDRIVER_PATH='/usr/bin/chromedriver' is not a path to a file

Please see the action log: https://github.com/ant-design-blazor/ant-design-blazor/runs/6823621463?check_suite_focus=true

Installing dependencies from Pipfile.lock (aabb[41](https://github.com/ant-design-blazor/ant-design-blazor/runs/6823621463?check_suite_focus=true#step:4:42))...
Traceback (most recent call last):
  File "docsearch", line 5, in <module>
    run()
  File "/github/workspace/docsearch-scraper/cli/src/index.py", line 161, in run
    exit(command.run(sys.argv[2:]))
  File "/github/workspace/docsearch-scraper/cli/src/commands/run_config.py", line 21, in run
    return run_config(args[0])
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/index.py", line 33, in run_config
    config = ConfigLoader(config)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/config/config_loader.py", line 78, in __init__
    self.user_agent)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/config/browser_handler.py", line 34, in init
    CHROMEDRIVER_PATH))
Exception: Env CHROMEDRIVER_PATH='/usr/bin/chromedriver' is not a path to a file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.