Coder Social home page Coder Social logo

Comments (26)

oliwarner avatar oliwarner commented on July 20, 2024 1

@mib1185 Docker Compose... Nothing too strange. My config, as well as system dbus and localtime are bound in volumes.

version: '3'
services:
  homeassistant:
    image: "ghcr.io/home-assistant/home-assistant"
    volumes:
      - /home/pi/ha-config:/config
      - /etc/localtime:/etc/localtime:ro
      - /run/dbus:/run/dbus:ro
      - /srv/homeassistant/quirks/:/srv/homeassistant/quirks/
    restart: unless-stopped
    privileged: true
    network_mode: host
    devices:
      - "/dev/ttyUSB0:/dev/ttyUSB0"

from core.

bartv avatar bartv commented on July 20, 2024 1

I attached gdb from my host OS. I do not have debug symbols but I was able to get this:

Thread 1 "hass" received signal SIGBUS, Bus error.
0xf5f6f9f6 in <orjson::serialize::per_type::unicode::StrSerializer as serde::ser::Serialize>::serialize ()
   from target:/usr/local/lib/python3.12/site-packages/orjson/orjson.cpython-312-arm-linux-musleabihf.so

from core.

ccMatrix avatar ccMatrix commented on July 20, 2024 1

The just released version 2024.6.1 works and fixed the issue for me. Tag stable can be used again :)

from core.

mib1185 avatar mib1185 commented on July 20, 2024

do have any more details, like log messages, command outputs or anything else helpful?

from core.

oliwarner avatar oliwarner commented on July 20, 2024

Painfully little I can see. It starts, stops and repeats. No obvious error messaging.

homeassistant_1  | s6-rc: info: service s6rc-oneshot-runner: starting
homeassistant_1  | s6-rc: info: service s6rc-oneshot-runner successfully started
homeassistant_1  | s6-rc: info: service fix-attrs: starting
homeassistant_1  | s6-rc: info: service fix-attrs successfully started
homeassistant_1  | s6-rc: info: service legacy-cont-init: starting
homeassistant_1  | s6-rc: info: service legacy-cont-init successfully started
homeassistant_1  | s6-rc: info: service legacy-services: starting
homeassistant_1  | services-up: info: copying legacy longrun home-assistant (no readiness notification)
homeassistant_1  | s6-rc: info: service legacy-services successfully started
homeassistant_1  | [18:58:25] INFO: Home Assistant Core finish process exit code 256
homeassistant_1  | [18:58:25] INFO: Home Assistant Core finish process received signal 7
homeassistant_1  | s6-rc: info: service legacy-services: stopping
homeassistant_1  | s6-rc: info: service legacy-services successfully stopped
homeassistant_1  | s6-rc: info: service legacy-cont-init: stopping
homeassistant_1  | s6-rc: info: service legacy-cont-init successfully stopped
homeassistant_1  | s6-rc: info: service fix-attrs: stopping
homeassistant_1  | s6-rc: info: service fix-attrs successfully stopped
homeassistant_1  | s6-rc: info: service s6rc-oneshot-runner: stopping
homeassistant_1  | s6-rc: info: service s6rc-oneshot-runner successfully stopped

Oh I've just found home-assistant.log.fault. Looks like a python "Bus error"

from core.

mib1185 avatar mib1185 commented on July 20, 2024

@oliwarner from your initila post i get you're using the homeassistant docker image directly? But your log from #118507 (comment) shows some service handling around - so how exactly do you run HA?

from core.

mib1185 avatar mib1185 commented on July 20, 2024

could you please provide the log of the container itself

from core.

agners avatar agners commented on July 20, 2024

This sounds like an OS level error SIGBUS which killed the Python process. From what I read this can happen in various circumstances, e.g. accessing /dev/mem (do you use RPi GPIOs?) or memory related issues (unaligned access) to potential hardware problems.

What OS are you using? Are there errors showing up in the kernel log (dmesg)?

from core.

oliwarner avatar oliwarner commented on July 20, 2024

Thanks for follow-up @agners

  • Raspbian 11 (bullseye)
  • No GPIOs but I do passthrough a USB device for zigbee (works fine in stable)
  • No errors in dmesg

This is testing with the latest RC (which is somewhat further along than the initial reported one). Reverting back to stable still works.

Am I at the point where I need to start culling my existing config until I find what's wrong? Has there been a major Python environment upgrade in this HA release? Could it be an import error of a plugin that's not 3.12-compatible? I thought we were over that hill already but happy to be corrected.

from core.

agners avatar agners commented on July 20, 2024

Raspbian 11 (bullseye)

Is this 32-bit or 64-bit?

FWIW, in my test setup 2024.6.0b5 runs fine on a Raspberry Pi 3 Model B with HAOS 12.3 (32-bit) and Raspberry Pi 3 Model B+ with HAOS 12.3 (64-bit).

What I would try is cleaning the image completely. Sometimes the layers corrupt in mysterious way, especially on Rasspbery Pis. So make sure to stop and remove the container and cleanup/prune all the image layers, and download it again.

from core.

oliwarner avatar oliwarner commented on July 20, 2024

aarch64 — I'm so sorry. I've accidentally mislead you, this is a Raspi 4b. I completely forgot I upgraded it. I'll update the title.

I've deleted and re-downloaded the entire image stack. Same behaviour.

from core.

agners avatar agners commented on July 20, 2024

What container image/tag do you use exactly?

from core.

oliwarner avatar oliwarner commented on July 20, 2024

ghcr.io/home-assistant/home-assistant:rc for testing this, ghcr.io/home-assistant/home-assistant normally. (I've just noticed I've posted the non-rc version above - that's what I fall back to when the WAF dips too low and the family want their automatic lights back). My compose file is currently:

services:
  homeassistant:
    image: "ghcr.io/home-assistant/home-assistant:rc"
    volumes:
      - /home/pi/ha-config:/config
      - /etc/localtime:/etc/localtime:ro
      - /run/dbus:/run/dbus:ro
    restart: unless-stopped
    privileged: true
    network_mode: host
    devices:
      - "/dev/ttyUSB0:/dev/ttyUSB0"
  • If I run it directly in debug mode with no config supplied (docker compose run homeassistant hass --debug) it starts up. That's obviously not storing anything anywhere and it has no plugins or existing configuration. I'm falling back to the working idea that the problem is a compatibility issue with a configured integration.

  • If I run it in --recovery-mode with the right config, it's crashing still.

  • I've turned on verbose logging, -v and I now see INFO output I didn't see before, and the last thing to show (after the last python WARNING) is:

    INFO (MainThread) [homeassistant.helpers.storage] Migrating core.config_entries storage from 1.1 to 1.2
    

It's still crashing with a bus error. The last trace is full of things happening around the storage helpers, which I didn't appreciate before. Can't be a co-incidence, right? The JSON file .storage/core.config_entries parses in Python okay but it's way too big (36k) to manually spot what the problem might be, and it's full of secrets so I can't upload it.

Thread 0xf7e89f24 (most recent call first):
  File "/usr/local/lib/python3.12/linecache.py", line 72 in checkcache
  File "/usr/local/lib/python3.12/traceback.py", line 434 in _extract_from_extended_frame_gen
  File "/usr/local/lib/python3.12/traceback.py", line 395 in extract
  File "/usr/local/lib/python3.12/traceback.py", line 232 in extract_stack
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 448 in create_future
  File "/usr/local/lib/python3.12/asyncio/futures.py", line 417 in wrap_future
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 860 in run_in_executor
  File "/usr/src/homeassistant/homeassistant/core.py", line 876 in async_add_executor_job
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 545 in _async_write_data
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 540 in _async_handle_write_data
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 436 in async_save
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 419 in _async_load_data
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 309 in _async_load
  File "/usr/src/homeassistant/homeassistant/helpers/storage.py", line 289 in async_load
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 1770 in async_initialize
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1980 in _run_once
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 639 in run_forever
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 672 in run_until_complete
  File "/usr/src/homeassistant/homeassistant/runner.py", line 188 in run
  File "/usr/src/homeassistant/homeassistant/__main__.py", line 209 in main
  File "/usr/local/bin/hass", line 8 in <module>

from core.

agners avatar agners commented on July 20, 2024

ghcr.io/home-assistant/home-assistant:rc for testing this, ghcr.io/home-assistant/home-assistant normally. (I've just noticed I've posted the non-rc version above - that's what I fall back to when the WAF dips too low and the family want their automatic lights back). My compose file is currently:

This is using the multi-platform image. I guess when you use docker inspect on the image it says indeed aarch64, correct?

It's still crashing with a bus error. The last trace is full of things happening around the storage helpers, which I didn't appreciate before. Can't be a co-incidence, right? The JSON file .storage/core.config_entries parses in Python okay but it's way too big (36k) to manually spot what the problem might be, and it's full of secrets so I can't upload it.

Hm, yeah this makes it sound like this is an issue with orjson, the json parser used in HA. It has native parts which can cause crashes like this. Can you try to parse the file using orjson 3.10.3?

from core.

agners avatar agners commented on July 20, 2024

FWIW, Core 2024.6.0b5 runs fine here on Raspberry Pi 4/aarch64 with HAOS, but I guess this is related to the exact data at play 🤔

from core.

oliwarner avatar oliwarner commented on July 20, 2024

Nice idea but orjson 3.10.3 can parse /config/.storage/core.config_entries

# running in $ docker compose run homeassistant python
import orjson
from pathlib import Path
orjson.loads(Path('/config/.storage/core.config_entries').read_text())"  # outputs a parsed copy

Do you know where this 1.1→1.2 migration code is? The storage classes are a bit overwhelming for somebody looking at them for the first time but if you can point me at the code that's actually running on here, perhaps I can slip a few debug statements into my copy and see what it outputs.

from core.

agners avatar agners commented on July 20, 2024

Nice idea but orjson 3.10.3 can parse /config/.storage/core.config_entries

You tried that on the target platform correct?

docker compose run homeassistant python doesn't use this the latest tag (instead of rc? 🤔 )

Do you know where this 1.1→1.2 migration code is? The storage classes are a bit overwhelming for somebody looking at them for the first time but if you can point me at the code that's actually running on here, perhaps I can slip a few debug statements into my copy and see what it outputs.

Not sure, maybe @bdraco can help out here?

from core.

bartv avatar bartv commented on July 20, 2024

I have the same issue on a raspberry pi with 8GB of ram. I am running rocky 8 and use podman (as root) to run the container.

When using verbose logging the last line is indeed:

2024-06-05 21:53:33.051 INFO (MainThread) [homeassistant.helpers.storage] Migrating core.config_entries storage from 1.1 to 1.2

from core.

bdraco avatar bdraco commented on July 20, 2024

Can you try downgrading orjson to 3.9.15 in the container?

from core.

mib1185 avatar mib1185 commented on July 20, 2024

Do you know where this 1.1→1.2 migration code is?

called in

stored = await self._async_migrate_func(

the migration method itself

async def _async_migrate_func(
self,
old_major_version: int,
old_minor_version: int,
old_data: dict[str, Any],
) -> dict[str, Any]:
"""Migrate to the new version."""
data = old_data
if old_major_version == 1 and old_minor_version < 2:
# Version 1.2 implements migration and freezes the available keys
for entry in data["entries"]:
# Populate keys which were introduced before version 1.2
pref_disable_new_entities = entry.get("pref_disable_new_entities")
if pref_disable_new_entities is None and "system_options" in entry:
pref_disable_new_entities = entry.get("system_options", {}).get(
"disable_new_entities"
)
entry.setdefault("disabled_by", entry.get("disabled_by"))
entry.setdefault("minor_version", entry.get("minor_version", 1))
entry.setdefault("options", entry.get("options", {}))
entry.setdefault("pref_disable_new_entities", pref_disable_new_entities)
entry.setdefault(
"pref_disable_polling", entry.get("pref_disable_polling")
)
entry.setdefault("unique_id", entry.get("unique_id"))
if old_major_version > 1:
raise NotImplementedError
return data

saving migrated data to disk

await self.async_save(stored)

from core.

bdraco avatar bdraco commented on July 20, 2024

@bartv Since you seem to be able to reproduce it on demand, can you get a backtrace with gdb?

https://wiki.python.org/moin/DebuggingWithGdb

from core.

bartv avatar bartv commented on July 20, 2024

I was able to resolve the issue by running the python.org python:3.12 container, creating a venv and installing the latest hass. When starting it did the migration. When done, I started the official 2024.6.0 container and now it goes a lot further in the startup process before I again get the "Buss Error"

2024-06-05 23:40:58.337 INFO (MainThread) [homeassistant.setup] Setup of domain rest_command took 0.00 seconds
2024-06-05 23:40:58.337 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event component_loaded[L]: component=rest_command>
2024-06-05 23:40:58.345 INFO (MainThread) [homeassistant.setup] Setting up application_credentials
2024-06-05 23:40:58.347 INFO (MainThread) [homeassistant.setup] Setup of domain application_credentials took 0.00 seconds
2024-06-05 23:40:58.347 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event component_loaded[L]: component=application_credentials>
Bus error (core dumped)

GDB is not working at the moment because it crashes much earlier:

(gdb) run /usr/local/bin/hass -c /config -v
Starting program: /usr/local/bin/python3 /usr/local/bin/hass -c /config -v

Program received signal SIGILL, Illegal instruction.
0xf73bb41e in ?? () from /lib/libcrypto.so.3

from core.

bartv avatar bartv commented on July 20, 2024

Downgrading to orjson 3.10.1 fixes the issue. It exists with both 3.10.2 and 3.10.3 (dep with which 2024.6.0) was shipped.

I tried running it from a python:3.12 container (debian based with full libc) installed from pypi.org and that seems to start fine, with the side note that this install is not complete and it does not start all custom components.

from core.

bdraco avatar bdraco commented on July 20, 2024

https://github.com/ijl/orjson/blob/master/src/serialize/per_type/unicode.rs

from core.

bdraco avatar bdraco commented on July 20, 2024

It looks like the original issue that caused us to revert was fixed but a new issue was introduced in 3.10.2 so I opened a PR to revert to the last known good version

from core.

TheArtizan avatar TheArtizan commented on July 20, 2024

just letting you know I had the same issue, luckily I made an complete backup a couple of months ago on a new drive , I will not update until this is fixed thanks guys

from core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.