Just installed tzindex (extracted binary from the public latest image and inserted in

Cannot kill tzindex and restart it about tzindex HOT 5 CLOSED

blockwatch-cc commented on June 14, 2024

Cannot kill tzindex and restart it

from tzindex.

Comments (5)

echa commented on June 14, 2024

The block not indexed error is an indication of an inconsistent database state. This happens on unclean shutdown and in rare circumstances when the indexer experiences an internal error when in the middle of indexing a block.

What I usually do to prevent this is setting a stop_grace_period. Shutdown is usually fast (a second or two) unless you have open http connections, then its defined by server.shutdown_timeout in the config (defaults to 15sec). So anything beyond 20-30 sec should be safe.

When the database is corrupted, there's no easy way to repair it right now (an fsck like command is not implemented yet). The only solution is to remove all database files and reindex from scratch.

To prevent wait times you can do database snapshots. Right now I advice to stop the indexer and copy all database files manually. A life snapshot feature exists, but is not production ready yet.

from tzindex.

flaupretre commented on June 14, 2024

Hi, thanks for your quick reply. According to the kubernetes documentation, especially with a terminationGracePeriod, it should send a SIGTERM and wait for a graceful shutdown. Actually, if I watch the logs, the connection is closed without displaying any message indicating a graceful shutdown. Everything indicates that the processes are destroyed with a SIGKILL without receiving a gentle SIGTERM before. That's strange.

Anyway, I solved the problem with the addition of a 'preStop' script which just sends a SIGTERM to the tzindex process. With this script, the database is never corrupted.

For backups, as it is deployed on AWS/EKS, I will implement something based on EBS snapshots, as long as you don't provide the possibility to backup/snapshot without stopping the program.

from tzindex.

echa commented on June 14, 2024

EBS snapshots may or may not work. The database files are mmapped and new data gets flushed after each block is processed, but sometimes also during block processing when database journals run full. When a filesystem snapshot runs during block processing the database state may be currupted because some database files may already be updated from journal flush, may be in the middle of a journal flush or are not flushed yet. It's similar to Postgres or other databases where you get no guarantee about data consistency on the fs layer.

The not-production-ready feature for snapshotting might actually be better. Call PUT /system/snapshot (with an empty body) which makes the indexer copy consistent db files to a subdirectory inside the snapshot directory (see config crawler.snapshot_path).

The missing pice on this feature is that it's not mutually exclusive with block processing right now. I'll push this feature a bit up the priority stack. Are you on a commercial license or do you use OSS?

from tzindex.

flaupretre commented on June 14, 2024

I'm currently testing tzindex for a potential commercial license. I am working for the Ledger company and we are implementing a platform to support the tezos coin on our hardware/software. Nothing is decided yet and that's why I'm currently testing 'infrastructure' aspects of different tezos indexers (mostly backup/restore and monitoring).

When I was talking about EBS snapshots, I was talking of cold snapshots, after a graceful shutdown of the indexer. Apart from snapshots (I will try this too), it seems to be the only way to guarantee DB consistency. Even if the database is rather fast to regenerate, it takes hours and we'll need a few backups (maybe 4~5 daily snapshots). Hot snapshots are interesting, and I'll try it, but they will need to be transferred to an external location (S3 for instance).

from tzindex.

echa commented on June 14, 2024

A faster way (50min on mainet currently) is to use the RPC proxy that comes with the pro version. Even with a regular node it should be sufficient to do one snapshot a day or even one a week. Catching up the 1440 blocks a day is rather fast in my experience.

As for hot snapshots, you can orchestrate them with the PUT /system/snapshot call which is synchronous (blocks until done and then returns a 204 unless you set a very short HTTP write timeout and have very slow storage). You can configure snapshot directory and the indexer will create a subdir with the current height, but you'd still have to run the S3 copy when done.

from tzindex.

Cannot kill tzindex and restart it about tzindex HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent