Coder Social home page Coder Social logo

Comments (5)

echa avatar echa commented on June 14, 2024

The block not indexed error is an indication of an inconsistent database state. This happens on unclean shutdown and in rare circumstances when the indexer experiences an internal error when in the middle of indexing a block.

What I usually do to prevent this is setting a stop_grace_period. Shutdown is usually fast (a second or two) unless you have open http connections, then its defined by server.shutdown_timeout in the config (defaults to 15sec). So anything beyond 20-30 sec should be safe.

When the database is corrupted, there's no easy way to repair it right now (an fsck like command is not implemented yet). The only solution is to remove all database files and reindex from scratch.

To prevent wait times you can do database snapshots. Right now I advice to stop the indexer and copy all database files manually. A life snapshot feature exists, but is not production ready yet.

from tzindex.

flaupretre avatar flaupretre commented on June 14, 2024

Hi, thanks for your quick reply. According to the kubernetes documentation, especially with a terminationGracePeriod, it should send a SIGTERM and wait for a graceful shutdown. Actually, if I watch the logs, the connection is closed without displaying any message indicating a graceful shutdown. Everything indicates that the processes are destroyed with a SIGKILL without receiving a gentle SIGTERM before. That's strange.

Anyway, I solved the problem with the addition of a 'preStop' script which just sends a SIGTERM to the tzindex process. With this script, the database is never corrupted.

For backups, as it is deployed on AWS/EKS, I will implement something based on EBS snapshots, as long as you don't provide the possibility to backup/snapshot without stopping the program.

from tzindex.

echa avatar echa commented on June 14, 2024

EBS snapshots may or may not work. The database files are mmapped and new data gets flushed after each block is processed, but sometimes also during block processing when database journals run full. When a filesystem snapshot runs during block processing the database state may be currupted because some database files may already be updated from journal flush, may be in the middle of a journal flush or are not flushed yet. It's similar to Postgres or other databases where you get no guarantee about data consistency on the fs layer.

The not-production-ready feature for snapshotting might actually be better. Call PUT /system/snapshot (with an empty body) which makes the indexer copy consistent db files to a subdirectory inside the snapshot directory (see config crawler.snapshot_path).

The missing pice on this feature is that it's not mutually exclusive with block processing right now. I'll push this feature a bit up the priority stack. Are you on a commercial license or do you use OSS?

from tzindex.

flaupretre avatar flaupretre commented on June 14, 2024

I'm currently testing tzindex for a potential commercial license. I am working for the Ledger company and we are implementing a platform to support the tezos coin on our hardware/software. Nothing is decided yet and that's why I'm currently testing 'infrastructure' aspects of different tezos indexers (mostly backup/restore and monitoring).

When I was talking about EBS snapshots, I was talking of cold snapshots, after a graceful shutdown of the indexer. Apart from snapshots (I will try this too), it seems to be the only way to guarantee DB consistency. Even if the database is rather fast to regenerate, it takes hours and we'll need a few backups (maybe 4~5 daily snapshots). Hot snapshots are interesting, and I'll try it, but they will need to be transferred to an external location (S3 for instance).

from tzindex.

echa avatar echa commented on June 14, 2024

A faster way (50min on mainet currently) is to use the RPC proxy that comes with the pro version. Even with a regular node it should be sufficient to do one snapshot a day or even one a week. Catching up the 1440 blocks a day is rather fast in my experience.

As for hot snapshots, you can orchestrate them with the PUT /system/snapshot call which is synchronous (blocks until done and then returns a 204 unless you set a very short HTTP write timeout and have very slow storage). You can configure snapshot directory and the indexer will create a subdir with the current height, but you'd still have to run the S3 copy when done.

from tzindex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.