One of results of executing transaction in stellar protocol is txINTERNAL_ERROR - this

Ah, OK. <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Reconsider txINTERNAL_ERROR result of transaction about stellar-protocol HOT 15 CLOSED

stellar commented on July 21, 2024

Reconsider txINTERNAL_ERROR result of transaction

from stellar-protocol.

Comments (15)

nullstyle commented on July 21, 2024

This result is stored in history and is part of input of given ledger hash.

This seems totally incorrect. An internal error should indicate an implementation error or an operational error... it shouldn't be expressed at the protocol level, right? The fact that it can leak into history just seems like an implementation bug itself.

So it will be possible that some servers will handle given transaction just right and some will return txINTERNAL_ERROR. It will result with different ledger hashes and some nodes will get desynchronized and became unusable.

Perhaps the mechanism is wrong, but it seems unavoidable that implementations of the protocol that have bugs would become desynchronized, right? Well, I'm guessing the preferred behavior would be they would halt progress until a fix could for their faulty implementation could be deployed.

I think it would be better to just crash in this case (with lots of logged info, of course).

This makes for a really poor integration experience for transaction submitters. Simply having the server drop off the network is not a good way to communicate with downstream clients. I'm very against this route unless it is unavoidable.

Could you perhaps explain why txINTERNAL_ERROR is ending up in history?

from stellar-protocol.

vogel commented on July 21, 2024

Of course, from a code point of view at least:

It all starts in LedgerManagerImpl::applyTransactions where tx->apply is invoked.
If an exception is thrown from it (other than InvariantDoesNotHold), it is catched just below:

catch (std::runtime_error& e)
{
    CLOG(ERROR, "Ledger") << "Exception during tx->apply: " << e.what();
    tx->getResult().result.code(txINTERNAL_ERROR);
}

So we get txINTERNAL_ERROR stored as result code of transaction. Then with this call:

tx->storeTransaction(*this, tm, ++index, txResultSet);

It is both stored in database and it txResultSet structure. Going back to LedgerManagerImpl::closeLedger which called LedgerManagerImpl::applyTransactions we see the following:

applyTransactions(txs, ledgerDelta, txResultSet);

ledgerDelta.getHeader().txSetResultHash =
    sha256(xdr::xdr_to_opaque(txResultSet));

So hash of txResultSet which contains txINTERNAL_ERROR result code becomes is stored in txSetResultHash and then it becomes part of ledger header hash.

If you mean "why txINTERNAL_ERROR is ending up in history" from design point of view then I don't know why it was done.

Most of problems in transaction handling is taken care of before trying to applying them in HerderImpl::recvTransaction - transaction rejected at this level never make it into ledger as part of scpValue.txSetHash. But transaction that are accepted at HerderImpl::recvTransaction are included in scpValue.txSetHash and in txSetResultHash which unfortunately requires that it supports txINTERNAL_ERROR.

I don't know other solution that simply crashing stellar-core in that case. All other solutions require synchronization of clients bugs.

from stellar-protocol.

nullstyle commented on July 21, 2024

If you mean "why txINTERNAL_ERROR is ending up in history" from design point of view then I don't know why it was done.

Yeah, this is what I was asking. It seems unintentional that txINTERNAL_ERROR would be included in history. IMO, when the system encounters a txINTERNAL_ERROR it should try to recover such that the system behaves as if the offending transaction had never even been submitted. Right @jedmccaleb?

from stellar-protocol.

jedmccaleb commented on July 21, 2024

It is too late at that point though since it happens at apply time. So the tx has already been included in the tx set. We are sort of faced with two bad choices here, A) crash stellar-core (and probably the network) or, B) pollute history with txINTERNAL_ERROR

from stellar-protocol.

nullstyle commented on July 21, 2024

Ah, OK. @vogel is right, IMO: it'd be better to crash than encode txINTERNAL_ERROR into history. I'd love to hear rationale for polluting history if anyone would like to advocate for that design choice. I tend to agree that it will be tough to ask other implementors to deal with polluted history and if we can avoid it we should.

from stellar-protocol.

MonsieurNicolas commented on July 21, 2024

Like Jed said: we can't exclude the transaction as it has been voted by SCP all around.

The tradeoff we made is to keep the network live: as there is only one implementation of the protocol around this seems to be the only reasonable choice, otherwise we would have to turnaround a fix for this bug and deploy it to all validators in order to get the network back (as the transaction has to be included).

from stellar-protocol.

vogel commented on July 21, 2024

We can remove txINTERNAL_ERROR from protocol version 9. For previous versions we can just keep a set of transaction hashes that should result in txINTERNAL_ERROR. It would be much easier for other implementation to just get that set than to replicate errors that caused the txINTERNAL_ERROR to happen.

from stellar-protocol.

nullstyle commented on July 21, 2024

The tradeoff we made is to keep the network live: as there is only one implementation of the protocol around this seems to be the only reasonable choice, otherwise we would have to turnaround a fix for this bug and deploy it to all validators in order to get the network back (as the transaction has to be included).

Ah, that makes sense... thanks for the explanation.

from stellar-protocol.

jedmccaleb commented on July 21, 2024

We can remove txINTERNAL_ERROR from protocol version 9. For previous versions we can just keep a set of transaction hashes that should result in txINTERNAL_ERROR. It would be much easier for other implementation to just get that set than to replicate errors that caused the txINTERNAL_ERROR to happen.

I don't think we can get rid of the txINTERNAL_ERROR result but we can fix the things that have caused txINTERNAL_ERRORs in the past and do the hash of tx thing like you suggest. That is a good idea.

from stellar-protocol.

vogel commented on July 21, 2024

All the things that caused txINTERNAL_ERROR are now fixed. There is special code in ManageDataOpFrame::doApply to trigger that error when needed:

    if (app.getLedgerManager().getCurrentLedgerVersion() == 3)
    {
        throw std::runtime_error(
            "MANAGE_DATA not supported on ledger version 3");
    }

I'm still voting for getting rid of txINTERNAL_ERROR in newer versions of protocol and just crash stellar-core in case something unexpected happens - this way other implementations of our protocol won't have to redo our bugs.

from stellar-protocol.

MonsieurNicolas commented on July 21, 2024

All the things that caused txINTERNAL_ERROR are now fixed

this is incorrect, it should read "all known things" - and it's a point in time statement, who knows the kind of bugs that will be added to the implementation in the future.

re-read what I wrote earlier on the choice we're making here - I don't think crashing the network is something we want to do in this situation.

from stellar-protocol.

piersy commented on July 21, 2024

I'm not familiar with the internals of the stellar codebase, so please forgive me if my suggestion doesn't make sense. But wouldn't it be possible to make a note of the offending transaction, and restart the round of consensus but filter out that particular transaction from the transaction set. You can log a warning with all the info required to investigate the problem and start working on a fix, but in the meantime the network can continue.

from stellar-protocol.

MonsieurNicolas commented on July 21, 2024

trying to exclude transactions after the fact creates a bunch of problems - I described this in more detail (under "other approaches") in #125 where, interestingly I am advocating for using txINTERNAL_ERROR even more than today

from stellar-protocol.

piersy commented on July 21, 2024

So having looked at #125 my suggestion is exactly the "Skip the entire transaction set" approach, which suffers from an attacker being able to DoS the network by constantly submitting transactions that fail.

I'm thinking this could be avoided by banning the account after a certain number of bad transactions.

One thing that I'm wondering about now is, how is a new node able to catch up if it is running code that is not producing the same internal errors for past transactions, that were entered into the ledger by older node implementations? I am assuming here that a node while catching up replays transactions into it's own ledger and validates that its result matches the history it has been given.

from stellar-protocol.

jedmccaleb commented on July 21, 2024

Moving this discussion to #125

from stellar-protocol.

Reconsider txINTERNAL_ERROR result of transaction about stellar-protocol HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent