Return-Path: Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6AA94C000B for ; Fri, 4 Mar 2022 23:21:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 417E44010E for ; Fri, 4 Mar 2022 23:21:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -2.098 X-Spam-Level: X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id saexSjTB7Map for ; Fri, 4 Mar 2022 23:21:55 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.8.0 Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235]) by smtp2.osuosl.org (Postfix) with ESMTPS id C86B1400E9 for ; Fri, 4 Mar 2022 23:21:54 +0000 (UTC) Received: by mail-lj1-x235.google.com with SMTP id p20so13013345ljo.0 for ; Fri, 04 Mar 2022 15:21:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=rRRAB3OFBb80kbLRW1tlSu2Yvb56ZdJ2GnKoZ4LFiyE=; b=QNZkZzWg8/CCJkmkuJ1ogEEbEjCfxn7kuqXMg0V5vBh573NGesfbcetM4E1h4chF2l heE4QO8fO2FcU4lJdiIyj1jxjEQ8f9mtqqj8RZnUIyHQTxkUH6oiKCeqc4sGJN69Yoe5 UprXSJrgntPxp6pWoOK3br/C2jGR3JvuBte9PsEYMoxxmtpzM+YW6u99SblpjlWzXIDd LQHb6UNtJ+cjxKDULAMSxZc+dYFskWrF6G2tIN2C7C1M63KE6ChytDHYmAeKSlXe5DDz Z7/hFGX/lPNK/w/anOCW6ZHmnapiMtyDhM84uGSCi8EzLBIEeLOsPk8Gs0j3sobgWVW7 0kTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=rRRAB3OFBb80kbLRW1tlSu2Yvb56ZdJ2GnKoZ4LFiyE=; b=BTDxQhHvMeSTWeG5+530E5JXzKglxZ1mXc2hd/Sbq4JVlmdrmV6kbzHs/dFQLrHRmd H2Wr8XRAbxjYZrCAM3eYGo2Xyc46irD2TB4SWih+uwcpPkyXLdFRQq48XftEU7GxmVa+ AscFQggFq5RDEndWn+5WXfHdR6BdbWOF+XmpgAIcHuaHy1KjLZsdQn2qMHxwS5k2DyD9 znUxo45YXK0XS5L5u1t/WXojQANK+Lvf+VLRMiYTTCuDucq0eswjCyvnQaSmNX8eDciR JASlDWGv3cAKHnCHcBhcRNq075za71syKH04ndBlDgcW0qR1bfKDRZG9YACCGQdVRvwH 0OAg== X-Gm-Message-State: AOAM533wisBdCf+b21yWHBfROq4WES87BApvJn9hPBtH+KuDDSXnJjcE 23kaQOye+VZjzQj4hHa+PMKESAjF+mmasq2DL81ZTp5vpN8= X-Google-Smtp-Source: ABdhPJw9tNZVMdkhi2+q24NDim548SREAAbIZGZ6UwbGduQIwLs2t4fL4kAqU+BxCQQSUhelatuRL7bVWWYGfoaWxTU= X-Received: by 2002:a2e:6804:0:b0:245:f269:618 with SMTP id c4-20020a2e6804000000b00245f2690618mr580282lja.198.1646436112167; Fri, 04 Mar 2022 15:21:52 -0800 (PST) MIME-Version: 1.0 From: Jeremy Rubin Date: Fri, 4 Mar 2022 23:21:41 +0000 Message-ID: To: Bitcoin development mailing list Content-Type: multipart/alternative; boundary="0000000000009c76d605d96cc722" Subject: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX, Turing Completeness, and other considerations X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Mar 2022 23:21:57 -0000 --0000000000009c76d605d96cc722 Content-Type: text/plain; charset="UTF-8" I've seen some discussion of what the Annex can be used for in Bitcoin. For example, some people have discussed using the annex as a data field for something like CHECKSIGFROMSTACK type stuff (additional authenticated data) or for something like delegation (the delegation is to the annex). I think before devs get too excited, we should have an open discussion about what this is actually for, and figure out if there are any constraints to using it however we may please. The BIP is tight lipped about it's purpose, saying mostly only: *What is the purpose of the annex? The annex is a reserved space for future extensions, such as indicating the validation costs of computationally expensive new opcodes in a way that is recognizable without knowing the scriptPubKey of the output being spent. Until the meaning of this field is defined by another softfork, users SHOULD NOT include annex in transactions, or it may lead to PERMANENT FUND LOSS.* *The annex (or the lack of thereof) is always covered by the signature and contributes to transaction weight, but is otherwise ignored during taproot validation.* *Execute the script, according to the applicable script rules[11], using the witness stack elements excluding the script s, the control block c, and the annex a if present, as initial stack.* Essentially, I read this as saying: The annex is the ability to pad a transaction with an additional string of 0's that contribute to the virtual weight of a transaction, but has no validation cost itself. Therefore, somehow, if you needed to validate more signatures than 1 per 50 virtual weight units, you could add padding to buy extra gas. Or, we might somehow make the witness a small language (e.g., run length encoded zeros) such that we can very quickly compute an equivalent number of zeros to 'charge' without actually consuming the space but still consuming a linearizable resource... or something like that. We might also e.g. want to use the annex to reserve something else, like the amount of memory. In general, we are using the annex to express a resource constraint efficiently. This might be useful for e.g. simplicity one day. Generating an Annex: One should write a tracing executor for a script, run it, measure the resource costs, and then generate an annex that captures any externalized costs. ------------------- Introducing OP_ANNEX: Suppose there were some sort of annex pushing opcode, OP_ANNEX which puts the annex on the stack as well as a 0 or 1 (to differentiate annex is 0 from no annex, e.g. 0 1 means annex was 0 and 0 0 means no annex). This would be equivalent to something based on OP_TXHASH OP_TXHASH. Now suppose that I have a computation that I am running in a script as follows: OP_ANNEX OP_IF `some operation that requires annex to be <1>` OP_ELSE OP_SIZE `some operation that requires annex to be len(annex) + 1 or does a checksig` OP_ENDIF Now every time you run this, it requires one more resource unit than the last time you ran it, which makes your satisfier use the annex as some sort of "scratch space" for a looping construct, where you compute a new annex, loop with that value, and see if that annex is now accepted by the program. In short, it kinda seems like being able to read the annex off of the stack makes witness construction somehow turing complete, because we can use it as a register/tape for some sort of computational model. ------------------- This seems at odds with using the annex as something that just helps you heuristically guess computation costs, now it's somehow something that acts to make script satisfiers recursive. Because the Annex is signed, and must be the same, this can also be inconvenient: Suppose that you have a Miniscript that is something like: and(or(PK(A), PK(A')), X, or(PK(B), PK(B'))). A or A' should sign with B or B'. X is some sort of fragment that might require a value that is unknown (and maybe recursively defined?) so therefore if we send the PSBT to A first, which commits to the annex, and then X reads the annex and say it must be something else, A must sign again. So you might say, run X first, and then sign with A and C or B. However, what if the script somehow detects the bitstring WHICH_A WHICH_B and has a different Annex per selection (e.g., interpret the bitstring as a int and annex must == that int). Now, given and(or(K1, K1'),... or(Kn, Kn')) we end up with needing to pre-sign 2**n annex values somehow... this seems problematic theoretically. Of course this wouldn't be miniscript then. Because miniscript is just for the well behaved subset of script, and this seems ill behaved. So maybe we're OK? But I think the issue still arises where suppose I have a simple thing like: and(COLD_LOGIC, HOT_LOGIC) where both contains a signature, if COLD_LOGIC and HOT_LOGIC can both have different costs, I need to decide what logic each satisfier for the branch is going to use in advance, or sign all possible sums of both our annex costs? This could come up if cold/hot e.g. use different numbers of signatures / use checksigCISAadd which maybe requires an annex argument. ------------ It seems like one good option is if we just go on and banish the OP_ANNEX. Maybe that solves some of this? I sort of think so. It definitely seems like we're not supposed to access it via script, given the quote from above: *Execute the script, according to the applicable script rules[11], using the witness stack elements excluding the script s, the control block c, and the annex a if present, as initial stack.* If we were meant to have it, we would have not nixed it from the stack, no? Or would have made the opcode for it as a part of taproot... But recall that the annex is committed to by the signature. So it's only a matter of time till we see some sort of Cat and Schnorr Tricks III the Annex Edition that lets you use G cleverly to get the annex onto the stack again, and then it's like we had OP_ANNEX all along, or without CAT, at least something that we can detect that the value has changed and cause this satisfier looping issue somehow. Not to mention if we just got OP_TXHASH ----------- Is the annex bad? After writing this I sort of think so? One solution would be to... just soft-fork it out. Always must be 0. When we come up with a use case for something like an annex, we can find a way to add it back. Maybe this means somehow pushing multiple annexes and having an annex stack, where only sub-segments are signed for the last executed signature? That would solve looping... but would it break some aggregation thing? Maybe. Another solution would be to make it so the annex is never committed to and unobservable from the script, but that the annex is always something that you can run get_annex(stack) to generate the annex. Thus it is a hint for validation rules, but not directly readable, and if it is modified you figure out the txn was cheaper sometime after you execute the scripts and can decrease the value when you relay. But this sounds like something that needs to be a p2p only annex, because consensus we may not care (unless it's something like preallocating memory for validation?). ----------------------- Overall my preference is -- perhaps sadly -- looking like we should soft-fork it out of our current Checksig (making the policy that it must 0 a consensus rule) and redesign the annex technique later when we actually know what it is for with a new checksig or other mechanism. But It's not a hard opinion! It just seems like you can't practically use the annex for this worklimit type thing *and* observe it from the stack meaningfully. Thanks for coming to my ted-talk, Jeremy -- @JeremyRubin --0000000000009c76d605d96cc722 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I've seen some discus= sion of what the Annex can be used for in Bitcoin. For example, some people= have discussed using the annex as a data field for something like CHECKSIG= FROMSTACK type stuff (additional authenticated data) or for something like = delegation (the delegation is to the annex). I think before devs get too ex= cited, we should have an open discussion about what this is actually for, a= nd figure out if there are any constraints to using it however we may pleas= e.

The BIP is tight lipped about it's purpose, saying mostly only= :

What is the purpose of the annex= ? The annex is a reserved space for future extensions, such as indicating t= he validation costs of computationally expensive new opcodes in a way that = is recognizable without knowing the scriptPubKey of the output being spent.= Until the meaning of this field is defined by another softfork, users SHOU= LD NOT include annex in transactions, or it may lead to PERMANENT FUND LOSS= .

The annex (or the lack of thereof) is always covered by the s= ignature and contributes to transaction weight, but is otherwise ignored du= ring taproot validation.

<= /div>
Execute the script, according to the a= pplicable script rules[11], using the witness stack elements excluding the = script s, the control block c, and the annex a if present, as initial stack= .

Essentially, I = read this as saying: The annex is the ability to pad a transaction with an = additional string of 0's that contribute to the virtual weight of a tra= nsaction, but has no validation cost itself. Therefore, somehow, if you nee= ded to validate more signatures than 1 per 50 virtual weight units, you cou= ld add padding to buy extra gas. Or, we might somehow make the witness a sm= all language (e.g., run length encoded zeros) such that we can very quickly= compute an equivalent number of zeros to 'charge' without actually= consuming the space but still consuming a linearizable resource... or some= thing like that. We might also e.g. want to use the annex to reserve someth= ing else, like the amount of memory. In general, we are using the annex to = express a resource constraint efficiently. This might be useful for e.g. si= mplicity one day.

Generating an Annex: One should write a tracing executor for a script= , run it, measure the resource costs, and then generate an annex that captu= res any externalized costs.

-------------------
=
Introducing OP_ANNEX: Suppose= there were some sort of annex pushing opcode, OP_ANNEX which puts the anne= x on the stack as well as a 0 or 1 (to differentiate annex is 0 from no ann= ex, e.g. 0 1 means annex was 0 and 0 0 means no annex). This would be equiv= alent to something based on <annex flag> OP_TXHASH <has annex flag= > OP_TXHASH.


O= P_ANNEX
OP_IF
=C2= =A0 =C2=A0 `some operation that requires annex to be <1>`
OP_ELSE
=C2=A0 =C2=A0 OP_SIZE<= /font>
=C2=A0 =C2=A0 `some operation that requires ann= ex to be len(annex)=C2=A0+ 1 or does a checksig`

Now= every time you run this, it requires one more resource unit than the last = time you ran it, which makes your satisfier use the annex as some sort of &= quot;scratch space" for a looping construct, where you compute a new a= nnex, loop with that value, and see if that annex is now accepted by the pr= ogram.

In shor= t, it kinda seems like being able to read the annex off of the stack makes = witness construction somehow turing complete, because we can use it as a re= gister/tape for some sort of computational model.

----------= ---------


Because the Annex is signed, and must be = the same, this can also be inconvenient:

Suppose that you have a Miniscript that is somet= hing like: and(or(PK(A), PK(A')), X, or(PK(B), PK(B'))).

A or A' should sign = with B or B'. X is some sort of fragment that might require a value tha= t is unknown (and maybe recursively defined?) so therefore if we send the P= SBT to A first, which commits to the annex, and then X reads the annex and = say it must be something else, A must sign again. So you might say, run X f= irst, and then sign with A and C or B. However, what if the script somehow = detects the bitstring WHICH_A WHICH_B and has a different Annex per selecti= on (e.g., interpret the bitstring as a int and annex must =3D=3D that int).= Now, given and(or(K1, K1'),... or(Kn, Kn')) we end up with needing= to pre-sign 2**n annex values somehow... this seems problematic theoretica= lly.

Of cours= e this wouldn't be miniscript then. Because miniscript is just for the = well behaved subset of script, and this seems ill behaved. So maybe we'= re OK?

But I t= hink the issue still arises where suppose I have a simple thing like: and(C= OLD_LOGIC, HOT_LOGIC) where both contains a signature, if COLD_LOGIC and HO= T_LOGIC can both have different costs, I need to decide what logic each sat= isfier for the branch is going to use in advance, or sign all possible sums= of both our annex costs? This could come up if cold/hot e.g. use different= numbers of signatures / use checksigCISAadd which maybe requires an annex = argument.


=

= ------------<= /font>

<= font color=3D"#000000" face=3D"arial, helvetica, sans-serif">It seems like = one good option is if we just go on and banish the OP_ANNEX. Maybe that sol= ves some of this? I sort of think so. It definitely seems like we're no= t supposed to access it via script, given the quote from above:

Execute the script, according to the applicable scrip= t rules[11], using the witness stack elements excluding the script s, the c= ontrol block c, and the annex a if present, as initial stack.

=
If we were meant to have it, we would ha= ve not nixed it from the stack, no? Or would have made the opcode for it as= a part of taproot...

But recall that the annex is committed=C2=A0to by=C2=A0the signature.

So it's only a= matter of time till we see some sort of Cat and Schnorr Tricks III the Ann= ex Edition that lets you use G cleverly to get the annex onto the stack aga= in, and then it's like we had OP_ANNEX all along, or without CAT, at le= ast something that we can detect that the value has changed and cause this = satisfier looping issue somehow.
<= font color=3D"#000000" face=3D"arial, helvetica, sans-serif">
Not to mention if we just got OP_TXHASH



-----------

Is the annex bad? After writing thi= s I sort of think so?

<= /div>
One solution would be to... just soft-fork it ou= t. Always must be 0. When we come up with a use case for something like an = annex, we can find a way to add it back.=C2=A0 Maybe this means somehow pus= hing multiple annexes and having an annex stack, where only sub-segments ar= e signed for the last executed signature? That would solve looping... but w= ould it break some aggregation thing? Maybe.


Another solution would be to make it so the annex is never = committed=C2=A0to and unobservable from the script, but that the annex is a= lways something that you can run get_annex(stack) to generate the annex. Th= us it is a hint for validation rules, but not directly readable, and if it = is modified you figure out the txn was cheaper sometime after you execute t= he scripts and can decrease the value when you relay. But this sounds like = something that needs to be a p2p only annex, because consensus we may not c= are (unless it's something like preallocating memory for validation?).<= /div>

--= ---------------------

Overall my preference is -- perhaps sadly -- looking li= ke we should soft-fork it out of our current Checksig (making the policy th= at it must 0 a consensus rule) and redesign the annex technique later when = we actually know what it is for with a new checksig or other mechanism. But= It's not a hard opinion! It just seems like you can't practically = use the annex for this worklimit type thing *and* observe it from the stack= meaningfully.



Thanks for coming to my ted-talk,

Jeremy


--0000000000009c76d605d96cc722--