Numeric Outcome DLCs (#110)

* Began work on Numeric Multi-Nonce Outcome spec, wrote compression algorithm

* More progress, specifically on curve serialization and polynomial interpolation

* Separated precision from function points and added note about polynomial evaluation optimizations when precision is not 1

* Wrote section for putting everything together into a CET set computation

* Wrote CET signature validation section

* Filled in all remaining holes!

* Added table of contents

* Added clarification about why base 2 is best, removed some first person

* Added concrete example

* Added subsections to general example

* Added note on non-generality of concrete example

* Clarified optimizations

* Fixed optimizations

* Fixed algorithm typos

* Responded to review, renamed precision_range -> rounding_interval

* Replaced paragraph about accepter's payout_function with new recommendation

* Split NumericOutcome.md into three files and added some design discussion/intentions

* Added extra precision to interpolation points in general payout functions

* Responded to Ben's review
This commit is contained in:
Nadav Kohen
2021-02-05 02:13:59 -06:00
committed by GitHub
parent a7009ec5d6
commit b9ac02c5aa
3 changed files with 688 additions and 0 deletions

379
CETCompression.md Normal file
View File

@@ -0,0 +1,379 @@
# Contract Execution Transaction Compression
## Introduction
When constructing a DLC for a [numeric outcome](NumericOutcome.md), there are often an unreasonably large number of
possible outcomes to construct a unique CET for every outcome.
We remedy this fact with a CET compression mechanism specified in this document which allows
any flat portions of the DLC's [payout curve](PayoutCurve.md) to be covered with only a logarithmic number of CETs.
It is common for payout curves to have constant extremal payouts for a large number of cases
representing all outcomes considered sufficiently unlikely.
These intervals with constant extremal payouts are often called "collars" and these collars
can be compressed to negligible size making the remaining number of CETs proportional
to the number of sufficiently likely outcomes.
Furthermore, through the use of [rounding intervals](NumericOutcome.md#rounding-intervals), even portions of the payout curve which are not
completely flat can be compressed to some extent, normally causing the total number of CETs to be
divided by some power of two.
This is accomplished through the use of digit decomposition where oracles attesting to
numeric outcomes sign each digit of the outcome individually.
There are as many nonces as there are possible digits required and CETs are claimed using
only some of these signatures, not necessarily all of them.
When not all of the signatures are used, then that corresponding CET represents all events
which agree on the digits for which signatures were used and may have any value at all other
digits where signatures were ignored.
## Table of Contents
* [Adaptor Points with Multiple Signatures](#adaptor-points-with-multiple-signatures)
* [CET Compression](#cet-compression)
* [Concrete Example](#concrete-example)
* [Abstract Example](#abstract-example)
* [Analysis of CET Compression](#analysis-of-cet-compression)
* [Counting CETs](#counting-cets)
* [Optimizations](#optimizations)
* [Algorithms](#algorithms)
* [Reference Implementations](#reference-implementations)
* [Authors](#authors)
## Adaptor Points with Multiple Signatures
Given public key `P` and nonces `R1, ..., Rn` we can compute `n` individual signature points for
a given event `(d1, ..., dn)` in the usual way: `si * G = Ri + H(P, Ri, di)*P`.
To compute a composite adaptor point for all events which agree on the first `m` digits, where
`m` is any positive number less than or equal to `n`, the sum of the corresponding signature
points is used: `s(1..m) * G = (s1 + s2 + ... + sm) * G = s1 * G + s2 * G + ... + sm * G`.
When the oracle broadcasts its `n` signatures `s1, ..., sn`, the corresponding adaptor secret can be
computed as `s(1..m) = s1 + s2 + ... + sm` which can be used to broadcast the CET.
#### Rationale
This design allows implementations to re-use all [transaction construction code](Transactions.md) without modification
because every CET needs as input exactly one adaptor point just like in the single-nonce setting.
Another design that was considered was adding keys to the funding output so that parties could collaboratively
construct `m` adaptor signatures and where `n` signatures are put on-chain in every CET which would reveal
all oracle signatures to both parties when a CET is published.
This design's major drawbacks is that it creates a very distinct fingerprint and makes CET fees significantly worse.
Additionally it leads to extra complexity in contract construction.
This design's only benefit is that it results in simpler and slightly more informative (although larger) fraud proofs.
The large multi-signature design was abandoned because the above proposal is sufficient to generate fraud proofs.
If an oracle incorrectly signs for an event, then only the sum of the digit signatures `s(1..m)`
is recoverable on-chain using the adaptor signature which was given to one's counter-party.
This sum is sufficient information to determine what was signed however as one can iterate through
all possible composite adaptor points until they find one whose pre-image is the signature sum found on-chain.
This will determine what digits `(d1, ..., dm)` were signed and these values along with the oracle
announcement and `s(1..m)` is sufficient information to generate a fraud proof in the multi-nonce setting.
## CET Compression
Anytime there is a range of numeric outcomes `[start, end]` which result in the same payouts for all parties,
then a compression function described in this section can be run to reduce the number of CETs from `O(L)` to `O(log(L))`
where `L = end - start + 1` is the length of the interval being compressed.
Because this compression of CETs only works for intervals which result in the same payout, the [CET calculation algorithm](NumericOutcome.md#contract-execution-transaction-calculation)
first splits the domain into buckets of equal payout, and then applies the compression algorithm from this
document to individual intervals, `[start, end]` where all values in this interval have some fixed payout.
Most contracts are expected to be concerned with some subset of the total possible domain and every
outcome before or after that range will result in some constant maximal or minimal payout.
This means that compression will drastically reduce the number of CETs to be of the order of the size
of the probable domain, with further optimizations available when parties are willing to do some [rounding](NumericOutcome.md#rounding-intervals).
The compression algorithm takes as input a range `[start, end]`, a base `B`, and the number of digits
`n` (being signed by the oracle) and returns an array of arrays of integers (which will all be in the range `[0, B-1]`).
An array of integers corresponds to a single event equal to the concatenation of these integers (interpreted in base `B`).
### Concrete Example
Before generalizing or specifying the algorithm, let us run through a concrete example.
We will consider the range `[135677, 138621]`, in base `10`.
Note that they both begin with the prefix `13` which must be included in every CET, for this purpose I omit these digits for
the remainder of this example as we can simply examine the range `[5677, 8621]` and prepend a `13` to all results.
To cover all cases while looking at as few digits as possible in this range we need only consider
`5677`, `8621` and the following cases:
```
5678, 5679,
568_, 569_,
57__, 58__, 59__,
6_, 7_,
80__, 81__, 82__, 83__, 84__, 85__,
860_, 861_,
8620
```
where `_` refers to an ignored digit (an omission from the array of integers).
(Recall that all of these are prefixed by `13`).
Thus, we are able to cover the entire interval of `2944` outcomes using only `20` CETs!
Here it is again in binary (specifically the range `[5677, 8621]`, not the original range with the `13` prefix in base 10):
Outliers are `5677 = 01011000101101` and `8621 = 10000110101101` with cases:
```
0101100010111_,
0101100011____,
01011001______,
0101101_______,
010111________,
011___________,
100000________,
1000010_______,
100001100_____,
10000110100___,
100001101010__,
10000110101100
```
And so again we are able to cover the entire interval of `2944` outcomes using only `14` CETs this time.
### Abstract Example
Before specifying the algorithm, let us run through a general example and do some analysis.
Consider the range `[(prefix)wxyz, (prefix)WXYZ]` where `prefix` is some string of digits in base `B` which
`start` and `end` share and `w, x, y, and z` are the unique digits of `start` in base `B` while `W, X, Y, and Z`
are the unique digits of `end` in base `B`.
To cover all cases while looking at as few digits as possible in this (general) range we need only consider
`(prefix)wxyz`, `(prefix)WXYZ` and the following cases:
```
wxy(z+1), wxy(z+2), ..., wxy(B-1),
wx(y+1)_, wx(y+2)_, ..., wx(B-1)_,
w(x+1)__, w(x+2)__, ..., w(B-1)__,
(w+1)___, (w+2)___, ..., (W-1)___,
W0__, W1__, ..., W(X-1)__,
WX0_, WX1_, ..., WX(Y-1)_,
WXY0, WXY1, ..., WXY(Z-1)
```
where `_` refers to an ignored digit (an omission from the array of integers) and all of these cases have the `prefix`.
### Analysis of CET Compression
This specification refers to the first three rows of the abstract example above as the **front groupings** the fourth row
in the example as the **middle grouping** and the last three rows in the example as the **back groupings**.
Notice that the patterns for the front and back groupings are nearly identical.
#### Counting CETs
Also note that in total the number of elements in each row of the front groupings is equal to `B-1` minus the corresponding digit.
That is to say, `B-1` minus the last digit is the number of elements in the first row and then the second to last digit and so on.
Likewise the number of elements in each row of the back groupings is equal to the corresponding digit.
That is to say, the last digit corresponds to the last row, second to last digit is the second to last row and so on.
This covers all but the first digit of both `start` and `end` (as well as the two outliers `wxyz` and `WXYZ`).
Thus the total number of CETs required to cover the range will be equal to the sum of the unique digits of `end` except the first,
plus the sum of the unique digits of `start` except for the first subtracted from `B-1` plus the difference of the first digits plus one.
A corollary of this is that the number of CETs required to cover a range of length `L` will be `O(B*log_B(L))` because `log_B(L)`
corresponds to the number of unique digits between the start and end of the range and for each unique digit a row is
generated in both the front and back groupings of length at most `B-1 ` which corresponds to the coefficient in the order bound.
This counting also shows us that base 2 is the optimal base to be using in general cases as it will, in general, outperform all larger bases
in both large and small intervals.
Note that the concrete example above was chosen to be easy to write down in base 10 (large digits in `start`, small digits in `end`) and so it should not
be thought of as a general candidate for this particular consideration.
To help with intuition on this matter, consider an arbitrary range of three digit numbers in base 10.
To capture the same range in base 2 we need 10 digit binary numbers.
However, a random three digit number in base 10 is expected to have a digit sum of 15, while a random ten digit binary number expects a digit sum of only 5!
Thus we should expect base 2 to outperform base 10 by around 3x on average.
This is because using binary results in a compression where each row in the diagram above has only a single element, which corresponds
to binary compression's ability to efficiently reach the largest possible number of digits ignored which itself covers the largest number of cases.
Meanwhile in a base like 10, each row can take up to 9 CETs before moving to a larger number of digits ignored (and cases covered).
Another way to put this is that the inefficiency of base 10 which seems intuitive at small scales is actually equally present at *all scales*!
One final abstract way of intuiting that base 2 is optimal is the following:
We wish to maximize the amount of information that we may ignore when constructing CETs, because abstractly every bit of information ignored
in a CET doubles the number of cases covered with a single transaction and signature.
Thus, if we use any base other than 2, say 10, then we will almost always run into situations where redundant information is needed because we can
only ignore a decimal digit at a time where a decimal digit has 3.3 bits of information.
Meanwhile in binary where every digit encodes exactly a single bit of information, we are able to perfectly ignore all redundant bits of information
resulting in some number near 3.3 times fewer CETs on average.
#### Optimizations
Because `start` and `end` are outliers to the general grouping pattern, there are optimizations that could potentially be made when they are added.
Consider the example in base 10 of the range `[2200, 4999]` which has the endpoints `2200` and `4999` along with the groupings
```
2201, 2202, 2203, 2204, 2205, 2206, 2207, 2208, 2209,
221_, 222_, 223_, 224_, 225_, 226_, 227_, 228_, 229_,
23__, 24__, 25__, 26__, 27__, 28__, 29__,
3___,
40__, 41__, 42__, 43__, 44__, 45__, 46__, 47__, 48__,
490_, 491_, 492_, 493_, 494_, 495_, 496_, 497_, 498_,
4990, 4991, 4992, 4993, 4994, 4995, 4996, 4997, 4998
```
This grouping pattern captures the exclusive range `(2200, 4999)` and then adds the endpoints in ad-hoc to get the inclusive range `[2200, 4999]`.
But this method misses out on a good amount of compression as re-introducing the endpoints allows us to replace the first two rows with
a single `22__` and the last 3 rows with just `4___`.
This optimization is called the **endpoint optimization**.
More generally, let the unique digits of `start` be `start = (x1)(x2)...(xn)` then if `x(n-i+1) = x(n-i+2) = ... = xn = 0`, then the
first `i` rows of the front groupings can be replaced by `(x1)(x2)...(x(n-i))_..._`.
In the example above, `start` ends with two zeros so that the first two rows are replaced by `22__`.
Likewise, let the unique digits of `end` be `end = (y1)(y2)...(yn)` then if `y(n-j+1) = y(n-j+2) = ... = yn = B-1`, then the last `j` rows
of the back groupings can be replaced by `(y1)(y2)...(y(n-j))_..._`.
In the example above, `end` ends with three nines so that the last three rows can are replaced by `4___`.
There is one more optimization that can potentially be made.
If the unique digits of `start` are all `0` and the unique digits of `end` are all `B-1` then we will have no need for a middle grouping as we can cover
this whole interval with just a single CET of `(prefix)_..._`.
This optimization is called the **total optimization**.
### Algorithms
We will first need a function to decompose any number into its digits in a given base.
```scala
def decompose(num: Long, base: Int, numDigits: Int): Vector[Int] = {
var currentNum: Long = num
// Note that (0 until numDigits) includes 0 and excludes numDigits
val backwardsDigits = (0 until numDigits).toVector.map { _ =>
val digit = currentNum % base
currentNum = currentNum / base // Note that this is integer division
digit.toInt
}
backwardsDigits.reverse
}
```
We will use this function for the purposes of this specification but note that when iterating through a range of sequential numbers,
there are faster algorithms for computing sequential decompositions.
We will then need a function to compute the shared prefix and the unique digits of `start` and `end`.
```scala
def separatePrefix(start: Long, end: Long, base: Int, numDigits: Int): (Vector[Int], Vector[Int], Vector[Int]) = {
val startDigits = decompose(start, base, numDigits)
val endDigits = decompose(end, base, numDigits)
val prefixDigits = startDigits
.zip(endDigits)
.takeWhile { case (startDigit, endDigit) => startDigit == endDigit }
.map(_._1)
(prefixDigits, startDigits.drop(prefixDigits.length), endDigits.drop(prefixDigits.length))
}
```
Now we need the algorithms for computing the groupings.
```scala
def frontGroupings(
digits: Vector[Int], // The unique digits of the range's start
base: Int): Vector[Vector[Int]] = {
val nonZeroDigits = digits.reverse.zipWithIndex.dropWhile(_._1 == 0) // Endpoint Optimization
if (nonZeroDigits.isEmpty) { // All digits are 0
Vector(Vector(0))
} else {
val fromFront = nonZeroDigits.init.flatMap { // Note the flatMap collapses the rows of the grouping
case (lastImportantDigit, unimportantDigits) =>
val fixedDigits = digits.dropRight(unimportantDigits + 1)
(lastImportantDigit + 1).until(base).map { lastDigit => // Note that this range excludes lastImportantDigit and base
fixedDigits :+ lastDigit
}
}
nonZeroDigits.map(_._1).reverse +: fromFront // Add Endpoint
}
}
def backGroupings(
digits: Vector[Int], // The unique digits of the range's end
base: Int): Vector[Vector[Int]] = {
val nonMaxDigits = digits.reverse.zipWithIndex.dropWhile(_._1 == base - 1) // Endpoint Optimization
if (nonMaxDigits.isEmpty) { // All digits are max
Vector(Vector(base - 1))
} else {
// Here we compute the back groupings in reverse so as to use the same iteration as in front groupings
val fromBack = nonMaxDigits.init.flatMap { // Note the flatMap collapses the rows of the grouping
case (lastImportantDigit, unimportantDigits) =>
val fixedDigits = digits.dropRight(unimportantDigits + 1)
0.until(lastImportantDigit).reverse.toVector.map { // Note that this range excludes lastImportantDigit
lastDigit =>
fixedDigits :+ lastDigit
}
}
fromBack.reverse :+ nonMaxDigits.map(_._1).reverse // Add Endpoint
}
}
def middleGrouping(
firstDigitStart: Int, // The first unique digit of the range's start
firstDigitEnd: Int): Vector[Vector[Int]] = { // The first unique digit of the range's end
(firstDigitStart + 1).until(firstDigitEnd).toVector.map { firstDigit => // Note that this range excludes firstDigitEnd
Vector(firstDigit)
}
}
```
Finally we are able to use all of these pieces to compress a range to an approximately minimal number of outcomes (by ignoring digits).
```scala
def groupByIgnoringDigits(start: Long, end: Long, base: Int, numDigits: Int): Vector[Vector[Int]] = {
val (prefixDigits, startDigits, endDigits) = separatePrefix(start, end, base, numDigits)
if (start == end) { // Special Case: Range Length 1
Vector(prefixDigits)
} else if (startDigits.forall(_ == 0) && endDigits.forall(_ == base - 1)) {
if (prefixDigits.nonEmpty) {
Vector(prefixDigits) // Total Optimization
} else {
throw new IllegalArgumentException("DLCs with only one outcome are not supported.")
}
} else if (prefixDigits.length == numDigits - 1) { // Special Case: Front Grouping = Back Grouping
startDigits.last.to(endDigits.last).toVector.map { lastDigit =>
prefixDigits :+ lastDigit
}
} else {
val front = frontGroupings(startDigits, base)
val middle = middleGrouping(startDigits.head, endDigits.head)
val back = backGroupings(endDigits, base)
val groupings = front ++ middle ++ back
groupings.map { digits =>
prefixDigits ++ digits
}
}
}
```
## Reference Implementations
* [bitcoin-s](https://github.com/bitcoin-s/bitcoin-s/blob/adaptor-dlc/core/src/main/scala/org/bitcoins/core/protocol/dlc/CETCalculator.scala)
## Authors
Nadav Kohen <nadavk25@gmail.com>
![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png "License CC-BY")
<br>
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

143
NumericOutcome.md Normal file
View File

@@ -0,0 +1,143 @@
# Numeric Outcome DLCs
## Introduction
This document combines the [CET Compression](CETCompression.md) and [Payout Curve](PayoutCurve.md) specifications, along with
independently introduced [Rounding Intervals](#rounding-intervals) to specify the complete procedure for CET
construction, adaptor signing, and signature verification for DLCs over numeric outcomes.
When dealing with enumerated outcomes, DLCs require a single nonce and Contract Execution
Transactions (CETs) are claimed using a single oracle signature.
This scheme results in DLCs which contain a unique CET for every possible outcome, which is
only feasible if the number of possible outcomes is of manageable size.
If an outcome can be any of a large range of numbers, then using a simple enumeration of
all possible numbers in this range is unwieldy.
We optimize for this case by using numeric decomposition in which the oracle signs each digit of the outcome
individually so that many possible outcomes can be [compressed](CETCompression.md) into a single CET by ignoring certain digits.
We also compress the information needed to communicate all outcomes, as this can usually be viewed as a
[payout curve](PayoutCurve.md) parameterized by only a few numbers which determine payouts for the entire possible range.
Lastly, we introduce a method of deterministic rounding which allows DLC participants to increase CET
compression where they are willing to allow some additional rounding error in their payouts.
We put all of these pieces together to specify CET calculation and signature validation procedures
for Numeric Outcome DLCs.
This specification, as well as the [payout curve](PayoutCurve.md) and [CET compression](CETCompression.md) specifications are primarily concerned
with the protocol-level deterministic reproduction and concise communication of generic higher-level information.
These documents are not likely to concern application-level and UI/UX developers, who should operate at
their own levels of abstraction, only to compile application-level information into the formats specified here
when interacting with lowest-level core DLC logic.
## Table of Contents
* [Rounding Intervals](#rounding-intervals)
* [Reference Implementations](#reference-implementations)
* [Rounding Interval Serialization](#rounding-interval-serialization)
* [Contract Execution Transaction Calculation](#contract-execution-transaction-calculation)
* [Contract Execution Transaction Signature Validation](#contract-execution-transaction-signature-validation)
* [Authors](#authors)
## Rounding Intervals
As detailed in the [CET compression](CETCompression.md#cet-compression), any time some continuous interval of the domain results in the same payout value, we can
compress the CETs required by that interval to be logarithmic in size compared to using one CET per outcome on that interval.
As such, it can be beneficial to round the outputs of the payout function to allow for bounded approximation of pieces of the payout
curve by constant-payout intervals.
For example, if two parties are both willing to round payout values to the nearest 100 satoshis, they can have significant savings
on the number of CETs required to enforce their contract.
To this end, we allow parties to negotiate rounding intervals which may vary along the curve, allowing for less rounding near more
probable outcomes and allowing for more rounding to occur near extremes.
Each party has their own minimum `rounding_intervals` and the rounding to be used at a given `event_outcome` is the minimum
of both party's rounding moduli.
If `R` is the rounding modulus to be used for a given `event_outcome` and the result of function evaluation for that `event_outcome` is `value`,
then the amount to be used in the CET output for this party will be the closer of `value - (value % R)` or `value - (value % R) + R`, rounding
up in the case of a tie.
#### Reference Implementations
* [bitcoin-s](https://github.com/bitcoin-s/bitcoin-s/blob/adaptor-dlc/core/src/main/scala/org/bitcoins/core/protocol/dlc/RoundingIntervals.scala)
#### Rounding Interval Serialization
1. type: 42788 (`rounding_intervals_v0`)
2. data:
* [`u16`:`num_rounding_intervals`]
* [`bigsize`:`begin_interval_1`]
* [`bigsize`:`rounding_mod_1`]
* ...
* [`bigsize`:`begin_interval_num_rounding_intervals`]
* [`bigsize`:`rounding_mod_num_rounding_intervals`]
`num_rounding_intervals` is the number of rounding intervals specified in this function and can be
zero in which case a rounding modulus of `1` is used everywhere.
Each serialized rounding interval consists of two `bigsize` integers.
The first integer is called `begin_interval` and refers to the x-coordinate (`event_outcome`) at which this range begins.
The second integer is called `rounding_mod` and contains the rounding modulus to be used in this range.
If `begin_interval_1` is strictly greater than `0`, then the interval between `0` and `begin_interval_1` has a precision of `1`.
#### Requirements
* `begin_interval_1`, if it exists, MUST be non-negative.
* `begin_interval` MUST strictly increase.
## Contract Execution Transaction Calculation
Given the offerrer's [payout function](PayoutCurve.md), a `total_collateral` amount and [rounding intervals](#rounding-intervals), we wish to compute a list of pairs
of digits (i.e. arrays of integers) and Satoshi values.
Each of these pairs will then be turned into a CET whose adaptor point is [computed from the list of integers](CETCompression.md#adaptor-points-with-multiple-signatures) and whose
output values will be equal to the Satoshi payout and `total_collateral` minus that payout.
We must first modify the pure function given to us (e.g. by interpolating points) by applying rounding, as well as setting all
negative payouts to `0` and all computed payouts above `total_collateral` to equal `total_collateral`.
Next, we split the function's domain into two kinds of intervals:
1. Intervals in which the modified function's value is constant.
2. Intervals in which the modified function's values are changing at every point.
This can be done by evaluating the modified function at every point in the domain and keeping track of whether or not the value has
changed to construct the intervals, but this is not a particularly efficient solution.
There are countless ways to go about making this process more efficient such as binary searching for function value changes or looking
at the unmodified function's derivatives.
Regardless of how these intervals are computed, it is required that the constant-valued intervals be as large as possible.
For example if you have two constant-valued intervals in a row with the same value, these must be merged.
Finally, once these intervals have been computed, the [CET compression](CETCompression.md#cet-compression) algorithm is run on each constant-valued interval which generates
a list of integers to be paired with the (constant) payout for that interval.
For variable-payout intervals, a unique CET is constructed for every `event_outcome` where all digits of that `event_outcome` are included
in the array of integers and the Satoshi payout is equal to the output of the modified function for that `event_outcome`.
## Contract Execution Transaction Signature Validation
To validate the adaptor signatures for CETs given in a `dlc_accept` or `dlc_sign` message, do the [process above](#contract-execution-transaction-calculation[) of computing the list of pairs of
arrays of digits and payout values to construct the CETs and their adaptor points and then run the `adaptor_verify` function.
However, if `adaptor_verify` results in a failed validation, do not terminate the CET signing process.
Instead, you must look at whether you rounded up (to `value - (value % rounding_mod) + rounding_mod`)
or down (to `value - (value % rounding_mod)`).
If you rounded up, compute the CET resulting from rounding down or if you rounded down, compute the CET resulting from rounding up.
Call the `adaptor_verify` function against this new CET and if it passes verification, consider that adaptor signature valid and continue.
This extra step is necessary because there is no way to introduce deterministic floating point computations into this specification without also
introducing complexity of magnitude much larger than that of this entire specification.
This is because there are no guarantees provided by hardware, operating systems, or programming language compilers that doing the same
floating point computation twice will yield the same results.
Thus, this specification instead takes the stance that clients must be resilient to off-by-precision (off-by-one * rounding_mod) differences between machines.
## Authors
Nadav Kohen <nadavk25@gmail.com>
![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png "License CC-BY")
<br>
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

166
PayoutCurve.md Normal file
View File

@@ -0,0 +1,166 @@
# Payout Curve Serialization
## Introduction
When constructing a DLC for a [numeric outcome](NumericOutcome.md), there are often an unreasonably large number of
possible outcomes to practically enumerate them all in an offer, along with their associated payouts.
Often times, there exists some macroscopic structure whereby a few parameters determine the
payouts for all possible outcomes, making these parameters a more succinct serialization for
numeric outcome DLC payout curves.
This document begins by specifying serialization and deserialization (aka evaluation) of so-called
[General Payout Curves](#general-payout-curves) which should be sufficient for any simple payout curves, such as those
composed of some combination of lines, as well as for custom payout curves which do not
warrant their own types to be created.
## Table of Contents
* [General Payout Curves](#general-payout-curves)
* [Design](#design)
* [Curve Serialization](#curve-serialization)
* [General Function Evaluation](#general-function-evaluation)
* [Reference Implementations](#reference-implementations)
* [Optimized Evaluation During CET Calculation](#optimized-evaluation-during-cet-calculation)
* [Authors](#authors)
## General Payout Curves
### Design
The goal of this specification is to enable general payout curve shapes efficiently and compactly while also ensuring that
simpler and more common payout curves (such as a straight line for a forward contract) do not become complex
while conforming to the generalized structure.
Specifically, the general payout curve specification supports the set of piecewise polynomial functions (with no continuity
requirements between pieces).
If there is some payout curve "shape" which you wish to support which is not efficiently represented as a piecewise polynomial
function, then you should propose a new `payout_function` type which efficiently specifies a minimal set of parameters from
which the entire payout curve can be determined.
An example of one such candidate would be curves with the "shape" `1/outcome` which are fully determined by only a couple
parameters but can require thousands of interpolation points when using polynomial interpolation.
Since lines are polynomials, simple curves remain simple when represented in the language of piecewise polynomial functions.
Any interesting (e.g. non-random) payout curve can be closely approximated using a cleverly constructed [polynomial interpolation](https://en.wikipedia.org/wiki/Polynomial_interpolation).
And lastly, serializing these functions can be done compactly by providing only a few points of each polynomial piece so as to
enable the receiving party in the communication to interpolate the polynomials from this minimal amount of information.
It is important to note however, that due to [Runge's phenomenon](https://en.wikipedia.org/wiki/Runge%27s_phenomenon), it will usually be preferable for clients to construct their payout curves
using some choice of [spline interpolation](https://en.wikipedia.org/wiki/Spline_interpolation) instead of directly using polynomial interpolation (unless linear approximation is sufficient)
where a spline is made up of polynomial pieces so that the resulting interpolation can be written as a piecewise polynomial one.
Please note that payout curves are a protocol-level abstraction, and that at the application layer users will likely be
interacting with some set of parameters on some contract templates.
They will not be directly interfacing with General Payout Curves and their serialization, which are meant for efficient
serialization and deterministic reproduction between core DLC logic implementations.
General Payout Curves also handle a large number of common application use-cases' interpolation logic,
which is to say that applications likely do not have to compute all outcome points to serialize to this format
but instead can usually use only the user-provided parameters to directly compute a small number of relevant
interpolation points.
### Curve Serialization
In this section we detail the TLV serialization for a general `payout_function`.
#### Version 0 payout_function
1. type: 42790 (`payout_function_v0`)
2. data:
* [`u16`:`num_pts`]
* [`boolean`:`is_endpoint_1`]
* [`bigsize`:`event_outcome_1`]
* [`bigsize`:`outcome_payout_1`]
* [`u16`:`extra_precision_1`]
* ...
* [`boolean`:`is_endpoint_num_pts`]
* [`bigsize`:`event_outcome_num_pts`]
* [`bigsize`:`outcome_payout_num_pts`]
* [`u16`:`extra_precision_num_pts`]
`num_pts` is the number of points on the payout curve that will be provided for interpolation.
Each point consists of a `boolean` and two `bigsize` integers and a `u16`.
The `boolean` is called `is_endpoint` and if this is true, then this point marks the end of a
polynomial piece and the beginning of a new one. If this is false then this is a midpoint
between the previous and next endpoints.
The first integer is called `event_outcome` and contains the actual number that could be signed by the oracle
(note: not in the serialization used by the oracle) which corresponds to an x-coordinate on the payout curve.
The second integer is called `outcome_payout` and is set equal to the local party's payout should
`event_outcome` be signed which corresponds to a y-coordinate on the payout curve.
The third integer, a `u16`, is called `extra_precision` and is set to be the first 16 bits of the payout after
the binary point which were rounded away.
This extra precision ensures that interpolation does not contain large errors due to error in the
`outcome_payout`s due to rounding.
To be precise, the points used for interpolation should be:
(`event_outcome`, `outcome_payout + double(extra_precision) >> 16`).
For the remainder of this document, the value `outcome_payout + double(extra_precision) >> 16`
is refered to as `outcome_payout`.
Note that this `payout_function` is from the offerer's point of view.
To evaluate the accepter's `payout_function`, you must evaluate the offerer's `payout_function` at a given
`event_outcome` and subtract the resulting payout from `total_collateral`.
It is important that you do NOT construct the accepter's `payout_function` by replacing all `outcome_payout`s in the
offerer's `payout_function` with `total_collateral - outcome_payout` and interpolating the resulting points.
This does not work because, due to rounding, the sum of the outputs of both parties' `payout_function`s could then be
`total_collateral - 1` and including checking this case (with one satoshi missing from either outcome) to the verification algorithm
could make verification times up to four times slower and adds complexity to the protocol by breaking a reasonable assumption.
#### Requirements
* `num_pts` MUST be at least `2`.
* `event_outcome` MUST strictly increase.
If a discontinuity is desired, a sequential `event_outcome` should be used in the second point.
This is done to avoid ambiguity about the value at a discontinuity.
### General Function Evaluation
There are many ways to compute the unique polynomial determined by some set of interpolation points.
I choose to detail [Lagrange Interpolation](https://en.wikipedia.org/wiki/Lagrange_polynomial) here due to its relative simplicity, but any algorithm should work so long is it does
not result in approximations with an error too large so as to fail [validation](NumericOutcome.md#contract-execution-transaction-signature-validation).
To name only a few other algorithms, if you are interested in alternatives you may wish to use a [Vandermonde matrix](https://en.wikipedia.org/wiki/Polynomial_interpolation#Constructing_the_interpolation_polynomial) or
another alternative, the method of [Divided Differences](https://en.wikipedia.org/wiki/Newton_polynomial#Divided-Difference_Methods_vs._Lagrange).
Please note that while the following may seem complex, it should boil down to very few lines of code.
Furthermore this problem is well-known and solutions in most languages likely exist on sites such as
Stack Overflow from which they can be copied so that only minor aesthetic modifications are required.
Given a potential `event_outcome` compute the `outcome_payout` as follows:
1. Binary search the endpoints by `event_outcome`
* If found, return `outcome_payout`
* Else let `points` be all of the interpolation points between (inclusive)
the previous and next endpoint
2. Let `lagrange_line(i, j) = (event_outcome - points(j).event_outcome)/(points(i).event_outcome - points(j).event_outcome)`
3. Let `lagrange(i) := PROD(j = 0, j < points.length && j != i, lagrange_line(i, j))`
4. Return `SUM(i = 0, i < points.length, points(i).outcome_payout * lagrange(i))`
#### Reference Implementations
* [bitcoin-s](https://github.com/bitcoin-s/bitcoin-s/blob/adaptor-dlc/core/src/main/scala/org/bitcoins/core/protocol/dlc/DLCPayoutCurve.scala#L324)
### Optimized Evaluation During CET Calculation
There are many optimizations to this piecewise interpolation function that can be made
when repeatedly and sequentially evaluating an interpolation as is done during CET calculation.
* The binary search in step 1 can be avoided when computing for sequential inputs.
* The value ` points(i).outcome_payout / PROD(j = 0, j < points.length && j != i, points(i).event_outcome - points(j).event_outcome)`
can be cached for each `i` in a polynomial piece, call this `coef_i`.
For a given `event_outcome` let `all_prod = PROD(i = 0, i < points.length, event_outcome - points(i).event_outcome)`.
The sum can then be computed as `SUM(i = 0, i < points.length, coef_i * all_prod / (event_outcome - points(i).event_outcome))`.
* When precision ranges are introduced, derivatives of the polynomial can be used to reduce the number of calculations needed.
For example, when dealing with a cubic piece, if you are going left to right and enter a new value modulo precision while the first derivative (slope) is positive and the second derivative (concavity) is negative then you can take the tangent line to the curve at this point and find it's intersection with the next value boundary modulo precision.
If the derivatives' signs are the same, then the interval from the current x-coordinate to the x-coordinate of the intersection is guaranteed to all be the same value modulo precision.
## Authors
Nadav Kohen <nadavk25@gmail.com>
![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png "License CC-BY")
<br>
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).