Survey of Time Preference, Delay Discounting Models

The paper surveys over twenty models of delay discounting (also known as temporal discounting, time preference, time discounting), that psychologists and economists have put forward to explain the way people actually trade off time and money. Using little more than the basic algebra of powers and logarithms, we show how the models are derived, what assumptions they are based upon, and how different models relate to each other. Rather than concentrate only on discount functions themselves, we show how discount functions may be manipulated to isolate rate parameters for each model. This approach, consistently applied, helps focus attention on the three main components in any discounting model: subjectively perceived money; subjectively perceived time; and how these elements are combined. We group models by the number of parameters that have to be estimated, which means our exposition follows a trajectory of increasing complexity to the models. However, as the story unfolds it becomes clear that most models fall into a smaller number of families. We also show how new models may be constructed by combining elements of different models. The surveyed models are: Exponential; Hyperbolic; Arithmetic; Hyperboloid (Green & Myerson, Rachlin); Loewenstein and Prelec Generalized Hyperboloid; quasi-Hyperbolic (β-δ); Benhabib et al’s fixed cost; Benhabib et al’s Exponential/Hyperbolic/quasi-Hyperbolic; Read’s discounting fractions; Roelofsma’s exponential time; Scholten and Read’s discounting-by-intervals (DBI); Ebert and Prelec’s constant sensitivity (CS); Bleichrodt et al.’s constant absolute decreasing impatience (CADI); Bleichrodt et al.’s constant relative decreasing impatience (CRDI); Green, Myerson, and Macaux’s hyperboloid over intervals models; Killeen’s additive utility; size-sensitive additive utility; Yi, Reid, and Bickel’s memory trace models; McClure et al.’s two exponentials; Scholten and Read’s trade-off model; and Decision-by-Sampling (DbS). For a convenient overview, a single “cheat sheet” table captures the notation and essential mathematics behind all but one of the models.

In this survey we classify models somewhat superficially by the number of additional free parameters that an experimental design would need to estimate over and above the rate parameter. We build up the complexity of the models. In the end, however, we are able to show that most models belong to a few families, from which simpler models are special cases. We also speculate on future models.

Stevens, Weber, powers, logs, ratios, differences, and started time
Most of the models described in this survey, can be classified by their treatment of three main issues: (i) (how) is the subjective perception of money to be treated; (ii) (how) is the subjective perception of time to be treated; and (iii) how is money to be traded-off against time? In deciding these questions, we will be making repeated use of a few psychological principles and related mathematical relationships. Though they appear throughout the paper, it is worth collecting them together before we begin.

Stevens versus Weber
A long-standing debate in psychology has been whether basic psychophysical phenomena such as the subjective brightness of light, heaviness, sound, pain are best modeled by using logs or powers. The Weber-Fechner "law" dates back to the 18 th century and promotes a log treatment. The law assumes that the subjective change in a stimulus (ΔS) has a constant relation with the change in objective energy (ΔE) of a stimulus relative to the existing energy (E). ΔS = k.ΔE / E (x1) As an example, many candles will be needed at noon on a sunny day to give the same subjective increase in brightness experienced by lighting a single candle at twilight. Integrating, we have S = k.log(E), (x2) which is the Weber-Fechner law, or just Weber's Law. (Log is assumed base e throughout this paper.) Its main challenger is the power "law", which instead suggests that: S = K.E c , (x3) where c depends on the context. For instance, Stevens (1957;1961) documented different power laws for different psychophysical phenomena. Subjective loudness is proportional to (objective sound pressure) .67 , whereas subjective pain is proportional to (electrical shock to finger) 3.5 , and so on. It is worth noting that in the cases investigated by Stevens, there was approximately an even split between exponents > 1, and exponents < 1. The models surveyed in this paper assume that the abstract concepts of time and money (and even time intervals) may be treated in the same way as physical stimuli, by using either Weber's Law or Stevens' Power Law. Eisler's (1976) comprehensive review of studies spanning a hundred years settled on a Stevens power exponent for time perception of c ≈ .9. Classical economics, for instance, expects a decreasing marginal utility of money and therefore that the exponent should be < 1. Surveying his own and others' work Stewart (2009) noted a stable exponent for money that was just under .5, i.e. approximately a square root law.

Powers and logs (Stevens Ê Weber?)
Undoubtedly Stevens' power law is more flexible than Weber's log law, though at the expense of having to estimate an additional parameter. If c = 1 we are treating time or money as if objective. As c decreases towards zero, E c becomes more curved, in fact more logarithmic-like. This follows from a mathematical result that is important for this paper, which is that if x > 0: (x c -1 ) / c → log(x), as c → 0. (x4) In many applications that compute relative differences (including here) constant scaling factors do not matter just as long as they are applied to all observations. Consequently, the c in the denominator can usually be ignored. Therefore for small c > 0, x c becomes log-like around 1, rather than around 0, which would be more useful.
Sometimes this doesn't matter. For instance, often we are interested in difference relationships such as v = (x cy c ). In this particular case, the Stevens' Power Law nests Webers' Log Law as a special case. This follows because we can re-write the equations as: v/c = (x c -1)/c -(y c -1)/c, (x5) then using the mathematical result, v/c → log(x) -log(y) = log(x/y) as c → 0, (x6) Therefore in order to build in modeling flexibility, rather than forcing a relationship to be log(x/y) and thus fixing on a Weber law, we can let it be the more general x cy c and let an optimizing program determine which c best fits the empirical data. If c is close to zero, then a Weber-like log law applies, if c is close to 1, then x and y are linear, if c = .5, then we have a square-root law, and so on.
Sometimes it does matter. For instance, when a single variable is being considered, x c → 1 as c → 0, which implies that the x variable is redundant. In order to avoid this problem, and to be consistent with the log behavior of (x cy c ), the form (x c -1)/c is often used to express powers, instead of x c .

Started values
Unfortunately, it may happen that x takes the value 0, either always or sometimes. Typically this happens when an amount of money is offered now (at t = 0) compared with an amount offered later. This is then a problem, because log(0) is undefined. The problem also occurs for any power law that may be nonpositive. So, any model that wants to treat time as logarithmic, either explicitly or implicitly as the limit of a power law, throws up an anomaly for t = 0, which is actually the most frequently used value of t. One way out of this difficulty is to present t = 0 as a special (sometimes absent) case. Another is to use "started time" instead of time itself. Started counts, started reciprocals, started logs, etc. were recommended by Tukey (1977) and Mosteller and Tukey (1977) to overcome just this kind of problem when dealing with troublesome zeros. Started time translates all measurements t into t + ε, where ε is small relative to the values that t will take. Equivalently, in our calculations we can translate time from days, as may have been stated in the questionnaire, into hours (or even minutes), and just let ε = 1. Under this translation, t = 0 would become t * = 1; t = 2 days would become t * = 49; 7 days would become t * = 169 (all started hours), and so on, with the * subscript emphasizing that we are dealing with started time. Although this tweak is motivated by avoiding a mathematical problem, started time may pragmatically reflect reality better than the nominal time appearing on the questionnaire; because when a reward is offered "now" participants would not assume they would get the reward instantaneously, rather that they would get it after the experiment was overi.e., quite possibly about an hour later. This delay between mathematical and actual now is even more apparent if the amount is offered "today".
Generally this would not be an issue for modeling money because in any realistic choice F and P will both be non-zero. If it ever did become an issue, money could similarly be "started" by adding one cent to both P and F.

Treatment of time and money.
In almost all models surveyed here money and time are combined multiplicatively. A ratio is formed between the subjective value of additional money and the subjective dis-value of additional delay: that ratio is the rate parameter. We can distinguish two very general views on time and money. In one, money and time are seen as qualitatively distinct, which means that treating them differently is not an issue. For instance, money can be compounded or subject to inflation, which thus change its face value and / or its real worth: time cannot. Money is traded: time is generally not, and so on. In the other view subjective perceptions of money and subjective time are modeled as if they were standard psychophysical stimuli (pain, brightness, loudness, etc.). Now if psychophysics itself treats its stimuli via consistent functional forms, for instance by Stevens' power law (or Weber's law), it sets a strong precedent for treating time and money via similar functional forms. Accordingly, if a model treats subjective time and money inconsistently, the burden of proof should be on the model's creator to justify why it does so.
The following pairs are not consistent: Tt whereas the following are consistent: This is not to suppose that the parameters need match up (e.g. that A = B, and a = b in these examples). We propose that a useful characteristic to observe in a model is whether it treats time and money symmetrically or not. If it does not, has any special justification been given why not?

A note on notation.
In defining parameters for the different models, we have tried to obey the following guidelines. We obey common usage where it is standard in general algebra (e.g., r is the common ratio of a Geometric Progression, d the common difference of an Arithmetic Progression). We have tried to use mnemonic devices (e.g., h for hyperbolic, m for the exponent on money), and use capital letters for the models themselves (e.g. E for the exponential model, H for the simple hyperbolic). Generally speaking the rate parameter is lowercase of the uppercase model: e.g., H, h. We have also respected past literature where terminology is widely and standardly used so that it would be confusing to do otherwise (β-δ of the quasihyperbolic model). We have tried to use Roman letters (but not e, l, o, or p), rather than Greek. Because of the sheer number of parameters used in this survey, inevitably we have failed. We have used the convention that F is a future gain, and P is the present gain, which follows standard practice in accounting and finance, though not elsewhere. Where we wish to contrast two gains in the future, we have used the symbol f to refer to the less distant one. Thus, because of the premise of that people discount delays, generally F > f. To carry the capitalization mnemonic into time, we have used T to refer to more distant time than t (T > t). So, a typical choice might be to ask which alternative is preferred: (f at time t) or (F at time T). Although that 8 may seem quite natural, one awkward consequence of this usage is that since most models are based around the special case of t=0, the standard symbol for time becomes T, rather than the more natural t. Also, because both t and T are now in use, to maintain mnemonic value we have resorted to using the Greek τ (tau) to subscript parameters and models that focus on a time issue, as well as for exponents used to deform objective time into subjective time. Consequently, we have the presbyopic nightmare of: T τ and t τ . Old eyes be warned! Table 1 is provided to help with the proliferation of symbols.
Hyperboloid on intervals H mω m, ω Generalized H H mτ m, τ h mτ = ((F/P) m -1) / T τ Generalized E, H, qH, X G β, χ g = log(βF/ (P + χ)) / T Size-sensitive K J m, τ, z j = P -z (F m -P m ) / (T * τt * τ ) Discounting by intervals I m, τ, θ m is the money exponent; τ (tau) is the time exponent. β is the quasi-hyperbolic non-present scaling parameter, and γ = 1/ β is the present-premium. z is the size exponent in J; θ is the interval exponent in I (DBI). F (future value) is the more distant future amount delivered at time T; f is the less distant amount delivered at time t. If t= 0, f = P, the present value. T * and t * are "started" times (Tukey). %(.) is the percentile of the remembered distribution. W and w are weights placed on different processes.

Simple rate parameter models
In accounting and finance, discounting can be described as a method of valuing a future cash flow (F) as a present value (P); or in terms of the Internal Rate of Return (IRR) that would generate F from P. Present values may be used in preference to IRR to consolidate several future cash flows at different times into a single net present value. However, IRR is often used as a threshold or "hurdle rate" to make investment appraisals (Ryan, 2007), and it is the IRR view of the problem which turns out to be more useful during our analyses. We generalize the concept of IRR from r, the interest rate in E, to cover model rate parameters which capture the intensity of discount operating in each of the other models we consider.

Exponential discounting (E, r)
The standard financial method of summarizing growth rates by which to translate present values into future values and vice versa is by a geometric mean taken on the ratio of increase. Thus, the compounding growth model for discrete time is: F = P (1 + r) T , (1) with F (future value), P (present value), r (growth rate) and T (compounding periods) taking their usual meanings, so that r = (F/P) (1/T) -1.
(2) As an example, if $200 (P) becomes $400 (F) over a period of 4 years (T), then r = (400/200) (1/4) -1 = .19, or a 19% increase per annum. We can also compute P as: Where [1 / (1 + r)] T is known as the discounting factor which allows us to translate a future cash flow into its present value. When faced with a choice between: (1) P, an instantaneous payoff now, and (2) F at time T, the decision maker, operating normatively as an exponential discounter, would apply the discounting factor to F using a given r, and choose the larger of P and the discounted F. Another way of describing this process is to focus on equation (2) and compute r for a given choice of P, F, and T. This is equivalent to the internal rate of return (IRR) as used by accountants, for instance to judge whether a planned project will meet a criterion level or return. If r exceeds a given criterion rate r 0 , then the accountant would accept F. We posit that r 0 is not a commercially available rate of return, but a person-specific one. Each person uses their own r 0 as a criterion rate of return by which to judge whether r, computed from choice {P, F, T}, exceeds criterion (hence choose F) or not (hence choose P). We do not, however, claim that r 0 is impervious to context, manipulation, and simple fluctuation with time.
If we increase the number of compounding periods to be n per annum, equation (3) becomes: F = P (1 + r/s) nT (4) and if n increases without limit so that compounding is continuous, this becomes F = P e rT .
(5) The discount factor therefore becomes e -rT , hence the name "exponential discounting", though the geometric form in (4) is often still used. The growth rate parameter r is determined from (5) as: r = [log(F/P)] / T In practice, the exponential and geometric view of growth and discounting do not differ much in their computed values of r. In the above example, equation (6) gives r = 17%.

Hyperbolic discounting (H, h)
Accountants also frequently calculate the (arithmetic) average rate of return (Ryan, 2007, p.12). Similarly, if a quantity increases by x% over T years, then the arithmetic average of the rate of return should be (x/T)%. In the example cited above there was a 100% increase, which averages out to 25% per annum. Although frequently used to appraise investments and set a benchmark figure for project acceptance, this figure cannot be used in the compound growth formula. Instead of doubling the original as required by a 100% increase, its use would lead to a calculated increase of (1.25 4 = 2.44) times the original. In algebra this is equivalent to computing a growth rate of h by: h = (F/P -1) / T (7) In the example given in 2.1, h = (2 -1) / 4 = .25. Equation (7) also implies: P = F [1 / (1 + hT)] (8) According to equation (8), the discount factor [1 / (1 + hT)] has a hyperbolic form, hence the name "hyperbolic discounting" (Mazur, 1987). Also, whereas E is underpinned by a model of compound interest, H is underpinned by a model of simple interest, with no compounding (Rachlin, 2006). This is also apparent if we let n → 1/T in equation (4) for a single compounding period, which reduces to the hyperbolic model. Thus geometric compounding is a half-way house between the extremes of E and H under the control of the frequency of compounding, n.
The fact that the 2.44 exceeds 2 (and 25% exceeds 19%) in the above example is no accident, and is due to the mathematical relationship that the arithmetic mean ≥ geometric mean for positive numbers. It follows that in the period up to T, hyperbolic discounting (which uses the arithmetic mean) will always make a higher estimate of future growth than exponential discounting: therefore, the immediate future will be discounted more heavily in the hyperbolic model. After T, the situation reverses.
We can also use equation (7) to determine the hyperbolic discounting rate parameter h, implicit in any choice {P, F, T}, which we then use to compare against someone's internal h 0 . As with exponential discounting, if people apply a hyperbolic model, then if h > h 0 they will choose F; if h < h 0 they will choose P.
Finally, note that the word "hyperbolic" is often used loosely to cover the empirical phenomenon of behavioral discounting that is more exaggerated than E at short delays, but less exaggerated at longer delays. In that usage, any number of models might explain "hyperbolic discounting". We use the term more precisely to refer to the particular model described in (7) and (8).

Arithmetic discounting (A, d)
In arithmetic discounting, future and present values are compared by constant increments d, exactly as in an arithmetic progression: d = (F -P) / T (9) P = F -dT (10) Instead of multiplying F by a discount factor to yield P, we subtract a discount decrement (Td). In one choice people are offered P: in the second they are offered P + (F -P). If the participant decomposes the choice in this way, then the decision is whether it is worth waiting for the excess (F -P). If they feel their time is worth d 0 per day to wait, then if (F -P) > Td 0 they chooses F. Whereas r and h are dimensionless, being essentially expressions of interest rates, d is measured in units of money per time (e.g. dollars per day).
There is another justification for (9) and (10). Killeen (2009) started from the assumption that the marginal change in utility with respect to time follows a power law; also, that utility and value are themselves related by a power law. Taken together these were shown to yield an equation of the form: P = (F m -T τ d) 1/m (Killeen, 2009, equation 6, p.605). We examine the general form of this equation in a later section dealing with the robustness of our findings. But for now, notice that if c = 1, subjective time is objective time; and that if k = 1, the utility of money is just its face value, giving equation (10). Doyle and Chen (2010) found empirical support for A, relative to E and H, both in other researchers' past data, and in their own. They surmised that people were treating delay discounting by analogy with "wages for waiting" rather than by analogy with "investment growth" as implied by E and H.

Rate parameter + 1 models
Each of the models in this section has one additional (free) parameter that must be estimated ("recovered") from the data. The next sub-section, 3.1.1, for instance, models utility of money with a power law. An additional parameter offers the possibility of more accurately modeling behavioral discounting, and of also "recovering" additional information, such as how money is subjectively perceived, as captured in the exponent used in section 3.1.1. But additional parameters also need additional data to avoid the problems of over-fitting; and this data must be distributed to provide sufficient variety in order to constrain the additional parameter. Myerson and Green (1995) proposed a model which generalizes the hyperbolic. Whereas in this and other publications these authors consistently use the symbol s for the exponent, we use 1/m for consistency with the rest of this article. Their model is: P = F / (1 + h m T) 1/m (11) This model is a special case of Loewenstein and Prelec's (1992) generalized hyperbola formulation. Further simplifying, if m = 1, we have the hyperbolic. When 1/m (=s) < 1 the tendency of H to discount more steeply than E at short durations (but less so over longer durations) is exaggerated. Equation (11) may be re-arranged to give a rate parameter that is the equivalent of r, h, or d: h m = ( (F/P) m -1) / T (12) The subscript m on the rate parameter emphasizes that money is being treated subjectively, and requires the researcher to estimate m (=1/s) empirically. Typical values that are reported in the literature for s (=1/m) are in the range [.45, .78] (McKerchar, Green, Myerson, Pickford, Hill, & Stout, 2009;McKerchar, Green & Myerson, 2010), and these authors find that modeling the additional parameter leads to statistically significant improvements in fit over the hyperbolic. Notably, s is typically less than 1, which means that the ratio (F/R) in equation (12) is being raised to a power greater than 1, thus acting to exaggerate the relative difference between F and P in calculating h G . At the other extreme, when s becomes large, the exponent acts to shrink the ratio (F/R) towards 1, and in so doing (12) begins to approximate the exponential model.

Hyperboloid models 3.1.1 Green and Myerson (H m , h m ; m)
To see why, we use the mathematical relationship that (x n -1)/n → logx as n → 0. Here x is (F/P) and n is m. The smaller m becomes, the closer (F/P) m -1 comes to m.log(F/P). Once the value of m is fixed at some small value, all calculations in (12) are multiplied by the same m, which in this way just acts as a scaling constant, and can be ignored. It follows that the exponential model is a special case of (12) when s gets to be very large. The take-home point is that since m > 1 in empirical data, actual behavioral discounting is not a compromise between E and H, but lies even further from E than H had suggested.

Rachlin (H τ , h τ ; τ)
An alternative way to generalize the hyperbolic was proposed by Rachlin (2006): Here, the exponent τ applies specifically to T rather than the whole brackets. This minor modification leads to a subtly different model. Rearranging (13) to isolate the rate parameter, we get: h τ = ( (F/P) -1) / T τ (14) In this version of the hyperboloid, the numerator (the treatment of money) is the same as for the hyperbolic. However, unlike in the previous four models, time is treated subjectively by means of a Stevens-like power law. Similar to before, if τ=1 we have the hyperbolic. But once again, τ < 1 in empirical data. Thus, subjective time is increasingly compressed at longer periods. The typical range for τ is [.67, .90] (McKerchar et al. 2009, McKerchar et al. 2010) which is just slightly lower than other estimates of the subjective time exponent (Eisler, 1976). This is the first model we have considered in which the rate parameter cannot be expressed in straightforward units of measurement.

m, τ, or m relative to τ?
Comparing models the money and time parameters from H m and H τ , in 3.1.1 and 3.1.2 we cited typical (m, τ) exponents as being, very approximately (1.5, 1) when H m was the model, and (1, .7) when H τ was the model. The utility of money is generally conceded to be m < 1. This may lead us to believe one of two things. Either the estimates of m and τ are discrepant with each other, and in the case of m, highly discrepant with past research. Alternatively, that what may count behaviorally, over and above the absolute values of m and τ, is their sizes relative to each other. Suppose it is a psychological fact that m > τ, i.e. that subjective money is less concave (more convex) than subjective time, then in fitting m and τ relative to each other, both models are consistent. But if one model constrains τ = 1 (as H m does), then the behavioral requirement that m > τ will force m to be greater than 1. Likewise if the other model constrains m = 1 (as H τ does), the behavioral requirement m > τ will force τ to be less than 1. This is what we find.
The requirement m > τ is consistent with Zauberman, Kim, Malkoc and Bettman's (2009) conclusion that perception of time are more labile than perception of money, but it is not consistent with exponential discounting for which m << τ.

Present bias / present premium models 3.2.1 Quasi-hyperbolic (beta-delta) discounting (qH, q; β)
Quasi-hyperbolic discounting (qH) is rarely used in psychological research, though it is used extensively by economists. In discrete time, the sequence of discounting factors is usually stated as: {1, βδ, βδ 2 , βδ 3 , … βδ n } for t = 0, 1, 2, 3…n, respectively, with 0 < β, δ < 1 (Laibson, 1997). If β = 1 we would have a straightforward compounding model with δ = 1/(1+r), as described in equation (3). But of course, β < 1, and this acts to give a one-off boost to discounting over the first compounding period. The purpose of β is to help tick the box of unexpectedly steep discounting at short durations, which is the hallmark of actual behavioral data, and which first motivated the use of hyperbolic discounting. Thereafter, the model assumes that discounting follows the standard normative model. Paralleling the development in section 2.1, the quasi-hyperbolic can also be used for continuous discounting (e.g., Benhabib, Bisin, & Schotter, 2010). The discount factor D is: The rate parameter for quasi-hyperbolic discounting is therefore: q = [ ln(βF / P) ] / T (T > 0) (16) All F first get scaled by the additional parameter β before being treated by the exponential model (obviously, if β = 1, then q = r as in equation 6). It is thus clear that qH discounting is hyperbolic only in its intent to mimic the steep initial discounting of the hyperbolic model, but in every other sense it is an exponential model.
The qH model may have been motivated pragmatically to preserve exponential discounting and all the useful mathematics that goes with it, while also modeling initial steep discounting. Nonetheless, the idea that F (but not P) gets a special "tax" simply because it is not received right now does have a more constructive justification. There is a qualitative difference between now and any time to come which we all appreciate intuitively, and which language mirrors in verb tenses. From this perspective, the discontinuity in qH between t = 0 and t > 0 is not an ugly kludge to workaround bothersome behavioral evidence, but a clever modeling device that allows what happens now (P) to be treated differently from all future events (F). In this way qH, a model devised by economists for economists, has the capacity to capture a psychological distinction that the other models cannot.
To help rescue this insight from the limitation of being identified exclusively with the exponential model, let γ = 1 / β. Whereas β captures the idea that F is less than it should be, γ captures the idea that P is more than it should be, and we call γ the present premium. Analogous ideas are met in risky decision, where a certain reward (p = 1) are valued at a premium over rewards with p < 1; and also in the premium due to mere possession or the endowment effect (Thaler, 1980;Sen & Johnson, 1997). Rewards that are certain, mine, and now are all over-valued. Potentially, the models we have considered or are about to consider, could incorporate a present-premium parameter by simply replacing P with γP (or F with βF) in all calculations. Benhabib, Bisin and Schotter (2004) formulate exactly such a hybrid between H m and qH in this way (see also section 4.4, and Tanaka, Camerer & Nguyen, 2010).
Nonetheless, there are still issues to be confronted with qH. In particular, if β < P/F in a particular choice, the model predicts P should always be preferred to F, no matter how short the delay in Funless someone's internal rate parameter is negative. Laibson (2003) suggested "calibrating" qH with β ≈ .5, meaning that 21 of Kirby et al.'s (1999) 27 choice questions would be in the negative rate parameter category. Even β = .90 would put 8 questions into that category. Summarizing several empirical studies in economics, van de Ven and Weale (2010) noted βs of . 296, .308, 674, 687, .846, .942; and .825 in their own work; and in Albrecht, Volz, Sutter, Laibson and von Cramon (2010) the median β from individual experimental data was .86, with 24 of 27 participants having β < 1. If these estimates are close to what individuals use in binary choice, then β < P/F would not be an occasional occurrence in typical data.
The quasi-hyperbolic is popular with behavioral economists. It lends itself to convenient testing against the normative model E, by testing whether β is significantly different from (less than) 1. Unfortunately, in the literature qH has rarely been tested against the kinds of models surveyed here which are more potent challengers in that they themselves have consistently proven better than E. Furthermore, Kable and Glimcher (2010) argued that the present bias is more strictly a soon-as-possible bias, implying that a deflationary β should be applied to all f (t > 0), and not just P (t = 0).

Benhabib et al.'s fixed cost model (X, x; χ)
As an alternative to the percentage decrement for non-present rewards, Benhabib, Bisin and Schottter (2010) suggested using a fixed cost to model present biasthat is, an absolute decrement: use Fχ in place of F. Thereafter use the exponential model, exactly as in qH.
x = log((F -χ)/P) / T (T > 0) (17) They found a present bias / present premium of about $4 among amounts of $10 through $100. The fixed cost model better fitted their data than the qH model. A different, though very similar model is obtained if χ is added to P rather than subtracted from F (see section 4.4). Hardisty, Appelt, and Weber (2012) found that χ was positive for both gains and losses, suggesting a general "want-it-now" bias.

Read's interval model (B, b; θ)
Read (2001) presents his model in terms of a discount fraction D that occurs within the time window [t, T]. D is the fraction by which F is reduced by being at the end of the interval (T), rather than the start (t): namely, f / F. The exponent θ applies to the time interval, rather than time itself (as would be the case with the exponent τ, which is not used here): From equation (18) we see that Read's is an exponential model performed on the subjectively (because θ ≠ 1) perceived time interval (Tt). It nests the standard exponential model E when the interval of inspection starts from t=0 (at which point f is to be known as P), and when subjective time is assumed to be linear with objective time (θ = 1). While this model seeks to understand cognitive processes through the use of interval data (t > 0), it still must compete with other models at the particular case of t = 0, even though these other models have not been generalized to the case of t > 0.

Roelofsmaexponential time (V, v; υ)
Arguing from the Weber-Fechner Law of logarithmic sensitivity to psychophysical stimuli, Roelofsma (1996) presented his model as: P = U(F) / (1 + v) υlog(T) (19) If we take U(F), the utility of F, as itself being logarithmic as in the exponential model E, and move from discrete to continuous time, we derive the rate parameter: v = log(F/P) / [log(υ) + log(T)] (20) Whether log(υ) is an intrinsic part of this model is not really discussed by the author. If it is not, the obvious choice is to set υ = 1, so that this model becomes a simple rate parameter model in log money and log time, with no additional parameters to be estimated. In treating both money and time logarithmically, among the models we have surveyed, Roelofsma's is the most enthusiastic adherent of Weber's law. Assuming υ = 1, and writing (20) as: v = [log(F) -log(P) ] / log(T), we see that Roelofsma is a version of the arithmetic model in which all elements of the rate parameter equation have been logged.

Synopsis of the models so far
E is the normatively correct model to value investments. It assumes a continuously compounding interest model of growth. But E fails to capture important aspects of how laypeople actually evaluate intertemporal choices. qH extends E by making a distinction between present choices and future choices. In qH, all choices in the future are devalued by a constant scalar β before treating them as in E, which itself is the special case of qH when β = 1. Model X, like qH, maintains that there is a special difference between P and all future values F, but whereas qH treats the present premium multiplicatively, X does so additively.
For over two decades, H has been the psychologists' default alternative to E in that it consistently estimates people's actual discounting behaviors better than E. H is equivalent to a simple interest model of growth. The hyperboloid models are extensions of H that treat money (H G ) and time (H R ) as non-linear. H G is equivalent to H when its additional parameter m = 1, and E when 1/m → 0. But past evidence estimates m to be less than 1, which implies increasing utility to money, contrary to classical theory and common intuition. Also, H R is equivalent to H when its additional parameter τ = 1; The mental operations involved in A are simpler than those in all other models, which all start by forming a ratio between (possibly transformed) F and P. Model A starts by forming a difference between F and P.
The special case of t = 0 in Read's model B (all time intervals start from now; hence f = P) presents as a hybrid between E and H τ . It is an exponential model (logarithmic money) calculated on subjective (power law) time. Of course, in modeling time intervals in general, not just ones that start from the present, Read's model aims to examine phenomena that the others cannot. Roelefsma's model V treats both money and time logarithmically, which makes it equivalent to A, where all monies and times have first been logged.
Finally, because of the reciprocal T in calculations of r, h, d, h m , q, and x for a given P and F, all of these rate parameters decline hyperbolically with time. In fact, models E, H, A, H G , qH, and X are distinguished from each other only by their treatment of money, and once that is determined in the numerators of their respective rate parameter equations, r, h, d, h G , q, and x all decline identically with objective time. This neglected perspective suggests the essential difference between the five models of intertemporal choice, A, A z , H, E, H R , qH, and X has nothing to do with time per se, and could even be tested against each other by considering a single fixed time gap between P and F.

Time (im)patience models
From the point of view of modeling rate parameters, most of the above models have focused on their treatment of money, making few inroads into modeling the subjectivity of time. In this section we collect together research that takes a serious look at how time is perceived -while incidentally accepting the log(F/P) of model E as a given.

Ebert & Prelec's constant sensitivity (CS) function (C, a; b)
"Given that unit elasticity defines compounding discounting [E], it is natural to interpret lowerthan-unit elasticities as indicating insufficient time-sensitivity relative to compound discounting. If elasticity is constant and equal to b > 0, the discount function has the constant-sensitivity form" (Ebert & Prelec, 2007, p.1425, which is: When b → 0 we have diminishing time sensitivity of the eternal now. If b >> 1, then the CS function begins to appear as a step function, meaning that time is ever more clearly dichotomized into "near future" and "far future" (at the boundary 1/α). See their Figure 1. We can isolate the rate parameter associated with the CS model from equation (21) as: a = [log(F/P)] 1/b / T (22) Clearly, when b = 1 (unit elasticity), we have model E as the special case. Ebert and Prelec note, but avoid, the similar discount function: P / F = exp(-(a x T b )), leading to the rate parameter: a x = log(F/P) / T b (23) which is a special case of the generalized hyperboloid to be examined in section 4.3, with m → 0.

Bleichrodt et al. Constant Relative Decreasing Impatience (CRDI) functions (M, ρ; ψ, (β=1))
CRDI is explicitly described as an analog of constant absolute risk averse functions (CARA) that appear in risky choice models. Bleichrodt, Rhode, and Wakker (2009) observed that hyperbolic discount functions were developed to account for decreasing impatience. That is, relative to exponential discounting, people consistently show greater impatience (for P) in the short term, but greater patience in the longer term. They also noted that such models failed to accommodate "increasing impatience or strongly decreasing impatience" (p. 27), particularly given the attempt to model individuals' discounting behavior: hence their search for discounting functions to fulfill this need. Let D(T) be the discount function (i.e., D(T) = P/F). Then their CRDI function is / are: (i) D(T) = β.exp(-ρT ψ ) for ψ > 0 (ii) D(T) = β.T -ρ for ψ = 0, and T ≠ 0 (iii) D(T) = β.exp(ρT ψ ) for ψ < 0, and T ≠ 0 Examining each in turn, we have for ψ > 0, log(P/ βF) = -ρT ψ ρ = log(βF / P) / T ψ The following points should be noted. First, money is treated as in qH, with β fulfilling the same role in both models. Second, there is a power law on time; if β = 1, the model exactly the route not taken by Ebert and Prelec (2007), as in equation (23). Third, if ψ = 1, we have objective clock time, and thus "constant impatience"; if 0 < ψ < 1, we have the conventional view that time becomes increasingly contracted the more distant in the future it is, and thus we have "decreasing impatience"; but if ψ > 1, the more distant the time, the more stretched it becomes, which therefore models "increasing impatience". Examining ψ = 0, we have: ρ = log(βF / P) / log(T) (25) If β = 1, we have Roelofsma's model (presuming υ =1), described in section 3.4.
Finally, if ψ < 0, we have: ρ = log(βF / P) / -T ψ (26) In this case the denominator asymptotes to zero, but from below. It is designed to model impatience that decreases more rapidly than for the first part with 0 < ψ < 1. In order for ρ to be positive (Bleichrot et al, 2009, Definition 4.1, p. 31), we must have that β < P/F so that the log in the numerator is negative allowing the minus signs to cancel.
As ψ moves through zero from above, for a given T > 0, the denominators in parts (24), (25) and (26) become 1, logT and -1, respectively. This means that any algorithm that hopes to recover the impatience parameter ψ from data has a discontinuity to negotiate at ψ = 0. It may therefore be better to think of (i), (ii) and (iii), or equations 24, 25, and 26, as three distinct sub-models, rather than a single model.
The same kind of remarks made for the CRDI functions, concerning algorithms and betas can be repeated here.

Size-sensitivity in the arithmetic model (A z , d z ; z)
Returning to the arithmetic model, one shortcoming is its insensitivity to size. That is, if someone prefers to receive $11 in 2 days rather than $1 today, model A assumes that person would also prefer $1,000,011 in 2 days to $1,000,001 today. Both imply d = 5 so that if d exceeds the internal d 0 in the first scenario, implying that F should be preferred to P, d will also exceed d 0 in the second. However, in the second scenario, just contemplating the impact of becoming a millionaire right now is likely to affect the internal criterion d 0 in the sense of requiring greater "wages for waiting." A more flexible model is therefore to suggest that the rate parameter may vary with the size of P. For instance, d may be given by a power law scaling of a rate parameter d z which is stable and does not vary depending on the size of the choices offered,. Hence, d = P z d z . Therefore, we have: d z = P -z (F -P) / T (30) If z = 0, d z is not size-sensitive at all, and we have model A. If z = 1, d z is highly size-sensitive and in fact we have the simple hyperbolic model H. Thus the size parameter z links A with H.
An alternative is to scale d by F -z (Kirby, 1997), though see section 5.1 for arguments why P is preferred to F.

Rate parameter + 2 models 4.1 Killeen's additive utility model (K, k; m, τ)
Killeen (2009) started from the assumption that the marginal change in utility with respect to time follows a power law; also, that utility and value are themselves related by a power law. Taken together these were shown to yield an equation of the form: P = (F m -kT τ ) 1/m (Killeen, 2009, equation 6, p. 605), where his v t , v, α, β, t, and λ are respectively our P, F, m, τ, T,and k. He explained as follows: "this additive utility discount function is the central contribution of this article. It is additive because the (negative) utility of a delay is added to the nominal utility of the deferred good." (p. 605). Some rearrangement isolates the rate parameter to give our canonical form: k = (F m -P m ) / T τ (31) Clearly, when both m = 1 and τ = 1, we have the simple arithmetic model. The numerator can be written as: So that as m → 0, it becomes: m.log(F) -m.log(P) = m.log(F/P) But since m is a common factor, it can be treated as a scaling constant, meaning that K nests E when m → 0 and τ = 1. It also nests the particular t = 0 case of Read's model, which occurs in K when m → 0 and τ is a free parameter. Killeen's used his model to survey aggregated data from past research, finding that m ≈ .15, and τ ≈ .53. Both parameters are smaller than other researchers have estimated, and in particular m < τ, which is contrary to evidence collected by researchers in connection with H m and H τ . Note, however, that within the datasets that Killeen examined, the number of observations used to fit the model was never greater than eight, which is rather small for non-linear estimation.

Killeen, intervals, and started time (K τ , k τ ; m, τ)
Killeen's additive utility model can be generalized to time intervals that start in the future (at time t ≠ 0), either with a power law on the interval, as modeled in Read (2001) and section 3.4: k θ = (F mf m ) / (Tt) θ (32) or with a power on each point in time itself: k τ = (F mf m ) / (T τt τ ) (33) There is an important distinction between these models for time exponents θ or τ if they are << 1, which exaggeratedly condense distant time. In our first generalization of Killeen, as θ → 0, the denominator (Tt) θ → 1, which means that people would be completely time-insensitive, and therefore presumably always choose F over f. However, in the second generalization, as τ → 0, the denominator (T τt τ ) → τ.log(T/t), meaning that at this extreme people become only logarithmically (in)sensitive to time. Started time does not affect (32) since the start is added to both T and t, which cancel out.
Because of the symmetry in the treatment of money and time in the k τ generalization 1 , and to allow the possibility that subjective time is logarithmic, as argued in recent (Zaubermann et al, 2009), we now focus attention on k τ . But immediately we hit a problem, which is that if t = 0 (as it usually is in intertemporal choice questionnaires), equation (33) cannot be distinguished from (32), so that we end up with the same time insensitivity as in k θ when τ → 0. Persisting with the logarithmic limit idea doesn't help either because of divide-by-zero problems.
As argued earlier, because of log(0) problems a useful generalization of Killeen is to use started time 2 : k τ = (F mf m ) / (T * τt * τ ) (34) where T * = T + ε, and t * = t + ε, and ε is small. This form is flexible enough to capture many possible states of the world, as reflected in the recovered exponents m and τ. If m → 0, subjective money (F, f) would be logarithmic. If τ → 0 started time (T * , t * ) would be logarithmic. If m and τ were close to 1, money and (started) time would be linear (objective), while intermediate values of m and τ would indicate that subjective time and money have less-than-logarithmic curvature. It is also possible that both m and / or τ could be > 1.

General hyperboloid with subjective money and time (H mτ , h mτ ; m, τ)
As a robustness analysis, Doyle and Chen (2010) used power exponents to model time and money for A, which then becomes model K, and for the hyperbolic, which then becomes: h mτ = ((F/P) m -1) / T τ (35) This nests a number of models already considered. The rate parameter h m for Green and Myerson's hyperboloid model is the special case when τ = 1. The rate parameter h τ for Rachlin's hyperboloid occurs when m = 1; and the hyperbolic rate parameter h occurs when both m = 1 and τ = 1. Furthermore, when m → 0 we have: h mτ = log(F/P) / T τ Which is model B (Read), and b = h mτ . Note also that the subjective-time exponential only has one additional parameter, because any attempt to introduce a new exponent for money results in log(F m /P m ) = log((F/P) m ) = mlog(F/P), so that m just becomes a scaling constant. The model Elbert and Prelec (2007) reject, (equation 23), occurs when m → 0.
In conclusion, most of the models considered so far can be generated from equation (35) as special cases. However, the CS model Elbert and Prelec do propose, equation (22), has the exponent acting on all of log(F/P), not just what is in the brackets. Therefore, CS cannot be derived.
We begin to see here how elements of different models may be combined to good effect. In this example the authors were able to test whether the multiplier β or additive element χ was more predictive of behavior. In their analyses, it was the latter. They also noted that their particular data had little power to distinguish between functional forms H (n = -1) and E (n = 0), though the main point to note here is that their generalization model does allow for testing this possibility.

Hyperboloid over intervals (H mω , h mω ; m, ω)
Green, Myerson and Macaux (2005)  Equation (37a): ω = 0 is their elimination by aspects model. One aspect that is common to the two choices (f, t) and (F, T) is the wait to the first choice, and so it is eliminated, reducing the time component to (T -t). We should also note that f is common to the monetary choices, but this aspect is left untouched in 37a. Equation (37b): m = ω = 1 is their present value comparison model. In this model, people are assumed to discount both F and f to present values, using the simple hyperbolic model in section 2.2 (i.e., m = 1). Equation (37c): m = 1 is their common aspect attenuation model. Once again, F and f are assumed to be hyperbolically discounted to a present value, but this model assumes that an extra attenuation factor, operationalized by the weight ω, may act on f.
One point to note from the general statement in (37) is that although a money-only component can be identified in the numerator, the subjective time component in the denominator is not independent of money, because the denominator in 37 has the additional term ωtM, and M has terms involving F and f. This contrasts with all other models presented thus far. Finally, equation 37 implies that if (F >> f), M could become large enough so that (Tt) < ωtM, thus turning the rate parameter negative.
In being able to generate these alternative models, J has a number of useful properties. First, there is the simple matter of parsimony. We need hold in mind a single equation from which many others may be generated. Second, the generation process encourages us to consider and investigate other models that could be generated from the triplets, but have not. For instance, there are eight possible fully-constrained, simple rate parameter models. Each is a combination of the 0 or 1 for each of the triplet values z, m, and τ. Only four of these models have been considered: what of the other four? Similarly, though less mechanically, new models may be considered with partial constraints. Last, but not least, one may use model J with no constraints to recover parameters for z, m, and τ. The fact that so many models may be generated from this form hints that the true model may be a compromise between of all of them, as should be evident if the parameters z, m, and τ were all to fall between 0 and 1. This model scales by P -z rather than F -z because the latter is less productive in generating other models as special cases, and also because it is more likely that P, will be used as the given situation, against which F will be compared, rather than vice versa. After all, P is mentioned first in the typical question frame (e.g. "Would you prefer to receive $70 now, or $100 in 20 days?"), and is on offer right now. Thus F will be evaluated in terms of P. Nonetheless, it may be possible to prime F to fulfill the role of given information, as in the question frame: "Assuming you can receive $100 in 20 days time, would you accept $70 now instead?" Thus biasing P to be evaluated in terms of F.

Discounting by invervals, DBI (I, I; m, τ, θ)
Like Read's model, Scholten and Read's (2006) discounting by intervals (DBI) model is designed to highlight phenomena which occur over several intervals that do not necessarily include the special one bounded at t=0. Nonetheless, as a viable model it must be able to stand comparison with other models when t=0. The rate parameter for DBI is: i = ((F / f) m -1) / (T τt τ ) θ (39) This model extends Read's in two ways. First, money is modeled as in Green and Myerson with a to-beestimated exponent m, rather than with log(F/f), as in Read. This is more flexible because the power form nests the logarithmic money form. Second, although there is still an exponent on the time interval, the times in the interval are themselves subjective. Therefore, Read's model is nested in DBI as the special case where m → 0, and τ = 1. Note that if t = 0, the denominator becomes T τθ . Given that τθ only occur together, they can be treated as a single parameter, and DBI becomes the generalized hyperboloid H mτ . Also, if θ = 1, we approximate J (m, m, τ) if τ is sufficiently far from 0 so that started and objective time are effectively equivalent.
Finally, the nesting of time and time-interval exponents in the denominator (T τt τ ) θ means that care must be taken in experimentally estimating the separate parameters.
6. Discounting models with no closed-form rate parameter 6.1 Two rate parameters -two exponentials discounting (E βδ , r β , r δ ; w) This model is related to the quasi-hyperbolic model in its intent to separate short-term from longterm processes. But whereas qH separates events at t=0 into a qualitatively distinct category from those at t>0, this model deals with the same short/long-term issue in a more graded manner. Carrying over the same terminology as used in qH, McClure, Laibson, Loewenstein, and Cohen (2007) suggested a β-system associated with limbic areas of the brain, that is impulsive, myopic, and discounts at a high rate; and a δsystem, associated with prefrontal and parietal cortical regions ("higher man"), which discounts at lower rates. Each of these sub-systems is assumed to discount according to the exponential model, with the overall discount rate being a weighted sum of each sub-system: where if the first term represents the β-system, then r β > r δ . Quite explicitly, there are two rate parameters. -7 parameters (Y, γ, τ, ε m , ε τ , and three of {a 1 ,  a 2 , b 1 , b 2

})
Scholten and Read (2010) provide a densely argued justification for a complex model which they present in terms of an indifference point between the perk of additional compensation, and the irk of waiting for it. They define the money value function on a future payment F to be: v(F) = (1/γ) log(1 + γF) (41) and a time-weighting function to be: w(T) = (1/τ) log (1 + τT).
(42) They define an "effective compensation" to be: where F > f (43) and an "effective interval" between t and T to be: The point of indifference is when: such that if Q τ (W τ ) > Q m (V m ) then the irk of waiting is greater than the perk of compensation, so DM chooses f at t, rather than F at T, and vice versa if Q τ (W τ ) < Q m (V m ). However, Q τ and Q m are not simple scaling factors, but two-part linear functions: Q τ (x) = a 1 x if x < ε τ Q τ (x) = a 2 x if x ≥ ε τ , with a 1 > a 2 (46) ε τ is a threshold below which effective time intervals (here just x) are weighted more steeply than abovethreshold. Note the similarity of intent with the two exponentials model. Similarly, If the indifference equation is written out in full, it is possible to see that a discount function of the form f/F cannot be extracted: nor does limiting analysis to t=0 overcome the problem. Similarly, no simple rate parameter emerges from this model. It follows that the indifference equation is the only simple way to present the model. The model requires three parameters to specify the two-part linear function for time (a 1 , a 2 , ε τ ), three more to specify the equivalent parameters for money (b 1 , b 2 , ε m ), as well as τ and γ to specify the time-weighting and value functions, respectively. However, the authors note that "if only the relative magnitude of scaled effective differences is of interest, one of these scaling constants {a 1 , a 2 , b 1 , b 2 } can be set to unity." (p. 934). This model therefore requires seven parameters to be estimated. So clearly, one of the biggest problems in testing this model is the number of parameters to be recovered, which means designing choice problems over which the parameters are sufficiently independent of each other, and the number of observations sufficiently large in order to estimate the parameters adequately.
The tradeoff model treats time and money symmetrically, which we argued in 1.3.4 should be the default position for a quasi-psychophysical theory of time and money. Indeed, the authors explicitly deny that people are discounting in any way that an accountant would recognize, espousing instead a thoroughly psychological perspective on intertemporal choice. It is of note that the next model, though motivated on altogether different grounds, is also a psychological model and is also symmetric in time and money.

Decision by sampling (S, λ; W)
Decision by sampling, or DbS (Stewart, Chater, and Brown, 2006;Stewart, 2009), is a radically different way of thinking about judgment and decision problems, which justifies its having its own section. In DbS, people are presumed not to walk around with ready-made psychophysical functions for time or money or risk that are largely invariant when applied to different problems. If the metaphor of a yardstick or ruler applies to objective stimuli, then in classical psychophysics people apply their own warped ruler, which probably looks much like everyone else's. But it is at least the same warped ruler that a person applies from situation to situation. In DbS, there is no person-specific but ultimately rigid ruler: instead, DbS assumes people construct their perceptions from memory and what amounts to a ranking principle. As an example, suppose my entire experience of receiving money is limited to five occasions on which I received: $3, $6, $12, $24, and $48 in any order. Then according to DbS, to evaluate a further receipt of $24, I would compare it against the remembered distribution, and use the rank of the amount received (in this case, rank = 4, where rank 1 is smallest) as its utility: u($24) = 4. If instead, I had received $6, my utility would be given by the rank of $6 in my remembered distribution, namely u($6) = 2. Receipts lying between distributional values can receive an interpolated rank, or more simply a mid-rankthese and other details of implementation becoming less important as the number of items in memory increases beyond five to thousands 3 . Ranks can be normalized to percentiles of the distribution so that in an idealized form of DbS utilities are "read off" as percentiles of the recalled cdf (cumulative distribution function), much as we would read off percentage points of the normal distribution from z-scores (in statistics the percentage points of a distribution are also known as the inverse of that distribution, or its quantile function).
One difficulty with testing DbS is finding the relevant distribution that reflects what people have in their heads. By looking at people's bank account transactions, the authors found cdfs which when inverted would give typical utility functions of money. The sizes of payments, in comparison with receipts, were also distributed in a way that would be consistent with loss aversion. People tend to make many smaller payments (losses) than receipts (gains), hence a gain of $100 will have a smaller rank in its remembered distribution than a payment of $100 has in its distribution, thus it will have a smaller percentile, and thus a smaller utility (than the disutility of $100 lost). Consequently, $100 paid will loom larger than $100 received. In general, positively skewed distributions, such as in the {$3, $6, $12, $24, $48} example above, will lead to concave "psychophysical" functions as in a typical utility curve.
The central role of context and memory in DbS suggests that subjective value may be more malleable than classical accounts admit. People are assumed to sample both from the immediate local context of what is around them, what they happen to be thinking, etc., and the global context of long-term memory for typical sums of money, time durations, probabilities that have been met in the past (Stewart, Chater, & Brown, 2006), or fatalities in catastrophes (Olivola & Sagara, 2009), and so on. While local context is relatively easy to manipulate within the lab, global context must be sought in correlational data. For instance, that monthly-salaried people, who typically receive the one large gain per month (salary) versus multiple small losses, should exhibit much greater loss aversion than shopkeepers and traders, who experience distributions of many small receipts and few large payments (for delivery of goods). Similarly, the daily experience of travel agents is with longer-than-usual time periods; and astronomers' experience is of rare events. Whole cultures also differ in the shapes of reference distributions to which they are exposed (Olivola & Sagara, 2009). In this way DbS predicts the presence of sub-populations whose "psychophysics" of time, money and probability may be systematically different from the wider population. Unfortunately, the world seems to contain fewer highly familiar but negatively skewed distributions, which would therefore predict increasing sensitivity to gains, and which could be used to pit against the positive skews of time, money, and so on. Nonetheless, the authors did find that probability phrases in a corpus of English had a bimodal distribution. Its shape, when inverted, matched the form that the probability weighting function is assumed to have in prospect theory (Stewart, Chater, & Brown, 2006). In practice, although people find it difficult and tedious when asked to rank anything more than a handful of objects, approximate percentiles can be derived from very simple cognitive mechanisms, as we now see.
At its core DbS is a general theory about the derivation and form of "psychophysical" functions, typically for abstract concepts such as money, time, and probability. No DbS model of intertemporal choice has yet been formally stated in the literature. However, it is possible to use Stewart and Simpson's (2009) model of risky choice as a template, and infer from it what a DbS-inspired model of intertemporal choice might look like. A simple example of risky choice might be: Prospect A is $A offered with probability pA; Prospect B is $B offered with probability pB. Stewart and Simpson (2009) assume that people randomly sample one of {$A, pA, $B, pB}. If $A has been chosen from A, then they randomly sample memory for money amounts. Suppose $X is retrieved. If $A is more favorable than $X ($A > $X), then a decision accumulator for A (call it #A) gets incremented. If instead the pA had been sampled, then the DM would compare it with a probability retrieved from memory (say pX), and increment #A if pA is more favorable than pX (pA > pX). Likewise, prospect B may have been sampled, and its accumulator incremented or not if the retrieved item is (isn't) more favorable than the item in memory. Sampling continues both from memory and from the problem until a point is reached at which one or other accumulator A or B reaches a threshold separation Δ, though other stopping criteria are possible. This process description of accumulator dynamics amounts to finding noisy versions of the percentiles for each of $A, pA, $B, and pB, because repeatedly testing whether $A > $X and recording the proportion of times it does, will asymptote to the percentile for $A in the remembered distribution. Therefore, A is chosen if: where Percentile(.) refers to the percentile of the distribution in long-term or recent memory that happens to be serving as context.
We translate DbS into a model of intertemporal choice by suggesting that #A and #B accumulate favorable times rather than favorable probabilities. When represented on paper rather than in the head, we need to change signs, because whereas large probabilities of payoffs are desirable in risky choice, it is small delays in rewards that are desirable in intertemporal choice. The equivalent of a rate parameter is: where F, f, T, and t take the usual meanings we have given to them in this survey, namely ($f at t) versus ($F at T), with f < F and t < T.
One assumption in Stewart and Simpson's implementation of DbS is that people spend the same mental effort accessing time-memory as money-memory, and similarly for the risky choice formulation. This assumption may be unnecessarily restricting because the DM's focus of interest is likely to be on money, rather than time or probability. Also, money may be a more concrete concept than time, making it more accessible. The upshot of any attentional bias is that #A and #B may be more influenced by money than by time. We can model this by a free parameter that weights the contributions of money and time as: If, contrary to supposition or by manipulation of context, people weight time more heavily than money, then W will turn out to be greater than 1. The implicit default setting for W within DbS is 1, but in the more general formulation, W may vary systematically from context to context and from person to person, making it an indicator of interest in its own right. Stewart and Simpson (2009) also investigated the possibility of weighting the distributional contributions from long-term memory and immediate context (in risky choice).
DbS treats money and time symmetrically, which we have argued is probably a good default for a psychological theory. But it has a number of features besides that set it apart from other models in this survey. DbS combines time and money as additive effects, whereas other models combine them as multiplicative. DbS has a theory about where power laws come fromthey are reflections of the DM's environment. In Stewart and Simpson, it models the basic cognitive operations used to make comparisons, whereas in other models it just somehow happens. DbS therefore tells us exactly where to look to understand how people will behave. Change the context by surrounding the DM with particular distributions of money or time or probability and the DM's subjective evaluations of time and money and probability will align to reflect that context, whether that occurs transiently in the lab or more enduringly through life experience. Change the way memory is accessed and again the DM's "psychophysics" will change.
One other unique feature is that in trading off #A against #B, and allowing both money and time to contribute to #A (or #B), DbS requires that time is not segregated from money at an early stage of processing, whereas all other models implicitly assume that time and money are segregated early on in processing. The required order of the algebraic operations formally set out in any of the surveyed models must, in some idealized way, mirror the processing done by the DM, however approximately, who is supposed to operate the model. The final algebraic operation in non-DbS models is the division of a money component by a time component, with the implication that money and time must have been processed in separation to this point. These models must therefore parse the choice "$f at t, or $F at T?" into "$f or $F at t or T, respectively?" By contrast, DbS segregates information by choice alternative (prospect), more or less as written on the page. In summary, DbS is a model of frugal but general processing, which contrasts with the highly optimized, problem-specific, multi-parameter models that are increasingly evident in the delay discounting literature.

Model families and newborns 8.1 Models have similar forms
The world is simpler than a list of twenty or so models might suggest. Our presentation of models via their rate parameters emphasizes the following common structure: where s(.) and S(.) are functions that return subjective perceptions of the money and time aspects, respectively. According to this view, if someone operates according to the exponential model it is because their subjective perception of money is logarithmic, while their subjective perception of time is linear. In (50), subjective perceptions then get combined by the operation Ä . In most cases, Ä is arithmetic division, but can be subtraction (DbS), or conceivably something else. The separability of s and S in the mathematics implies that subjective perceptions of time and money should take place independently of each other. In most models, objective time and money are transformed into subjective equivalents by power laws, or logs, but other transformations are possible, as in DbS.
To further simplify the map of the models surveyed here, note that half are special cases of model J, and the present-bias models of qH and X could also easily be incorporated into J. Two further themes that lie outside J are: the time impatience models of section 3.6 that go to town on the modelling of time itself; and models that place power laws on time intervals rather than on points of time (sections 3.3 and 5.2). Finally, we are left with just four models that are not captured by these themes. First there is the twoexponentials modelwhich is just a doubling up of the reference model E, as if we had two normative discounters in the same head: one patient (δ), the other impatient (β). Then there is Schotten and Read's (2010) trade-off model I, which for all its complexity is still recognisable as an assemblage of parameterbased treatments of subjective money, time, and intervals. H mω , although an innocuous extension of the hyperboloid to time intervals turns out to be the one model for which time and money cannot be cleanly separated. Finally, DbS is the only model to eschew the parameter-based treatment of time and money.

Speculative models
We end with a sample of speculative models (newborns) that have clearly been assembled from the components of existing models. Being able to mix and match almost indefinitely emphasizes the family resemblances that exist between models, and helps to compare and contrast treatments of time and money. Some of these examples are semi-serious; others are intended to provoke thought.

Arithmetic interval model
One of the innovations in B and I is that intervals and subjective intervals are treated as if psychophysical elements in their own right, which may then be investigated using power laws, and so on. We cross this with Killeen (extended) to give the hybrid: r = (F mf m ) μ / (T τt τ ) θ (52) For mnemonic value note that μ is "mu" Greek m; and θ "theta" sounds vaguely like T.

Fully additive model
Instead of treating Ä as a division in 8.2.1, we could treat it as a subtraction: r = (F mf m ) μw(T τt τ ) θ (53) The weighting coefficient w cannot simply be interpreted as the perceived relative importance of time, because it also rescales for m, μ, τ, θ, and units of time and money.

Fully multiplicative hyperbolic model
Analogously, if H treats money as a percentage increase (that gets averaged over time), then a model that also treated time as a percentage increase would look like: r = [(F/f) m -1] / [(T * /t * ) τ -1] (54) This example highlights the point that the default setting in models is to treat time as additive, whereas money is treated as multiplicative. From the point of view of Steven's (1946) levels of measurement, both are ratio scale measurement, with properly defined zeros. So if in theory we can treat time and money symmetrically, what is to stop the naïve behavioral theorist thinking likewise? Still, the treatment of time looks distinctly dd.

Time-size and money-size sensitive model
Model J only treated money as size-sensitive. A symmetrical treatment of time and money is instead: r = t * g (F mf m ) μ / [ f z (T * τt * τ ) θ ], (55) where g is the time analog of z for money, implying that someone's criterion rate parameter may vary, not just with the size of f, but also how far in the future f is offered.

Arithmetic present premium model
A present premium can be incorporated into other models. Using both the multiplicative and additive components of Benhabib et al's (2004) generalizing model in section 4.4, the arithmetic model would become, for instance: r = (βF -(P + χ)) / T (56)

DbS interval model
In our extension of DbS to intertemporal choice, we have assumed that the order of operations is first to translate F and f into percentiles of their remembered distribution, then subtract the ranks. Similarly for T and t. But an alternative is to reverse this order of operation, so that F and f are first differenced, and only then is this difference checked against the internal distribution. Given a symmetrical treatment for time, we would then have: r = percentile (Ff) -w.percentile (Tt) (57) It seems implausible to imagine that people would have a reference distribution for amounts of money received, but a quite separate reference distribution for differences in amounts received. For now, we suggest that people would access either the putative point distribution to derive a rank, or a distribution composed of both points and differences.

Conclusions
This paper has surveyed delay discounting models that have appeared in several literatures. Our presentation emphasizes that models have a clearly separable time component, a separable money component, and a method of combining these components into a rate parameter / decision parameter. Within the components that transform objective monies and times and intervals into subjective equivalents, we have seen that the power law occupies a special place in delay discounting models, particularly given that log laws may be derived as special cases. While researchers have been inventive in their modeling of the time and money components, DbS excepted, there has been less interest shown in exploring alternatives to power laws (which have been taken as the basic nuts and bolts of discounting models), or how the time and money components should be combined. Models of individual decision making should more explicitly model the "fuzzy math" (Stango & Zinman, 2009) of the typical insufficiently competent mathematician who is required to think about financial and other numerically presented choices. Given the large numbers of models and variants that have appeared in the literature, the re-usability of components, and the ease with which speculative models can be assembled from them, a chief problem for future research is to heed Occam's razor by limiting the number of parameters a model employs to only those that are strictly necessary. Finally, by gathering these models into one place and thereby shining a light on them, it is hoped that future research will be more proactive in testing models against each other, and in manipulating choice behavior through selectively manipulating the parameters that are supposed to drive those behaviors.