The Montgomery ladder for x-coordinate-based scalar multiplication

 in GF(p), i.e., they are performed modulo p.  The constant a24 is
   (486662 - 2) / 4 = 121665 for curve25519/X25519 and (156326 - 2) / 4
   = 39081 for curve448/X448.

















Langley, et al.               Informational                     [Page 8]


RFC 7748              Elliptic Curves for Security          January 2016


   x_1 = u
   x_2 = 1
   z_2 = 0
   x_3 = u
   z_3 = 1
   swap = 0

   For t = bits-1 down to 0:
       k_t = (k >> t) & 1
       swap ^= k_t
       // Conditional swap; see text below.
       (x_2, x_3) = cswap(swap, x_2, x_3)
       (z_2, z_3) = cswap(swap, z_2, z_3)
       swap = k_t

       A = x_2 + z_2
       AA = A^2
       B = x_2 - z_2
       BB = B^2
       E = AA - BB
       C = x_3 + z_3
       D = x_3 - z_3
       DA = D * A
       CB = C * B
       x_3 = (DA + CB)^2
       z_3 = x_1 * (DA - CB)^2
       x_2 = AA * BB
       z_2 = E * (AA /*BB?*/ + a24 * E)

   // Conditional swap; see text below.
   (x_2, x_3) = cswap(swap, x_2, x_3)
   (z_2, z_3) = cswap(swap, z_2, z_3)
   Return x_2 * (z_2^(p - 2))

   (Note that these formulas are slightly different from Montgomery's
   original paper.  Implementations are free to use any correct
   formulas.)

   Finally, encode the resulting value as 32 or 56 bytes in little-
   endian order.  For X25519, the unused, most significant bit MUST be

 

 

https://tools.ietf.org/html/rfc7748



//https://www.researchgate.net/publication/277940984_High-speed_Curve25519_on_8-bit_16-bit_and_32-bit_microcontrollers

 

Synthetically, the advantages of the Montgomery ladder are that it is simple and fast.

If you look at X25519, the Diffie-Hellman algorithm applied to Curve25519 and described in RFC 7748, you will see that for an n-bit Montgomery curve, multiplying a point with an n-bit scalar, you will need to compute about 10n multiplications of field elements. In more details, there are n iterations of a loop, each of them implying:

  • 4 multiplications with two varying field element values
  • 4 squarings
  • 1 multiplication with a fixed and usually small constant (called a24 in the RFC)
  • 1 multiplication with the base point "u" coordinate (fixed throughout the algorithm, but may vary between invocations)

To that, a final field inversion must be added, which is normally done with a modular exponentiation. If the field modulus is well chosen, that inversion will add a bit more than n squarings. So the value of "10" given above is an estimate that depends on how squarings and multiplications by constants can be optimised; in practice, overall cost will be between 8n and 12n.

Now, with "classic" curves with Weierstraß equation y2 = x3 - 3x + b (for some constant b), the usual implementation method is to use Jacobian coordinates, in which a point (xy) is represented by the triplet (XYZ), with:

  • x = X / Z2
  • y = Y / Z3

With that representation, a point doubling costs 8 multiplications (4 of them are squarings), while a point addition which is not a doubling uses 16 multiplications (4 of which being squarings). A basic double-and-add algorithm would then need an average of 16 multiplications per multiplier bit, but in fact 24 if you want a constant-time implementation (which is recommended), therefore not leaking information about which scalar bits are 0 and which are 1. An extra n multiplications in total are to be added, for the final inversion of Z to get back to affine coordinates (there again, classically with a modular exponentiation).

Things can be sped up with a window optimisation in which, for base point P, you precompute small multiples of P; for instance, with a 5-bit window, you compute kP for all k from 0 to 31 (which involves 15 doublings and 15 point additions), and then you only make one point addition every five doublings. This requires some extra implementation care (for instance doing a constant-time lookup in the window, if constant-time execution is sought) but can bring down overall cost to about 13n field multiplications, i.e. 30% more than for a Montgomery curve. Some further speedups are possible when the base point is known in advance, as is the case for the first half of Diffie-Hellman (using the conventional base curve), because the addition formulas in projective coordinates when one of the point has Z = 1 (called "mixed additions") are a bit simpler (11 multiplications instead of 16).


So, in practice, Montgomery curves get a speed advantage, but not really because they are Montgomery curves:

  • As shown above, the advantage is slight (about 30%). It must be said, though, that implementation of Montgomery curves is much easier and it shows up as smaller code and less room for tricky bugs.

  • A much bigger speedup of Curve25519 over NIST curve P-256 comes from the definition of the base field. NIST P-256 uses integers modulo p = 2256 - 2224 + 2192 + 296 - 1. Curve25519 uses p = 2255 - 19. Both are integers chosen because they allow for faster modular reduction; however, NIST's choice is optimised for computer hardware of the late 1980s, while Curve25519 shoots at mid-2000s hardware, with much cheaper multiplication opcodes, and superscalar architectures. On modern hardware (and this includes modern "small" embedded CPU), the latter choice seems to be roughly twice faster than NIST's modulus.

  • An extra interesting feature of Curve25519 is that it is not one curve, but two curves. Every possible source value for the "u" coordinate will define either a point on the curve, or on the "twisted" curve, which is also cryptographically good (its order is a big prime multiplied by a very small cofactor). This allows X25519 to be safe without performing any validation of the incoming point, which again makes for simpler and shorter code. There again, that feature is not intrinsic to Montgomery curves; classic curves in short Weierstraß form can exhibit the same property, but the NIST curve does not.

  • Conversely, there is a complication that comes from Montgomery curves, which is that their order is necessarily a multiple of 8; hence, it cannot be prime. This means that a valid curve point is not necessarily a point on the prime-order subgroup on which we perform most operations. The X25519 specification accommodates that issue by forcing scalars to be multiple of 8. When Curve25519 (or its derivative edwards25519) is used in other, more complex cryptographic protocols, this property can be problematic and must be side-stepped appropriately (usually by the same methods of multiplying things by 8 or forcing multipliers to be multiple of 8). In a way, this is a trade-off: implementation is more simple, but at the expense of a bit of extra complexity in the protocols.


To summarise: the Montgomery ladder makes for a somewhat faster algorithm which is easier to implement, especially if you aim for constant-time code. But modern standard curves, that are Montgomery curves, also come with a few extra features that are quite nice to have, and yield even greater speedups; these extra features are not reserved to Montgomery curves, but NIST curves don't offer them because they had not been invented or at least noticed at that time.

As an historical note: Montgomery curves were first invented and used for the elliptic curve factorization method in which raw speed is of paramount importance, but the curves themselves don't have any actual cryptographic relevance.

posted @ 2020-11-26 13:55  zJanly  阅读(220)  评论(0编辑  收藏  举报