comparison tommath.src @ 386:97db060d0ef5 libtommath-orig libtommath-0.40

Update to LibTomMath 0.40
author Matt Johnston <matt@ucc.asn.au>
date Thu, 11 Jan 2007 03:11:15 +0000
parents 91fbc376f010
children
comparison
equal deleted inserted replaced
282:91fbc376f010 386:97db060d0ef5
64 \end{tabular} 64 \end{tabular}
65 %\end{small} 65 %\end{small}
66 } 66 }
67 } 67 }
68 \maketitle 68 \maketitle
69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the 69 This text has been placed in the public domain. This text corresponds to the v0.39 release of the
70 LibTomMath project. 70 LibTomMath project.
71 71
72 \begin{alltt} 72 \begin{alltt}
73 Tom St Denis 73 Tom St Denis
74 111 Banning Rd 74 111 Banning Rd
75 Ottawa, Ontario 75 Ottawa, Ontario
76 K2L 1C3 76 K2L 1C3
77 Canada 77 Canada
78 78
79 Phone: 1-613-836-3160 79 Phone: 1-613-836-3160
80 Email: [email protected] 80 Email: [email protected]
81 \end{alltt} 81 \end{alltt}
82 82
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} 83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{}
84 {\em book} macro package and the Perl {\em booker} package. 84 {\em book} macro package and the Perl {\em booker} package.
85 85
266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers 266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers
267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve 267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve
268 any form of useful performance in non-trivial applications. 268 any form of useful performance in non-trivial applications.
269 269
270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer 270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer
271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used 271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.com}} package is used
272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field 272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field
273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text 273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text
274 discusses a very large portion of the inner workings of the library. 274 discusses a very large portion of the inner workings of the library.
275 275
276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed 276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed
2188 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ 2188 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$
2189 left. 2189 left.
2190 2190
2191 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts 2191 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts
2192 required. If it is non-zero a modified shift loop is used to calculate the remaining product. 2192 required. If it is non-zero a modified shift loop is used to calculate the remaining product.
2193 Essentially the loop is a generic version of algorith mp\_mul2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ 2193 Essentially the loop is a generic version of algorithm mp\_mul\_2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$
2194 variable is used to extract the upper $d$ bits to form the carry for the next iteration. 2194 variable is used to extract the upper $d$ bits to form the carry for the next iteration.
2195 2195
2196 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to 2196 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to
2197 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. 2197 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow.
2198 2198
2609 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ 2609 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\
2610 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ 2610 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\
2611 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ 2611 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\
2612 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ 2612 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\
2613 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ 2613 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\
2614 6. $W_{pa} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\
2615 \\ 2614 \\
2616 7. $oldused \leftarrow c.used$ \\ 2615 6. $oldused \leftarrow c.used$ \\
2617 8. $c.used \leftarrow digs$ \\ 2616 7. $c.used \leftarrow digs$ \\
2618 9. for $ix$ from $0$ to $pa$ do \\ 2617 8. for $ix$ from $0$ to $pa$ do \\
2619 \hspace{3mm}9.1 $c_{ix} \leftarrow W_{ix}$ \\ 2618 \hspace{3mm}8.1 $c_{ix} \leftarrow W_{ix}$ \\
2620 10. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ 2619 9. for $ix$ from $pa + 1$ to $oldused - 1$ do \\
2621 \hspace{3mm}10.1 $c_{ix} \leftarrow 0$ \\ 2620 \hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\
2622 \\ 2621 \\
2623 11. Clamp $c$. \\ 2622 10. Clamp $c$. \\
2624 12. Return MP\_OKAY. \\ 2623 11. Return MP\_OKAY. \\
2625 \hline 2624 \hline
2626 \end{tabular} 2625 \end{tabular}
2627 \end{center} 2626 \end{center}
2628 \end{small} 2627 \end{small}
2629 \caption{Algorithm fast\_s\_mp\_mul\_digs} 2628 \caption{Algorithm fast\_s\_mp\_mul\_digs}
2773 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for 2772 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for
2774 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with 2773 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with
2775 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. 2774 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent.
2776 2775
2777 \begin{equation} 2776 \begin{equation}
2778 f(x) \cdot g(x) = acx^2 + ((a - b)(c - d) - (ac + bd))x + bd 2777 f(x) \cdot g(x) = acx^2 + ((a + b)(c + d) - (ac + bd))x + bd
2779 \end{equation} 2778 \end{equation}
2780 2779
2781 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying 2780 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying
2782 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns 2781 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns
2783 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points 2782 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points
2784 $\zeta_0$, $\zeta_{\infty}$ and $-\zeta_{-1}$. Consider the resultant system of equations. 2783 $\zeta_0$, $\zeta_{\infty}$ and $\zeta_{1}$. Consider the resultant system of equations.
2785 2784
2786 \begin{center} 2785 \begin{center}
2787 \begin{tabular}{rcrcrcrc} 2786 \begin{tabular}{rcrcrcrc}
2788 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ 2787 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\
2789 $-\zeta_{-1}$ & $=$ & $-w_2$ & $+$ & $w_1$ & $-$ & $w_0$ \\ 2788 $\zeta_{1}$ & $=$ & $w_2$ & $+$ & $w_1$ & $+$ & $w_0$ \\
2790 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ 2789 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\
2791 \end{tabular} 2790 \end{tabular}
2792 \end{center} 2791 \end{center}
2793 2792
2794 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity 2793 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity
2795 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} 2794 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.}
2796 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. It is worth noting that the point 2795 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman.
2797 $\zeta_1$ could be substituted for $-\zeta_{-1}$. In this case the first and third row are subtracted instead of added to the second row.
2798 2796
2799 \newpage\begin{figure}[!here] 2797 \newpage\begin{figure}[!here]
2800 \begin{small} 2798 \begin{small}
2801 \begin{center} 2799 \begin{center}
2802 \begin{tabular}{l} 2800 \begin{tabular}{l}
2815 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ 2813 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\
2816 \\ 2814 \\
2817 Calculate the three products. \\ 2815 Calculate the three products. \\
2818 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ 2816 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\
2819 9. $x1y1 \leftarrow x1 \cdot y1$ \\ 2817 9. $x1y1 \leftarrow x1 \cdot y1$ \\
2820 10. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ 2818 10. $t1 \leftarrow x1 + x0$ (\textit{mp\_add}) \\
2821 11. $x0 \leftarrow y1 - y0$ \\ 2819 11. $x0 \leftarrow y1 + y0$ \\
2822 12. $t1 \leftarrow t1 \cdot x0$ \\ 2820 12. $t1 \leftarrow t1 \cdot x0$ \\
2823 \\ 2821 \\
2824 Calculate the middle term. \\ 2822 Calculate the middle term. \\
2825 13. $x0 \leftarrow x0y0 + x1y1$ \\ 2823 13. $x0 \leftarrow x0y0 + x1y1$ \\
2826 14. $t1 \leftarrow x0 - t1$ \\ 2824 14. $t1 \leftarrow t1 - x0$ (\textit{s\_mp\_sub}) \\
2827 \\ 2825 \\
2828 Calculate the final product. \\ 2826 Calculate the final product. \\
2829 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ 2827 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\
2830 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ 2828 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\
2831 17. $t1 \leftarrow x0y0 + t1$ \\ 2829 17. $t1 \leftarrow x0y0 + t1$ \\
2848 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the 2846 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the
2849 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 2847 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5
2850 compute the lower halves. Step 6 and 7 computer the upper halves. 2848 compute the lower halves. Step 6 and 7 computer the upper halves.
2851 2849
2852 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products 2850 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products
2853 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 - x0$ has been computed. By using $x0$ instead 2851 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 + x0$ has been computed. By using $x0$ instead
2854 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. 2852 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation.
2855 2853
2856 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. 2854 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations.
2857 2855
2858 EXAM,bn_mp_karatsuba_mul.c 2856 EXAM,bn_mp_karatsuba_mul.c
3244 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. 3242 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square.
3245 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a 3243 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a
3246 number with the following equation. 3244 number with the following equation.
3247 3245
3248 \begin{equation} 3246 \begin{equation}
3249 h(x) = a^2x^2 + \left (a^2 + b^2 - (a - b)^2 \right )x + b^2 3247 h(x) = a^2x^2 + \left ((a + b)^2 - (a^2 + b^2) \right )x + b^2
3250 \end{equation} 3248 \end{equation}
3251 3249
3252 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a - b)^2$. As in 3250 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a + b)^2$. As in
3253 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of 3251 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of
3254 $O \left ( n^{lg(3)} \right )$. 3252 $O \left ( n^{lg(3)} \right )$.
3255 3253
3256 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm 3254 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm
3257 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the 3255 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the
3279 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ 3277 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\
3280 \\ 3278 \\
3281 Calculate the three squares. \\ 3279 Calculate the three squares. \\
3282 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ 3280 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\
3283 7. $x1x1 \leftarrow x1^2$ \\ 3281 7. $x1x1 \leftarrow x1^2$ \\
3284 8. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ 3282 8. $t1 \leftarrow x1 + x0$ (\textit{s\_mp\_add}) \\
3285 9. $t1 \leftarrow t1^2$ \\ 3283 9. $t1 \leftarrow t1^2$ \\
3286 \\ 3284 \\
3287 Compute the middle term. \\ 3285 Compute the middle term. \\
3288 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ 3286 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\
3289 11. $t1 \leftarrow t2 - t1$ \\ 3287 11. $t1 \leftarrow t1 - t2$ \\
3290 \\ 3288 \\
3291 Compute final product. \\ 3289 Compute final product. \\
3292 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ 3290 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\
3293 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ 3291 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\
3294 14. $t1 \leftarrow t1 + x0x0$ \\ 3292 14. $t1 \leftarrow t1 + x0x0$ \\
3307 3305
3308 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is 3306 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is
3309 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ 3307 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$
3310 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. 3308 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form.
3311 3309
3312 By expanding $\left (x1 - x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $x1^2 + x0^2 - (x1 - x0)^2 = 2 \cdot x0 \cdot x1$. 3310 By expanding $\left (x1 + x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $(x0 - x1)^2 - (x1^2 + x0^2) = 2 \cdot x0 \cdot x1$.
3313 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then 3311 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then
3314 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. 3312 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality.
3315 3313
3316 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or 3314 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or
3317 machine clock cycles.}. 3315 machine clock cycles.}.
3730 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ 3728 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\
3731 \hline $5$ & $x/2 = 278$ \\ 3729 \hline $5$ & $x/2 = 278$ \\
3732 \hline $6$ & $x/2 = 139$ \\ 3730 \hline $6$ & $x/2 = 139$ \\
3733 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ 3731 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\
3734 \hline $8$ & $x/2 = 99$ \\ 3732 \hline $8$ & $x/2 = 99$ \\
3733 \hline $9$ & $x + n = 356$, $x/2 = 178$ \\
3735 \hline 3734 \hline
3736 \end{tabular} 3735 \end{tabular}
3737 \end{center} 3736 \end{center}
3738 \end{small} 3737 \end{small}
3739 \caption{Example of Montgomery Reduction (I)} 3738 \caption{Example of Montgomery Reduction (I)}
3740 \label{fig:MONT1} 3739 \label{fig:MONT1}
3741 \end{figure} 3740 \end{figure}
3742 3741
3743 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 8$. The result of the algorithm $r = 99$ is 3742 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 9$ (note $\beta^k = 512$ which is larger than $n$). The result of
3744 congruent to the value of $2^{-8} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^8$ modulo $257$ the correct residue 3743 the algorithm $r = 178$ is congruent to the value of $2^{-9} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^9$ modulo $257$ the correct residue
3745 $r \equiv 158$ is produced. 3744 $r \equiv 158$ is produced.
3746 3745
3747 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts 3746 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts
3748 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. 3747 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful.
3749 Fortunately there exists an alternative representation of the algorithm. 3748 Fortunately there exists an alternative representation of the algorithm.
3751 \begin{figure}[!here] 3750 \begin{figure}[!here]
3752 \begin{small} 3751 \begin{small}
3753 \begin{center} 3752 \begin{center}
3754 \begin{tabular}{l} 3753 \begin{tabular}{l}
3755 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ 3754 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\
3756 \textbf{Input}. Integer $x$, $n$ and $k$ \\ 3755 \textbf{Input}. Integer $x$, $n$ and $k$ ($2^k > n$) \\
3757 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3756 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\
3758 \hline \\ 3757 \hline \\
3759 1. for $t$ from $0$ to $k - 1$ do \\ 3758 1. for $t$ from $1$ to $k$ do \\
3760 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ 3759 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\
3761 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ 3760 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\
3762 2. Return $x/2^k$. \\ 3761 2. Return $x/2^k$. \\
3763 \hline 3762 \hline
3764 \end{tabular} 3763 \end{tabular}
3782 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ 3781 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\
3783 \hline $5$ & $8896$ & $10001011000000$ \\ 3782 \hline $5$ & $8896$ & $10001011000000$ \\
3784 \hline $6$ & $8896$ & $10001011000000$ \\ 3783 \hline $6$ & $8896$ & $10001011000000$ \\
3785 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ 3784 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\
3786 \hline $8$ & $25344$ & $110001100000000$ \\ 3785 \hline $8$ & $25344$ & $110001100000000$ \\
3787 \hline -- & $x/2^k = 99$ & \\ 3786 \hline $9$ & $x + 2^{7}n = 91136$ & $10110010000000000$ \\
3787 \hline -- & $x/2^k = 178$ & \\
3788 \hline 3788 \hline
3789 \end{tabular} 3789 \end{tabular}
3790 \end{center} 3790 \end{center}
3791 \end{small} 3791 \end{small}
3792 \caption{Example of Montgomery Reduction (II)} 3792 \caption{Example of Montgomery Reduction (II)}
3793 \label{fig:MONT2} 3793 \label{fig:MONT2}
3794 \end{figure} 3794 \end{figure}
3795 3795
3796 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 8$. 3796 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 9$.
3797 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the 3797 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the
3798 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is 3798 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is
3799 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. 3799 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero.
3800 3800
3801 \subsection{Digit Based Montgomery Reduction} 3801 \subsection{Digit Based Montgomery Reduction}
3805 \begin{figure}[!here] 3805 \begin{figure}[!here]
3806 \begin{small} 3806 \begin{small}
3807 \begin{center} 3807 \begin{center}
3808 \begin{tabular}{l} 3808 \begin{tabular}{l}
3809 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ 3809 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\
3810 \textbf{Input}. Integer $x$, $n$ and $k$ \\ 3810 \textbf{Input}. Integer $x$, $n$ and $k$ ($\beta^k > n$) \\
3811 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 3811 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\
3812 \hline \\ 3812 \hline \\
3813 1. for $t$ from $0$ to $k - 1$ do \\ 3813 1. for $t$ from $0$ to $k - 1$ do \\
3814 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ 3814 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\
3815 2. Return $x/\beta^k$. \\ 3815 2. Return $x/\beta^k$. \\
4033 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ 4033 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\
4034 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ 4034 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\
4035 \hline \\ 4035 \hline \\
4036 1. $b \leftarrow n_0$ \\ 4036 1. $b \leftarrow n_0$ \\
4037 2. If $b$ is even return(\textit{MP\_VAL}) \\ 4037 2. If $b$ is even return(\textit{MP\_VAL}) \\
4038 3. $x \leftarrow ((b + 2) \mbox{ AND } 4) << 1) + b$ \\ 4038 3. $x \leftarrow (((b + 2) \mbox{ AND } 4) << 1) + b$ \\
4039 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ 4039 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\
4040 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ 4040 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\
4041 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ 4041 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\
4042 6. Return(\textit{MP\_OKAY}). \\ 4042 6. Return(\textit{MP\_OKAY}). \\
4043 \hline 4043 \hline
4937 By step 13 there are no more digits left in the exponent. However, there may be partial bits in the window left. If $mode = 2$ then 4937 By step 13 there are no more digits left in the exponent. However, there may be partial bits in the window left. If $mode = 2$ then
4938 a Left-to-Right algorithm is used to process the remaining few bits. 4938 a Left-to-Right algorithm is used to process the remaining few bits.
4939 4939
4940 EXAM,bn_s_mp_exptmod.c 4940 EXAM,bn_s_mp_exptmod.c
4941 4941
4942 Lines @26,if@ through @40,}@ determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted 4942 Lines @31,if@ through @45,}@ determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted
4943 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement 4943 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement
4944 on line @32,if@ the value of $x$ is already known to be greater than $140$. 4944 on line @37,if@ the value of $x$ is already known to be greater than $140$.
4945 4945
4946 The conditional piece of code beginning on line @42,ifdef@ allows the window size to be restricted to five bits. This logic is used to ensure 4946 The conditional piece of code beginning on line @42,ifdef@ allows the window size to be restricted to five bits. This logic is used to ensure
4947 the table of precomputed powers of $G$ remains relatively small. 4947 the table of precomputed powers of $G$ remains relatively small.
4948 4948
4949 The for loop on line @49,for@ initializes the $M$ array while lines @59,mp_init@ and @62,mp_reduce@ compute the value of $\mu$ required for 4949 The for loop on line @60,for@ initializes the $M$ array while lines @71,mp_init@ and @75,mp_reduce@ through @85,}@ initialize the reduction
4950 Barrett reduction. 4950 function that will be used for this modulus.
4951 4951
4952 -- More later. 4952 -- More later.
4953 4953
4954 \section{Quick Power of Two} 4954 \section{Quick Power of Two}
4955 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is 4955 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is
5228 5228
5229 \begin{verbatim} 5229 \begin{verbatim}
5230 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ 5230 mp_div(&a, &b, &c, NULL); /* c = [a/b] */
5231 \end{verbatim} 5231 \end{verbatim}
5232 5232
5233 Lines @37,if@ and @42,if@ handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor 5233 Lines @108,if@ and @113,if@ handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor
5234 respectively. After the two trivial cases all of the temporary variables are initialized. Line @76,neg@ determines the sign of 5234 respectively. After the two trivial cases all of the temporary variables are initialized. Line @147,neg@ determines the sign of
5235 the quotient and line @77,sign@ ensures that both $x$ and $y$ are positive. 5235 the quotient and line @148,sign@ ensures that both $x$ and $y$ are positive.
5236 5236
5237 The number of bits in the leading digit is calculated on line @80,norm@. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits 5237 The number of bits in the leading digit is calculated on line @151,norm@. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits
5238 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is 5238 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is
5239 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting 5239 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting
5240 them to the left by $lg(\beta) - 1 - k$ bits. 5240 them to the left by $lg(\beta) - 1 - k$ bits.
5241 5241
5242 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the 5242 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the
5243 leading digit of the quotient. The loop beginning on line @113,for@ will produce the remainder of the quotient digits. 5243 leading digit of the quotient. The loop beginning on line @184,for@ will produce the remainder of the quotient digits.
5244 5244
5245 The conditional ``continue'' on line @114,if@ is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the 5245 The conditional ``continue'' on line @186,continue@ is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the
5246 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits 5246 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits
5247 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. 5247 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}.
5248 5248
5249 Lines @142,t1@, @143,t1@ and @150,t2@ through @152,t2@ manually construct the high accuracy estimations by setting the digits of the two mp\_int 5249 Lines @214,t1@, @216,t1@ and @222,t2@ through @225,t2@ manually construct the high accuracy estimations by setting the digits of the two mp\_int
5250 variables directly. 5250 variables directly.
5251 5251
5252 \section{Single Digit Helpers} 5252 \section{Single Digit Helpers}
5253 5253
5254 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of 5254 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of
5742 \begin{tabular}{l} 5742 \begin{tabular}{l}
5743 \hline Algorithm \textbf{mp\_gcd}. \\ 5743 \hline Algorithm \textbf{mp\_gcd}. \\
5744 \textbf{Input}. mp\_int $a$ and $b$ \\ 5744 \textbf{Input}. mp\_int $a$ and $b$ \\
5745 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ 5745 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\
5746 \hline \\ 5746 \hline \\
5747 1. If $a = 0$ and $b \ne 0$ then \\ 5747 1. If $a = 0$ then \\
5748 \hspace{3mm}1.1 $c \leftarrow b$ \\ 5748 \hspace{3mm}1.1 $c \leftarrow \vert b \vert $ \\
5749 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ 5749 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\
5750 2. If $a \ne 0$ and $b = 0$ then \\ 5750 2. If $b = 0$ then \\
5751 \hspace{3mm}2.1 $c \leftarrow a$ \\ 5751 \hspace{3mm}2.1 $c \leftarrow \vert a \vert $ \\
5752 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ 5752 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\
5753 3. If $a = b = 0$ then \\ 5753 3. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\
5754 \hspace{3mm}3.1 $c \leftarrow 1$ \\ 5754 4. $k \leftarrow 0$ \\
5755 \hspace{3mm}3.2 Return(\textit{MP\_OKAY}). \\ 5755 5. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
5756 4. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ 5756 \hspace{3mm}5.1 $k \leftarrow k + 1$ \\
5757 5. $k \leftarrow 0$ \\ 5757 \hspace{3mm}5.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\
5758 6. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5758 \hspace{3mm}5.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\
5759 \hspace{3mm}6.1 $k \leftarrow k + 1$ \\ 5759 6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
5760 \hspace{3mm}6.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 5760 \hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\
5761 \hspace{3mm}6.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 5761 7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
5762 7. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5762 \hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
5763 \hspace{3mm}7.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 5763 8. While $v.used > 0$ \\
5764 8. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5764 \hspace{3mm}8.1 If $\vert u \vert > \vert v \vert$ then \\
5765 \hspace{3mm}8.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 5765 \hspace{6mm}8.1.1 Swap $u$ and $v$. \\
5766 9. While $v.used > 0$ \\ 5766 \hspace{3mm}8.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\
5767 \hspace{3mm}9.1 If $\vert u \vert > \vert v \vert$ then \\ 5767 \hspace{3mm}8.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
5768 \hspace{6mm}9.1.1 Swap $u$ and $v$. \\ 5768 \hspace{6mm}8.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
5769 \hspace{3mm}9.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ 5769 9. $c \leftarrow u \cdot 2^k$ \\
5770 \hspace{3mm}9.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 5770 10. Return(\textit{MP\_OKAY}). \\
5771 \hspace{6mm}9.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
5772 10. $c \leftarrow u \cdot 2^k$ \\
5773 11. Return(\textit{MP\_OKAY}). \\
5774 \hline 5771 \hline
5775 \end{tabular} 5772 \end{tabular}
5776 \end{center} 5773 \end{center}
5777 \end{small} 5774 \end{small}
5778 \caption{Algorithm mp\_gcd} 5775 \caption{Algorithm mp\_gcd}
5780 \textbf{Algorithm mp\_gcd.} 5777 \textbf{Algorithm mp\_gcd.}
5781 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of 5778 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of
5782 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as 5779 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as
5783 Algorithm B and in practice this appears to be true. 5780 Algorithm B and in practice this appears to be true.
5784 5781
5785 The first three steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the 5782 The first two steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the
5786 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of 5783 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of
5787 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. 5784 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair.
5788 5785
5789 Step six will divide out any common factors of two and keep track of the count in the variable $k$. After this step two is no longer a 5786 Step five will divide out any common factors of two and keep track of the count in the variable $k$. After this step, two is no longer a
5790 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step 5787 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step
5791 seven and eight ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while loops will iterate since 5788 six and seven ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while--loops will iterate since
5792 they cannot both be even. 5789 they cannot both be even.
5793 5790
5794 By step nine both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to 5791 By step eight both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to
5795 or greater than $u$. This ensures that the subtraction on step 9.2 will always produce a positive and even result. Step 9.3 removes any 5792 or greater than $u$. This ensures that the subtraction on step 8.2 will always produce a positive and even result. Step 8.3 removes any
5796 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. 5793 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd.
5797 5794
5798 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result 5795 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result
5799 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. 5796 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier.
5800 5797
5801 EXAM,bn_mp_gcd.c 5798 EXAM,bn_mp_gcd.c
5802 5799
5803 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the 5800 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the
5804 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise 5801 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise
5805 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three 5802 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three
5806 trivial cases of inputs are handled on lines @25,zero@ through @34,}@. After those lines the inputs are assumed to be non-zero. 5803 trivial cases of inputs are handled on lines @23,zero@ through @29,}@. After those lines the inputs are assumed to be non-zero.
5807 5804
5808 Lines @36,if@ and @40,if@ make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two 5805 Lines @32,if@ and @36,if@ make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two
5809 must be divided out of the two inputs. The while loop on line @49,while@ iterates so long as both are even. The local integer $k$ is used to 5806 must be divided out of the two inputs. The block starting at line @43,common@ removes common factors of two by first counting the number of trailing
5810 keep track of how many factors of $2$ are pulled out of both values. It is assumed that the number of factors will not exceed the maximum 5807 zero bits in both. The local integer $k$ is used to keep track of how many factors of $2$ are pulled out of both values. It is assumed that
5811 value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than entries than are accessible by an ``int'' so this is not 5808 the number of factors will not exceed the maximum value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than
5812 a limitation.}. 5809 entries than are accessible by an ``int'' so this is not a limitation.}.
5813 5810
5814 At this point there are no more common factors of two in the two values. The while loops on lines @60,while@ and @65,while@ remove any independent 5811 At this point there are no more common factors of two in the two values. The divisions by a power of two on lines @60,div_2d@ and @67,div_2d@ remove
5815 factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop 5812 any independent factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop
5816 on line @71, while@ performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in 5813 on line @72, while@ performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in
5817 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. 5814 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative.
5818 5815
5819 \section{Least Common Multiple} 5816 \section{Least Common Multiple}
5820 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the 5817 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the
5821 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ 5818 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$