Mercurial > dropbear
comparison tommath.src @ 386:97db060d0ef5 libtommath-orig libtommath-0.40
Update to LibTomMath 0.40
author | Matt Johnston <matt@ucc.asn.au> |
---|---|
date | Thu, 11 Jan 2007 03:11:15 +0000 |
parents | 91fbc376f010 |
children |
comparison
equal
deleted
inserted
replaced
282:91fbc376f010 | 386:97db060d0ef5 |
---|---|
64 \end{tabular} | 64 \end{tabular} |
65 %\end{small} | 65 %\end{small} |
66 } | 66 } |
67 } | 67 } |
68 \maketitle | 68 \maketitle |
69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the | 69 This text has been placed in the public domain. This text corresponds to the v0.39 release of the |
70 LibTomMath project. | 70 LibTomMath project. |
71 | 71 |
72 \begin{alltt} | 72 \begin{alltt} |
73 Tom St Denis | 73 Tom St Denis |
74 111 Banning Rd | 74 111 Banning Rd |
75 Ottawa, Ontario | 75 Ottawa, Ontario |
76 K2L 1C3 | 76 K2L 1C3 |
77 Canada | 77 Canada |
78 | 78 |
79 Phone: 1-613-836-3160 | 79 Phone: 1-613-836-3160 |
80 Email: [email protected] | 80 Email: [email protected] |
81 \end{alltt} | 81 \end{alltt} |
82 | 82 |
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} | 83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} |
84 {\em book} macro package and the Perl {\em booker} package. | 84 {\em book} macro package and the Perl {\em booker} package. |
85 | 85 |
266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers | 266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers |
267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve | 267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve |
268 any form of useful performance in non-trivial applications. | 268 any form of useful performance in non-trivial applications. |
269 | 269 |
270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer | 270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer |
271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used | 271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.com}} package is used |
272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field | 272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field |
273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text | 273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text |
274 discusses a very large portion of the inner workings of the library. | 274 discusses a very large portion of the inner workings of the library. |
275 | 275 |
276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed | 276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed |
2188 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ | 2188 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ |
2189 left. | 2189 left. |
2190 | 2190 |
2191 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts | 2191 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts |
2192 required. If it is non-zero a modified shift loop is used to calculate the remaining product. | 2192 required. If it is non-zero a modified shift loop is used to calculate the remaining product. |
2193 Essentially the loop is a generic version of algorith mp\_mul2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ | 2193 Essentially the loop is a generic version of algorithm mp\_mul\_2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ |
2194 variable is used to extract the upper $d$ bits to form the carry for the next iteration. | 2194 variable is used to extract the upper $d$ bits to form the carry for the next iteration. |
2195 | 2195 |
2196 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to | 2196 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to |
2197 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. | 2197 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. |
2198 | 2198 |
2609 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ | 2609 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ |
2610 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ | 2610 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ |
2611 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ | 2611 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ |
2612 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ | 2612 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ |
2613 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ | 2613 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ |
2614 6. $W_{pa} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\ | |
2615 \\ | 2614 \\ |
2616 7. $oldused \leftarrow c.used$ \\ | 2615 6. $oldused \leftarrow c.used$ \\ |
2617 8. $c.used \leftarrow digs$ \\ | 2616 7. $c.used \leftarrow digs$ \\ |
2618 9. for $ix$ from $0$ to $pa$ do \\ | 2617 8. for $ix$ from $0$ to $pa$ do \\ |
2619 \hspace{3mm}9.1 $c_{ix} \leftarrow W_{ix}$ \\ | 2618 \hspace{3mm}8.1 $c_{ix} \leftarrow W_{ix}$ \\ |
2620 10. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ | 2619 9. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ |
2621 \hspace{3mm}10.1 $c_{ix} \leftarrow 0$ \\ | 2620 \hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\ |
2622 \\ | 2621 \\ |
2623 11. Clamp $c$. \\ | 2622 10. Clamp $c$. \\ |
2624 12. Return MP\_OKAY. \\ | 2623 11. Return MP\_OKAY. \\ |
2625 \hline | 2624 \hline |
2626 \end{tabular} | 2625 \end{tabular} |
2627 \end{center} | 2626 \end{center} |
2628 \end{small} | 2627 \end{small} |
2629 \caption{Algorithm fast\_s\_mp\_mul\_digs} | 2628 \caption{Algorithm fast\_s\_mp\_mul\_digs} |
2773 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for | 2772 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for |
2774 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with | 2773 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with |
2775 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. | 2774 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. |
2776 | 2775 |
2777 \begin{equation} | 2776 \begin{equation} |
2778 f(x) \cdot g(x) = acx^2 + ((a - b)(c - d) - (ac + bd))x + bd | 2777 f(x) \cdot g(x) = acx^2 + ((a + b)(c + d) - (ac + bd))x + bd |
2779 \end{equation} | 2778 \end{equation} |
2780 | 2779 |
2781 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying | 2780 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying |
2782 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns | 2781 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns |
2783 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points | 2782 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points |
2784 $\zeta_0$, $\zeta_{\infty}$ and $-\zeta_{-1}$. Consider the resultant system of equations. | 2783 $\zeta_0$, $\zeta_{\infty}$ and $\zeta_{1}$. Consider the resultant system of equations. |
2785 | 2784 |
2786 \begin{center} | 2785 \begin{center} |
2787 \begin{tabular}{rcrcrcrc} | 2786 \begin{tabular}{rcrcrcrc} |
2788 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ | 2787 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ |
2789 $-\zeta_{-1}$ & $=$ & $-w_2$ & $+$ & $w_1$ & $-$ & $w_0$ \\ | 2788 $\zeta_{1}$ & $=$ & $w_2$ & $+$ & $w_1$ & $+$ & $w_0$ \\ |
2790 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ | 2789 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ |
2791 \end{tabular} | 2790 \end{tabular} |
2792 \end{center} | 2791 \end{center} |
2793 | 2792 |
2794 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity | 2793 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity |
2795 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} | 2794 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} |
2796 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. It is worth noting that the point | 2795 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. |
2797 $\zeta_1$ could be substituted for $-\zeta_{-1}$. In this case the first and third row are subtracted instead of added to the second row. | |
2798 | 2796 |
2799 \newpage\begin{figure}[!here] | 2797 \newpage\begin{figure}[!here] |
2800 \begin{small} | 2798 \begin{small} |
2801 \begin{center} | 2799 \begin{center} |
2802 \begin{tabular}{l} | 2800 \begin{tabular}{l} |
2815 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ | 2813 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ |
2816 \\ | 2814 \\ |
2817 Calculate the three products. \\ | 2815 Calculate the three products. \\ |
2818 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ | 2816 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ |
2819 9. $x1y1 \leftarrow x1 \cdot y1$ \\ | 2817 9. $x1y1 \leftarrow x1 \cdot y1$ \\ |
2820 10. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ | 2818 10. $t1 \leftarrow x1 + x0$ (\textit{mp\_add}) \\ |
2821 11. $x0 \leftarrow y1 - y0$ \\ | 2819 11. $x0 \leftarrow y1 + y0$ \\ |
2822 12. $t1 \leftarrow t1 \cdot x0$ \\ | 2820 12. $t1 \leftarrow t1 \cdot x0$ \\ |
2823 \\ | 2821 \\ |
2824 Calculate the middle term. \\ | 2822 Calculate the middle term. \\ |
2825 13. $x0 \leftarrow x0y0 + x1y1$ \\ | 2823 13. $x0 \leftarrow x0y0 + x1y1$ \\ |
2826 14. $t1 \leftarrow x0 - t1$ \\ | 2824 14. $t1 \leftarrow t1 - x0$ (\textit{s\_mp\_sub}) \\ |
2827 \\ | 2825 \\ |
2828 Calculate the final product. \\ | 2826 Calculate the final product. \\ |
2829 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ | 2827 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ |
2830 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ | 2828 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ |
2831 17. $t1 \leftarrow x0y0 + t1$ \\ | 2829 17. $t1 \leftarrow x0y0 + t1$ \\ |
2848 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the | 2846 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the |
2849 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 | 2847 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 |
2850 compute the lower halves. Step 6 and 7 computer the upper halves. | 2848 compute the lower halves. Step 6 and 7 computer the upper halves. |
2851 | 2849 |
2852 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products | 2850 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products |
2853 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 - x0$ has been computed. By using $x0$ instead | 2851 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 + x0$ has been computed. By using $x0$ instead |
2854 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. | 2852 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. |
2855 | 2853 |
2856 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. | 2854 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. |
2857 | 2855 |
2858 EXAM,bn_mp_karatsuba_mul.c | 2856 EXAM,bn_mp_karatsuba_mul.c |
3244 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. | 3242 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. |
3245 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a | 3243 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a |
3246 number with the following equation. | 3244 number with the following equation. |
3247 | 3245 |
3248 \begin{equation} | 3246 \begin{equation} |
3249 h(x) = a^2x^2 + \left (a^2 + b^2 - (a - b)^2 \right )x + b^2 | 3247 h(x) = a^2x^2 + \left ((a + b)^2 - (a^2 + b^2) \right )x + b^2 |
3250 \end{equation} | 3248 \end{equation} |
3251 | 3249 |
3252 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a - b)^2$. As in | 3250 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a + b)^2$. As in |
3253 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of | 3251 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of |
3254 $O \left ( n^{lg(3)} \right )$. | 3252 $O \left ( n^{lg(3)} \right )$. |
3255 | 3253 |
3256 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm | 3254 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm |
3257 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the | 3255 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the |
3279 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ | 3277 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ |
3280 \\ | 3278 \\ |
3281 Calculate the three squares. \\ | 3279 Calculate the three squares. \\ |
3282 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ | 3280 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ |
3283 7. $x1x1 \leftarrow x1^2$ \\ | 3281 7. $x1x1 \leftarrow x1^2$ \\ |
3284 8. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ | 3282 8. $t1 \leftarrow x1 + x0$ (\textit{s\_mp\_add}) \\ |
3285 9. $t1 \leftarrow t1^2$ \\ | 3283 9. $t1 \leftarrow t1^2$ \\ |
3286 \\ | 3284 \\ |
3287 Compute the middle term. \\ | 3285 Compute the middle term. \\ |
3288 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ | 3286 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ |
3289 11. $t1 \leftarrow t2 - t1$ \\ | 3287 11. $t1 \leftarrow t1 - t2$ \\ |
3290 \\ | 3288 \\ |
3291 Compute final product. \\ | 3289 Compute final product. \\ |
3292 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ | 3290 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ |
3293 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ | 3291 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ |
3294 14. $t1 \leftarrow t1 + x0x0$ \\ | 3292 14. $t1 \leftarrow t1 + x0x0$ \\ |
3307 | 3305 |
3308 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is | 3306 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is |
3309 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ | 3307 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ |
3310 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. | 3308 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. |
3311 | 3309 |
3312 By expanding $\left (x1 - x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $x1^2 + x0^2 - (x1 - x0)^2 = 2 \cdot x0 \cdot x1$. | 3310 By expanding $\left (x1 + x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $(x0 - x1)^2 - (x1^2 + x0^2) = 2 \cdot x0 \cdot x1$. |
3313 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then | 3311 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then |
3314 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. | 3312 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. |
3315 | 3313 |
3316 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or | 3314 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or |
3317 machine clock cycles.}. | 3315 machine clock cycles.}. |
3730 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ | 3728 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ |
3731 \hline $5$ & $x/2 = 278$ \\ | 3729 \hline $5$ & $x/2 = 278$ \\ |
3732 \hline $6$ & $x/2 = 139$ \\ | 3730 \hline $6$ & $x/2 = 139$ \\ |
3733 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ | 3731 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ |
3734 \hline $8$ & $x/2 = 99$ \\ | 3732 \hline $8$ & $x/2 = 99$ \\ |
3733 \hline $9$ & $x + n = 356$, $x/2 = 178$ \\ | |
3735 \hline | 3734 \hline |
3736 \end{tabular} | 3735 \end{tabular} |
3737 \end{center} | 3736 \end{center} |
3738 \end{small} | 3737 \end{small} |
3739 \caption{Example of Montgomery Reduction (I)} | 3738 \caption{Example of Montgomery Reduction (I)} |
3740 \label{fig:MONT1} | 3739 \label{fig:MONT1} |
3741 \end{figure} | 3740 \end{figure} |
3742 | 3741 |
3743 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 8$. The result of the algorithm $r = 99$ is | 3742 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 9$ (note $\beta^k = 512$ which is larger than $n$). The result of |
3744 congruent to the value of $2^{-8} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^8$ modulo $257$ the correct residue | 3743 the algorithm $r = 178$ is congruent to the value of $2^{-9} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^9$ modulo $257$ the correct residue |
3745 $r \equiv 158$ is produced. | 3744 $r \equiv 158$ is produced. |
3746 | 3745 |
3747 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts | 3746 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts |
3748 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. | 3747 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. |
3749 Fortunately there exists an alternative representation of the algorithm. | 3748 Fortunately there exists an alternative representation of the algorithm. |
3751 \begin{figure}[!here] | 3750 \begin{figure}[!here] |
3752 \begin{small} | 3751 \begin{small} |
3753 \begin{center} | 3752 \begin{center} |
3754 \begin{tabular}{l} | 3753 \begin{tabular}{l} |
3755 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ | 3754 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ |
3756 \textbf{Input}. Integer $x$, $n$ and $k$ \\ | 3755 \textbf{Input}. Integer $x$, $n$ and $k$ ($2^k > n$) \\ |
3757 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ | 3756 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ |
3758 \hline \\ | 3757 \hline \\ |
3759 1. for $t$ from $0$ to $k - 1$ do \\ | 3758 1. for $t$ from $1$ to $k$ do \\ |
3760 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ | 3759 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ |
3761 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ | 3760 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ |
3762 2. Return $x/2^k$. \\ | 3761 2. Return $x/2^k$. \\ |
3763 \hline | 3762 \hline |
3764 \end{tabular} | 3763 \end{tabular} |
3782 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ | 3781 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ |
3783 \hline $5$ & $8896$ & $10001011000000$ \\ | 3782 \hline $5$ & $8896$ & $10001011000000$ \\ |
3784 \hline $6$ & $8896$ & $10001011000000$ \\ | 3783 \hline $6$ & $8896$ & $10001011000000$ \\ |
3785 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ | 3784 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ |
3786 \hline $8$ & $25344$ & $110001100000000$ \\ | 3785 \hline $8$ & $25344$ & $110001100000000$ \\ |
3787 \hline -- & $x/2^k = 99$ & \\ | 3786 \hline $9$ & $x + 2^{7}n = 91136$ & $10110010000000000$ \\ |
3787 \hline -- & $x/2^k = 178$ & \\ | |
3788 \hline | 3788 \hline |
3789 \end{tabular} | 3789 \end{tabular} |
3790 \end{center} | 3790 \end{center} |
3791 \end{small} | 3791 \end{small} |
3792 \caption{Example of Montgomery Reduction (II)} | 3792 \caption{Example of Montgomery Reduction (II)} |
3793 \label{fig:MONT2} | 3793 \label{fig:MONT2} |
3794 \end{figure} | 3794 \end{figure} |
3795 | 3795 |
3796 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 8$. | 3796 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 9$. |
3797 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the | 3797 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the |
3798 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is | 3798 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is |
3799 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. | 3799 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. |
3800 | 3800 |
3801 \subsection{Digit Based Montgomery Reduction} | 3801 \subsection{Digit Based Montgomery Reduction} |
3805 \begin{figure}[!here] | 3805 \begin{figure}[!here] |
3806 \begin{small} | 3806 \begin{small} |
3807 \begin{center} | 3807 \begin{center} |
3808 \begin{tabular}{l} | 3808 \begin{tabular}{l} |
3809 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ | 3809 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ |
3810 \textbf{Input}. Integer $x$, $n$ and $k$ \\ | 3810 \textbf{Input}. Integer $x$, $n$ and $k$ ($\beta^k > n$) \\ |
3811 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ | 3811 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ |
3812 \hline \\ | 3812 \hline \\ |
3813 1. for $t$ from $0$ to $k - 1$ do \\ | 3813 1. for $t$ from $0$ to $k - 1$ do \\ |
3814 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ | 3814 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ |
3815 2. Return $x/\beta^k$. \\ | 3815 2. Return $x/\beta^k$. \\ |
4033 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ | 4033 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ |
4034 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ | 4034 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ |
4035 \hline \\ | 4035 \hline \\ |
4036 1. $b \leftarrow n_0$ \\ | 4036 1. $b \leftarrow n_0$ \\ |
4037 2. If $b$ is even return(\textit{MP\_VAL}) \\ | 4037 2. If $b$ is even return(\textit{MP\_VAL}) \\ |
4038 3. $x \leftarrow ((b + 2) \mbox{ AND } 4) << 1) + b$ \\ | 4038 3. $x \leftarrow (((b + 2) \mbox{ AND } 4) << 1) + b$ \\ |
4039 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ | 4039 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ |
4040 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ | 4040 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ |
4041 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ | 4041 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ |
4042 6. Return(\textit{MP\_OKAY}). \\ | 4042 6. Return(\textit{MP\_OKAY}). \\ |
4043 \hline | 4043 \hline |
4937 By step 13 there are no more digits left in the exponent. However, there may be partial bits in the window left. If $mode = 2$ then | 4937 By step 13 there are no more digits left in the exponent. However, there may be partial bits in the window left. If $mode = 2$ then |
4938 a Left-to-Right algorithm is used to process the remaining few bits. | 4938 a Left-to-Right algorithm is used to process the remaining few bits. |
4939 | 4939 |
4940 EXAM,bn_s_mp_exptmod.c | 4940 EXAM,bn_s_mp_exptmod.c |
4941 | 4941 |
4942 Lines @26,if@ through @40,}@ determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted | 4942 Lines @31,if@ through @45,}@ determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted |
4943 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement | 4943 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement |
4944 on line @32,if@ the value of $x$ is already known to be greater than $140$. | 4944 on line @37,if@ the value of $x$ is already known to be greater than $140$. |
4945 | 4945 |
4946 The conditional piece of code beginning on line @42,ifdef@ allows the window size to be restricted to five bits. This logic is used to ensure | 4946 The conditional piece of code beginning on line @42,ifdef@ allows the window size to be restricted to five bits. This logic is used to ensure |
4947 the table of precomputed powers of $G$ remains relatively small. | 4947 the table of precomputed powers of $G$ remains relatively small. |
4948 | 4948 |
4949 The for loop on line @49,for@ initializes the $M$ array while lines @59,mp_init@ and @62,mp_reduce@ compute the value of $\mu$ required for | 4949 The for loop on line @60,for@ initializes the $M$ array while lines @71,mp_init@ and @75,mp_reduce@ through @85,}@ initialize the reduction |
4950 Barrett reduction. | 4950 function that will be used for this modulus. |
4951 | 4951 |
4952 -- More later. | 4952 -- More later. |
4953 | 4953 |
4954 \section{Quick Power of Two} | 4954 \section{Quick Power of Two} |
4955 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is | 4955 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is |
5228 | 5228 |
5229 \begin{verbatim} | 5229 \begin{verbatim} |
5230 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ | 5230 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ |
5231 \end{verbatim} | 5231 \end{verbatim} |
5232 | 5232 |
5233 Lines @37,if@ and @42,if@ handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor | 5233 Lines @108,if@ and @113,if@ handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor |
5234 respectively. After the two trivial cases all of the temporary variables are initialized. Line @76,neg@ determines the sign of | 5234 respectively. After the two trivial cases all of the temporary variables are initialized. Line @147,neg@ determines the sign of |
5235 the quotient and line @77,sign@ ensures that both $x$ and $y$ are positive. | 5235 the quotient and line @148,sign@ ensures that both $x$ and $y$ are positive. |
5236 | 5236 |
5237 The number of bits in the leading digit is calculated on line @80,norm@. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits | 5237 The number of bits in the leading digit is calculated on line @151,norm@. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits |
5238 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is | 5238 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is |
5239 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting | 5239 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting |
5240 them to the left by $lg(\beta) - 1 - k$ bits. | 5240 them to the left by $lg(\beta) - 1 - k$ bits. |
5241 | 5241 |
5242 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the | 5242 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the |
5243 leading digit of the quotient. The loop beginning on line @113,for@ will produce the remainder of the quotient digits. | 5243 leading digit of the quotient. The loop beginning on line @184,for@ will produce the remainder of the quotient digits. |
5244 | 5244 |
5245 The conditional ``continue'' on line @114,if@ is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the | 5245 The conditional ``continue'' on line @186,continue@ is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the |
5246 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits | 5246 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits |
5247 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. | 5247 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. |
5248 | 5248 |
5249 Lines @142,t1@, @143,t1@ and @150,t2@ through @152,t2@ manually construct the high accuracy estimations by setting the digits of the two mp\_int | 5249 Lines @214,t1@, @216,t1@ and @222,t2@ through @225,t2@ manually construct the high accuracy estimations by setting the digits of the two mp\_int |
5250 variables directly. | 5250 variables directly. |
5251 | 5251 |
5252 \section{Single Digit Helpers} | 5252 \section{Single Digit Helpers} |
5253 | 5253 |
5254 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of | 5254 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of |
5742 \begin{tabular}{l} | 5742 \begin{tabular}{l} |
5743 \hline Algorithm \textbf{mp\_gcd}. \\ | 5743 \hline Algorithm \textbf{mp\_gcd}. \\ |
5744 \textbf{Input}. mp\_int $a$ and $b$ \\ | 5744 \textbf{Input}. mp\_int $a$ and $b$ \\ |
5745 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ | 5745 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ |
5746 \hline \\ | 5746 \hline \\ |
5747 1. If $a = 0$ and $b \ne 0$ then \\ | 5747 1. If $a = 0$ then \\ |
5748 \hspace{3mm}1.1 $c \leftarrow b$ \\ | 5748 \hspace{3mm}1.1 $c \leftarrow \vert b \vert $ \\ |
5749 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ | 5749 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ |
5750 2. If $a \ne 0$ and $b = 0$ then \\ | 5750 2. If $b = 0$ then \\ |
5751 \hspace{3mm}2.1 $c \leftarrow a$ \\ | 5751 \hspace{3mm}2.1 $c \leftarrow \vert a \vert $ \\ |
5752 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ | 5752 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ |
5753 3. If $a = b = 0$ then \\ | 5753 3. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ |
5754 \hspace{3mm}3.1 $c \leftarrow 1$ \\ | 5754 4. $k \leftarrow 0$ \\ |
5755 \hspace{3mm}3.2 Return(\textit{MP\_OKAY}). \\ | 5755 5. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
5756 4. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ | 5756 \hspace{3mm}5.1 $k \leftarrow k + 1$ \\ |
5757 5. $k \leftarrow 0$ \\ | 5757 \hspace{3mm}5.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ |
5758 6. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 5758 \hspace{3mm}5.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
5759 \hspace{3mm}6.1 $k \leftarrow k + 1$ \\ | 5759 6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
5760 \hspace{3mm}6.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ | 5760 \hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ |
5761 \hspace{3mm}6.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | 5761 7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
5762 7. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 5762 \hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
5763 \hspace{3mm}7.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ | 5763 8. While $v.used > 0$ \\ |
5764 8. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 5764 \hspace{3mm}8.1 If $\vert u \vert > \vert v \vert$ then \\ |
5765 \hspace{3mm}8.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | 5765 \hspace{6mm}8.1.1 Swap $u$ and $v$. \\ |
5766 9. While $v.used > 0$ \\ | 5766 \hspace{3mm}8.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ |
5767 \hspace{3mm}9.1 If $\vert u \vert > \vert v \vert$ then \\ | 5767 \hspace{3mm}8.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
5768 \hspace{6mm}9.1.1 Swap $u$ and $v$. \\ | 5768 \hspace{6mm}8.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
5769 \hspace{3mm}9.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ | 5769 9. $c \leftarrow u \cdot 2^k$ \\ |
5770 \hspace{3mm}9.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 5770 10. Return(\textit{MP\_OKAY}). \\ |
5771 \hspace{6mm}9.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | |
5772 10. $c \leftarrow u \cdot 2^k$ \\ | |
5773 11. Return(\textit{MP\_OKAY}). \\ | |
5774 \hline | 5771 \hline |
5775 \end{tabular} | 5772 \end{tabular} |
5776 \end{center} | 5773 \end{center} |
5777 \end{small} | 5774 \end{small} |
5778 \caption{Algorithm mp\_gcd} | 5775 \caption{Algorithm mp\_gcd} |
5780 \textbf{Algorithm mp\_gcd.} | 5777 \textbf{Algorithm mp\_gcd.} |
5781 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of | 5778 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of |
5782 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as | 5779 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as |
5783 Algorithm B and in practice this appears to be true. | 5780 Algorithm B and in practice this appears to be true. |
5784 | 5781 |
5785 The first three steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the | 5782 The first two steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the |
5786 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of | 5783 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of |
5787 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. | 5784 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. |
5788 | 5785 |
5789 Step six will divide out any common factors of two and keep track of the count in the variable $k$. After this step two is no longer a | 5786 Step five will divide out any common factors of two and keep track of the count in the variable $k$. After this step, two is no longer a |
5790 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step | 5787 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step |
5791 seven and eight ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while loops will iterate since | 5788 six and seven ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while--loops will iterate since |
5792 they cannot both be even. | 5789 they cannot both be even. |
5793 | 5790 |
5794 By step nine both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to | 5791 By step eight both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to |
5795 or greater than $u$. This ensures that the subtraction on step 9.2 will always produce a positive and even result. Step 9.3 removes any | 5792 or greater than $u$. This ensures that the subtraction on step 8.2 will always produce a positive and even result. Step 8.3 removes any |
5796 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. | 5793 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. |
5797 | 5794 |
5798 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result | 5795 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result |
5799 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. | 5796 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. |
5800 | 5797 |
5801 EXAM,bn_mp_gcd.c | 5798 EXAM,bn_mp_gcd.c |
5802 | 5799 |
5803 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the | 5800 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the |
5804 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise | 5801 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise |
5805 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three | 5802 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three |
5806 trivial cases of inputs are handled on lines @25,zero@ through @34,}@. After those lines the inputs are assumed to be non-zero. | 5803 trivial cases of inputs are handled on lines @23,zero@ through @29,}@. After those lines the inputs are assumed to be non-zero. |
5807 | 5804 |
5808 Lines @36,if@ and @40,if@ make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two | 5805 Lines @32,if@ and @36,if@ make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two |
5809 must be divided out of the two inputs. The while loop on line @49,while@ iterates so long as both are even. The local integer $k$ is used to | 5806 must be divided out of the two inputs. The block starting at line @43,common@ removes common factors of two by first counting the number of trailing |
5810 keep track of how many factors of $2$ are pulled out of both values. It is assumed that the number of factors will not exceed the maximum | 5807 zero bits in both. The local integer $k$ is used to keep track of how many factors of $2$ are pulled out of both values. It is assumed that |
5811 value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than entries than are accessible by an ``int'' so this is not | 5808 the number of factors will not exceed the maximum value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than |
5812 a limitation.}. | 5809 entries than are accessible by an ``int'' so this is not a limitation.}. |
5813 | 5810 |
5814 At this point there are no more common factors of two in the two values. The while loops on lines @60,while@ and @65,while@ remove any independent | 5811 At this point there are no more common factors of two in the two values. The divisions by a power of two on lines @60,div_2d@ and @67,div_2d@ remove |
5815 factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop | 5812 any independent factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop |
5816 on line @71, while@ performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in | 5813 on line @72, while@ performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in |
5817 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. | 5814 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. |
5818 | 5815 |
5819 \section{Least Common Multiple} | 5816 \section{Least Common Multiple} |
5820 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the | 5817 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the |
5821 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ | 5818 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ |