Mercurial > dropbear
comparison tommath.tex @ 386:97db060d0ef5 libtommath-orig libtommath-0.40
Update to LibTomMath 0.40
author | Matt Johnston <matt@ucc.asn.au> |
---|---|
date | Thu, 11 Jan 2007 03:11:15 +0000 |
parents | 91fbc376f010 |
children |
comparison
equal
deleted
inserted
replaced
282:91fbc376f010 | 386:97db060d0ef5 |
---|---|
64 \end{tabular} | 64 \end{tabular} |
65 %\end{small} | 65 %\end{small} |
66 } | 66 } |
67 } | 67 } |
68 \maketitle | 68 \maketitle |
69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the | 69 This text has been placed in the public domain. This text corresponds to the v0.39 release of the |
70 LibTomMath project. | 70 LibTomMath project. |
71 | 71 |
72 \begin{alltt} | 72 \begin{alltt} |
73 Tom St Denis | 73 Tom St Denis |
74 111 Banning Rd | 74 111 Banning Rd |
75 Ottawa, Ontario | 75 Ottawa, Ontario |
76 K2L 1C3 | 76 K2L 1C3 |
77 Canada | 77 Canada |
78 | 78 |
79 Phone: 1-613-836-3160 | 79 Phone: 1-613-836-3160 |
80 Email: [email protected] | 80 Email: [email protected] |
81 \end{alltt} | 81 \end{alltt} |
82 | 82 |
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} | 83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} |
84 {\em book} macro package and the Perl {\em booker} package. | 84 {\em book} macro package and the Perl {\em booker} package. |
85 | 85 |
266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers | 266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers |
267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve | 267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve |
268 any form of useful performance in non-trivial applications. | 268 any form of useful performance in non-trivial applications. |
269 | 269 |
270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer | 270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer |
271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used | 271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.com}} package is used |
272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field | 272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field |
273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text | 273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text |
274 discusses a very large portion of the inner workings of the library. | 274 discusses a very large portion of the inner workings of the library. |
275 | 275 |
276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed | 276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed |
812 037 a->sign = MP_ZPOS; | 812 037 a->sign = MP_ZPOS; |
813 038 | 813 038 |
814 039 return MP_OKAY; | 814 039 return MP_OKAY; |
815 040 \} | 815 040 \} |
816 041 #endif | 816 041 #endif |
817 042 | |
817 \end{alltt} | 818 \end{alltt} |
818 \end{small} | 819 \end{small} |
819 | 820 |
820 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It | 821 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It |
821 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The | 822 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The |
900 035 a->alloc = a->used = 0; | 901 035 a->alloc = a->used = 0; |
901 036 a->sign = MP_ZPOS; | 902 036 a->sign = MP_ZPOS; |
902 037 \} | 903 037 \} |
903 038 \} | 904 038 \} |
904 039 #endif | 905 039 #endif |
906 040 | |
905 \end{alltt} | 907 \end{alltt} |
906 \end{small} | 908 \end{small} |
907 | 909 |
908 The algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line 24) | 910 The algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line 24) |
909 checks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be | 911 checks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be |
1006 048 \} | 1008 048 \} |
1007 049 \} | 1009 049 \} |
1008 050 return MP_OKAY; | 1010 050 return MP_OKAY; |
1009 051 \} | 1011 051 \} |
1010 052 #endif | 1012 052 #endif |
1013 053 | |
1011 \end{alltt} | 1014 \end{alltt} |
1012 \end{small} | 1015 \end{small} |
1013 | 1016 |
1014 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 24) checks | 1017 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 24) checks |
1015 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} | 1018 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} |
1094 039 \} | 1097 039 \} |
1095 040 | 1098 040 |
1096 041 return MP_OKAY; | 1099 041 return MP_OKAY; |
1097 042 \} | 1100 042 \} |
1098 043 #endif | 1101 043 #endif |
1102 044 | |
1099 \end{alltt} | 1103 \end{alltt} |
1100 \end{small} | 1104 \end{small} |
1101 | 1105 |
1102 The number of digits $b$ requested is padded (line 23) by first augmenting it to the next multiple of | 1106 The number of digits $b$ requested is padded (line 23) by first augmenting it to the next multiple of |
1103 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the | 1107 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the |
1181 050 va_end(args); | 1185 050 va_end(args); |
1182 051 return res; /* Assumed ok, if error flagged above. */ | 1186 051 return res; /* Assumed ok, if error flagged above. */ |
1183 052 \} | 1187 052 \} |
1184 053 | 1188 053 |
1185 054 #endif | 1189 054 #endif |
1190 055 | |
1186 \end{alltt} | 1191 \end{alltt} |
1187 \end{small} | 1192 \end{small} |
1188 | 1193 |
1189 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int | 1194 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int |
1190 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the | 1195 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the |
1266 035 if (a->used == 0) \{ | 1271 035 if (a->used == 0) \{ |
1267 036 a->sign = MP_ZPOS; | 1272 036 a->sign = MP_ZPOS; |
1268 037 \} | 1273 037 \} |
1269 038 \} | 1274 038 \} |
1270 039 #endif | 1275 039 #endif |
1276 040 | |
1271 \end{alltt} | 1277 \end{alltt} |
1272 \end{small} | 1278 \end{small} |
1273 | 1279 |
1274 Note on line 27 how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming | 1280 Note on line 27 how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming |
1275 language the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is | 1281 language the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is |
1403 059 b->used = a->used; | 1409 059 b->used = a->used; |
1404 060 b->sign = a->sign; | 1410 060 b->sign = a->sign; |
1405 061 return MP_OKAY; | 1411 061 return MP_OKAY; |
1406 062 \} | 1412 062 \} |
1407 063 #endif | 1413 063 #endif |
1414 064 | |
1408 \end{alltt} | 1415 \end{alltt} |
1409 \end{small} | 1416 \end{small} |
1410 | 1417 |
1411 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output | 1418 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output |
1412 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without | 1419 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without |
1517 023 return res; | 1524 023 return res; |
1518 024 \} | 1525 024 \} |
1519 025 return mp_copy (b, a); | 1526 025 return mp_copy (b, a); |
1520 026 \} | 1527 026 \} |
1521 027 #endif | 1528 027 #endif |
1529 028 | |
1522 \end{alltt} | 1530 \end{alltt} |
1523 \end{small} | 1531 \end{small} |
1524 | 1532 |
1525 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that | 1533 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that |
1526 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call | 1534 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call |
1568 027 for (n = 0; n < a->alloc; n++) \{ | 1576 027 for (n = 0; n < a->alloc; n++) \{ |
1569 028 *tmp++ = 0; | 1577 028 *tmp++ = 0; |
1570 029 \} | 1578 029 \} |
1571 030 \} | 1579 030 \} |
1572 031 #endif | 1580 031 #endif |
1581 032 | |
1573 \end{alltt} | 1582 \end{alltt} |
1574 \end{small} | 1583 \end{small} |
1575 | 1584 |
1576 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the | 1585 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the |
1577 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. | 1586 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. |
1629 034 b->sign = MP_ZPOS; | 1638 034 b->sign = MP_ZPOS; |
1630 035 | 1639 035 |
1631 036 return MP_OKAY; | 1640 036 return MP_OKAY; |
1632 037 \} | 1641 037 \} |
1633 038 #endif | 1642 038 #endif |
1643 039 | |
1634 \end{alltt} | 1644 \end{alltt} |
1635 \end{small} | 1645 \end{small} |
1636 | 1646 |
1637 This fairly trivial algorithm first eliminates non--required duplications (line 27) and then sets the | 1647 This fairly trivial algorithm first eliminates non--required duplications (line 27) and then sets the |
1638 \textbf{sign} flag to \textbf{MP\_ZPOS}. | 1648 \textbf{sign} flag to \textbf{MP\_ZPOS}. |
1690 031 \} | 1700 031 \} |
1691 032 | 1701 032 |
1692 033 return MP_OKAY; | 1702 033 return MP_OKAY; |
1693 034 \} | 1703 034 \} |
1694 035 #endif | 1704 035 #endif |
1705 036 | |
1695 \end{alltt} | 1706 \end{alltt} |
1696 \end{small} | 1707 \end{small} |
1697 | 1708 |
1698 Like mp\_abs() this function avoids non--required duplications (line 21) and then sets the sign. We | 1709 Like mp\_abs() this function avoids non--required duplications (line 21) and then sets the sign. We |
1699 have to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero | 1710 have to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero |
1737 020 mp_zero (a); | 1748 020 mp_zero (a); |
1738 021 a->dp[0] = b & MP_MASK; | 1749 021 a->dp[0] = b & MP_MASK; |
1739 022 a->used = (a->dp[0] != 0) ? 1 : 0; | 1750 022 a->used = (a->dp[0] != 0) ? 1 : 0; |
1740 023 \} | 1751 023 \} |
1741 024 #endif | 1752 024 #endif |
1753 025 | |
1742 \end{alltt} | 1754 \end{alltt} |
1743 \end{small} | 1755 \end{small} |
1744 | 1756 |
1745 First we zero (line 20) the mp\_int to make sure that the other members are initialized for a | 1757 First we zero (line 20) the mp\_int to make sure that the other members are initialized for a |
1746 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count | 1758 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count |
1817 039 \} | 1829 039 \} |
1818 040 mp_clamp (a); | 1830 040 mp_clamp (a); |
1819 041 return MP_OKAY; | 1831 041 return MP_OKAY; |
1820 042 \} | 1832 042 \} |
1821 043 #endif | 1833 043 #endif |
1834 044 | |
1822 \end{alltt} | 1835 \end{alltt} |
1823 \end{small} | 1836 \end{small} |
1824 | 1837 |
1825 This function sets four bits of the number at a time to handle all practical \textbf{DIGIT\_BIT} sizes. The weird | 1838 This function sets four bits of the number at a time to handle all practical \textbf{DIGIT\_BIT} sizes. The weird |
1826 addition on line 38 ensures that the newly added in bits are added to the number of digits. While it may not | 1839 addition on line 38 ensures that the newly added in bits are added to the number of digits. While it may not |
1919 046 \} | 1932 046 \} |
1920 047 \} | 1933 047 \} |
1921 048 return MP_EQ; | 1934 048 return MP_EQ; |
1922 049 \} | 1935 049 \} |
1923 050 #endif | 1936 050 #endif |
1937 051 | |
1924 \end{alltt} | 1938 \end{alltt} |
1925 \end{small} | 1939 \end{small} |
1926 | 1940 |
1927 The two if statements (lines 24 and 28) compare the number of digits in the two inputs. These two are | 1941 The two if statements (lines 24 and 28) compare the number of digits in the two inputs. These two are |
1928 performed before all of the digits are compared since it is a very cheap test to perform and can potentially save | 1942 performed before all of the digits are compared since it is a very cheap test to perform and can potentially save |
1985 034 \} else \{ | 1999 034 \} else \{ |
1986 035 return mp_cmp_mag(a, b); | 2000 035 return mp_cmp_mag(a, b); |
1987 036 \} | 2001 036 \} |
1988 037 \} | 2002 037 \} |
1989 038 #endif | 2003 038 #endif |
2004 039 | |
1990 \end{alltt} | 2005 \end{alltt} |
1991 \end{small} | 2006 \end{small} |
1992 | 2007 |
1993 The two if statements (lines 22 and 23) perform the initial sign comparison. If the signs are not the equal then which ever | 2008 The two if statements (lines 22 and 23) perform the initial sign comparison. If the signs are not the equal then which ever |
1994 has the positive sign is larger. The inputs are compared (line 31) based on magnitudes. If the signs were both | 2009 has the positive sign is larger. The inputs are compared (line 31) based on magnitudes. If the signs were both |
2203 100 | 2218 100 |
2204 101 mp_clamp (c); | 2219 101 mp_clamp (c); |
2205 102 return MP_OKAY; | 2220 102 return MP_OKAY; |
2206 103 \} | 2221 103 \} |
2207 104 #endif | 2222 104 #endif |
2223 105 | |
2208 \end{alltt} | 2224 \end{alltt} |
2209 \end{small} | 2225 \end{small} |
2210 | 2226 |
2211 We first sort (lines 27 to 35) the inputs based on magnitude and determine the $min$ and $max$ variables. | 2227 We first sort (lines 27 to 35) the inputs based on magnitude and determine the $min$ and $max$ variables. |
2212 Note that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we | 2228 Note that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we |
2374 080 mp_clamp (c); | 2390 080 mp_clamp (c); |
2375 081 return MP_OKAY; | 2391 081 return MP_OKAY; |
2376 082 \} | 2392 082 \} |
2377 083 | 2393 083 |
2378 084 #endif | 2394 084 #endif |
2395 085 | |
2379 \end{alltt} | 2396 \end{alltt} |
2380 \end{small} | 2397 \end{small} |
2381 | 2398 |
2382 Like low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded | 2399 Like low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded |
2383 (lines 24 and 25). In reality the $min$ and $max$ variables are only aliases and are only | 2400 (lines 24 and 25). In reality the $min$ and $max$ variables are only aliases and are only |
2509 044 \} | 2526 044 \} |
2510 045 return res; | 2527 045 return res; |
2511 046 \} | 2528 046 \} |
2512 047 | 2529 047 |
2513 048 #endif | 2530 048 #endif |
2531 049 | |
2514 \end{alltt} | 2532 \end{alltt} |
2515 \end{small} | 2533 \end{small} |
2516 | 2534 |
2517 The source code follows the algorithm fairly closely. The most notable new source code addition is the usage of the $res$ integer variable which | 2535 The source code follows the algorithm fairly closely. The most notable new source code addition is the usage of the $res$ integer variable which |
2518 is used to pass result of the unsigned operations forward. Unlike in the algorithm, the variable $res$ is merely returned as is without | 2536 is used to pass result of the unsigned operations forward. Unlike in the algorithm, the variable $res$ is merely returned as is without |
2621 050 \} | 2639 050 \} |
2622 051 return res; | 2640 051 return res; |
2623 052 \} | 2641 052 \} |
2624 053 | 2642 053 |
2625 054 #endif | 2643 054 #endif |
2644 055 | |
2626 \end{alltt} | 2645 \end{alltt} |
2627 \end{small} | 2646 \end{small} |
2628 | 2647 |
2629 Much like the implementation of algorithm mp\_add the variable $res$ is used to catch the return code of the unsigned addition or subtraction operations | 2648 Much like the implementation of algorithm mp\_add the variable $res$ is used to catch the return code of the unsigned addition or subtraction operations |
2630 and forward it to the end of the function. On line 38 the ``not equal to'' \textbf{MP\_LT} expression is used to emulate a | 2649 and forward it to the end of the function. On line 38 the ``not equal to'' \textbf{MP\_LT} expression is used to emulate a |
2755 073 \} | 2774 073 \} |
2756 074 b->sign = a->sign; | 2775 074 b->sign = a->sign; |
2757 075 return MP_OKAY; | 2776 075 return MP_OKAY; |
2758 076 \} | 2777 076 \} |
2759 077 #endif | 2778 077 #endif |
2779 078 | |
2760 \end{alltt} | 2780 \end{alltt} |
2761 \end{small} | 2781 \end{small} |
2762 | 2782 |
2763 This implementation is essentially an optimized implementation of s\_mp\_add for the case of doubling an input. The only noteworthy difference | 2783 This implementation is essentially an optimized implementation of s\_mp\_add for the case of doubling an input. The only noteworthy difference |
2764 is the use of the logical shift operator on line 51 to perform a single precision doubling. | 2784 is the use of the logical shift operator on line 51 to perform a single precision doubling. |
2855 059 b->sign = a->sign; | 2875 059 b->sign = a->sign; |
2856 060 mp_clamp (b); | 2876 060 mp_clamp (b); |
2857 061 return MP_OKAY; | 2877 061 return MP_OKAY; |
2858 062 \} | 2878 062 \} |
2859 063 #endif | 2879 063 #endif |
2880 064 | |
2860 \end{alltt} | 2881 \end{alltt} |
2861 \end{small} | 2882 \end{small} |
2862 | 2883 |
2863 \section{Polynomial Basis Operations} | 2884 \section{Polynomial Basis Operations} |
2864 Recall from section 4.3 that any integer can be represented as a polynomial in $x$ as $y = f(\beta)$. Such a representation is also known as | 2885 Recall from section 4.3 that any integer can be represented as a polynomial in $x$ as $y = f(\beta)$. Such a representation is also known as |
2975 058 \} | 2996 058 \} |
2976 059 \} | 2997 059 \} |
2977 060 return MP_OKAY; | 2998 060 return MP_OKAY; |
2978 061 \} | 2999 061 \} |
2979 062 #endif | 3000 062 #endif |
3001 063 | |
2980 \end{alltt} | 3002 \end{alltt} |
2981 \end{small} | 3003 \end{small} |
2982 | 3004 |
2983 The if statement (line 23) ensures that the $b$ variable is greater than zero since we do not interpret negative | 3005 The if statement (line 23) ensures that the $b$ variable is greater than zero since we do not interpret negative |
2984 shift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates | 3006 shift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates |
3086 063 | 3108 063 |
3087 064 /* remove excess digits */ | 3109 064 /* remove excess digits */ |
3088 065 a->used -= b; | 3110 065 a->used -= b; |
3089 066 \} | 3111 066 \} |
3090 067 #endif | 3112 067 #endif |
3113 068 | |
3091 \end{alltt} | 3114 \end{alltt} |
3092 \end{small} | 3115 \end{small} |
3093 | 3116 |
3094 The only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we | 3117 The only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we |
3095 form a sliding window except we copy in the other direction. After the window (line 59) we then zero | 3118 form a sliding window except we copy in the other direction. After the window (line 59) we then zero |
3144 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ | 3167 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ |
3145 left. | 3168 left. |
3146 | 3169 |
3147 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts | 3170 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts |
3148 required. If it is non-zero a modified shift loop is used to calculate the remaining product. | 3171 required. If it is non-zero a modified shift loop is used to calculate the remaining product. |
3149 Essentially the loop is a generic version of algorith mp\_mul2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ | 3172 Essentially the loop is a generic version of algorithm mp\_mul\_2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ |
3150 variable is used to extract the upper $d$ bits to form the carry for the next iteration. | 3173 variable is used to extract the upper $d$ bits to form the carry for the next iteration. |
3151 | 3174 |
3152 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to | 3175 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to |
3153 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. | 3176 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. |
3154 | 3177 |
3219 076 \} | 3242 076 \} |
3220 077 mp_clamp (c); | 3243 077 mp_clamp (c); |
3221 078 return MP_OKAY; | 3244 078 return MP_OKAY; |
3222 079 \} | 3245 079 \} |
3223 080 #endif | 3246 080 #endif |
3247 081 | |
3224 \end{alltt} | 3248 \end{alltt} |
3225 \end{small} | 3249 \end{small} |
3226 | 3250 |
3227 The shifting is performed in--place which means the first step (line 24) is to copy the input to the | 3251 The shifting is performed in--place which means the first step (line 24) is to copy the input to the |
3228 destination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then | 3252 destination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then |
3355 088 \} | 3379 088 \} |
3356 089 mp_clear (&t); | 3380 089 mp_clear (&t); |
3357 090 return MP_OKAY; | 3381 090 return MP_OKAY; |
3358 091 \} | 3382 091 \} |
3359 092 #endif | 3383 092 #endif |
3384 093 | |
3360 \end{alltt} | 3385 \end{alltt} |
3361 \end{small} | 3386 \end{small} |
3362 | 3387 |
3363 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally | 3388 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally |
3364 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the | 3389 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the |
3446 t) 1)); | 3471 t) 1)); |
3447 047 mp_clamp (c); | 3472 047 mp_clamp (c); |
3448 048 return MP_OKAY; | 3473 048 return MP_OKAY; |
3449 049 \} | 3474 049 \} |
3450 050 #endif | 3475 050 #endif |
3476 051 | |
3451 \end{alltt} | 3477 \end{alltt} |
3452 \end{small} | 3478 \end{small} |
3453 | 3479 |
3454 We first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger | 3480 We first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger |
3455 than the input we just mp\_copy() the input and return right away. After this point we know we must actually | 3481 than the input we just mp\_copy() the input and return right away. After this point we know we must actually |
3685 081 | 3711 081 |
3686 082 mp_clear (&t); | 3712 082 mp_clear (&t); |
3687 083 return MP_OKAY; | 3713 083 return MP_OKAY; |
3688 084 \} | 3714 084 \} |
3689 085 #endif | 3715 085 #endif |
3716 086 | |
3690 \end{alltt} | 3717 \end{alltt} |
3691 \end{small} | 3718 \end{small} |
3692 | 3719 |
3693 First we determine (line 30) if the Comba method can be used first since it's faster. The conditions for | 3720 First we determine (line 30) if the Comba method can be used first since it's faster. The conditions for |
3694 sing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than | 3721 sing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than |
3835 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ | 3862 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ |
3836 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ | 3863 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ |
3837 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ | 3864 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ |
3838 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ | 3865 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ |
3839 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ | 3866 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ |
3840 6. $W_{pa} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\ | |
3841 \\ | 3867 \\ |
3842 7. $oldused \leftarrow c.used$ \\ | 3868 6. $oldused \leftarrow c.used$ \\ |
3843 8. $c.used \leftarrow digs$ \\ | 3869 7. $c.used \leftarrow digs$ \\ |
3844 9. for $ix$ from $0$ to $pa$ do \\ | 3870 8. for $ix$ from $0$ to $pa$ do \\ |
3845 \hspace{3mm}9.1 $c_{ix} \leftarrow W_{ix}$ \\ | 3871 \hspace{3mm}8.1 $c_{ix} \leftarrow W_{ix}$ \\ |
3846 10. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ | 3872 9. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ |
3847 \hspace{3mm}10.1 $c_{ix} \leftarrow 0$ \\ | 3873 \hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\ |
3848 \\ | 3874 \\ |
3849 11. Clamp $c$. \\ | 3875 10. Clamp $c$. \\ |
3850 12. Return MP\_OKAY. \\ | 3876 11. Return MP\_OKAY. \\ |
3851 \hline | 3877 \hline |
3852 \end{tabular} | 3878 \end{tabular} |
3853 \end{center} | 3879 \end{center} |
3854 \end{small} | 3880 \end{small} |
3855 \caption{Algorithm fast\_s\_mp\_mul\_digs} | 3881 \caption{Algorithm fast\_s\_mp\_mul\_digs} |
3940 067 iy = MIN(a->used-tx, ty+1); | 3966 067 iy = MIN(a->used-tx, ty+1); |
3941 068 | 3967 068 |
3942 069 /* execute loop */ | 3968 069 /* execute loop */ |
3943 070 for (iz = 0; iz < iy; ++iz) \{ | 3969 070 for (iz = 0; iz < iy; ++iz) \{ |
3944 071 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); | 3970 071 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); |
3945 072 \} | 3971 072 |
3946 073 | 3972 073 \} |
3947 074 /* store term */ | 3973 074 |
3948 075 W[ix] = ((mp_digit)_W) & MP_MASK; | 3974 075 /* store term */ |
3949 076 | 3975 076 W[ix] = ((mp_digit)_W) & MP_MASK; |
3950 077 /* make next carry */ | 3976 077 |
3951 078 _W = _W >> ((mp_word)DIGIT_BIT); | 3977 078 /* make next carry */ |
3952 079 \} | 3978 079 _W = _W >> ((mp_word)DIGIT_BIT); |
3953 080 | 3979 080 \} |
3954 081 /* store final carry */ | 3980 081 |
3955 082 W[ix] = (mp_digit)(_W & MP_MASK); | 3981 082 /* setup dest */ |
3956 083 | 3982 083 olduse = c->used; |
3957 084 /* setup dest */ | 3983 084 c->used = pa; |
3958 085 olduse = c->used; | 3984 085 |
3959 086 c->used = pa; | 3985 086 \{ |
3960 087 | 3986 087 register mp_digit *tmpc; |
3961 088 \{ | 3987 088 tmpc = c->dp; |
3962 089 register mp_digit *tmpc; | 3988 089 for (ix = 0; ix < pa+1; ix++) \{ |
3963 090 tmpc = c->dp; | 3989 090 /* now extract the previous digit [below the carry] */ |
3964 091 for (ix = 0; ix < pa+1; ix++) \{ | 3990 091 *tmpc++ = W[ix]; |
3965 092 /* now extract the previous digit [below the carry] */ | 3991 092 \} |
3966 093 *tmpc++ = W[ix]; | 3992 093 |
3967 094 \} | 3993 094 /* clear unused digits [that existed in the old copy of c] */ |
3968 095 | 3994 095 for (; ix < olduse; ix++) \{ |
3969 096 /* clear unused digits [that existed in the old copy of c] */ | 3995 096 *tmpc++ = 0; |
3970 097 for (; ix < olduse; ix++) \{ | 3996 097 \} |
3971 098 *tmpc++ = 0; | 3997 098 \} |
3972 099 \} | 3998 099 mp_clamp (c); |
3973 100 \} | 3999 100 return MP_OKAY; |
3974 101 mp_clamp (c); | 4000 101 \} |
3975 102 return MP_OKAY; | 4001 102 #endif |
3976 103 \} | 4002 103 |
3977 104 #endif | |
3978 \end{alltt} | 4003 \end{alltt} |
3979 \end{small} | 4004 \end{small} |
3980 | 4005 |
3981 As per the pseudo--code we first calculate $pa$ (line 47) as the number of digits to output. Next we begin the outer loop | 4006 As per the pseudo--code we first calculate $pa$ (line 47) as the number of digits to output. Next we begin the outer loop |
3982 to produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines 61, 62) to point | 4007 to produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines 61, 62) to point |
3983 inside the two multiplicands quickly. | 4008 inside the two multiplicands quickly. |
3984 | 4009 |
3985 The inner loop (lines 70 to 72) of this implementation is where the tradeoff come into play. Originally this comba | 4010 The inner loop (lines 70 to 73) of this implementation is where the tradeoff come into play. Originally this comba |
3986 implementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix | 4011 implementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix |
3987 the carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write | 4012 the carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write |
3988 one mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth | 4013 one mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth |
3989 is very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often | 4014 is very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often |
3990 slower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the | 4015 slower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the |
3991 compiler has aliased $\_ \hat W$ to a CPU register. | 4016 compiler has aliased $\_ \hat W$ to a CPU register. |
3992 | 4017 |
3993 After the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines 75, 78) to forward it as | 4018 After the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines 76, 79) to forward it as |
3994 a carry for the next pass. After the outer loop we use the final carry (line 82) as the last digit of the product. | 4019 a carry for the next pass. After the outer loop we use the final carry (line 76) as the last digit of the product. |
3995 | 4020 |
3996 \subsection{Polynomial Basis Multiplication} | 4021 \subsection{Polynomial Basis Multiplication} |
3997 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms | 4022 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms |
3998 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and | 4023 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and |
3999 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. | 4024 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. |
4093 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for | 4118 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for |
4094 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with | 4119 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with |
4095 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. | 4120 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. |
4096 | 4121 |
4097 \begin{equation} | 4122 \begin{equation} |
4098 f(x) \cdot g(x) = acx^2 + ((a - b)(c - d) - (ac + bd))x + bd | 4123 f(x) \cdot g(x) = acx^2 + ((a + b)(c + d) - (ac + bd))x + bd |
4099 \end{equation} | 4124 \end{equation} |
4100 | 4125 |
4101 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying | 4126 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying |
4102 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns | 4127 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns |
4103 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points | 4128 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points |
4104 $\zeta_0$, $\zeta_{\infty}$ and $-\zeta_{-1}$. Consider the resultant system of equations. | 4129 $\zeta_0$, $\zeta_{\infty}$ and $\zeta_{1}$. Consider the resultant system of equations. |
4105 | 4130 |
4106 \begin{center} | 4131 \begin{center} |
4107 \begin{tabular}{rcrcrcrc} | 4132 \begin{tabular}{rcrcrcrc} |
4108 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ | 4133 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ |
4109 $-\zeta_{-1}$ & $=$ & $-w_2$ & $+$ & $w_1$ & $-$ & $w_0$ \\ | 4134 $\zeta_{1}$ & $=$ & $w_2$ & $+$ & $w_1$ & $+$ & $w_0$ \\ |
4110 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ | 4135 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ |
4111 \end{tabular} | 4136 \end{tabular} |
4112 \end{center} | 4137 \end{center} |
4113 | 4138 |
4114 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity | 4139 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity |
4115 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} | 4140 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} |
4116 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. It is worth noting that the point | 4141 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. |
4117 $\zeta_1$ could be substituted for $-\zeta_{-1}$. In this case the first and third row are subtracted instead of added to the second row. | |
4118 | 4142 |
4119 \newpage\begin{figure}[!here] | 4143 \newpage\begin{figure}[!here] |
4120 \begin{small} | 4144 \begin{small} |
4121 \begin{center} | 4145 \begin{center} |
4122 \begin{tabular}{l} | 4146 \begin{tabular}{l} |
4135 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ | 4159 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ |
4136 \\ | 4160 \\ |
4137 Calculate the three products. \\ | 4161 Calculate the three products. \\ |
4138 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ | 4162 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ |
4139 9. $x1y1 \leftarrow x1 \cdot y1$ \\ | 4163 9. $x1y1 \leftarrow x1 \cdot y1$ \\ |
4140 10. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ | 4164 10. $t1 \leftarrow x1 + x0$ (\textit{mp\_add}) \\ |
4141 11. $x0 \leftarrow y1 - y0$ \\ | 4165 11. $x0 \leftarrow y1 + y0$ \\ |
4142 12. $t1 \leftarrow t1 \cdot x0$ \\ | 4166 12. $t1 \leftarrow t1 \cdot x0$ \\ |
4143 \\ | 4167 \\ |
4144 Calculate the middle term. \\ | 4168 Calculate the middle term. \\ |
4145 13. $x0 \leftarrow x0y0 + x1y1$ \\ | 4169 13. $x0 \leftarrow x0y0 + x1y1$ \\ |
4146 14. $t1 \leftarrow x0 - t1$ \\ | 4170 14. $t1 \leftarrow t1 - x0$ (\textit{s\_mp\_sub}) \\ |
4147 \\ | 4171 \\ |
4148 Calculate the final product. \\ | 4172 Calculate the final product. \\ |
4149 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ | 4173 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ |
4150 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ | 4174 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ |
4151 17. $t1 \leftarrow x0y0 + t1$ \\ | 4175 17. $t1 \leftarrow x0y0 + t1$ \\ |
4168 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the | 4192 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the |
4169 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 | 4193 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 |
4170 compute the lower halves. Step 6 and 7 computer the upper halves. | 4194 compute the lower halves. Step 6 and 7 computer the upper halves. |
4171 | 4195 |
4172 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products | 4196 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products |
4173 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 - x0$ has been computed. By using $x0$ instead | 4197 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 + x0$ has been computed. By using $x0$ instead |
4174 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. | 4198 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. |
4175 | 4199 |
4176 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. | 4200 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. |
4177 | 4201 |
4178 \vspace{+3mm}\begin{small} | 4202 \vspace{+3mm}\begin{small} |
4189 023 * | 4213 023 * |
4190 024 * a = a1 * B**n + a0 | 4214 024 * a = a1 * B**n + a0 |
4191 025 * b = b1 * B**n + b0 | 4215 025 * b = b1 * B**n + b0 |
4192 026 * | 4216 026 * |
4193 027 * Then, a * b => | 4217 027 * Then, a * b => |
4194 028 a1b1 * B**2n + ((a1 - a0)(b1 - b0) + a0b0 + a1b1) * B + a0b0 | 4218 028 a1b1 * B**2n + ((a1 + a0)(b1 + b0) - (a0b0 + a1b1)) * B + a0b0 |
4195 029 * | 4219 029 * |
4196 030 * Note that a1b1 and a0b0 are used twice and only need to be | 4220 030 * Note that a1b1 and a0b0 are used twice and only need to be |
4197 031 * computed once. So in total three half size (half # of | 4221 031 * computed once. So in total three half size (half # of |
4198 032 * digit) multiplications are performed, a0b0, a1b1 and | 4222 032 * digit) multiplications are performed, a0b0, a1b1 and |
4199 033 * (a1-b1)(a0-b0) | 4223 033 * (a1+b1)(a0+b0) |
4200 034 * | 4224 034 * |
4201 035 * Note that a multiplication of half the digits requires | 4225 035 * Note that a multiplication of half the digits requires |
4202 036 * 1/4th the number of single precision multiplications so in | 4226 036 * 1/4th the number of single precision multiplications so in |
4203 037 * total after one call 25% of the single precision multiplications | 4227 037 * total after one call 25% of the single precision multiplications |
4204 038 * are saved. Note also that the call to mp_mul can end up back | 4228 038 * are saved. Note also that the call to mp_mul can end up back |
4285 119 if (mp_mul (&x0, &y0, &x0y0) != MP_OKAY) | 4309 119 if (mp_mul (&x0, &y0, &x0y0) != MP_OKAY) |
4286 120 goto X1Y1; /* x0y0 = x0*y0 */ | 4310 120 goto X1Y1; /* x0y0 = x0*y0 */ |
4287 121 if (mp_mul (&x1, &y1, &x1y1) != MP_OKAY) | 4311 121 if (mp_mul (&x1, &y1, &x1y1) != MP_OKAY) |
4288 122 goto X1Y1; /* x1y1 = x1*y1 */ | 4312 122 goto X1Y1; /* x1y1 = x1*y1 */ |
4289 123 | 4313 123 |
4290 124 /* now calc x1-x0 and y1-y0 */ | 4314 124 /* now calc x1+x0 and y1+y0 */ |
4291 125 if (mp_sub (&x1, &x0, &t1) != MP_OKAY) | 4315 125 if (s_mp_add (&x1, &x0, &t1) != MP_OKAY) |
4292 126 goto X1Y1; /* t1 = x1 - x0 */ | 4316 126 goto X1Y1; /* t1 = x1 - x0 */ |
4293 127 if (mp_sub (&y1, &y0, &x0) != MP_OKAY) | 4317 127 if (s_mp_add (&y1, &y0, &x0) != MP_OKAY) |
4294 128 goto X1Y1; /* t2 = y1 - y0 */ | 4318 128 goto X1Y1; /* t2 = y1 - y0 */ |
4295 129 if (mp_mul (&t1, &x0, &t1) != MP_OKAY) | 4319 129 if (mp_mul (&t1, &x0, &t1) != MP_OKAY) |
4296 130 goto X1Y1; /* t1 = (x1 - x0) * (y1 - y0) */ | 4320 130 goto X1Y1; /* t1 = (x1 + x0) * (y1 + y0) */ |
4297 131 | 4321 131 |
4298 132 /* add x0y0 */ | 4322 132 /* add x0y0 */ |
4299 133 if (mp_add (&x0y0, &x1y1, &x0) != MP_OKAY) | 4323 133 if (mp_add (&x0y0, &x1y1, &x0) != MP_OKAY) |
4300 134 goto X1Y1; /* t2 = x0y0 + x1y1 */ | 4324 134 goto X1Y1; /* t2 = x0y0 + x1y1 */ |
4301 135 if (mp_sub (&x0, &t1, &t1) != MP_OKAY) | 4325 135 if (s_mp_sub (&t1, &x0, &t1) != MP_OKAY) |
4302 136 goto X1Y1; /* t1 = x0y0 + x1y1 - (x1-x0)*(y1-y0) */ | 4326 136 goto X1Y1; /* t1 = (x1+x0)*(y1+y0) - (x1y1 + x0y0) */ |
4303 137 | 4327 137 |
4304 138 /* shift by B */ | 4328 138 /* shift by B */ |
4305 139 if (mp_lshd (&t1, B) != MP_OKAY) | 4329 139 if (mp_lshd (&t1, B) != MP_OKAY) |
4306 140 goto X1Y1; /* t1 = (x0y0 + x1y1 - (x1-x0)*(y1-y0))<<B */ | 4330 140 goto X1Y1; /* t1 = (x0y0 + x1y1 - (x1-x0)*(y1-y0))<<B */ |
4307 141 if (mp_lshd (&x1y1, B * 2) != MP_OKAY) | 4331 141 if (mp_lshd (&x1y1, B * 2) != MP_OKAY) |
4324 158 X0:mp_clear (&x0); | 4348 158 X0:mp_clear (&x0); |
4325 159 ERR: | 4349 159 ERR: |
4326 160 return err; | 4350 160 return err; |
4327 161 \} | 4351 161 \} |
4328 162 #endif | 4352 162 #endif |
4353 163 | |
4329 \end{alltt} | 4354 \end{alltt} |
4330 \end{small} | 4355 \end{small} |
4331 | 4356 |
4332 The new coding element in this routine, not seen in previous routines, is the usage of goto statements. The conventional | 4357 The new coding element in this routine, not seen in previous routines, is the usage of goto statements. The conventional |
4333 wisdom is that goto statements should be avoided. This is generally true, however when every single function call can fail, it makes sense | 4358 wisdom is that goto statements should be avoided. This is generally true, however when every single function call can fail, it makes sense |
4727 275 &b2, &tmp1, &tmp2, NULL); | 4752 275 &b2, &tmp1, &tmp2, NULL); |
4728 276 return res; | 4753 276 return res; |
4729 277 \} | 4754 277 \} |
4730 278 | 4755 278 |
4731 279 #endif | 4756 279 #endif |
4757 280 | |
4732 \end{alltt} | 4758 \end{alltt} |
4733 \end{small} | 4759 \end{small} |
4734 | 4760 |
4735 The first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very | 4761 The first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very |
4736 large numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with | 4762 large numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with |
4835 057 \} | 4861 057 \} |
4836 058 c->sign = (c->used > 0) ? neg : MP_ZPOS; | 4862 058 c->sign = (c->used > 0) ? neg : MP_ZPOS; |
4837 059 return res; | 4863 059 return res; |
4838 060 \} | 4864 060 \} |
4839 061 #endif | 4865 061 #endif |
4866 062 | |
4840 \end{alltt} | 4867 \end{alltt} |
4841 \end{small} | 4868 \end{small} |
4842 | 4869 |
4843 The implementation is rather simplistic and is not particularly noteworthy. Line 23 computes the sign of the result using the ``?'' | 4870 The implementation is rather simplistic and is not particularly noteworthy. Line 23 computes the sign of the result using the ``?'' |
4844 operator from the C programming language. Line 47 computes $\delta$ using the fact that $1 << k$ is equal to $2^k$. | 4871 operator from the C programming language. Line 47 computes $\delta$ using the fact that $1 << k$ is equal to $2^k$. |
5004 075 mp_exch (&t, b); | 5031 075 mp_exch (&t, b); |
5005 076 mp_clear (&t); | 5032 076 mp_clear (&t); |
5006 077 return MP_OKAY; | 5033 077 return MP_OKAY; |
5007 078 \} | 5034 078 \} |
5008 079 #endif | 5035 079 #endif |
5036 080 | |
5009 \end{alltt} | 5037 \end{alltt} |
5010 \end{small} | 5038 \end{small} |
5011 | 5039 |
5012 Inside the outer loop (line 33) the square term is calculated on line 36. The carry (line 43) has been | 5040 Inside the outer loop (line 33) the square term is calculated on line 36. The carry (line 43) has been |
5013 extracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized | 5041 extracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized |
5186 105 \} | 5214 105 \} |
5187 106 mp_clamp (b); | 5215 106 mp_clamp (b); |
5188 107 return MP_OKAY; | 5216 107 return MP_OKAY; |
5189 108 \} | 5217 108 \} |
5190 109 #endif | 5218 109 #endif |
5219 110 | |
5191 \end{alltt} | 5220 \end{alltt} |
5192 \end{small} | 5221 \end{small} |
5193 | 5222 |
5194 This implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for | 5223 This implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for |
5195 the special case of squaring. | 5224 the special case of squaring. |
5203 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. | 5232 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. |
5204 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a | 5233 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a |
5205 number with the following equation. | 5234 number with the following equation. |
5206 | 5235 |
5207 \begin{equation} | 5236 \begin{equation} |
5208 h(x) = a^2x^2 + \left (a^2 + b^2 - (a - b)^2 \right )x + b^2 | 5237 h(x) = a^2x^2 + \left ((a + b)^2 - (a^2 + b^2) \right )x + b^2 |
5209 \end{equation} | 5238 \end{equation} |
5210 | 5239 |
5211 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a - b)^2$. As in | 5240 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a + b)^2$. As in |
5212 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of | 5241 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of |
5213 $O \left ( n^{lg(3)} \right )$. | 5242 $O \left ( n^{lg(3)} \right )$. |
5214 | 5243 |
5215 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm | 5244 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm |
5216 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the | 5245 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the |
5238 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ | 5267 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ |
5239 \\ | 5268 \\ |
5240 Calculate the three squares. \\ | 5269 Calculate the three squares. \\ |
5241 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ | 5270 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ |
5242 7. $x1x1 \leftarrow x1^2$ \\ | 5271 7. $x1x1 \leftarrow x1^2$ \\ |
5243 8. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ | 5272 8. $t1 \leftarrow x1 + x0$ (\textit{s\_mp\_add}) \\ |
5244 9. $t1 \leftarrow t1^2$ \\ | 5273 9. $t1 \leftarrow t1^2$ \\ |
5245 \\ | 5274 \\ |
5246 Compute the middle term. \\ | 5275 Compute the middle term. \\ |
5247 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ | 5276 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ |
5248 11. $t1 \leftarrow t2 - t1$ \\ | 5277 11. $t1 \leftarrow t1 - t2$ \\ |
5249 \\ | 5278 \\ |
5250 Compute final product. \\ | 5279 Compute final product. \\ |
5251 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ | 5280 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ |
5252 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ | 5281 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ |
5253 14. $t1 \leftarrow t1 + x0x0$ \\ | 5282 14. $t1 \leftarrow t1 + x0x0$ \\ |
5266 | 5295 |
5267 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is | 5296 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is |
5268 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ | 5297 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ |
5269 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. | 5298 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. |
5270 | 5299 |
5271 By expanding $\left (x1 - x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $x1^2 + x0^2 - (x1 - x0)^2 = 2 \cdot x0 \cdot x1$. | 5300 By expanding $\left (x1 + x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $(x0 - x1)^2 - (x1^2 + x0^2) = 2 \cdot x0 \cdot x1$. |
5272 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then | 5301 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then |
5273 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. | 5302 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. |
5274 | 5303 |
5275 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or | 5304 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or |
5276 machine clock cycles.}. | 5305 machine clock cycles.}. |
5361 077 if (mp_sqr (&x0, &x0x0) != MP_OKAY) | 5390 077 if (mp_sqr (&x0, &x0x0) != MP_OKAY) |
5362 078 goto X1X1; /* x0x0 = x0*x0 */ | 5391 078 goto X1X1; /* x0x0 = x0*x0 */ |
5363 079 if (mp_sqr (&x1, &x1x1) != MP_OKAY) | 5392 079 if (mp_sqr (&x1, &x1x1) != MP_OKAY) |
5364 080 goto X1X1; /* x1x1 = x1*x1 */ | 5393 080 goto X1X1; /* x1x1 = x1*x1 */ |
5365 081 | 5394 081 |
5366 082 /* now calc (x1-x0)**2 */ | 5395 082 /* now calc (x1+x0)**2 */ |
5367 083 if (mp_sub (&x1, &x0, &t1) != MP_OKAY) | 5396 083 if (s_mp_add (&x1, &x0, &t1) != MP_OKAY) |
5368 084 goto X1X1; /* t1 = x1 - x0 */ | 5397 084 goto X1X1; /* t1 = x1 - x0 */ |
5369 085 if (mp_sqr (&t1, &t1) != MP_OKAY) | 5398 085 if (mp_sqr (&t1, &t1) != MP_OKAY) |
5370 086 goto X1X1; /* t1 = (x1 - x0) * (x1 - x0) */ | 5399 086 goto X1X1; /* t1 = (x1 - x0) * (x1 - x0) */ |
5371 087 | 5400 087 |
5372 088 /* add x0y0 */ | 5401 088 /* add x0y0 */ |
5373 089 if (s_mp_add (&x0x0, &x1x1, &t2) != MP_OKAY) | 5402 089 if (s_mp_add (&x0x0, &x1x1, &t2) != MP_OKAY) |
5374 090 goto X1X1; /* t2 = x0x0 + x1x1 */ | 5403 090 goto X1X1; /* t2 = x0x0 + x1x1 */ |
5375 091 if (mp_sub (&t2, &t1, &t1) != MP_OKAY) | 5404 091 if (s_mp_sub (&t1, &t2, &t1) != MP_OKAY) |
5376 092 goto X1X1; /* t1 = x0x0 + x1x1 - (x1-x0)*(x1-x0) */ | 5405 092 goto X1X1; /* t1 = (x1+x0)**2 - (x0x0 + x1x1) */ |
5377 093 | 5406 093 |
5378 094 /* shift by B */ | 5407 094 /* shift by B */ |
5379 095 if (mp_lshd (&t1, B) != MP_OKAY) | 5408 095 if (mp_lshd (&t1, B) != MP_OKAY) |
5380 096 goto X1X1; /* t1 = (x0x0 + x1x1 - (x1-x0)*(x1-x0))<<B */ | 5409 096 goto X1X1; /* t1 = (x0x0 + x1x1 - (x1-x0)*(x1-x0))<<B */ |
5381 097 if (mp_lshd (&x1x1, B * 2) != MP_OKAY) | 5410 097 if (mp_lshd (&x1x1, B * 2) != MP_OKAY) |
5396 112 X0:mp_clear (&x0); | 5425 112 X0:mp_clear (&x0); |
5397 113 ERR: | 5426 113 ERR: |
5398 114 return err; | 5427 114 return err; |
5399 115 \} | 5428 115 \} |
5400 116 #endif | 5429 116 #endif |
5430 117 | |
5401 \end{alltt} | 5431 \end{alltt} |
5402 \end{small} | 5432 \end{small} |
5403 | 5433 |
5404 This implementation is largely based on the implementation of algorithm mp\_karatsuba\_mul. It uses the same inline style to copy and | 5434 This implementation is largely based on the implementation of algorithm mp\_karatsuba\_mul. It uses the same inline style to copy and |
5405 shift the input into the two halves. The loop from line 53 to line 69 has been modified since only one input exists. The \textbf{used} | 5435 shift the input into the two halves. The loop from line 53 to line 69 has been modified since only one input exists. The \textbf{used} |
5492 049 \} | 5522 049 \} |
5493 050 b->sign = MP_ZPOS; | 5523 050 b->sign = MP_ZPOS; |
5494 051 return res; | 5524 051 return res; |
5495 052 \} | 5525 052 \} |
5496 053 #endif | 5526 053 #endif |
5527 054 | |
5497 \end{alltt} | 5528 \end{alltt} |
5498 \end{small} | 5529 \end{small} |
5499 | 5530 |
5500 \section*{Exercises} | 5531 \section*{Exercises} |
5501 \begin{tabular}{cl} | 5532 \begin{tabular}{cl} |
5825 091 mp_clear (&q); | 5856 091 mp_clear (&q); |
5826 092 | 5857 092 |
5827 093 return res; | 5858 093 return res; |
5828 094 \} | 5859 094 \} |
5829 095 #endif | 5860 095 #endif |
5861 096 | |
5830 \end{alltt} | 5862 \end{alltt} |
5831 \end{small} | 5863 \end{small} |
5832 | 5864 |
5833 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves | 5865 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves |
5834 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits | 5866 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits |
5877 025 return res; | 5909 025 return res; |
5878 026 \} | 5910 026 \} |
5879 027 return mp_div (a, b, a, NULL); | 5911 027 return mp_div (a, b, a, NULL); |
5880 028 \} | 5912 028 \} |
5881 029 #endif | 5913 029 #endif |
5914 030 | |
5882 \end{alltt} | 5915 \end{alltt} |
5883 \end{small} | 5916 \end{small} |
5884 | 5917 |
5885 This simple routine calculates the reciprocal $\mu$ required by Barrett reduction. Note the extended usage of algorithm mp\_div where the variable | 5918 This simple routine calculates the reciprocal $\mu$ required by Barrett reduction. Note the extended usage of algorithm mp\_div where the variable |
5886 which would received the remainder is passed as NULL. As will be discussed in~\ref{sec:division} the division routine allows both the quotient and the | 5919 which would received the remainder is passed as NULL. As will be discussed in~\ref{sec:division} the division routine allows both the quotient and the |
5941 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ | 5974 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ |
5942 \hline $5$ & $x/2 = 278$ \\ | 5975 \hline $5$ & $x/2 = 278$ \\ |
5943 \hline $6$ & $x/2 = 139$ \\ | 5976 \hline $6$ & $x/2 = 139$ \\ |
5944 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ | 5977 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ |
5945 \hline $8$ & $x/2 = 99$ \\ | 5978 \hline $8$ & $x/2 = 99$ \\ |
5979 \hline $9$ & $x + n = 356$, $x/2 = 178$ \\ | |
5946 \hline | 5980 \hline |
5947 \end{tabular} | 5981 \end{tabular} |
5948 \end{center} | 5982 \end{center} |
5949 \end{small} | 5983 \end{small} |
5950 \caption{Example of Montgomery Reduction (I)} | 5984 \caption{Example of Montgomery Reduction (I)} |
5951 \label{fig:MONT1} | 5985 \label{fig:MONT1} |
5952 \end{figure} | 5986 \end{figure} |
5953 | 5987 |
5954 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 8$. The result of the algorithm $r = 99$ is | 5988 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 9$ (note $\beta^k = 512$ which is larger than $n$). The result of |
5955 congruent to the value of $2^{-8} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^8$ modulo $257$ the correct residue | 5989 the algorithm $r = 178$ is congruent to the value of $2^{-9} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^9$ modulo $257$ the correct residue |
5956 $r \equiv 158$ is produced. | 5990 $r \equiv 158$ is produced. |
5957 | 5991 |
5958 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts | 5992 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts |
5959 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. | 5993 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. |
5960 Fortunately there exists an alternative representation of the algorithm. | 5994 Fortunately there exists an alternative representation of the algorithm. |
5962 \begin{figure}[!here] | 5996 \begin{figure}[!here] |
5963 \begin{small} | 5997 \begin{small} |
5964 \begin{center} | 5998 \begin{center} |
5965 \begin{tabular}{l} | 5999 \begin{tabular}{l} |
5966 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ | 6000 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ |
5967 \textbf{Input}. Integer $x$, $n$ and $k$ \\ | 6001 \textbf{Input}. Integer $x$, $n$ and $k$ ($2^k > n$) \\ |
5968 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ | 6002 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ |
5969 \hline \\ | 6003 \hline \\ |
5970 1. for $t$ from $0$ to $k - 1$ do \\ | 6004 1. for $t$ from $1$ to $k$ do \\ |
5971 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ | 6005 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ |
5972 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ | 6006 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ |
5973 2. Return $x/2^k$. \\ | 6007 2. Return $x/2^k$. \\ |
5974 \hline | 6008 \hline |
5975 \end{tabular} | 6009 \end{tabular} |
5993 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ | 6027 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ |
5994 \hline $5$ & $8896$ & $10001011000000$ \\ | 6028 \hline $5$ & $8896$ & $10001011000000$ \\ |
5995 \hline $6$ & $8896$ & $10001011000000$ \\ | 6029 \hline $6$ & $8896$ & $10001011000000$ \\ |
5996 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ | 6030 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ |
5997 \hline $8$ & $25344$ & $110001100000000$ \\ | 6031 \hline $8$ & $25344$ & $110001100000000$ \\ |
5998 \hline -- & $x/2^k = 99$ & \\ | 6032 \hline $9$ & $x + 2^{7}n = 91136$ & $10110010000000000$ \\ |
6033 \hline -- & $x/2^k = 178$ & \\ | |
5999 \hline | 6034 \hline |
6000 \end{tabular} | 6035 \end{tabular} |
6001 \end{center} | 6036 \end{center} |
6002 \end{small} | 6037 \end{small} |
6003 \caption{Example of Montgomery Reduction (II)} | 6038 \caption{Example of Montgomery Reduction (II)} |
6004 \label{fig:MONT2} | 6039 \label{fig:MONT2} |
6005 \end{figure} | 6040 \end{figure} |
6006 | 6041 |
6007 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 8$. | 6042 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 9$. |
6008 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the | 6043 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the |
6009 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is | 6044 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is |
6010 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. | 6045 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. |
6011 | 6046 |
6012 \subsection{Digit Based Montgomery Reduction} | 6047 \subsection{Digit Based Montgomery Reduction} |
6016 \begin{figure}[!here] | 6051 \begin{figure}[!here] |
6017 \begin{small} | 6052 \begin{small} |
6018 \begin{center} | 6053 \begin{center} |
6019 \begin{tabular}{l} | 6054 \begin{tabular}{l} |
6020 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ | 6055 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ |
6021 \textbf{Input}. Integer $x$, $n$ and $k$ \\ | 6056 \textbf{Input}. Integer $x$, $n$ and $k$ ($\beta^k > n$) \\ |
6022 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ | 6057 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ |
6023 \hline \\ | 6058 \hline \\ |
6024 1. for $t$ from $0$ to $k - 1$ do \\ | 6059 1. for $t$ from $0$ to $k - 1$ do \\ |
6025 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ | 6060 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ |
6026 2. Return $x/\beta^k$. \\ | 6061 2. Return $x/\beta^k$. \\ |
6232 109 \} | 6267 109 \} |
6233 110 | 6268 110 |
6234 111 return MP_OKAY; | 6269 111 return MP_OKAY; |
6235 112 \} | 6270 112 \} |
6236 113 #endif | 6271 113 #endif |
6272 114 | |
6237 \end{alltt} | 6273 \end{alltt} |
6238 \end{small} | 6274 \end{small} |
6239 | 6275 |
6240 This is the baseline implementation of the Montgomery reduction algorithm. Lines 30 to 35 determine if the Comba based | 6276 This is the baseline implementation of the Montgomery reduction algorithm. Lines 30 to 35 determine if the Comba based |
6241 routine can be used instead. Line 48 computes the value of $\mu$ for that particular iteration of the outer loop. | 6277 routine can be used instead. Line 48 computes the value of $\mu$ for that particular iteration of the outer loop. |
6476 163 return s_mp_sub (x, n, x); | 6512 163 return s_mp_sub (x, n, x); |
6477 164 \} | 6513 164 \} |
6478 165 return MP_OKAY; | 6514 165 return MP_OKAY; |
6479 166 \} | 6515 166 \} |
6480 167 #endif | 6516 167 #endif |
6517 168 | |
6481 \end{alltt} | 6518 \end{alltt} |
6482 \end{small} | 6519 \end{small} |
6483 | 6520 |
6484 The $\hat W$ array is first filled with digits of $x$ on line 50 then the rest of the digits are zeroed on line 54. Both loops share | 6521 The $\hat W$ array is first filled with digits of $x$ on line 50 then the rest of the digits are zeroed on line 54. Both loops share |
6485 the same alias variables to make the code easier to read. | 6522 the same alias variables to make the code easier to read. |
6503 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ | 6540 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ |
6504 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ | 6541 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ |
6505 \hline \\ | 6542 \hline \\ |
6506 1. $b \leftarrow n_0$ \\ | 6543 1. $b \leftarrow n_0$ \\ |
6507 2. If $b$ is even return(\textit{MP\_VAL}) \\ | 6544 2. If $b$ is even return(\textit{MP\_VAL}) \\ |
6508 3. $x \leftarrow ((b + 2) \mbox{ AND } 4) << 1) + b$ \\ | 6545 3. $x \leftarrow (((b + 2) \mbox{ AND } 4) << 1) + b$ \\ |
6509 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ | 6546 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ |
6510 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ | 6547 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ |
6511 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ | 6548 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ |
6512 6. Return(\textit{MP\_OKAY}). \\ | 6549 6. Return(\textit{MP\_OKAY}). \\ |
6513 \hline | 6550 \hline |
6557 045 #ifdef MP_64BIT | 6594 045 #ifdef MP_64BIT |
6558 046 x *= 2 - b * x; /* here x*a==1 mod 2**64 */ | 6595 046 x *= 2 - b * x; /* here x*a==1 mod 2**64 */ |
6559 047 #endif | 6596 047 #endif |
6560 048 | 6597 048 |
6561 049 /* rho = -1/m mod b */ | 6598 049 /* rho = -1/m mod b */ |
6562 050 *rho = (((mp_word)1 << ((mp_word) DIGIT_BIT)) - x) & MP_MASK; | 6599 050 *rho = (unsigned long)(((mp_word)1 << ((mp_word) DIGIT_BIT)) - x) & MP_MAS |
6600 K; | |
6563 051 | 6601 051 |
6564 052 return MP_OKAY; | 6602 052 return MP_OKAY; |
6565 053 \} | 6603 053 \} |
6566 054 #endif | 6604 054 #endif |
6605 055 | |
6567 \end{alltt} | 6606 \end{alltt} |
6568 \end{small} | 6607 \end{small} |
6569 | 6608 |
6570 This source code computes the value of $\rho$ required to perform Montgomery reduction. It has been modified to avoid performing excess | 6609 This source code computes the value of $\rho$ required to perform Montgomery reduction. It has been modified to avoid performing excess |
6571 multiplications when $\beta$ is not the default 28-bits. | 6610 multiplications when $\beta$ is not the default 28-bits. |
6828 085 goto top; | 6867 085 goto top; |
6829 086 \} | 6868 086 \} |
6830 087 return MP_OKAY; | 6869 087 return MP_OKAY; |
6831 088 \} | 6870 088 \} |
6832 089 #endif | 6871 089 #endif |
6872 090 | |
6833 \end{alltt} | 6873 \end{alltt} |
6834 \end{small} | 6874 \end{small} |
6835 | 6875 |
6836 The first step is to grow $x$ as required to $2m$ digits since the reduction is performed in place on $x$. The label on line 51 is where | 6876 The first step is to grow $x$ as required to $2m$ digits since the reduction is performed in place on $x$. The label on line 51 is where |
6837 the algorithm will resume if further reduction passes are required. In theory it could be placed at the top of the function however, the size of | 6877 the algorithm will resume if further reduction passes are required. In theory it could be placed at the top of the function however, the size of |
6883 023 *d = (mp_digit)((((mp_word)1) << ((mp_word)DIGIT_BIT)) - | 6923 023 *d = (mp_digit)((((mp_word)1) << ((mp_word)DIGIT_BIT)) - |
6884 024 ((mp_word)a->dp[0])); | 6924 024 ((mp_word)a->dp[0])); |
6885 025 \} | 6925 025 \} |
6886 026 | 6926 026 |
6887 027 #endif | 6927 027 #endif |
6928 028 | |
6888 \end{alltt} | 6929 \end{alltt} |
6889 \end{small} | 6930 \end{small} |
6890 | 6931 |
6891 \subsubsection{Modulus Detection} | 6932 \subsubsection{Modulus Detection} |
6892 Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be | 6933 Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be |
6941 034 \} | 6982 034 \} |
6942 035 return 1; | 6983 035 return 1; |
6943 036 \} | 6984 036 \} |
6944 037 | 6985 037 |
6945 038 #endif | 6986 038 #endif |
6987 039 | |
6946 \end{alltt} | 6988 \end{alltt} |
6947 \end{small} | 6989 \end{small} |
6948 | 6990 |
6949 \subsection{Unrestricted Diminished Radix Reduction} | 6991 \subsection{Unrestricted Diminished Radix Reduction} |
6950 The unrestricted Diminished Radix algorithm allows modular reductions to be performed when the modulus is of the form $2^p - k$. This algorithm | 6992 The unrestricted Diminished Radix algorithm allows modular reductions to be performed when the modulus is of the form $2^p - k$. This algorithm |
7025 052 mp_clear(&q); | 7067 052 mp_clear(&q); |
7026 053 return res; | 7068 053 return res; |
7027 054 \} | 7069 054 \} |
7028 055 | 7070 055 |
7029 056 #endif | 7071 056 #endif |
7072 057 | |
7030 \end{alltt} | 7073 \end{alltt} |
7031 \end{small} | 7074 \end{small} |
7032 | 7075 |
7033 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d | 7076 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d |
7034 on line 30 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size | 7077 on line 30 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size |
7094 038 *d = tmp.dp[0]; | 7137 038 *d = tmp.dp[0]; |
7095 039 mp_clear(&tmp); | 7138 039 mp_clear(&tmp); |
7096 040 return MP_OKAY; | 7139 040 return MP_OKAY; |
7097 041 \} | 7140 041 \} |
7098 042 #endif | 7141 042 #endif |
7142 043 | |
7099 \end{alltt} | 7143 \end{alltt} |
7100 \end{small} | 7144 \end{small} |
7101 | 7145 |
7102 \subsubsection{Unrestricted Detection} | 7146 \subsubsection{Unrestricted Detection} |
7103 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. | 7147 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. |
7170 043 \} | 7214 043 \} |
7171 044 return MP_YES; | 7215 044 return MP_YES; |
7172 045 \} | 7216 045 \} |
7173 046 | 7217 046 |
7174 047 #endif | 7218 047 #endif |
7219 048 | |
7175 \end{alltt} | 7220 \end{alltt} |
7176 \end{small} | 7221 \end{small} |
7177 | 7222 |
7178 | 7223 |
7179 | 7224 |
7379 048 | 7424 048 |
7380 049 mp_clear (&g); | 7425 049 mp_clear (&g); |
7381 050 return MP_OKAY; | 7426 050 return MP_OKAY; |
7382 051 \} | 7427 051 \} |
7383 052 #endif | 7428 052 #endif |
7429 053 | |
7384 \end{alltt} | 7430 \end{alltt} |
7385 \end{small} | 7431 \end{small} |
7386 | 7432 |
7387 Line 28 sets the initial value of the result to $1$. Next the loop on line 30 steps through each bit of the exponent starting from | 7433 Line 28 sets the initial value of the result to $1$. Next the loop on line 30 steps through each bit of the exponent starting from |
7388 the most significant down towards the least significant. The invariant squaring operation placed on line 32 is performed first. After | 7434 the most significant down towards the least significant. The invariant squaring operation placed on line 32 is performed first. After |
7618 063 return MP_VAL; | 7664 063 return MP_VAL; |
7619 064 #endif | 7665 064 #endif |
7620 065 \} | 7666 065 \} |
7621 066 | 7667 066 |
7622 067 /* modified diminished radix reduction */ | 7668 067 /* modified diminished radix reduction */ |
7623 068 #if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) | 7669 068 #if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) && defin |
7670 ed(BN_S_MP_EXPTMOD_C) | |
7624 069 if (mp_reduce_is_2k_l(P) == MP_YES) \{ | 7671 069 if (mp_reduce_is_2k_l(P) == MP_YES) \{ |
7625 070 return s_mp_exptmod(G, X, P, Y, 1); | 7672 070 return s_mp_exptmod(G, X, P, Y, 1); |
7626 071 \} | 7673 071 \} |
7627 072 #endif | 7674 072 #endif |
7628 073 | 7675 073 |
7658 103 \} | 7705 103 \} |
7659 104 #endif | 7706 104 #endif |
7660 105 \} | 7707 105 \} |
7661 106 | 7708 106 |
7662 107 #endif | 7709 107 #endif |
7710 108 | |
7663 \end{alltt} | 7711 \end{alltt} |
7664 \end{small} | 7712 \end{small} |
7665 | 7713 |
7666 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is | 7714 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is |
7667 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned | 7715 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned |
7837 | 7885 |
7838 \vspace{+3mm}\begin{small} | 7886 \vspace{+3mm}\begin{small} |
7839 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_exptmod.c | 7887 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_exptmod.c |
7840 \vspace{-3mm} | 7888 \vspace{-3mm} |
7841 \begin{alltt} | 7889 \begin{alltt} |
7842 016 | 7890 016 #ifdef MP_LOW_MEM |
7843 017 #ifdef MP_LOW_MEM | 7891 017 #define TAB_SIZE 32 |
7844 018 #define TAB_SIZE 32 | 7892 018 #else |
7845 019 #else | 7893 019 #define TAB_SIZE 256 |
7846 020 #define TAB_SIZE 256 | 7894 020 #endif |
7847 021 #endif | 7895 021 |
7848 022 | 7896 022 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmod |
7849 023 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmod | |
7850 e) | 7897 e) |
7851 024 \{ | 7898 023 \{ |
7852 025 mp_int M[TAB_SIZE], res, mu; | 7899 024 mp_int M[TAB_SIZE], res, mu; |
7853 026 mp_digit buf; | 7900 025 mp_digit buf; |
7854 027 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; | 7901 026 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; |
7855 028 int (*redux)(mp_int*,mp_int*,mp_int*); | 7902 027 int (*redux)(mp_int*,mp_int*,mp_int*); |
7856 029 | 7903 028 |
7857 030 /* find window size */ | 7904 029 /* find window size */ |
7858 031 x = mp_count_bits (X); | 7905 030 x = mp_count_bits (X); |
7859 032 if (x <= 7) \{ | 7906 031 if (x <= 7) \{ |
7860 033 winsize = 2; | 7907 032 winsize = 2; |
7861 034 \} else if (x <= 36) \{ | 7908 033 \} else if (x <= 36) \{ |
7862 035 winsize = 3; | 7909 034 winsize = 3; |
7863 036 \} else if (x <= 140) \{ | 7910 035 \} else if (x <= 140) \{ |
7864 037 winsize = 4; | 7911 036 winsize = 4; |
7865 038 \} else if (x <= 450) \{ | 7912 037 \} else if (x <= 450) \{ |
7866 039 winsize = 5; | 7913 038 winsize = 5; |
7867 040 \} else if (x <= 1303) \{ | 7914 039 \} else if (x <= 1303) \{ |
7868 041 winsize = 6; | 7915 040 winsize = 6; |
7869 042 \} else if (x <= 3529) \{ | 7916 041 \} else if (x <= 3529) \{ |
7870 043 winsize = 7; | 7917 042 winsize = 7; |
7871 044 \} else \{ | 7918 043 \} else \{ |
7872 045 winsize = 8; | 7919 044 winsize = 8; |
7873 046 \} | 7920 045 \} |
7874 047 | 7921 046 |
7875 048 #ifdef MP_LOW_MEM | 7922 047 #ifdef MP_LOW_MEM |
7876 049 if (winsize > 5) \{ | 7923 048 if (winsize > 5) \{ |
7877 050 winsize = 5; | 7924 049 winsize = 5; |
7878 051 \} | 7925 050 \} |
7879 052 #endif | 7926 051 #endif |
7880 053 | 7927 052 |
7881 054 /* init M array */ | 7928 053 /* init M array */ |
7882 055 /* init first cell */ | 7929 054 /* init first cell */ |
7883 056 if ((err = mp_init(&M[1])) != MP_OKAY) \{ | 7930 055 if ((err = mp_init(&M[1])) != MP_OKAY) \{ |
7884 057 return err; | 7931 056 return err; |
7885 058 \} | 7932 057 \} |
7886 059 | 7933 058 |
7887 060 /* now init the second half of the array */ | 7934 059 /* now init the second half of the array */ |
7888 061 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ | 7935 060 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ |
7889 062 if ((err = mp_init(&M[x])) != MP_OKAY) \{ | 7936 061 if ((err = mp_init(&M[x])) != MP_OKAY) \{ |
7890 063 for (y = 1<<(winsize-1); y < x; y++) \{ | 7937 062 for (y = 1<<(winsize-1); y < x; y++) \{ |
7891 064 mp_clear (&M[y]); | 7938 063 mp_clear (&M[y]); |
7892 065 \} | 7939 064 \} |
7893 066 mp_clear(&M[1]); | 7940 065 mp_clear(&M[1]); |
7894 067 return err; | 7941 066 return err; |
7895 068 \} | 7942 067 \} |
7896 069 \} | 7943 068 \} |
7897 070 | 7944 069 |
7898 071 /* create mu, used for Barrett reduction */ | 7945 070 /* create mu, used for Barrett reduction */ |
7899 072 if ((err = mp_init (&mu)) != MP_OKAY) \{ | 7946 071 if ((err = mp_init (&mu)) != MP_OKAY) \{ |
7900 073 goto LBL_M; | 7947 072 goto LBL_M; |
7901 074 \} | 7948 073 \} |
7902 075 | 7949 074 |
7903 076 if (redmode == 0) \{ | 7950 075 if (redmode == 0) \{ |
7904 077 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{ | 7951 076 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{ |
7905 078 goto LBL_MU; | 7952 077 goto LBL_MU; |
7906 079 \} | 7953 078 \} |
7907 080 redux = mp_reduce; | 7954 079 redux = mp_reduce; |
7908 081 \} else \{ | 7955 080 \} else \{ |
7909 082 if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) \{ | 7956 081 if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) \{ |
7910 083 goto LBL_MU; | 7957 082 goto LBL_MU; |
7911 084 \} | 7958 083 \} |
7912 085 redux = mp_reduce_2k_l; | 7959 084 redux = mp_reduce_2k_l; |
7913 086 \} | 7960 085 \} |
7914 087 | 7961 086 |
7915 088 /* create M table | 7962 087 /* create M table |
7916 089 * | 7963 088 * |
7917 090 * The M table contains powers of the base, | 7964 089 * The M table contains powers of the base, |
7918 091 * e.g. M[x] = G**x mod P | 7965 090 * e.g. M[x] = G**x mod P |
7919 092 * | 7966 091 * |
7920 093 * The first half of the table is not | 7967 092 * The first half of the table is not |
7921 094 * computed though accept for M[0] and M[1] | 7968 093 * computed though accept for M[0] and M[1] |
7922 095 */ | 7969 094 */ |
7923 096 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{ | 7970 095 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{ |
7924 097 goto LBL_MU; | 7971 096 goto LBL_MU; |
7925 098 \} | 7972 097 \} |
7926 099 | 7973 098 |
7927 100 /* compute the value at M[1<<(winsize-1)] by squaring | 7974 099 /* compute the value at M[1<<(winsize-1)] by squaring |
7928 101 * M[1] (winsize-1) times | 7975 100 * M[1] (winsize-1) times |
7929 102 */ | 7976 101 */ |
7930 103 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{ | 7977 102 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{ |
7931 104 goto LBL_MU; | 7978 103 goto LBL_MU; |
7932 105 \} | 7979 104 \} |
7933 106 | 7980 105 |
7934 107 for (x = 0; x < (winsize - 1); x++) \{ | 7981 106 for (x = 0; x < (winsize - 1); x++) \{ |
7935 108 /* square it */ | 7982 107 /* square it */ |
7936 109 if ((err = mp_sqr (&M[1 << (winsize - 1)], | 7983 108 if ((err = mp_sqr (&M[1 << (winsize - 1)], |
7937 110 &M[1 << (winsize - 1)])) != MP_OKAY) \{ | 7984 109 &M[1 << (winsize - 1)])) != MP_OKAY) \{ |
7938 111 goto LBL_MU; | 7985 110 goto LBL_MU; |
7939 112 \} | 7986 111 \} |
7940 113 | 7987 112 |
7941 114 /* reduce modulo P */ | 7988 113 /* reduce modulo P */ |
7942 115 if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{ | 7989 114 if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{ |
7943 116 goto LBL_MU; | 7990 115 goto LBL_MU; |
7944 117 \} | 7991 116 \} |
7945 118 \} | 7992 117 \} |
7946 119 | 7993 118 |
7947 120 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P) | 7994 119 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P) |
7948 121 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1) | 7995 120 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1) |
7949 122 */ | 7996 121 */ |
7950 123 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{ | 7997 122 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{ |
7951 124 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{ | 7998 123 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{ |
7952 125 goto LBL_MU; | 7999 124 goto LBL_MU; |
7953 126 \} | 8000 125 \} |
7954 127 if ((err = redux (&M[x], P, &mu)) != MP_OKAY) \{ | 8001 126 if ((err = redux (&M[x], P, &mu)) != MP_OKAY) \{ |
7955 128 goto LBL_MU; | 8002 127 goto LBL_MU; |
7956 129 \} | 8003 128 \} |
7957 130 \} | 8004 129 \} |
7958 131 | 8005 130 |
7959 132 /* setup result */ | 8006 131 /* setup result */ |
7960 133 if ((err = mp_init (&res)) != MP_OKAY) \{ | 8007 132 if ((err = mp_init (&res)) != MP_OKAY) \{ |
7961 134 goto LBL_MU; | 8008 133 goto LBL_MU; |
7962 135 \} | 8009 134 \} |
7963 136 mp_set (&res, 1); | 8010 135 mp_set (&res, 1); |
7964 137 | 8011 136 |
7965 138 /* set initial mode and bit cnt */ | 8012 137 /* set initial mode and bit cnt */ |
7966 139 mode = 0; | 8013 138 mode = 0; |
7967 140 bitcnt = 1; | 8014 139 bitcnt = 1; |
7968 141 buf = 0; | 8015 140 buf = 0; |
7969 142 digidx = X->used - 1; | 8016 141 digidx = X->used - 1; |
7970 143 bitcpy = 0; | 8017 142 bitcpy = 0; |
7971 144 bitbuf = 0; | 8018 143 bitbuf = 0; |
7972 145 | 8019 144 |
7973 146 for (;;) \{ | 8020 145 for (;;) \{ |
7974 147 /* grab next digit as required */ | 8021 146 /* grab next digit as required */ |
7975 148 if (--bitcnt == 0) \{ | 8022 147 if (--bitcnt == 0) \{ |
7976 149 /* if digidx == -1 we are out of digits */ | 8023 148 /* if digidx == -1 we are out of digits */ |
7977 150 if (digidx == -1) \{ | 8024 149 if (digidx == -1) \{ |
7978 151 break; | 8025 150 break; |
7979 152 \} | 8026 151 \} |
7980 153 /* read next digit and reset the bitcnt */ | 8027 152 /* read next digit and reset the bitcnt */ |
7981 154 buf = X->dp[digidx--]; | 8028 153 buf = X->dp[digidx--]; |
7982 155 bitcnt = (int) DIGIT_BIT; | 8029 154 bitcnt = (int) DIGIT_BIT; |
7983 156 \} | 8030 155 \} |
7984 157 | 8031 156 |
7985 158 /* grab the next msb from the exponent */ | 8032 157 /* grab the next msb from the exponent */ |
7986 159 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1; | 8033 158 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1; |
7987 160 buf <<= (mp_digit)1; | 8034 159 buf <<= (mp_digit)1; |
7988 161 | 8035 160 |
7989 162 /* if the bit is zero and mode == 0 then we ignore it | 8036 161 /* if the bit is zero and mode == 0 then we ignore it |
7990 163 * These represent the leading zero bits before the first 1 bit | 8037 162 * These represent the leading zero bits before the first 1 bit |
7991 164 * in the exponent. Technically this opt is not required but it | 8038 163 * in the exponent. Technically this opt is not required but it |
7992 165 * does lower the # of trivial squaring/reductions used | 8039 164 * does lower the # of trivial squaring/reductions used |
7993 166 */ | 8040 165 */ |
7994 167 if (mode == 0 && y == 0) \{ | 8041 166 if (mode == 0 && y == 0) \{ |
7995 168 continue; | 8042 167 continue; |
7996 169 \} | 8043 168 \} |
7997 170 | 8044 169 |
7998 171 /* if the bit is zero and mode == 1 then we square */ | 8045 170 /* if the bit is zero and mode == 1 then we square */ |
7999 172 if (mode == 1 && y == 0) \{ | 8046 171 if (mode == 1 && y == 0) \{ |
8000 173 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 8047 172 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
8001 174 goto LBL_RES; | 8048 173 goto LBL_RES; |
8002 175 \} | 8049 174 \} |
8003 176 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ | 8050 175 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8004 177 goto LBL_RES; | 8051 176 goto LBL_RES; |
8005 178 \} | 8052 177 \} |
8006 179 continue; | 8053 178 continue; |
8007 180 \} | 8054 179 \} |
8008 181 | 8055 180 |
8009 182 /* else we add it to the window */ | 8056 181 /* else we add it to the window */ |
8010 183 bitbuf |= (y << (winsize - ++bitcpy)); | 8057 182 bitbuf |= (y << (winsize - ++bitcpy)); |
8011 184 mode = 2; | 8058 183 mode = 2; |
8012 185 | 8059 184 |
8013 186 if (bitcpy == winsize) \{ | 8060 185 if (bitcpy == winsize) \{ |
8014 187 /* ok window is filled so square as required and multiply */ | 8061 186 /* ok window is filled so square as required and multiply */ |
8015 188 /* square first */ | 8062 187 /* square first */ |
8016 189 for (x = 0; x < winsize; x++) \{ | 8063 188 for (x = 0; x < winsize; x++) \{ |
8017 190 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 8064 189 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
8018 191 goto LBL_RES; | 8065 190 goto LBL_RES; |
8019 192 \} | 8066 191 \} |
8020 193 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ | 8067 192 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8021 194 goto LBL_RES; | 8068 193 goto LBL_RES; |
8022 195 \} | 8069 194 \} |
8023 196 \} | 8070 195 \} |
8024 197 | 8071 196 |
8025 198 /* then multiply */ | 8072 197 /* then multiply */ |
8026 199 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{ | 8073 198 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{ |
8027 200 goto LBL_RES; | 8074 199 goto LBL_RES; |
8028 201 \} | 8075 200 \} |
8029 202 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ | 8076 201 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8030 203 goto LBL_RES; | 8077 202 goto LBL_RES; |
8031 204 \} | 8078 203 \} |
8032 205 | 8079 204 |
8033 206 /* empty window and reset */ | 8080 205 /* empty window and reset */ |
8034 207 bitcpy = 0; | 8081 206 bitcpy = 0; |
8035 208 bitbuf = 0; | 8082 207 bitbuf = 0; |
8036 209 mode = 1; | 8083 208 mode = 1; |
8037 210 \} | 8084 209 \} |
8038 211 \} | 8085 210 \} |
8039 212 | 8086 211 |
8040 213 /* if bits remain then square/multiply */ | 8087 212 /* if bits remain then square/multiply */ |
8041 214 if (mode == 2 && bitcpy > 0) \{ | 8088 213 if (mode == 2 && bitcpy > 0) \{ |
8042 215 /* square then multiply if the bit is set */ | 8089 214 /* square then multiply if the bit is set */ |
8043 216 for (x = 0; x < bitcpy; x++) \{ | 8090 215 for (x = 0; x < bitcpy; x++) \{ |
8044 217 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 8091 216 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
8045 218 goto LBL_RES; | 8092 217 goto LBL_RES; |
8046 219 \} | 8093 218 \} |
8047 220 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ | 8094 219 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8048 221 goto LBL_RES; | 8095 220 goto LBL_RES; |
8049 222 \} | 8096 221 \} |
8050 223 | 8097 222 |
8051 224 bitbuf <<= 1; | 8098 223 bitbuf <<= 1; |
8052 225 if ((bitbuf & (1 << winsize)) != 0) \{ | 8099 224 if ((bitbuf & (1 << winsize)) != 0) \{ |
8053 226 /* then multiply */ | 8100 225 /* then multiply */ |
8054 227 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{ | 8101 226 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{ |
8055 228 goto LBL_RES; | 8102 227 goto LBL_RES; |
8056 229 \} | 8103 228 \} |
8057 230 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ | 8104 229 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8058 231 goto LBL_RES; | 8105 230 goto LBL_RES; |
8059 232 \} | 8106 231 \} |
8060 233 \} | 8107 232 \} |
8061 234 \} | 8108 233 \} |
8062 235 \} | 8109 234 \} |
8063 236 | 8110 235 |
8064 237 mp_exch (&res, Y); | 8111 236 mp_exch (&res, Y); |
8065 238 err = MP_OKAY; | 8112 237 err = MP_OKAY; |
8066 239 LBL_RES:mp_clear (&res); | 8113 238 LBL_RES:mp_clear (&res); |
8067 240 LBL_MU:mp_clear (&mu); | 8114 239 LBL_MU:mp_clear (&mu); |
8068 241 LBL_M: | 8115 240 LBL_M: |
8069 242 mp_clear(&M[1]); | 8116 241 mp_clear(&M[1]); |
8070 243 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ | 8117 242 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ |
8071 244 mp_clear (&M[x]); | 8118 243 mp_clear (&M[x]); |
8072 245 \} | 8119 244 \} |
8073 246 return err; | 8120 245 return err; |
8074 247 \} | 8121 246 \} |
8075 248 #endif | 8122 247 #endif |
8123 248 | |
8076 \end{alltt} | 8124 \end{alltt} |
8077 \end{small} | 8125 \end{small} |
8078 | 8126 |
8079 Lines 21 through 40 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted | 8127 Lines 31 through 45 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted |
8080 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement | 8128 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement |
8081 on line 32 the value of $x$ is already known to be greater than $140$. | 8129 on line 37 the value of $x$ is already known to be greater than $140$. |
8082 | 8130 |
8083 The conditional piece of code beginning on line 48 allows the window size to be restricted to five bits. This logic is used to ensure | 8131 The conditional piece of code beginning on line 47 allows the window size to be restricted to five bits. This logic is used to ensure |
8084 the table of precomputed powers of $G$ remains relatively small. | 8132 the table of precomputed powers of $G$ remains relatively small. |
8085 | 8133 |
8086 The for loop on line 61 initializes the $M$ array while lines 62 and 77 compute the value of $\mu$ required for | 8134 The for loop on line 60 initializes the $M$ array while lines 71 and 76 through 85 initialize the reduction |
8087 Barrett reduction. | 8135 function that will be used for this modulus. |
8088 | 8136 |
8089 -- More later. | 8137 -- More later. |
8090 | 8138 |
8091 \section{Quick Power of Two} | 8139 \section{Quick Power of Two} |
8092 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is | 8140 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is |
8144 039 a->dp[b / DIGIT_BIT] = ((mp_digit)1) << (b % DIGIT_BIT); | 8192 039 a->dp[b / DIGIT_BIT] = ((mp_digit)1) << (b % DIGIT_BIT); |
8145 040 | 8193 040 |
8146 041 return MP_OKAY; | 8194 041 return MP_OKAY; |
8147 042 \} | 8195 042 \} |
8148 043 #endif | 8196 043 #endif |
8197 044 | |
8149 \end{alltt} | 8198 \end{alltt} |
8150 \end{small} | 8199 \end{small} |
8151 | 8200 |
8152 \chapter{Higher Level Algorithms} | 8201 \chapter{Higher Level Algorithms} |
8153 | 8202 |
8664 283 \} | 8713 283 \} |
8665 284 | 8714 284 |
8666 285 #endif | 8715 285 #endif |
8667 286 | 8716 286 |
8668 287 #endif | 8717 287 #endif |
8718 288 | |
8669 \end{alltt} | 8719 \end{alltt} |
8670 \end{small} | 8720 \end{small} |
8671 | 8721 |
8672 The implementation of this algorithm differs slightly from the pseudo code presented previously. In this algorithm either of the quotient $c$ or | 8722 The implementation of this algorithm differs slightly from the pseudo code presented previously. In this algorithm either of the quotient $c$ or |
8673 remainder $d$ may be passed as a \textbf{NULL} pointer which indicates their value is not desired. For example, the C code to call the division | 8723 remainder $d$ may be passed as a \textbf{NULL} pointer which indicates their value is not desired. For example, the C code to call the division |
8675 | 8725 |
8676 \begin{verbatim} | 8726 \begin{verbatim} |
8677 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ | 8727 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ |
8678 \end{verbatim} | 8728 \end{verbatim} |
8679 | 8729 |
8680 Lines 37 and 44 handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor | 8730 Lines 108 and 113 handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor |
8681 respectively. After the two trivial cases all of the temporary variables are initialized. Line 105 determines the sign of | 8731 respectively. After the two trivial cases all of the temporary variables are initialized. Line 147 determines the sign of |
8682 the quotient and line 76 ensures that both $x$ and $y$ are positive. | 8732 the quotient and line 148 ensures that both $x$ and $y$ are positive. |
8683 | 8733 |
8684 The number of bits in the leading digit is calculated on line 105. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits | 8734 The number of bits in the leading digit is calculated on line 151. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits |
8685 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is | 8735 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is |
8686 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting | 8736 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting |
8687 them to the left by $lg(\beta) - 1 - k$ bits. | 8737 them to the left by $lg(\beta) - 1 - k$ bits. |
8688 | 8738 |
8689 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the | 8739 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the |
8690 leading digit of the quotient. The loop beginning on line 183 will produce the remainder of the quotient digits. | 8740 leading digit of the quotient. The loop beginning on line 184 will produce the remainder of the quotient digits. |
8691 | 8741 |
8692 The conditional ``continue'' on line 114 is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the | 8742 The conditional ``continue'' on line 186 is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the |
8693 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits | 8743 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits |
8694 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. | 8744 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. |
8695 | 8745 |
8696 Lines 130, 130 and 134 through 134 manually construct the high accuracy estimations by setting the digits of the two mp\_int | 8746 Lines 214, 216 and 222 through 225 manually construct the high accuracy estimations by setting the digits of the two mp\_int |
8697 variables directly. | 8747 variables directly. |
8698 | 8748 |
8699 \section{Single Digit Helpers} | 8749 \section{Single Digit Helpers} |
8700 | 8750 |
8701 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of | 8751 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of |
8755 037 res = mp_sub_d(a, b, c); | 8805 037 res = mp_sub_d(a, b, c); |
8756 038 | 8806 038 |
8757 039 /* fix sign */ | 8807 039 /* fix sign */ |
8758 040 a->sign = c->sign = MP_NEG; | 8808 040 a->sign = c->sign = MP_NEG; |
8759 041 | 8809 041 |
8760 042 return res; | 8810 042 /* clamp */ |
8761 043 \} | 8811 043 mp_clamp(c); |
8762 044 | 8812 044 |
8763 045 /* old number of used digits in c */ | 8813 045 return res; |
8764 046 oldused = c->used; | 8814 046 \} |
8765 047 | 8815 047 |
8766 048 /* sign always positive */ | 8816 048 /* old number of used digits in c */ |
8767 049 c->sign = MP_ZPOS; | 8817 049 oldused = c->used; |
8768 050 | 8818 050 |
8769 051 /* source alias */ | 8819 051 /* sign always positive */ |
8770 052 tmpa = a->dp; | 8820 052 c->sign = MP_ZPOS; |
8771 053 | 8821 053 |
8772 054 /* destination alias */ | 8822 054 /* source alias */ |
8773 055 tmpc = c->dp; | 8823 055 tmpa = a->dp; |
8774 056 | 8824 056 |
8775 057 /* if a is positive */ | 8825 057 /* destination alias */ |
8776 058 if (a->sign == MP_ZPOS) \{ | 8826 058 tmpc = c->dp; |
8777 059 /* add digit, after this we're propagating | 8827 059 |
8778 060 * the carry. | 8828 060 /* if a is positive */ |
8779 061 */ | 8829 061 if (a->sign == MP_ZPOS) \{ |
8780 062 *tmpc = *tmpa++ + b; | 8830 062 /* add digit, after this we're propagating |
8781 063 mu = *tmpc >> DIGIT_BIT; | 8831 063 * the carry. |
8782 064 *tmpc++ &= MP_MASK; | 8832 064 */ |
8783 065 | 8833 065 *tmpc = *tmpa++ + b; |
8784 066 /* now handle rest of the digits */ | 8834 066 mu = *tmpc >> DIGIT_BIT; |
8785 067 for (ix = 1; ix < a->used; ix++) \{ | 8835 067 *tmpc++ &= MP_MASK; |
8786 068 *tmpc = *tmpa++ + mu; | 8836 068 |
8787 069 mu = *tmpc >> DIGIT_BIT; | 8837 069 /* now handle rest of the digits */ |
8788 070 *tmpc++ &= MP_MASK; | 8838 070 for (ix = 1; ix < a->used; ix++) \{ |
8789 071 \} | 8839 071 *tmpc = *tmpa++ + mu; |
8790 072 /* set final carry */ | 8840 072 mu = *tmpc >> DIGIT_BIT; |
8791 073 ix++; | 8841 073 *tmpc++ &= MP_MASK; |
8792 074 *tmpc++ = mu; | 8842 074 \} |
8793 075 | 8843 075 /* set final carry */ |
8794 076 /* setup size */ | 8844 076 ix++; |
8795 077 c->used = a->used + 1; | 8845 077 *tmpc++ = mu; |
8796 078 \} else \{ | 8846 078 |
8797 079 /* a was negative and |a| < b */ | 8847 079 /* setup size */ |
8798 080 c->used = 1; | 8848 080 c->used = a->used + 1; |
8799 081 | 8849 081 \} else \{ |
8800 082 /* the result is a single digit */ | 8850 082 /* a was negative and |a| < b */ |
8801 083 if (a->used == 1) \{ | 8851 083 c->used = 1; |
8802 084 *tmpc++ = b - a->dp[0]; | 8852 084 |
8803 085 \} else \{ | 8853 085 /* the result is a single digit */ |
8804 086 *tmpc++ = b; | 8854 086 if (a->used == 1) \{ |
8805 087 \} | 8855 087 *tmpc++ = b - a->dp[0]; |
8806 088 | 8856 088 \} else \{ |
8807 089 /* setup count so the clearing of oldused | 8857 089 *tmpc++ = b; |
8808 090 * can fall through correctly | 8858 090 \} |
8809 091 */ | 8859 091 |
8810 092 ix = 1; | 8860 092 /* setup count so the clearing of oldused |
8811 093 \} | 8861 093 * can fall through correctly |
8812 094 | 8862 094 */ |
8813 095 /* now zero to oldused */ | 8863 095 ix = 1; |
8814 096 while (ix++ < oldused) \{ | 8864 096 \} |
8815 097 *tmpc++ = 0; | 8865 097 |
8816 098 \} | 8866 098 /* now zero to oldused */ |
8817 099 mp_clamp(c); | 8867 099 while (ix++ < oldused) \{ |
8818 100 | 8868 100 *tmpc++ = 0; |
8819 101 return MP_OKAY; | 8869 101 \} |
8820 102 \} | 8870 102 mp_clamp(c); |
8821 103 | 8871 103 |
8822 104 #endif | 8872 104 return MP_OKAY; |
8873 105 \} | |
8874 106 | |
8875 107 #endif | |
8876 108 | |
8823 \end{alltt} | 8877 \end{alltt} |
8824 \end{small} | 8878 \end{small} |
8825 | 8879 |
8826 Clever use of the letter 't'. | 8880 Clever use of the letter 't'. |
8827 | 8881 |
8927 070 mp_clamp(c); | 8981 070 mp_clamp(c); |
8928 071 | 8982 071 |
8929 072 return MP_OKAY; | 8983 072 return MP_OKAY; |
8930 073 \} | 8984 073 \} |
8931 074 #endif | 8985 074 #endif |
8986 075 | |
8932 \end{alltt} | 8987 \end{alltt} |
8933 \end{small} | 8988 \end{small} |
8934 | 8989 |
8935 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is | 8990 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is |
8936 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. | 8991 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. |
9072 101 | 9127 101 |
9073 102 return res; | 9128 102 return res; |
9074 103 \} | 9129 103 \} |
9075 104 | 9130 104 |
9076 105 #endif | 9131 105 #endif |
9132 106 | |
9077 \end{alltt} | 9133 \end{alltt} |
9078 \end{small} | 9134 \end{small} |
9079 | 9135 |
9080 Like the implementation of algorithm mp\_div this algorithm allows either of the quotient or remainder to be passed as a \textbf{NULL} pointer to | 9136 Like the implementation of algorithm mp\_div this algorithm allows either of the quotient or remainder to be passed as a \textbf{NULL} pointer to |
9081 indicate the respective value is not required. This allows a trivial single digit modular reduction algorithm, mp\_mod\_d to be created. | 9137 indicate the respective value is not required. This allows a trivial single digit modular reduction algorithm, mp\_mod\_d to be created. |
9258 123 LBL_T2:mp_clear (&t2); | 9314 123 LBL_T2:mp_clear (&t2); |
9259 124 LBL_T1:mp_clear (&t1); | 9315 124 LBL_T1:mp_clear (&t1); |
9260 125 return res; | 9316 125 return res; |
9261 126 \} | 9317 126 \} |
9262 127 #endif | 9318 127 #endif |
9319 128 | |
9263 \end{alltt} | 9320 \end{alltt} |
9264 \end{small} | 9321 \end{small} |
9265 | 9322 |
9266 \section{Random Number Generation} | 9323 \section{Random Number Generation} |
9267 | 9324 |
9334 046 \} | 9391 046 \} |
9335 047 | 9392 047 |
9336 048 return MP_OKAY; | 9393 048 return MP_OKAY; |
9337 049 \} | 9394 049 \} |
9338 050 #endif | 9395 050 #endif |
9396 051 | |
9339 \end{alltt} | 9397 \end{alltt} |
9340 \end{small} | 9398 \end{small} |
9341 | 9399 |
9342 \section{Formatted Representations} | 9400 \section{Formatted Representations} |
9343 The ability to emit a radix-$n$ textual representation of an integer is useful for interacting with human parties. For example, the ability to | 9401 The ability to emit a radix-$n$ textual representation of an integer is useful for interacting with human parties. For example, the ability to |
9423 018 int mp_read_radix (mp_int * a, const char *str, int radix) | 9481 018 int mp_read_radix (mp_int * a, const char *str, int radix) |
9424 019 \{ | 9482 019 \{ |
9425 020 int y, res, neg; | 9483 020 int y, res, neg; |
9426 021 char ch; | 9484 021 char ch; |
9427 022 | 9485 022 |
9428 023 /* make sure the radix is ok */ | 9486 023 /* zero the digit bignum */ |
9429 024 if (radix < 2 || radix > 64) \{ | 9487 024 mp_zero(a); |
9430 025 return MP_VAL; | 9488 025 |
9431 026 \} | 9489 026 /* make sure the radix is ok */ |
9432 027 | 9490 027 if (radix < 2 || radix > 64) \{ |
9433 028 /* if the leading digit is a | 9491 028 return MP_VAL; |
9434 029 * minus set the sign to negative. | 9492 029 \} |
9435 030 */ | 9493 030 |
9436 031 if (*str == '-') \{ | 9494 031 /* if the leading digit is a |
9437 032 ++str; | 9495 032 * minus set the sign to negative. |
9438 033 neg = MP_NEG; | 9496 033 */ |
9439 034 \} else \{ | 9497 034 if (*str == '-') \{ |
9440 035 neg = MP_ZPOS; | 9498 035 ++str; |
9441 036 \} | 9499 036 neg = MP_NEG; |
9442 037 | 9500 037 \} else \{ |
9443 038 /* set the integer to the default of zero */ | 9501 038 neg = MP_ZPOS; |
9444 039 mp_zero (a); | 9502 039 \} |
9445 040 | 9503 040 |
9446 041 /* process each digit of the string */ | 9504 041 /* set the integer to the default of zero */ |
9447 042 while (*str) \{ | 9505 042 mp_zero (a); |
9448 043 /* if the radix < 36 the conversion is case insensitive | 9506 043 |
9449 044 * this allows numbers like 1AB and 1ab to represent the same value | 9507 044 /* process each digit of the string */ |
9450 045 * [e.g. in hex] | 9508 045 while (*str) \{ |
9451 046 */ | 9509 046 /* if the radix < 36 the conversion is case insensitive |
9452 047 ch = (char) ((radix < 36) ? toupper (*str) : *str); | 9510 047 * this allows numbers like 1AB and 1ab to represent the same value |
9453 048 for (y = 0; y < 64; y++) \{ | 9511 048 * [e.g. in hex] |
9454 049 if (ch == mp_s_rmap[y]) \{ | 9512 049 */ |
9455 050 break; | 9513 050 ch = (char) ((radix < 36) ? toupper (*str) : *str); |
9456 051 \} | 9514 051 for (y = 0; y < 64; y++) \{ |
9457 052 \} | 9515 052 if (ch == mp_s_rmap[y]) \{ |
9458 053 | 9516 053 break; |
9459 054 /* if the char was found in the map | 9517 054 \} |
9460 055 * and is less than the given radix add it | 9518 055 \} |
9461 056 * to the number, otherwise exit the loop. | 9519 056 |
9462 057 */ | 9520 057 /* if the char was found in the map |
9463 058 if (y < radix) \{ | 9521 058 * and is less than the given radix add it |
9464 059 if ((res = mp_mul_d (a, (mp_digit) radix, a)) != MP_OKAY) \{ | 9522 059 * to the number, otherwise exit the loop. |
9465 060 return res; | 9523 060 */ |
9466 061 \} | 9524 061 if (y < radix) \{ |
9467 062 if ((res = mp_add_d (a, (mp_digit) y, a)) != MP_OKAY) \{ | 9525 062 if ((res = mp_mul_d (a, (mp_digit) radix, a)) != MP_OKAY) \{ |
9468 063 return res; | 9526 063 return res; |
9469 064 \} | 9527 064 \} |
9470 065 \} else \{ | 9528 065 if ((res = mp_add_d (a, (mp_digit) y, a)) != MP_OKAY) \{ |
9471 066 break; | 9529 066 return res; |
9472 067 \} | 9530 067 \} |
9473 068 ++str; | 9531 068 \} else \{ |
9474 069 \} | 9532 069 break; |
9475 070 | 9533 070 \} |
9476 071 /* set the sign only if a != 0 */ | 9534 071 ++str; |
9477 072 if (mp_iszero(a) != 1) \{ | 9535 072 \} |
9478 073 a->sign = neg; | 9536 073 |
9479 074 \} | 9537 074 /* set the sign only if a != 0 */ |
9480 075 return MP_OKAY; | 9538 075 if (mp_iszero(a) != 1) \{ |
9481 076 \} | 9539 076 a->sign = neg; |
9482 077 #endif | 9540 077 \} |
9541 078 return MP_OKAY; | |
9542 079 \} | |
9543 080 #endif | |
9544 081 | |
9483 \end{alltt} | 9545 \end{alltt} |
9484 \end{small} | 9546 \end{small} |
9485 | 9547 |
9486 \subsection{Generating Radix-$n$ Output} | 9548 \subsection{Generating Radix-$n$ Output} |
9487 Generating radix-$n$ output is fairly trivial with a division and remainder algorithm. | 9549 Generating radix-$n$ output is fairly trivial with a division and remainder algorithm. |
9597 066 mp_clear (&t); | 9659 066 mp_clear (&t); |
9598 067 return MP_OKAY; | 9660 067 return MP_OKAY; |
9599 068 \} | 9661 068 \} |
9600 069 | 9662 069 |
9601 070 #endif | 9663 070 #endif |
9664 071 | |
9602 \end{alltt} | 9665 \end{alltt} |
9603 \end{small} | 9666 \end{small} |
9604 | 9667 |
9605 \chapter{Number Theoretic Algorithms} | 9668 \chapter{Number Theoretic Algorithms} |
9606 This chapter discusses several fundamental number theoretic algorithms such as the greatest common divisor, least common multiple and Jacobi | 9669 This chapter discusses several fundamental number theoretic algorithms such as the greatest common divisor, least common multiple and Jacobi |
9726 \begin{tabular}{l} | 9789 \begin{tabular}{l} |
9727 \hline Algorithm \textbf{mp\_gcd}. \\ | 9790 \hline Algorithm \textbf{mp\_gcd}. \\ |
9728 \textbf{Input}. mp\_int $a$ and $b$ \\ | 9791 \textbf{Input}. mp\_int $a$ and $b$ \\ |
9729 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ | 9792 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ |
9730 \hline \\ | 9793 \hline \\ |
9731 1. If $a = 0$ and $b \ne 0$ then \\ | 9794 1. If $a = 0$ then \\ |
9732 \hspace{3mm}1.1 $c \leftarrow b$ \\ | 9795 \hspace{3mm}1.1 $c \leftarrow \vert b \vert $ \\ |
9733 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ | 9796 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ |
9734 2. If $a \ne 0$ and $b = 0$ then \\ | 9797 2. If $b = 0$ then \\ |
9735 \hspace{3mm}2.1 $c \leftarrow a$ \\ | 9798 \hspace{3mm}2.1 $c \leftarrow \vert a \vert $ \\ |
9736 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ | 9799 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ |
9737 3. If $a = b = 0$ then \\ | 9800 3. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ |
9738 \hspace{3mm}3.1 $c \leftarrow 1$ \\ | 9801 4. $k \leftarrow 0$ \\ |
9739 \hspace{3mm}3.2 Return(\textit{MP\_OKAY}). \\ | 9802 5. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
9740 4. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ | 9803 \hspace{3mm}5.1 $k \leftarrow k + 1$ \\ |
9741 5. $k \leftarrow 0$ \\ | 9804 \hspace{3mm}5.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ |
9742 6. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 9805 \hspace{3mm}5.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
9743 \hspace{3mm}6.1 $k \leftarrow k + 1$ \\ | 9806 6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
9744 \hspace{3mm}6.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ | 9807 \hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ |
9745 \hspace{3mm}6.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | 9808 7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
9746 7. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 9809 \hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
9747 \hspace{3mm}7.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ | 9810 8. While $v.used > 0$ \\ |
9748 8. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 9811 \hspace{3mm}8.1 If $\vert u \vert > \vert v \vert$ then \\ |
9749 \hspace{3mm}8.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | 9812 \hspace{6mm}8.1.1 Swap $u$ and $v$. \\ |
9750 9. While $v.used > 0$ \\ | 9813 \hspace{3mm}8.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ |
9751 \hspace{3mm}9.1 If $\vert u \vert > \vert v \vert$ then \\ | 9814 \hspace{3mm}8.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ |
9752 \hspace{6mm}9.1.1 Swap $u$ and $v$. \\ | 9815 \hspace{6mm}8.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ |
9753 \hspace{3mm}9.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ | 9816 9. $c \leftarrow u \cdot 2^k$ \\ |
9754 \hspace{3mm}9.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ | 9817 10. Return(\textit{MP\_OKAY}). \\ |
9755 \hspace{6mm}9.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ | |
9756 10. $c \leftarrow u \cdot 2^k$ \\ | |
9757 11. Return(\textit{MP\_OKAY}). \\ | |
9758 \hline | 9818 \hline |
9759 \end{tabular} | 9819 \end{tabular} |
9760 \end{center} | 9820 \end{center} |
9761 \end{small} | 9821 \end{small} |
9762 \caption{Algorithm mp\_gcd} | 9822 \caption{Algorithm mp\_gcd} |
9764 \textbf{Algorithm mp\_gcd.} | 9824 \textbf{Algorithm mp\_gcd.} |
9765 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of | 9825 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of |
9766 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as | 9826 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as |
9767 Algorithm B and in practice this appears to be true. | 9827 Algorithm B and in practice this appears to be true. |
9768 | 9828 |
9769 The first three steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the | 9829 The first two steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the |
9770 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of | 9830 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of |
9771 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. | 9831 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. |
9772 | 9832 |
9773 Step six will divide out any common factors of two and keep track of the count in the variable $k$. After this step two is no longer a | 9833 Step five will divide out any common factors of two and keep track of the count in the variable $k$. After this step, two is no longer a |
9774 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step | 9834 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step |
9775 seven and eight ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while loops will iterate since | 9835 six and seven ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while--loops will iterate since |
9776 they cannot both be even. | 9836 they cannot both be even. |
9777 | 9837 |
9778 By step nine both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to | 9838 By step eight both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to |
9779 or greater than $u$. This ensures that the subtraction on step 9.2 will always produce a positive and even result. Step 9.3 removes any | 9839 or greater than $u$. This ensures that the subtraction on step 8.2 will always produce a positive and even result. Step 8.3 removes any |
9780 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. | 9840 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. |
9781 | 9841 |
9782 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result | 9842 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result |
9783 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. | 9843 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. |
9784 | 9844 |
9792 019 \{ | 9852 019 \{ |
9793 020 mp_int u, v; | 9853 020 mp_int u, v; |
9794 021 int k, u_lsb, v_lsb, res; | 9854 021 int k, u_lsb, v_lsb, res; |
9795 022 | 9855 022 |
9796 023 /* either zero than gcd is the largest */ | 9856 023 /* either zero than gcd is the largest */ |
9797 024 if (mp_iszero (a) == 1 && mp_iszero (b) == 0) \{ | 9857 024 if (mp_iszero (a) == MP_YES) \{ |
9798 025 return mp_abs (b, c); | 9858 025 return mp_abs (b, c); |
9799 026 \} | 9859 026 \} |
9800 027 if (mp_iszero (a) == 0 && mp_iszero (b) == 1) \{ | 9860 027 if (mp_iszero (b) == MP_YES) \{ |
9801 028 return mp_abs (a, c); | 9861 028 return mp_abs (a, c); |
9802 029 \} | 9862 029 \} |
9803 030 | 9863 030 |
9804 031 /* optimized. At this point if a == 0 then | 9864 031 /* get copies of a and b we can modify */ |
9805 032 * b must equal zero too | 9865 032 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{ |
9806 033 */ | 9866 033 return res; |
9807 034 if (mp_iszero (a) == 1) \{ | 9867 034 \} |
9808 035 mp_zero(c); | 9868 035 |
9809 036 return MP_OKAY; | 9869 036 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{ |
9810 037 \} | 9870 037 goto LBL_U; |
9811 038 | 9871 038 \} |
9812 039 /* get copies of a and b we can modify */ | 9872 039 |
9813 040 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{ | 9873 040 /* must be positive for the remainder of the algorithm */ |
9814 041 return res; | 9874 041 u.sign = v.sign = MP_ZPOS; |
9815 042 \} | 9875 042 |
9816 043 | 9876 043 /* B1. Find the common power of two for u and v */ |
9817 044 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{ | 9877 044 u_lsb = mp_cnt_lsb(&u); |
9818 045 goto LBL_U; | 9878 045 v_lsb = mp_cnt_lsb(&v); |
9819 046 \} | 9879 046 k = MIN(u_lsb, v_lsb); |
9820 047 | 9880 047 |
9821 048 /* must be positive for the remainder of the algorithm */ | 9881 048 if (k > 0) \{ |
9822 049 u.sign = v.sign = MP_ZPOS; | 9882 049 /* divide the power of two out */ |
9823 050 | 9883 050 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{ |
9824 051 /* B1. Find the common power of two for u and v */ | 9884 051 goto LBL_V; |
9825 052 u_lsb = mp_cnt_lsb(&u); | 9885 052 \} |
9826 053 v_lsb = mp_cnt_lsb(&v); | 9886 053 |
9827 054 k = MIN(u_lsb, v_lsb); | 9887 054 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{ |
9828 055 | 9888 055 goto LBL_V; |
9829 056 if (k > 0) \{ | 9889 056 \} |
9830 057 /* divide the power of two out */ | 9890 057 \} |
9831 058 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{ | 9891 058 |
9832 059 goto LBL_V; | 9892 059 /* divide any remaining factors of two out */ |
9833 060 \} | 9893 060 if (u_lsb != k) \{ |
9834 061 | 9894 061 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{ |
9835 062 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{ | 9895 062 goto LBL_V; |
9836 063 goto LBL_V; | 9896 063 \} |
9837 064 \} | 9897 064 \} |
9838 065 \} | 9898 065 |
9839 066 | 9899 066 if (v_lsb != k) \{ |
9840 067 /* divide any remaining factors of two out */ | 9900 067 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{ |
9841 068 if (u_lsb != k) \{ | 9901 068 goto LBL_V; |
9842 069 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{ | 9902 069 \} |
9843 070 goto LBL_V; | 9903 070 \} |
9844 071 \} | 9904 071 |
9845 072 \} | 9905 072 while (mp_iszero(&v) == 0) \{ |
9846 073 | 9906 073 /* make sure v is the largest */ |
9847 074 if (v_lsb != k) \{ | 9907 074 if (mp_cmp_mag(&u, &v) == MP_GT) \{ |
9848 075 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{ | 9908 075 /* swap u and v to make sure v is >= u */ |
9849 076 goto LBL_V; | 9909 076 mp_exch(&u, &v); |
9850 077 \} | 9910 077 \} |
9851 078 \} | 9911 078 |
9852 079 | 9912 079 /* subtract smallest from largest */ |
9853 080 while (mp_iszero(&v) == 0) \{ | 9913 080 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{ |
9854 081 /* make sure v is the largest */ | 9914 081 goto LBL_V; |
9855 082 if (mp_cmp_mag(&u, &v) == MP_GT) \{ | 9915 082 \} |
9856 083 /* swap u and v to make sure v is >= u */ | 9916 083 |
9857 084 mp_exch(&u, &v); | 9917 084 /* Divide out all factors of two */ |
9858 085 \} | 9918 085 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{ |
9859 086 | 9919 086 goto LBL_V; |
9860 087 /* subtract smallest from largest */ | 9920 087 \} |
9861 088 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{ | 9921 088 \} |
9862 089 goto LBL_V; | 9922 089 |
9863 090 \} | 9923 090 /* multiply by 2**k which we divided out at the beginning */ |
9864 091 | 9924 091 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{ |
9865 092 /* Divide out all factors of two */ | 9925 092 goto LBL_V; |
9866 093 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{ | 9926 093 \} |
9867 094 goto LBL_V; | 9927 094 c->sign = MP_ZPOS; |
9868 095 \} | 9928 095 res = MP_OKAY; |
9869 096 \} | 9929 096 LBL_V:mp_clear (&u); |
9870 097 | 9930 097 LBL_U:mp_clear (&v); |
9871 098 /* multiply by 2**k which we divided out at the beginning */ | 9931 098 return res; |
9872 099 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{ | 9932 099 \} |
9873 100 goto LBL_V; | 9933 100 #endif |
9874 101 \} | 9934 101 |
9875 102 c->sign = MP_ZPOS; | |
9876 103 res = MP_OKAY; | |
9877 104 LBL_V:mp_clear (&u); | |
9878 105 LBL_U:mp_clear (&v); | |
9879 106 return res; | |
9880 107 \} | |
9881 108 #endif | |
9882 \end{alltt} | 9935 \end{alltt} |
9883 \end{small} | 9936 \end{small} |
9884 | 9937 |
9885 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the | 9938 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the |
9886 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise | 9939 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise |
9887 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three | 9940 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three |
9888 trivial cases of inputs are handled on lines 24 through 37. After those lines the inputs are assumed to be non-zero. | 9941 trivial cases of inputs are handled on lines 23 through 29. After those lines the inputs are assumed to be non-zero. |
9889 | 9942 |
9890 Lines 34 and 40 make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two | 9943 Lines 32 and 36 make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two |
9891 must be divided out of the two inputs. The while loop on line 80 iterates so long as both are even. The local integer $k$ is used to | 9944 must be divided out of the two inputs. The block starting at line 43 removes common factors of two by first counting the number of trailing |
9892 keep track of how many factors of $2$ are pulled out of both values. It is assumed that the number of factors will not exceed the maximum | 9945 zero bits in both. The local integer $k$ is used to keep track of how many factors of $2$ are pulled out of both values. It is assumed that |
9893 value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than entries than are accessible by an ``int'' so this is not | 9946 the number of factors will not exceed the maximum value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than |
9894 a limitation.}. | 9947 entries than are accessible by an ``int'' so this is not a limitation.}. |
9895 | 9948 |
9896 At this point there are no more common factors of two in the two values. The while loops on lines 80 and 80 remove any independent | 9949 At this point there are no more common factors of two in the two values. The divisions by a power of two on lines 61 and 67 remove |
9897 factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop | 9950 any independent factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop |
9898 on line 80 performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in | 9951 on line 72 performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in |
9899 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. | 9952 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. |
9900 | 9953 |
9901 \section{Least Common Multiple} | 9954 \section{Least Common Multiple} |
9902 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the | 9955 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the |
9903 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ | 9956 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ |
9972 051 LBL_T: | 10025 051 LBL_T: |
9973 052 mp_clear_multi (&t1, &t2, NULL); | 10026 052 mp_clear_multi (&t1, &t2, NULL); |
9974 053 return res; | 10027 053 return res; |
9975 054 \} | 10028 054 \} |
9976 055 #endif | 10029 055 #endif |
10030 056 | |
9977 \end{alltt} | 10031 \end{alltt} |
9978 \end{small} | 10032 \end{small} |
9979 | 10033 |
9980 \section{Jacobi Symbol Computation} | 10034 \section{Jacobi Symbol Computation} |
9981 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is | 10035 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is |
10216 096 LBL_P1:mp_clear (&p1); | 10270 096 LBL_P1:mp_clear (&p1); |
10217 097 LBL_A1:mp_clear (&a1); | 10271 097 LBL_A1:mp_clear (&a1); |
10218 098 return res; | 10272 098 return res; |
10219 099 \} | 10273 099 \} |
10220 100 #endif | 10274 100 #endif |
10275 101 | |
10221 \end{alltt} | 10276 \end{alltt} |
10222 \end{small} | 10277 \end{small} |
10223 | 10278 |
10224 As a matter of practicality the variable $a'$ as per the pseudo-code is reprensented by the variable $a1$ since the $'$ symbol is not valid for a C | 10279 As a matter of practicality the variable $a'$ as per the pseudo-code is reprensented by the variable $a1$ since the $'$ symbol is not valid for a C |
10225 variable name character. | 10280 variable name character. |
10364 034 #endif | 10419 034 #endif |
10365 035 | 10420 035 |
10366 036 return MP_VAL; | 10421 036 return MP_VAL; |
10367 037 \} | 10422 037 \} |
10368 038 #endif | 10423 038 #endif |
10424 039 | |
10369 \end{alltt} | 10425 \end{alltt} |
10370 \end{small} | 10426 \end{small} |
10371 | 10427 |
10372 \subsubsection{Odd Moduli} | 10428 \subsubsection{Odd Moduli} |
10373 | 10429 |
10465 041 \} | 10521 041 \} |
10466 042 | 10522 042 |
10467 043 return MP_OKAY; | 10523 043 return MP_OKAY; |
10468 044 \} | 10524 044 \} |
10469 045 #endif | 10525 045 #endif |
10526 046 | |
10470 \end{alltt} | 10527 \end{alltt} |
10471 \end{small} | 10528 \end{small} |
10472 | 10529 |
10473 The algorithm defaults to a return of $0$ in case an error occurs. The values in the prime table are all specified to be in the range of a | 10530 The algorithm defaults to a return of $0$ in case an error occurs. The values in the prime table are all specified to be in the range of a |
10474 mp\_digit. The table \_\_prime\_tab is defined in the following file. | 10531 mp\_digit. The table \_\_prime\_tab is defined in the following file. |
10516 052 0x05F3, 0x05FB, 0x0607, 0x060D, 0x0611, 0x0617, 0x061F, 0x0623, | 10573 052 0x05F3, 0x05FB, 0x0607, 0x060D, 0x0611, 0x0617, 0x061F, 0x0623, |
10517 053 0x062B, 0x062F, 0x063D, 0x0641, 0x0647, 0x0649, 0x064D, 0x0653 | 10574 053 0x062B, 0x062F, 0x063D, 0x0641, 0x0647, 0x0649, 0x064D, 0x0653 |
10518 054 #endif | 10575 054 #endif |
10519 055 \}; | 10576 055 \}; |
10520 056 #endif | 10577 056 #endif |
10578 057 | |
10521 \end{alltt} | 10579 \end{alltt} |
10522 \end{small} | 10580 \end{small} |
10523 | 10581 |
10524 Note that there are two possible tables. When an mp\_digit is 7-bits long only the primes upto $127$ may be included, otherwise the primes | 10582 Note that there are two possible tables. When an mp\_digit is 7-bits long only the primes upto $127$ may be included, otherwise the primes |
10525 upto $1619$ are used. Note that the value of \textbf{PRIME\_SIZE} is a constant dependent on the size of a mp\_digit. | 10583 upto $1619$ are used. Note that the value of \textbf{PRIME\_SIZE} is a constant dependent on the size of a mp\_digit. |
10604 053 err = MP_OKAY; | 10662 053 err = MP_OKAY; |
10605 054 LBL_T:mp_clear (&t); | 10663 054 LBL_T:mp_clear (&t); |
10606 055 return err; | 10664 055 return err; |
10607 056 \} | 10665 056 \} |
10608 057 #endif | 10666 057 #endif |
10667 058 | |
10609 \end{alltt} | 10668 \end{alltt} |
10610 \end{small} | 10669 \end{small} |
10611 | 10670 |
10612 \subsection{The Miller-Rabin Test} | 10671 \subsection{The Miller-Rabin Test} |
10613 The Miller-Rabin (citation) test is another primality test which has tighter error bounds than the Fermat test specifically with sequentially chosen | 10672 The Miller-Rabin (citation) test is another primality test which has tighter error bounds than the Fermat test specifically with sequentially chosen |
10739 094 LBL_R:mp_clear (&r); | 10798 094 LBL_R:mp_clear (&r); |
10740 095 LBL_N1:mp_clear (&n1); | 10799 095 LBL_N1:mp_clear (&n1); |
10741 096 return err; | 10800 096 return err; |
10742 097 \} | 10801 097 \} |
10743 098 #endif | 10802 098 #endif |
10803 099 | |
10744 \end{alltt} | 10804 \end{alltt} |
10745 \end{small} | 10805 \end{small} |
10746 | 10806 |
10747 | 10807 |
10748 | 10808 |