comparison tommath.tex @ 386:97db060d0ef5 libtommath-orig libtommath-0.40

Update to LibTomMath 0.40
author Matt Johnston <matt@ucc.asn.au>
date Thu, 11 Jan 2007 03:11:15 +0000
parents 91fbc376f010
children
comparison
equal deleted inserted replaced
282:91fbc376f010 386:97db060d0ef5
64 \end{tabular} 64 \end{tabular}
65 %\end{small} 65 %\end{small}
66 } 66 }
67 } 67 }
68 \maketitle 68 \maketitle
69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the 69 This text has been placed in the public domain. This text corresponds to the v0.39 release of the
70 LibTomMath project. 70 LibTomMath project.
71 71
72 \begin{alltt} 72 \begin{alltt}
73 Tom St Denis 73 Tom St Denis
74 111 Banning Rd 74 111 Banning Rd
75 Ottawa, Ontario 75 Ottawa, Ontario
76 K2L 1C3 76 K2L 1C3
77 Canada 77 Canada
78 78
79 Phone: 1-613-836-3160 79 Phone: 1-613-836-3160
80 Email: [email protected] 80 Email: [email protected]
81 \end{alltt} 81 \end{alltt}
82 82
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} 83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{}
84 {\em book} macro package and the Perl {\em booker} package. 84 {\em book} macro package and the Perl {\em booker} package.
85 85
266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers 266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers
267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve 267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve
268 any form of useful performance in non-trivial applications. 268 any form of useful performance in non-trivial applications.
269 269
270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer 270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer
271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used 271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.com}} package is used
272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field 272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field
273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text 273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text
274 discusses a very large portion of the inner workings of the library. 274 discusses a very large portion of the inner workings of the library.
275 275
276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed 276 The algorithms that are presented will always include at least one ``pseudo-code'' description followed
812 037 a->sign = MP_ZPOS; 812 037 a->sign = MP_ZPOS;
813 038 813 038
814 039 return MP_OKAY; 814 039 return MP_OKAY;
815 040 \} 815 040 \}
816 041 #endif 816 041 #endif
817 042
817 \end{alltt} 818 \end{alltt}
818 \end{small} 819 \end{small}
819 820
820 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It 821 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It
821 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The 822 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The
900 035 a->alloc = a->used = 0; 901 035 a->alloc = a->used = 0;
901 036 a->sign = MP_ZPOS; 902 036 a->sign = MP_ZPOS;
902 037 \} 903 037 \}
903 038 \} 904 038 \}
904 039 #endif 905 039 #endif
906 040
905 \end{alltt} 907 \end{alltt}
906 \end{small} 908 \end{small}
907 909
908 The algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line 24) 910 The algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line 24)
909 checks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be 911 checks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be
1006 048 \} 1008 048 \}
1007 049 \} 1009 049 \}
1008 050 return MP_OKAY; 1010 050 return MP_OKAY;
1009 051 \} 1011 051 \}
1010 052 #endif 1012 052 #endif
1013 053
1011 \end{alltt} 1014 \end{alltt}
1012 \end{small} 1015 \end{small}
1013 1016
1014 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 24) checks 1017 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 24) checks
1015 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} 1018 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc}
1094 039 \} 1097 039 \}
1095 040 1098 040
1096 041 return MP_OKAY; 1099 041 return MP_OKAY;
1097 042 \} 1100 042 \}
1098 043 #endif 1101 043 #endif
1102 044
1099 \end{alltt} 1103 \end{alltt}
1100 \end{small} 1104 \end{small}
1101 1105
1102 The number of digits $b$ requested is padded (line 23) by first augmenting it to the next multiple of 1106 The number of digits $b$ requested is padded (line 23) by first augmenting it to the next multiple of
1103 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the 1107 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the
1181 050 va_end(args); 1185 050 va_end(args);
1182 051 return res; /* Assumed ok, if error flagged above. */ 1186 051 return res; /* Assumed ok, if error flagged above. */
1183 052 \} 1187 052 \}
1184 053 1188 053
1185 054 #endif 1189 054 #endif
1190 055
1186 \end{alltt} 1191 \end{alltt}
1187 \end{small} 1192 \end{small}
1188 1193
1189 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int 1194 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int
1190 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the 1195 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the
1266 035 if (a->used == 0) \{ 1271 035 if (a->used == 0) \{
1267 036 a->sign = MP_ZPOS; 1272 036 a->sign = MP_ZPOS;
1268 037 \} 1273 037 \}
1269 038 \} 1274 038 \}
1270 039 #endif 1275 039 #endif
1276 040
1271 \end{alltt} 1277 \end{alltt}
1272 \end{small} 1278 \end{small}
1273 1279
1274 Note on line 27 how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming 1280 Note on line 27 how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming
1275 language the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is 1281 language the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is
1403 059 b->used = a->used; 1409 059 b->used = a->used;
1404 060 b->sign = a->sign; 1410 060 b->sign = a->sign;
1405 061 return MP_OKAY; 1411 061 return MP_OKAY;
1406 062 \} 1412 062 \}
1407 063 #endif 1413 063 #endif
1414 064
1408 \end{alltt} 1415 \end{alltt}
1409 \end{small} 1416 \end{small}
1410 1417
1411 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output 1418 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output
1412 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without 1419 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without
1517 023 return res; 1524 023 return res;
1518 024 \} 1525 024 \}
1519 025 return mp_copy (b, a); 1526 025 return mp_copy (b, a);
1520 026 \} 1527 026 \}
1521 027 #endif 1528 027 #endif
1529 028
1522 \end{alltt} 1530 \end{alltt}
1523 \end{small} 1531 \end{small}
1524 1532
1525 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that 1533 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that
1526 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call 1534 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call
1568 027 for (n = 0; n < a->alloc; n++) \{ 1576 027 for (n = 0; n < a->alloc; n++) \{
1569 028 *tmp++ = 0; 1577 028 *tmp++ = 0;
1570 029 \} 1578 029 \}
1571 030 \} 1579 030 \}
1572 031 #endif 1580 031 #endif
1581 032
1573 \end{alltt} 1582 \end{alltt}
1574 \end{small} 1583 \end{small}
1575 1584
1576 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the 1585 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the
1577 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. 1586 \textbf{sign} variable is set to \textbf{MP\_ZPOS}.
1629 034 b->sign = MP_ZPOS; 1638 034 b->sign = MP_ZPOS;
1630 035 1639 035
1631 036 return MP_OKAY; 1640 036 return MP_OKAY;
1632 037 \} 1641 037 \}
1633 038 #endif 1642 038 #endif
1643 039
1634 \end{alltt} 1644 \end{alltt}
1635 \end{small} 1645 \end{small}
1636 1646
1637 This fairly trivial algorithm first eliminates non--required duplications (line 27) and then sets the 1647 This fairly trivial algorithm first eliminates non--required duplications (line 27) and then sets the
1638 \textbf{sign} flag to \textbf{MP\_ZPOS}. 1648 \textbf{sign} flag to \textbf{MP\_ZPOS}.
1690 031 \} 1700 031 \}
1691 032 1701 032
1692 033 return MP_OKAY; 1702 033 return MP_OKAY;
1693 034 \} 1703 034 \}
1694 035 #endif 1704 035 #endif
1705 036
1695 \end{alltt} 1706 \end{alltt}
1696 \end{small} 1707 \end{small}
1697 1708
1698 Like mp\_abs() this function avoids non--required duplications (line 21) and then sets the sign. We 1709 Like mp\_abs() this function avoids non--required duplications (line 21) and then sets the sign. We
1699 have to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero 1710 have to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero
1737 020 mp_zero (a); 1748 020 mp_zero (a);
1738 021 a->dp[0] = b & MP_MASK; 1749 021 a->dp[0] = b & MP_MASK;
1739 022 a->used = (a->dp[0] != 0) ? 1 : 0; 1750 022 a->used = (a->dp[0] != 0) ? 1 : 0;
1740 023 \} 1751 023 \}
1741 024 #endif 1752 024 #endif
1753 025
1742 \end{alltt} 1754 \end{alltt}
1743 \end{small} 1755 \end{small}
1744 1756
1745 First we zero (line 20) the mp\_int to make sure that the other members are initialized for a 1757 First we zero (line 20) the mp\_int to make sure that the other members are initialized for a
1746 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count 1758 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count
1817 039 \} 1829 039 \}
1818 040 mp_clamp (a); 1830 040 mp_clamp (a);
1819 041 return MP_OKAY; 1831 041 return MP_OKAY;
1820 042 \} 1832 042 \}
1821 043 #endif 1833 043 #endif
1834 044
1822 \end{alltt} 1835 \end{alltt}
1823 \end{small} 1836 \end{small}
1824 1837
1825 This function sets four bits of the number at a time to handle all practical \textbf{DIGIT\_BIT} sizes. The weird 1838 This function sets four bits of the number at a time to handle all practical \textbf{DIGIT\_BIT} sizes. The weird
1826 addition on line 38 ensures that the newly added in bits are added to the number of digits. While it may not 1839 addition on line 38 ensures that the newly added in bits are added to the number of digits. While it may not
1919 046 \} 1932 046 \}
1920 047 \} 1933 047 \}
1921 048 return MP_EQ; 1934 048 return MP_EQ;
1922 049 \} 1935 049 \}
1923 050 #endif 1936 050 #endif
1937 051
1924 \end{alltt} 1938 \end{alltt}
1925 \end{small} 1939 \end{small}
1926 1940
1927 The two if statements (lines 24 and 28) compare the number of digits in the two inputs. These two are 1941 The two if statements (lines 24 and 28) compare the number of digits in the two inputs. These two are
1928 performed before all of the digits are compared since it is a very cheap test to perform and can potentially save 1942 performed before all of the digits are compared since it is a very cheap test to perform and can potentially save
1985 034 \} else \{ 1999 034 \} else \{
1986 035 return mp_cmp_mag(a, b); 2000 035 return mp_cmp_mag(a, b);
1987 036 \} 2001 036 \}
1988 037 \} 2002 037 \}
1989 038 #endif 2003 038 #endif
2004 039
1990 \end{alltt} 2005 \end{alltt}
1991 \end{small} 2006 \end{small}
1992 2007
1993 The two if statements (lines 22 and 23) perform the initial sign comparison. If the signs are not the equal then which ever 2008 The two if statements (lines 22 and 23) perform the initial sign comparison. If the signs are not the equal then which ever
1994 has the positive sign is larger. The inputs are compared (line 31) based on magnitudes. If the signs were both 2009 has the positive sign is larger. The inputs are compared (line 31) based on magnitudes. If the signs were both
2203 100 2218 100
2204 101 mp_clamp (c); 2219 101 mp_clamp (c);
2205 102 return MP_OKAY; 2220 102 return MP_OKAY;
2206 103 \} 2221 103 \}
2207 104 #endif 2222 104 #endif
2223 105
2208 \end{alltt} 2224 \end{alltt}
2209 \end{small} 2225 \end{small}
2210 2226
2211 We first sort (lines 27 to 35) the inputs based on magnitude and determine the $min$ and $max$ variables. 2227 We first sort (lines 27 to 35) the inputs based on magnitude and determine the $min$ and $max$ variables.
2212 Note that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we 2228 Note that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we
2374 080 mp_clamp (c); 2390 080 mp_clamp (c);
2375 081 return MP_OKAY; 2391 081 return MP_OKAY;
2376 082 \} 2392 082 \}
2377 083 2393 083
2378 084 #endif 2394 084 #endif
2395 085
2379 \end{alltt} 2396 \end{alltt}
2380 \end{small} 2397 \end{small}
2381 2398
2382 Like low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded 2399 Like low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded
2383 (lines 24 and 25). In reality the $min$ and $max$ variables are only aliases and are only 2400 (lines 24 and 25). In reality the $min$ and $max$ variables are only aliases and are only
2509 044 \} 2526 044 \}
2510 045 return res; 2527 045 return res;
2511 046 \} 2528 046 \}
2512 047 2529 047
2513 048 #endif 2530 048 #endif
2531 049
2514 \end{alltt} 2532 \end{alltt}
2515 \end{small} 2533 \end{small}
2516 2534
2517 The source code follows the algorithm fairly closely. The most notable new source code addition is the usage of the $res$ integer variable which 2535 The source code follows the algorithm fairly closely. The most notable new source code addition is the usage of the $res$ integer variable which
2518 is used to pass result of the unsigned operations forward. Unlike in the algorithm, the variable $res$ is merely returned as is without 2536 is used to pass result of the unsigned operations forward. Unlike in the algorithm, the variable $res$ is merely returned as is without
2621 050 \} 2639 050 \}
2622 051 return res; 2640 051 return res;
2623 052 \} 2641 052 \}
2624 053 2642 053
2625 054 #endif 2643 054 #endif
2644 055
2626 \end{alltt} 2645 \end{alltt}
2627 \end{small} 2646 \end{small}
2628 2647
2629 Much like the implementation of algorithm mp\_add the variable $res$ is used to catch the return code of the unsigned addition or subtraction operations 2648 Much like the implementation of algorithm mp\_add the variable $res$ is used to catch the return code of the unsigned addition or subtraction operations
2630 and forward it to the end of the function. On line 38 the ``not equal to'' \textbf{MP\_LT} expression is used to emulate a 2649 and forward it to the end of the function. On line 38 the ``not equal to'' \textbf{MP\_LT} expression is used to emulate a
2755 073 \} 2774 073 \}
2756 074 b->sign = a->sign; 2775 074 b->sign = a->sign;
2757 075 return MP_OKAY; 2776 075 return MP_OKAY;
2758 076 \} 2777 076 \}
2759 077 #endif 2778 077 #endif
2779 078
2760 \end{alltt} 2780 \end{alltt}
2761 \end{small} 2781 \end{small}
2762 2782
2763 This implementation is essentially an optimized implementation of s\_mp\_add for the case of doubling an input. The only noteworthy difference 2783 This implementation is essentially an optimized implementation of s\_mp\_add for the case of doubling an input. The only noteworthy difference
2764 is the use of the logical shift operator on line 51 to perform a single precision doubling. 2784 is the use of the logical shift operator on line 51 to perform a single precision doubling.
2855 059 b->sign = a->sign; 2875 059 b->sign = a->sign;
2856 060 mp_clamp (b); 2876 060 mp_clamp (b);
2857 061 return MP_OKAY; 2877 061 return MP_OKAY;
2858 062 \} 2878 062 \}
2859 063 #endif 2879 063 #endif
2880 064
2860 \end{alltt} 2881 \end{alltt}
2861 \end{small} 2882 \end{small}
2862 2883
2863 \section{Polynomial Basis Operations} 2884 \section{Polynomial Basis Operations}
2864 Recall from section 4.3 that any integer can be represented as a polynomial in $x$ as $y = f(\beta)$. Such a representation is also known as 2885 Recall from section 4.3 that any integer can be represented as a polynomial in $x$ as $y = f(\beta)$. Such a representation is also known as
2975 058 \} 2996 058 \}
2976 059 \} 2997 059 \}
2977 060 return MP_OKAY; 2998 060 return MP_OKAY;
2978 061 \} 2999 061 \}
2979 062 #endif 3000 062 #endif
3001 063
2980 \end{alltt} 3002 \end{alltt}
2981 \end{small} 3003 \end{small}
2982 3004
2983 The if statement (line 23) ensures that the $b$ variable is greater than zero since we do not interpret negative 3005 The if statement (line 23) ensures that the $b$ variable is greater than zero since we do not interpret negative
2984 shift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates 3006 shift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates
3086 063 3108 063
3087 064 /* remove excess digits */ 3109 064 /* remove excess digits */
3088 065 a->used -= b; 3110 065 a->used -= b;
3089 066 \} 3111 066 \}
3090 067 #endif 3112 067 #endif
3113 068
3091 \end{alltt} 3114 \end{alltt}
3092 \end{small} 3115 \end{small}
3093 3116
3094 The only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we 3117 The only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we
3095 form a sliding window except we copy in the other direction. After the window (line 59) we then zero 3118 form a sliding window except we copy in the other direction. After the window (line 59) we then zero
3144 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$ 3167 $\beta$. For example, if $b = 37$ and $\beta = 2^{28}$ then this step will multiply by $x$ leaving a multiplication by $2^{37 - 28} = 2^{9}$
3145 left. 3168 left.
3146 3169
3147 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts 3170 After the digits have been shifted appropriately at most $lg(\beta) - 1$ shifts are left to perform. Step 5 calculates the number of remaining shifts
3148 required. If it is non-zero a modified shift loop is used to calculate the remaining product. 3171 required. If it is non-zero a modified shift loop is used to calculate the remaining product.
3149 Essentially the loop is a generic version of algorith mp\_mul2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$ 3172 Essentially the loop is a generic version of algorithm mp\_mul\_2 designed to handle any shift count in the range $1 \le x < lg(\beta)$. The $mask$
3150 variable is used to extract the upper $d$ bits to form the carry for the next iteration. 3173 variable is used to extract the upper $d$ bits to form the carry for the next iteration.
3151 3174
3152 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to 3175 This algorithm is loosely measured as a $O(2n)$ algorithm which means that if the input is $n$-digits that it takes $2n$ ``time'' to
3153 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow. 3176 complete. It is possible to optimize this algorithm down to a $O(n)$ algorithm at a cost of making the algorithm slightly harder to follow.
3154 3177
3219 076 \} 3242 076 \}
3220 077 mp_clamp (c); 3243 077 mp_clamp (c);
3221 078 return MP_OKAY; 3244 078 return MP_OKAY;
3222 079 \} 3245 079 \}
3223 080 #endif 3246 080 #endif
3247 081
3224 \end{alltt} 3248 \end{alltt}
3225 \end{small} 3249 \end{small}
3226 3250
3227 The shifting is performed in--place which means the first step (line 24) is to copy the input to the 3251 The shifting is performed in--place which means the first step (line 24) is to copy the input to the
3228 destination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then 3252 destination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then
3355 088 \} 3379 088 \}
3356 089 mp_clear (&t); 3380 089 mp_clear (&t);
3357 090 return MP_OKAY; 3381 090 return MP_OKAY;
3358 091 \} 3382 091 \}
3359 092 #endif 3383 092 #endif
3384 093
3360 \end{alltt} 3385 \end{alltt}
3361 \end{small} 3386 \end{small}
3362 3387
3363 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally 3388 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally
3364 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the 3389 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the
3446 t) 1)); 3471 t) 1));
3447 047 mp_clamp (c); 3472 047 mp_clamp (c);
3448 048 return MP_OKAY; 3473 048 return MP_OKAY;
3449 049 \} 3474 049 \}
3450 050 #endif 3475 050 #endif
3476 051
3451 \end{alltt} 3477 \end{alltt}
3452 \end{small} 3478 \end{small}
3453 3479
3454 We first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger 3480 We first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger
3455 than the input we just mp\_copy() the input and return right away. After this point we know we must actually 3481 than the input we just mp\_copy() the input and return right away. After this point we know we must actually
3685 081 3711 081
3686 082 mp_clear (&t); 3712 082 mp_clear (&t);
3687 083 return MP_OKAY; 3713 083 return MP_OKAY;
3688 084 \} 3714 084 \}
3689 085 #endif 3715 085 #endif
3716 086
3690 \end{alltt} 3717 \end{alltt}
3691 \end{small} 3718 \end{small}
3692 3719
3693 First we determine (line 30) if the Comba method can be used first since it's faster. The conditions for 3720 First we determine (line 30) if the Comba method can be used first since it's faster. The conditions for
3694 sing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than 3721 sing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than
3835 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ 3862 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\
3836 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ 3863 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\
3837 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ 3864 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\
3838 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ 3865 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\
3839 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ 3866 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\
3840 6. $W_{pa} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\
3841 \\ 3867 \\
3842 7. $oldused \leftarrow c.used$ \\ 3868 6. $oldused \leftarrow c.used$ \\
3843 8. $c.used \leftarrow digs$ \\ 3869 7. $c.used \leftarrow digs$ \\
3844 9. for $ix$ from $0$ to $pa$ do \\ 3870 8. for $ix$ from $0$ to $pa$ do \\
3845 \hspace{3mm}9.1 $c_{ix} \leftarrow W_{ix}$ \\ 3871 \hspace{3mm}8.1 $c_{ix} \leftarrow W_{ix}$ \\
3846 10. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ 3872 9. for $ix$ from $pa + 1$ to $oldused - 1$ do \\
3847 \hspace{3mm}10.1 $c_{ix} \leftarrow 0$ \\ 3873 \hspace{3mm}9.1 $c_{ix} \leftarrow 0$ \\
3848 \\ 3874 \\
3849 11. Clamp $c$. \\ 3875 10. Clamp $c$. \\
3850 12. Return MP\_OKAY. \\ 3876 11. Return MP\_OKAY. \\
3851 \hline 3877 \hline
3852 \end{tabular} 3878 \end{tabular}
3853 \end{center} 3879 \end{center}
3854 \end{small} 3880 \end{small}
3855 \caption{Algorithm fast\_s\_mp\_mul\_digs} 3881 \caption{Algorithm fast\_s\_mp\_mul\_digs}
3940 067 iy = MIN(a->used-tx, ty+1); 3966 067 iy = MIN(a->used-tx, ty+1);
3941 068 3967 068
3942 069 /* execute loop */ 3968 069 /* execute loop */
3943 070 for (iz = 0; iz < iy; ++iz) \{ 3969 070 for (iz = 0; iz < iy; ++iz) \{
3944 071 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); 3970 071 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--);
3945 072 \} 3971 072
3946 073 3972 073 \}
3947 074 /* store term */ 3973 074
3948 075 W[ix] = ((mp_digit)_W) & MP_MASK; 3974 075 /* store term */
3949 076 3975 076 W[ix] = ((mp_digit)_W) & MP_MASK;
3950 077 /* make next carry */ 3976 077
3951 078 _W = _W >> ((mp_word)DIGIT_BIT); 3977 078 /* make next carry */
3952 079 \} 3978 079 _W = _W >> ((mp_word)DIGIT_BIT);
3953 080 3979 080 \}
3954 081 /* store final carry */ 3980 081
3955 082 W[ix] = (mp_digit)(_W & MP_MASK); 3981 082 /* setup dest */
3956 083 3982 083 olduse = c->used;
3957 084 /* setup dest */ 3983 084 c->used = pa;
3958 085 olduse = c->used; 3984 085
3959 086 c->used = pa; 3985 086 \{
3960 087 3986 087 register mp_digit *tmpc;
3961 088 \{ 3987 088 tmpc = c->dp;
3962 089 register mp_digit *tmpc; 3988 089 for (ix = 0; ix < pa+1; ix++) \{
3963 090 tmpc = c->dp; 3989 090 /* now extract the previous digit [below the carry] */
3964 091 for (ix = 0; ix < pa+1; ix++) \{ 3990 091 *tmpc++ = W[ix];
3965 092 /* now extract the previous digit [below the carry] */ 3991 092 \}
3966 093 *tmpc++ = W[ix]; 3992 093
3967 094 \} 3993 094 /* clear unused digits [that existed in the old copy of c] */
3968 095 3994 095 for (; ix < olduse; ix++) \{
3969 096 /* clear unused digits [that existed in the old copy of c] */ 3995 096 *tmpc++ = 0;
3970 097 for (; ix < olduse; ix++) \{ 3996 097 \}
3971 098 *tmpc++ = 0; 3997 098 \}
3972 099 \} 3998 099 mp_clamp (c);
3973 100 \} 3999 100 return MP_OKAY;
3974 101 mp_clamp (c); 4000 101 \}
3975 102 return MP_OKAY; 4001 102 #endif
3976 103 \} 4002 103
3977 104 #endif
3978 \end{alltt} 4003 \end{alltt}
3979 \end{small} 4004 \end{small}
3980 4005
3981 As per the pseudo--code we first calculate $pa$ (line 47) as the number of digits to output. Next we begin the outer loop 4006 As per the pseudo--code we first calculate $pa$ (line 47) as the number of digits to output. Next we begin the outer loop
3982 to produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines 61, 62) to point 4007 to produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines 61, 62) to point
3983 inside the two multiplicands quickly. 4008 inside the two multiplicands quickly.
3984 4009
3985 The inner loop (lines 70 to 72) of this implementation is where the tradeoff come into play. Originally this comba 4010 The inner loop (lines 70 to 73) of this implementation is where the tradeoff come into play. Originally this comba
3986 implementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix 4011 implementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix
3987 the carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write 4012 the carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write
3988 one mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth 4013 one mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth
3989 is very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often 4014 is very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often
3990 slower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the 4015 slower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the
3991 compiler has aliased $\_ \hat W$ to a CPU register. 4016 compiler has aliased $\_ \hat W$ to a CPU register.
3992 4017
3993 After the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines 75, 78) to forward it as 4018 After the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines 76, 79) to forward it as
3994 a carry for the next pass. After the outer loop we use the final carry (line 82) as the last digit of the product. 4019 a carry for the next pass. After the outer loop we use the final carry (line 76) as the last digit of the product.
3995 4020
3996 \subsection{Polynomial Basis Multiplication} 4021 \subsection{Polynomial Basis Multiplication}
3997 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms 4022 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms
3998 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and 4023 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and
3999 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. 4024 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree.
4093 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for 4118 Karatsuba \cite{KARA} multiplication when originally proposed in 1962 was among the first set of algorithms to break the $O(n^2)$ barrier for
4094 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with 4119 general purpose multiplication. Given two polynomial basis representations $f(x) = ax + b$ and $g(x) = cx + d$, Karatsuba proved with
4095 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent. 4120 light algebra \cite{KARAP} that the following polynomial is equivalent to multiplication of the two integers the polynomials represent.
4096 4121
4097 \begin{equation} 4122 \begin{equation}
4098 f(x) \cdot g(x) = acx^2 + ((a - b)(c - d) - (ac + bd))x + bd 4123 f(x) \cdot g(x) = acx^2 + ((a + b)(c + d) - (ac + bd))x + bd
4099 \end{equation} 4124 \end{equation}
4100 4125
4101 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying 4126 Using the observation that $ac$ and $bd$ could be re-used only three half sized multiplications would be required to produce the product. Applying
4102 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns 4127 this algorithm recursively, the work factor becomes $O(n^{lg(3)})$ which is substantially better than the work factor $O(n^2)$ of the Comba technique. It turns
4103 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points 4128 out what Karatsuba did not know or at least did not publish was that this is simply polynomial basis multiplication with the points
4104 $\zeta_0$, $\zeta_{\infty}$ and $-\zeta_{-1}$. Consider the resultant system of equations. 4129 $\zeta_0$, $\zeta_{\infty}$ and $\zeta_{1}$. Consider the resultant system of equations.
4105 4130
4106 \begin{center} 4131 \begin{center}
4107 \begin{tabular}{rcrcrcrc} 4132 \begin{tabular}{rcrcrcrc}
4108 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\ 4133 $\zeta_{0}$ & $=$ & & & & & $w_0$ \\
4109 $-\zeta_{-1}$ & $=$ & $-w_2$ & $+$ & $w_1$ & $-$ & $w_0$ \\ 4134 $\zeta_{1}$ & $=$ & $w_2$ & $+$ & $w_1$ & $+$ & $w_0$ \\
4110 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\ 4135 $\zeta_{\infty}$ & $=$ & $w_2$ & & & & \\
4111 \end{tabular} 4136 \end{tabular}
4112 \end{center} 4137 \end{center}
4113 4138
4114 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity 4139 By adding the first and last equation to the equation in the middle the term $w_1$ can be isolated and all three coefficients solved for. The simplicity
4115 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.} 4140 of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.}
4116 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman. It is worth noting that the point 4141 making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman.
4117 $\zeta_1$ could be substituted for $-\zeta_{-1}$. In this case the first and third row are subtracted instead of added to the second row.
4118 4142
4119 \newpage\begin{figure}[!here] 4143 \newpage\begin{figure}[!here]
4120 \begin{small} 4144 \begin{small}
4121 \begin{center} 4145 \begin{center}
4122 \begin{tabular}{l} 4146 \begin{tabular}{l}
4135 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\ 4159 7. $y1 \leftarrow \lfloor b / \beta^B \rfloor$ \\
4136 \\ 4160 \\
4137 Calculate the three products. \\ 4161 Calculate the three products. \\
4138 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\ 4162 8. $x0y0 \leftarrow x0 \cdot y0$ (\textit{mp\_mul}) \\
4139 9. $x1y1 \leftarrow x1 \cdot y1$ \\ 4163 9. $x1y1 \leftarrow x1 \cdot y1$ \\
4140 10. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ 4164 10. $t1 \leftarrow x1 + x0$ (\textit{mp\_add}) \\
4141 11. $x0 \leftarrow y1 - y0$ \\ 4165 11. $x0 \leftarrow y1 + y0$ \\
4142 12. $t1 \leftarrow t1 \cdot x0$ \\ 4166 12. $t1 \leftarrow t1 \cdot x0$ \\
4143 \\ 4167 \\
4144 Calculate the middle term. \\ 4168 Calculate the middle term. \\
4145 13. $x0 \leftarrow x0y0 + x1y1$ \\ 4169 13. $x0 \leftarrow x0y0 + x1y1$ \\
4146 14. $t1 \leftarrow x0 - t1$ \\ 4170 14. $t1 \leftarrow t1 - x0$ (\textit{s\_mp\_sub}) \\
4147 \\ 4171 \\
4148 Calculate the final product. \\ 4172 Calculate the final product. \\
4149 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\ 4173 15. $t1 \leftarrow t1 \cdot \beta^B$ (\textit{mp\_lshd}) \\
4150 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\ 4174 16. $x1y1 \leftarrow x1y1 \cdot \beta^{2B}$ \\
4151 17. $t1 \leftarrow x0y0 + t1$ \\ 4175 17. $t1 \leftarrow x0y0 + t1$ \\
4168 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the 4192 be used for both of the inputs meaning that it must be smaller than the smallest input. Step 3 chooses the radix point $B$ as half of the
4169 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5 4193 smallest input \textbf{used} count. After the radix point is chosen the inputs are split into lower and upper halves. Step 4 and 5
4170 compute the lower halves. Step 6 and 7 computer the upper halves. 4194 compute the lower halves. Step 6 and 7 computer the upper halves.
4171 4195
4172 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products 4196 After the halves have been computed the three intermediate half-size products must be computed. Step 8 and 9 compute the trivial products
4173 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 - x0$ has been computed. By using $x0$ instead 4197 $x0 \cdot y0$ and $x1 \cdot y1$. The mp\_int $x0$ is used as a temporary variable after $x1 + x0$ has been computed. By using $x0$ instead
4174 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation. 4198 of an additional temporary variable, the algorithm can avoid an addition memory allocation operation.
4175 4199
4176 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations. 4200 The remaining steps 13 through 18 compute the Karatsuba polynomial through a variety of digit shifting and addition operations.
4177 4201
4178 \vspace{+3mm}\begin{small} 4202 \vspace{+3mm}\begin{small}
4189 023 * 4213 023 *
4190 024 * a = a1 * B**n + a0 4214 024 * a = a1 * B**n + a0
4191 025 * b = b1 * B**n + b0 4215 025 * b = b1 * B**n + b0
4192 026 * 4216 026 *
4193 027 * Then, a * b => 4217 027 * Then, a * b =>
4194 028 a1b1 * B**2n + ((a1 - a0)(b1 - b0) + a0b0 + a1b1) * B + a0b0 4218 028 a1b1 * B**2n + ((a1 + a0)(b1 + b0) - (a0b0 + a1b1)) * B + a0b0
4195 029 * 4219 029 *
4196 030 * Note that a1b1 and a0b0 are used twice and only need to be 4220 030 * Note that a1b1 and a0b0 are used twice and only need to be
4197 031 * computed once. So in total three half size (half # of 4221 031 * computed once. So in total three half size (half # of
4198 032 * digit) multiplications are performed, a0b0, a1b1 and 4222 032 * digit) multiplications are performed, a0b0, a1b1 and
4199 033 * (a1-b1)(a0-b0) 4223 033 * (a1+b1)(a0+b0)
4200 034 * 4224 034 *
4201 035 * Note that a multiplication of half the digits requires 4225 035 * Note that a multiplication of half the digits requires
4202 036 * 1/4th the number of single precision multiplications so in 4226 036 * 1/4th the number of single precision multiplications so in
4203 037 * total after one call 25% of the single precision multiplications 4227 037 * total after one call 25% of the single precision multiplications
4204 038 * are saved. Note also that the call to mp_mul can end up back 4228 038 * are saved. Note also that the call to mp_mul can end up back
4285 119 if (mp_mul (&x0, &y0, &x0y0) != MP_OKAY) 4309 119 if (mp_mul (&x0, &y0, &x0y0) != MP_OKAY)
4286 120 goto X1Y1; /* x0y0 = x0*y0 */ 4310 120 goto X1Y1; /* x0y0 = x0*y0 */
4287 121 if (mp_mul (&x1, &y1, &x1y1) != MP_OKAY) 4311 121 if (mp_mul (&x1, &y1, &x1y1) != MP_OKAY)
4288 122 goto X1Y1; /* x1y1 = x1*y1 */ 4312 122 goto X1Y1; /* x1y1 = x1*y1 */
4289 123 4313 123
4290 124 /* now calc x1-x0 and y1-y0 */ 4314 124 /* now calc x1+x0 and y1+y0 */
4291 125 if (mp_sub (&x1, &x0, &t1) != MP_OKAY) 4315 125 if (s_mp_add (&x1, &x0, &t1) != MP_OKAY)
4292 126 goto X1Y1; /* t1 = x1 - x0 */ 4316 126 goto X1Y1; /* t1 = x1 - x0 */
4293 127 if (mp_sub (&y1, &y0, &x0) != MP_OKAY) 4317 127 if (s_mp_add (&y1, &y0, &x0) != MP_OKAY)
4294 128 goto X1Y1; /* t2 = y1 - y0 */ 4318 128 goto X1Y1; /* t2 = y1 - y0 */
4295 129 if (mp_mul (&t1, &x0, &t1) != MP_OKAY) 4319 129 if (mp_mul (&t1, &x0, &t1) != MP_OKAY)
4296 130 goto X1Y1; /* t1 = (x1 - x0) * (y1 - y0) */ 4320 130 goto X1Y1; /* t1 = (x1 + x0) * (y1 + y0) */
4297 131 4321 131
4298 132 /* add x0y0 */ 4322 132 /* add x0y0 */
4299 133 if (mp_add (&x0y0, &x1y1, &x0) != MP_OKAY) 4323 133 if (mp_add (&x0y0, &x1y1, &x0) != MP_OKAY)
4300 134 goto X1Y1; /* t2 = x0y0 + x1y1 */ 4324 134 goto X1Y1; /* t2 = x0y0 + x1y1 */
4301 135 if (mp_sub (&x0, &t1, &t1) != MP_OKAY) 4325 135 if (s_mp_sub (&t1, &x0, &t1) != MP_OKAY)
4302 136 goto X1Y1; /* t1 = x0y0 + x1y1 - (x1-x0)*(y1-y0) */ 4326 136 goto X1Y1; /* t1 = (x1+x0)*(y1+y0) - (x1y1 + x0y0) */
4303 137 4327 137
4304 138 /* shift by B */ 4328 138 /* shift by B */
4305 139 if (mp_lshd (&t1, B) != MP_OKAY) 4329 139 if (mp_lshd (&t1, B) != MP_OKAY)
4306 140 goto X1Y1; /* t1 = (x0y0 + x1y1 - (x1-x0)*(y1-y0))<<B */ 4330 140 goto X1Y1; /* t1 = (x0y0 + x1y1 - (x1-x0)*(y1-y0))<<B */
4307 141 if (mp_lshd (&x1y1, B * 2) != MP_OKAY) 4331 141 if (mp_lshd (&x1y1, B * 2) != MP_OKAY)
4324 158 X0:mp_clear (&x0); 4348 158 X0:mp_clear (&x0);
4325 159 ERR: 4349 159 ERR:
4326 160 return err; 4350 160 return err;
4327 161 \} 4351 161 \}
4328 162 #endif 4352 162 #endif
4353 163
4329 \end{alltt} 4354 \end{alltt}
4330 \end{small} 4355 \end{small}
4331 4356
4332 The new coding element in this routine, not seen in previous routines, is the usage of goto statements. The conventional 4357 The new coding element in this routine, not seen in previous routines, is the usage of goto statements. The conventional
4333 wisdom is that goto statements should be avoided. This is generally true, however when every single function call can fail, it makes sense 4358 wisdom is that goto statements should be avoided. This is generally true, however when every single function call can fail, it makes sense
4727 275 &b2, &tmp1, &tmp2, NULL); 4752 275 &b2, &tmp1, &tmp2, NULL);
4728 276 return res; 4753 276 return res;
4729 277 \} 4754 277 \}
4730 278 4755 278
4731 279 #endif 4756 279 #endif
4757 280
4732 \end{alltt} 4758 \end{alltt}
4733 \end{small} 4759 \end{small}
4734 4760
4735 The first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very 4761 The first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very
4736 large numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with 4762 large numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with
4835 057 \} 4861 057 \}
4836 058 c->sign = (c->used > 0) ? neg : MP_ZPOS; 4862 058 c->sign = (c->used > 0) ? neg : MP_ZPOS;
4837 059 return res; 4863 059 return res;
4838 060 \} 4864 060 \}
4839 061 #endif 4865 061 #endif
4866 062
4840 \end{alltt} 4867 \end{alltt}
4841 \end{small} 4868 \end{small}
4842 4869
4843 The implementation is rather simplistic and is not particularly noteworthy. Line 23 computes the sign of the result using the ``?'' 4870 The implementation is rather simplistic and is not particularly noteworthy. Line 23 computes the sign of the result using the ``?''
4844 operator from the C programming language. Line 47 computes $\delta$ using the fact that $1 << k$ is equal to $2^k$. 4871 operator from the C programming language. Line 47 computes $\delta$ using the fact that $1 << k$ is equal to $2^k$.
5004 075 mp_exch (&t, b); 5031 075 mp_exch (&t, b);
5005 076 mp_clear (&t); 5032 076 mp_clear (&t);
5006 077 return MP_OKAY; 5033 077 return MP_OKAY;
5007 078 \} 5034 078 \}
5008 079 #endif 5035 079 #endif
5036 080
5009 \end{alltt} 5037 \end{alltt}
5010 \end{small} 5038 \end{small}
5011 5039
5012 Inside the outer loop (line 33) the square term is calculated on line 36. The carry (line 43) has been 5040 Inside the outer loop (line 33) the square term is calculated on line 36. The carry (line 43) has been
5013 extracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized 5041 extracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized
5186 105 \} 5214 105 \}
5187 106 mp_clamp (b); 5215 106 mp_clamp (b);
5188 107 return MP_OKAY; 5216 107 return MP_OKAY;
5189 108 \} 5217 108 \}
5190 109 #endif 5218 109 #endif
5219 110
5191 \end{alltt} 5220 \end{alltt}
5192 \end{small} 5221 \end{small}
5193 5222
5194 This implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for 5223 This implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for
5195 the special case of squaring. 5224 the special case of squaring.
5203 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square. 5232 Let $f(x) = ax + b$ represent the polynomial basis representation of a number to square.
5204 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a 5233 Let $h(x) = \left ( f(x) \right )^2$ represent the square of the polynomial. The Karatsuba equation can be modified to square a
5205 number with the following equation. 5234 number with the following equation.
5206 5235
5207 \begin{equation} 5236 \begin{equation}
5208 h(x) = a^2x^2 + \left (a^2 + b^2 - (a - b)^2 \right )x + b^2 5237 h(x) = a^2x^2 + \left ((a + b)^2 - (a^2 + b^2) \right )x + b^2
5209 \end{equation} 5238 \end{equation}
5210 5239
5211 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a - b)^2$. As in 5240 Upon closer inspection this equation only requires the calculation of three half-sized squares: $a^2$, $b^2$ and $(a + b)^2$. As in
5212 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of 5241 Karatsuba multiplication, this algorithm can be applied recursively on the input and will achieve an asymptotic running time of
5213 $O \left ( n^{lg(3)} \right )$. 5242 $O \left ( n^{lg(3)} \right )$.
5214 5243
5215 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm 5244 If the asymptotic times of Karatsuba squaring and multiplication are the same, why not simply use the multiplication algorithm
5216 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the 5245 instead? The answer to this arises from the cutoff point for squaring. As in multiplication there exists a cutoff point, at which the
5238 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\ 5267 5. $x1 \leftarrow \lfloor a / \beta^B \rfloor$ (\textit{mp\_lshd}) \\
5239 \\ 5268 \\
5240 Calculate the three squares. \\ 5269 Calculate the three squares. \\
5241 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\ 5270 6. $x0x0 \leftarrow x0^2$ (\textit{mp\_sqr}) \\
5242 7. $x1x1 \leftarrow x1^2$ \\ 5271 7. $x1x1 \leftarrow x1^2$ \\
5243 8. $t1 \leftarrow x1 - x0$ (\textit{mp\_sub}) \\ 5272 8. $t1 \leftarrow x1 + x0$ (\textit{s\_mp\_add}) \\
5244 9. $t1 \leftarrow t1^2$ \\ 5273 9. $t1 \leftarrow t1^2$ \\
5245 \\ 5274 \\
5246 Compute the middle term. \\ 5275 Compute the middle term. \\
5247 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\ 5276 10. $t2 \leftarrow x0x0 + x1x1$ (\textit{s\_mp\_add}) \\
5248 11. $t1 \leftarrow t2 - t1$ \\ 5277 11. $t1 \leftarrow t1 - t2$ \\
5249 \\ 5278 \\
5250 Compute final product. \\ 5279 Compute final product. \\
5251 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\ 5280 12. $t1 \leftarrow t1\beta^B$ (\textit{mp\_lshd}) \\
5252 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\ 5281 13. $x1x1 \leftarrow x1x1\beta^{2B}$ \\
5253 14. $t1 \leftarrow t1 + x0x0$ \\ 5282 14. $t1 \leftarrow t1 + x0x0$ \\
5266 5295
5267 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is 5296 The radix point for squaring is simply placed exactly in the middle of the digits when the input has an odd number of digits, otherwise it is
5268 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$ 5297 placed just below the middle. Step 3, 4 and 5 compute the two halves required using $B$
5269 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form. 5298 as the radix point. The first two squares in steps 6 and 7 are rather straightforward while the last square is of a more compact form.
5270 5299
5271 By expanding $\left (x1 - x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $x1^2 + x0^2 - (x1 - x0)^2 = 2 \cdot x0 \cdot x1$. 5300 By expanding $\left (x1 + x0 \right )^2$, the $x1^2$ and $x0^2$ terms in the middle disappear, that is $(x0 - x1)^2 - (x1^2 + x0^2) = 2 \cdot x0 \cdot x1$.
5272 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then 5301 Now if $5n$ single precision additions and a squaring of $n$-digits is faster than multiplying two $n$-digit numbers and doubling then
5273 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality. 5302 this method is faster. Assuming no further recursions occur, the difference can be estimated with the following inequality.
5274 5303
5275 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or 5304 Let $p$ represent the cost of a single precision addition and $q$ the cost of a single precision multiplication both in terms of time\footnote{Or
5276 machine clock cycles.}. 5305 machine clock cycles.}.
5361 077 if (mp_sqr (&x0, &x0x0) != MP_OKAY) 5390 077 if (mp_sqr (&x0, &x0x0) != MP_OKAY)
5362 078 goto X1X1; /* x0x0 = x0*x0 */ 5391 078 goto X1X1; /* x0x0 = x0*x0 */
5363 079 if (mp_sqr (&x1, &x1x1) != MP_OKAY) 5392 079 if (mp_sqr (&x1, &x1x1) != MP_OKAY)
5364 080 goto X1X1; /* x1x1 = x1*x1 */ 5393 080 goto X1X1; /* x1x1 = x1*x1 */
5365 081 5394 081
5366 082 /* now calc (x1-x0)**2 */ 5395 082 /* now calc (x1+x0)**2 */
5367 083 if (mp_sub (&x1, &x0, &t1) != MP_OKAY) 5396 083 if (s_mp_add (&x1, &x0, &t1) != MP_OKAY)
5368 084 goto X1X1; /* t1 = x1 - x0 */ 5397 084 goto X1X1; /* t1 = x1 - x0 */
5369 085 if (mp_sqr (&t1, &t1) != MP_OKAY) 5398 085 if (mp_sqr (&t1, &t1) != MP_OKAY)
5370 086 goto X1X1; /* t1 = (x1 - x0) * (x1 - x0) */ 5399 086 goto X1X1; /* t1 = (x1 - x0) * (x1 - x0) */
5371 087 5400 087
5372 088 /* add x0y0 */ 5401 088 /* add x0y0 */
5373 089 if (s_mp_add (&x0x0, &x1x1, &t2) != MP_OKAY) 5402 089 if (s_mp_add (&x0x0, &x1x1, &t2) != MP_OKAY)
5374 090 goto X1X1; /* t2 = x0x0 + x1x1 */ 5403 090 goto X1X1; /* t2 = x0x0 + x1x1 */
5375 091 if (mp_sub (&t2, &t1, &t1) != MP_OKAY) 5404 091 if (s_mp_sub (&t1, &t2, &t1) != MP_OKAY)
5376 092 goto X1X1; /* t1 = x0x0 + x1x1 - (x1-x0)*(x1-x0) */ 5405 092 goto X1X1; /* t1 = (x1+x0)**2 - (x0x0 + x1x1) */
5377 093 5406 093
5378 094 /* shift by B */ 5407 094 /* shift by B */
5379 095 if (mp_lshd (&t1, B) != MP_OKAY) 5408 095 if (mp_lshd (&t1, B) != MP_OKAY)
5380 096 goto X1X1; /* t1 = (x0x0 + x1x1 - (x1-x0)*(x1-x0))<<B */ 5409 096 goto X1X1; /* t1 = (x0x0 + x1x1 - (x1-x0)*(x1-x0))<<B */
5381 097 if (mp_lshd (&x1x1, B * 2) != MP_OKAY) 5410 097 if (mp_lshd (&x1x1, B * 2) != MP_OKAY)
5396 112 X0:mp_clear (&x0); 5425 112 X0:mp_clear (&x0);
5397 113 ERR: 5426 113 ERR:
5398 114 return err; 5427 114 return err;
5399 115 \} 5428 115 \}
5400 116 #endif 5429 116 #endif
5430 117
5401 \end{alltt} 5431 \end{alltt}
5402 \end{small} 5432 \end{small}
5403 5433
5404 This implementation is largely based on the implementation of algorithm mp\_karatsuba\_mul. It uses the same inline style to copy and 5434 This implementation is largely based on the implementation of algorithm mp\_karatsuba\_mul. It uses the same inline style to copy and
5405 shift the input into the two halves. The loop from line 53 to line 69 has been modified since only one input exists. The \textbf{used} 5435 shift the input into the two halves. The loop from line 53 to line 69 has been modified since only one input exists. The \textbf{used}
5492 049 \} 5522 049 \}
5493 050 b->sign = MP_ZPOS; 5523 050 b->sign = MP_ZPOS;
5494 051 return res; 5524 051 return res;
5495 052 \} 5525 052 \}
5496 053 #endif 5526 053 #endif
5527 054
5497 \end{alltt} 5528 \end{alltt}
5498 \end{small} 5529 \end{small}
5499 5530
5500 \section*{Exercises} 5531 \section*{Exercises}
5501 \begin{tabular}{cl} 5532 \begin{tabular}{cl}
5825 091 mp_clear (&q); 5856 091 mp_clear (&q);
5826 092 5857 092
5827 093 return res; 5858 093 return res;
5828 094 \} 5859 094 \}
5829 095 #endif 5860 095 #endif
5861 096
5830 \end{alltt} 5862 \end{alltt}
5831 \end{small} 5863 \end{small}
5832 5864
5833 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves 5865 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves
5834 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits 5866 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits
5877 025 return res; 5909 025 return res;
5878 026 \} 5910 026 \}
5879 027 return mp_div (a, b, a, NULL); 5911 027 return mp_div (a, b, a, NULL);
5880 028 \} 5912 028 \}
5881 029 #endif 5913 029 #endif
5914 030
5882 \end{alltt} 5915 \end{alltt}
5883 \end{small} 5916 \end{small}
5884 5917
5885 This simple routine calculates the reciprocal $\mu$ required by Barrett reduction. Note the extended usage of algorithm mp\_div where the variable 5918 This simple routine calculates the reciprocal $\mu$ required by Barrett reduction. Note the extended usage of algorithm mp\_div where the variable
5886 which would received the remainder is passed as NULL. As will be discussed in~\ref{sec:division} the division routine allows both the quotient and the 5919 which would received the remainder is passed as NULL. As will be discussed in~\ref{sec:division} the division routine allows both the quotient and the
5941 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\ 5974 \hline $4$ & $x + n = 1112$, $x/2 = 556$ \\
5942 \hline $5$ & $x/2 = 278$ \\ 5975 \hline $5$ & $x/2 = 278$ \\
5943 \hline $6$ & $x/2 = 139$ \\ 5976 \hline $6$ & $x/2 = 139$ \\
5944 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\ 5977 \hline $7$ & $x + n = 396$, $x/2 = 198$ \\
5945 \hline $8$ & $x/2 = 99$ \\ 5978 \hline $8$ & $x/2 = 99$ \\
5979 \hline $9$ & $x + n = 356$, $x/2 = 178$ \\
5946 \hline 5980 \hline
5947 \end{tabular} 5981 \end{tabular}
5948 \end{center} 5982 \end{center}
5949 \end{small} 5983 \end{small}
5950 \caption{Example of Montgomery Reduction (I)} 5984 \caption{Example of Montgomery Reduction (I)}
5951 \label{fig:MONT1} 5985 \label{fig:MONT1}
5952 \end{figure} 5986 \end{figure}
5953 5987
5954 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 8$. The result of the algorithm $r = 99$ is 5988 Consider the example in figure~\ref{fig:MONT1} which reduces $x = 5555$ modulo $n = 257$ when $k = 9$ (note $\beta^k = 512$ which is larger than $n$). The result of
5955 congruent to the value of $2^{-8} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^8$ modulo $257$ the correct residue 5989 the algorithm $r = 178$ is congruent to the value of $2^{-9} \cdot 5555 \mbox{ (mod }257\mbox{)}$. When $r$ is multiplied by $2^9$ modulo $257$ the correct residue
5956 $r \equiv 158$ is produced. 5990 $r \equiv 158$ is produced.
5957 5991
5958 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts 5992 Let $k = \lfloor lg(n) \rfloor + 1$ represent the number of bits in $n$. The current algorithm requires $2k^2$ single precision shifts
5959 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful. 5993 and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful.
5960 Fortunately there exists an alternative representation of the algorithm. 5994 Fortunately there exists an alternative representation of the algorithm.
5962 \begin{figure}[!here] 5996 \begin{figure}[!here]
5963 \begin{small} 5997 \begin{small}
5964 \begin{center} 5998 \begin{center}
5965 \begin{tabular}{l} 5999 \begin{tabular}{l}
5966 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\ 6000 \hline Algorithm \textbf{Montgomery Reduction} (modified I). \\
5967 \textbf{Input}. Integer $x$, $n$ and $k$ \\ 6001 \textbf{Input}. Integer $x$, $n$ and $k$ ($2^k > n$) \\
5968 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 6002 \textbf{Output}. $2^{-k}x \mbox{ (mod }n\mbox{)}$ \\
5969 \hline \\ 6003 \hline \\
5970 1. for $t$ from $0$ to $k - 1$ do \\ 6004 1. for $t$ from $1$ to $k$ do \\
5971 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\ 6005 \hspace{3mm}1.1 If the $t$'th bit of $x$ is one then \\
5972 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\ 6006 \hspace{6mm}1.1.1 $x \leftarrow x + 2^tn$ \\
5973 2. Return $x/2^k$. \\ 6007 2. Return $x/2^k$. \\
5974 \hline 6008 \hline
5975 \end{tabular} 6009 \end{tabular}
5993 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\ 6027 \hline $4$ & $x + 2^{3}n = 8896$ & $10001011000000$ \\
5994 \hline $5$ & $8896$ & $10001011000000$ \\ 6028 \hline $5$ & $8896$ & $10001011000000$ \\
5995 \hline $6$ & $8896$ & $10001011000000$ \\ 6029 \hline $6$ & $8896$ & $10001011000000$ \\
5996 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\ 6030 \hline $7$ & $x + 2^{6}n = 25344$ & $110001100000000$ \\
5997 \hline $8$ & $25344$ & $110001100000000$ \\ 6031 \hline $8$ & $25344$ & $110001100000000$ \\
5998 \hline -- & $x/2^k = 99$ & \\ 6032 \hline $9$ & $x + 2^{7}n = 91136$ & $10110010000000000$ \\
6033 \hline -- & $x/2^k = 178$ & \\
5999 \hline 6034 \hline
6000 \end{tabular} 6035 \end{tabular}
6001 \end{center} 6036 \end{center}
6002 \end{small} 6037 \end{small}
6003 \caption{Example of Montgomery Reduction (II)} 6038 \caption{Example of Montgomery Reduction (II)}
6004 \label{fig:MONT2} 6039 \label{fig:MONT2}
6005 \end{figure} 6040 \end{figure}
6006 6041
6007 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 8$. 6042 Figure~\ref{fig:MONT2} demonstrates the modified algorithm reducing $x = 5555$ modulo $n = 257$ with $k = 9$.
6008 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the 6043 With this algorithm a single shift right at the end is the only right shift required to reduce the input instead of $k$ right shifts inside the
6009 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is 6044 loop. Note that for the iterations $t = 2, 5, 6$ and $8$ where the result $x$ is not changed. In those iterations the $t$'th bit of $x$ is
6010 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero. 6045 zero and the appropriate multiple of $n$ does not need to be added to force the $t$'th bit of the result to zero.
6011 6046
6012 \subsection{Digit Based Montgomery Reduction} 6047 \subsection{Digit Based Montgomery Reduction}
6016 \begin{figure}[!here] 6051 \begin{figure}[!here]
6017 \begin{small} 6052 \begin{small}
6018 \begin{center} 6053 \begin{center}
6019 \begin{tabular}{l} 6054 \begin{tabular}{l}
6020 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\ 6055 \hline Algorithm \textbf{Montgomery Reduction} (modified II). \\
6021 \textbf{Input}. Integer $x$, $n$ and $k$ \\ 6056 \textbf{Input}. Integer $x$, $n$ and $k$ ($\beta^k > n$) \\
6022 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\ 6057 \textbf{Output}. $\beta^{-k}x \mbox{ (mod }n\mbox{)}$ \\
6023 \hline \\ 6058 \hline \\
6024 1. for $t$ from $0$ to $k - 1$ do \\ 6059 1. for $t$ from $0$ to $k - 1$ do \\
6025 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\ 6060 \hspace{3mm}1.1 $x \leftarrow x + \mu n \beta^t$ \\
6026 2. Return $x/\beta^k$. \\ 6061 2. Return $x/\beta^k$. \\
6232 109 \} 6267 109 \}
6233 110 6268 110
6234 111 return MP_OKAY; 6269 111 return MP_OKAY;
6235 112 \} 6270 112 \}
6236 113 #endif 6271 113 #endif
6272 114
6237 \end{alltt} 6273 \end{alltt}
6238 \end{small} 6274 \end{small}
6239 6275
6240 This is the baseline implementation of the Montgomery reduction algorithm. Lines 30 to 35 determine if the Comba based 6276 This is the baseline implementation of the Montgomery reduction algorithm. Lines 30 to 35 determine if the Comba based
6241 routine can be used instead. Line 48 computes the value of $\mu$ for that particular iteration of the outer loop. 6277 routine can be used instead. Line 48 computes the value of $\mu$ for that particular iteration of the outer loop.
6476 163 return s_mp_sub (x, n, x); 6512 163 return s_mp_sub (x, n, x);
6477 164 \} 6513 164 \}
6478 165 return MP_OKAY; 6514 165 return MP_OKAY;
6479 166 \} 6515 166 \}
6480 167 #endif 6516 167 #endif
6517 168
6481 \end{alltt} 6518 \end{alltt}
6482 \end{small} 6519 \end{small}
6483 6520
6484 The $\hat W$ array is first filled with digits of $x$ on line 50 then the rest of the digits are zeroed on line 54. Both loops share 6521 The $\hat W$ array is first filled with digits of $x$ on line 50 then the rest of the digits are zeroed on line 54. Both loops share
6485 the same alias variables to make the code easier to read. 6522 the same alias variables to make the code easier to read.
6503 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\ 6540 \textbf{Input}. mp\_int $n$ ($n > 1$ and $(n, 2) = 1$) \\
6504 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\ 6541 \textbf{Output}. $\rho \equiv -1/n_0 \mbox{ (mod }\beta\mbox{)}$ \\
6505 \hline \\ 6542 \hline \\
6506 1. $b \leftarrow n_0$ \\ 6543 1. $b \leftarrow n_0$ \\
6507 2. If $b$ is even return(\textit{MP\_VAL}) \\ 6544 2. If $b$ is even return(\textit{MP\_VAL}) \\
6508 3. $x \leftarrow ((b + 2) \mbox{ AND } 4) << 1) + b$ \\ 6545 3. $x \leftarrow (((b + 2) \mbox{ AND } 4) << 1) + b$ \\
6509 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\ 6546 4. for $k$ from 0 to $\lceil lg(lg(\beta)) \rceil - 2$ do \\
6510 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\ 6547 \hspace{3mm}4.1 $x \leftarrow x \cdot (2 - bx)$ \\
6511 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\ 6548 5. $\rho \leftarrow \beta - x \mbox{ (mod }\beta\mbox{)}$ \\
6512 6. Return(\textit{MP\_OKAY}). \\ 6549 6. Return(\textit{MP\_OKAY}). \\
6513 \hline 6550 \hline
6557 045 #ifdef MP_64BIT 6594 045 #ifdef MP_64BIT
6558 046 x *= 2 - b * x; /* here x*a==1 mod 2**64 */ 6595 046 x *= 2 - b * x; /* here x*a==1 mod 2**64 */
6559 047 #endif 6596 047 #endif
6560 048 6597 048
6561 049 /* rho = -1/m mod b */ 6598 049 /* rho = -1/m mod b */
6562 050 *rho = (((mp_word)1 << ((mp_word) DIGIT_BIT)) - x) & MP_MASK; 6599 050 *rho = (unsigned long)(((mp_word)1 << ((mp_word) DIGIT_BIT)) - x) & MP_MAS
6600 K;
6563 051 6601 051
6564 052 return MP_OKAY; 6602 052 return MP_OKAY;
6565 053 \} 6603 053 \}
6566 054 #endif 6604 054 #endif
6605 055
6567 \end{alltt} 6606 \end{alltt}
6568 \end{small} 6607 \end{small}
6569 6608
6570 This source code computes the value of $\rho$ required to perform Montgomery reduction. It has been modified to avoid performing excess 6609 This source code computes the value of $\rho$ required to perform Montgomery reduction. It has been modified to avoid performing excess
6571 multiplications when $\beta$ is not the default 28-bits. 6610 multiplications when $\beta$ is not the default 28-bits.
6828 085 goto top; 6867 085 goto top;
6829 086 \} 6868 086 \}
6830 087 return MP_OKAY; 6869 087 return MP_OKAY;
6831 088 \} 6870 088 \}
6832 089 #endif 6871 089 #endif
6872 090
6833 \end{alltt} 6873 \end{alltt}
6834 \end{small} 6874 \end{small}
6835 6875
6836 The first step is to grow $x$ as required to $2m$ digits since the reduction is performed in place on $x$. The label on line 51 is where 6876 The first step is to grow $x$ as required to $2m$ digits since the reduction is performed in place on $x$. The label on line 51 is where
6837 the algorithm will resume if further reduction passes are required. In theory it could be placed at the top of the function however, the size of 6877 the algorithm will resume if further reduction passes are required. In theory it could be placed at the top of the function however, the size of
6883 023 *d = (mp_digit)((((mp_word)1) << ((mp_word)DIGIT_BIT)) - 6923 023 *d = (mp_digit)((((mp_word)1) << ((mp_word)DIGIT_BIT)) -
6884 024 ((mp_word)a->dp[0])); 6924 024 ((mp_word)a->dp[0]));
6885 025 \} 6925 025 \}
6886 026 6926 026
6887 027 #endif 6927 027 #endif
6928 028
6888 \end{alltt} 6929 \end{alltt}
6889 \end{small} 6930 \end{small}
6890 6931
6891 \subsubsection{Modulus Detection} 6932 \subsubsection{Modulus Detection}
6892 Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be 6933 Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be
6941 034 \} 6982 034 \}
6942 035 return 1; 6983 035 return 1;
6943 036 \} 6984 036 \}
6944 037 6985 037
6945 038 #endif 6986 038 #endif
6987 039
6946 \end{alltt} 6988 \end{alltt}
6947 \end{small} 6989 \end{small}
6948 6990
6949 \subsection{Unrestricted Diminished Radix Reduction} 6991 \subsection{Unrestricted Diminished Radix Reduction}
6950 The unrestricted Diminished Radix algorithm allows modular reductions to be performed when the modulus is of the form $2^p - k$. This algorithm 6992 The unrestricted Diminished Radix algorithm allows modular reductions to be performed when the modulus is of the form $2^p - k$. This algorithm
7025 052 mp_clear(&q); 7067 052 mp_clear(&q);
7026 053 return res; 7068 053 return res;
7027 054 \} 7069 054 \}
7028 055 7070 055
7029 056 #endif 7071 056 #endif
7072 057
7030 \end{alltt} 7073 \end{alltt}
7031 \end{small} 7074 \end{small}
7032 7075
7033 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d 7076 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d
7034 on line 30 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size 7077 on line 30 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size
7094 038 *d = tmp.dp[0]; 7137 038 *d = tmp.dp[0];
7095 039 mp_clear(&tmp); 7138 039 mp_clear(&tmp);
7096 040 return MP_OKAY; 7139 040 return MP_OKAY;
7097 041 \} 7140 041 \}
7098 042 #endif 7141 042 #endif
7142 043
7099 \end{alltt} 7143 \end{alltt}
7100 \end{small} 7144 \end{small}
7101 7145
7102 \subsubsection{Unrestricted Detection} 7146 \subsubsection{Unrestricted Detection}
7103 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. 7147 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true.
7170 043 \} 7214 043 \}
7171 044 return MP_YES; 7215 044 return MP_YES;
7172 045 \} 7216 045 \}
7173 046 7217 046
7174 047 #endif 7218 047 #endif
7219 048
7175 \end{alltt} 7220 \end{alltt}
7176 \end{small} 7221 \end{small}
7177 7222
7178 7223
7179 7224
7379 048 7424 048
7380 049 mp_clear (&g); 7425 049 mp_clear (&g);
7381 050 return MP_OKAY; 7426 050 return MP_OKAY;
7382 051 \} 7427 051 \}
7383 052 #endif 7428 052 #endif
7429 053
7384 \end{alltt} 7430 \end{alltt}
7385 \end{small} 7431 \end{small}
7386 7432
7387 Line 28 sets the initial value of the result to $1$. Next the loop on line 30 steps through each bit of the exponent starting from 7433 Line 28 sets the initial value of the result to $1$. Next the loop on line 30 steps through each bit of the exponent starting from
7388 the most significant down towards the least significant. The invariant squaring operation placed on line 32 is performed first. After 7434 the most significant down towards the least significant. The invariant squaring operation placed on line 32 is performed first. After
7618 063 return MP_VAL; 7664 063 return MP_VAL;
7619 064 #endif 7665 064 #endif
7620 065 \} 7666 065 \}
7621 066 7667 066
7622 067 /* modified diminished radix reduction */ 7668 067 /* modified diminished radix reduction */
7623 068 #if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) 7669 068 #if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) && defin
7670 ed(BN_S_MP_EXPTMOD_C)
7624 069 if (mp_reduce_is_2k_l(P) == MP_YES) \{ 7671 069 if (mp_reduce_is_2k_l(P) == MP_YES) \{
7625 070 return s_mp_exptmod(G, X, P, Y, 1); 7672 070 return s_mp_exptmod(G, X, P, Y, 1);
7626 071 \} 7673 071 \}
7627 072 #endif 7674 072 #endif
7628 073 7675 073
7658 103 \} 7705 103 \}
7659 104 #endif 7706 104 #endif
7660 105 \} 7707 105 \}
7661 106 7708 106
7662 107 #endif 7709 107 #endif
7710 108
7663 \end{alltt} 7711 \end{alltt}
7664 \end{small} 7712 \end{small}
7665 7713
7666 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is 7714 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is
7667 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned 7715 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned
7837 7885
7838 \vspace{+3mm}\begin{small} 7886 \vspace{+3mm}\begin{small}
7839 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_exptmod.c 7887 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_exptmod.c
7840 \vspace{-3mm} 7888 \vspace{-3mm}
7841 \begin{alltt} 7889 \begin{alltt}
7842 016 7890 016 #ifdef MP_LOW_MEM
7843 017 #ifdef MP_LOW_MEM 7891 017 #define TAB_SIZE 32
7844 018 #define TAB_SIZE 32 7892 018 #else
7845 019 #else 7893 019 #define TAB_SIZE 256
7846 020 #define TAB_SIZE 256 7894 020 #endif
7847 021 #endif 7895 021
7848 022 7896 022 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmod
7849 023 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmod
7850 e) 7897 e)
7851 024 \{ 7898 023 \{
7852 025 mp_int M[TAB_SIZE], res, mu; 7899 024 mp_int M[TAB_SIZE], res, mu;
7853 026 mp_digit buf; 7900 025 mp_digit buf;
7854 027 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; 7901 026 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize;
7855 028 int (*redux)(mp_int*,mp_int*,mp_int*); 7902 027 int (*redux)(mp_int*,mp_int*,mp_int*);
7856 029 7903 028
7857 030 /* find window size */ 7904 029 /* find window size */
7858 031 x = mp_count_bits (X); 7905 030 x = mp_count_bits (X);
7859 032 if (x <= 7) \{ 7906 031 if (x <= 7) \{
7860 033 winsize = 2; 7907 032 winsize = 2;
7861 034 \} else if (x <= 36) \{ 7908 033 \} else if (x <= 36) \{
7862 035 winsize = 3; 7909 034 winsize = 3;
7863 036 \} else if (x <= 140) \{ 7910 035 \} else if (x <= 140) \{
7864 037 winsize = 4; 7911 036 winsize = 4;
7865 038 \} else if (x <= 450) \{ 7912 037 \} else if (x <= 450) \{
7866 039 winsize = 5; 7913 038 winsize = 5;
7867 040 \} else if (x <= 1303) \{ 7914 039 \} else if (x <= 1303) \{
7868 041 winsize = 6; 7915 040 winsize = 6;
7869 042 \} else if (x <= 3529) \{ 7916 041 \} else if (x <= 3529) \{
7870 043 winsize = 7; 7917 042 winsize = 7;
7871 044 \} else \{ 7918 043 \} else \{
7872 045 winsize = 8; 7919 044 winsize = 8;
7873 046 \} 7920 045 \}
7874 047 7921 046
7875 048 #ifdef MP_LOW_MEM 7922 047 #ifdef MP_LOW_MEM
7876 049 if (winsize > 5) \{ 7923 048 if (winsize > 5) \{
7877 050 winsize = 5; 7924 049 winsize = 5;
7878 051 \} 7925 050 \}
7879 052 #endif 7926 051 #endif
7880 053 7927 052
7881 054 /* init M array */ 7928 053 /* init M array */
7882 055 /* init first cell */ 7929 054 /* init first cell */
7883 056 if ((err = mp_init(&M[1])) != MP_OKAY) \{ 7930 055 if ((err = mp_init(&M[1])) != MP_OKAY) \{
7884 057 return err; 7931 056 return err;
7885 058 \} 7932 057 \}
7886 059 7933 058
7887 060 /* now init the second half of the array */ 7934 059 /* now init the second half of the array */
7888 061 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ 7935 060 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{
7889 062 if ((err = mp_init(&M[x])) != MP_OKAY) \{ 7936 061 if ((err = mp_init(&M[x])) != MP_OKAY) \{
7890 063 for (y = 1<<(winsize-1); y < x; y++) \{ 7937 062 for (y = 1<<(winsize-1); y < x; y++) \{
7891 064 mp_clear (&M[y]); 7938 063 mp_clear (&M[y]);
7892 065 \} 7939 064 \}
7893 066 mp_clear(&M[1]); 7940 065 mp_clear(&M[1]);
7894 067 return err; 7941 066 return err;
7895 068 \} 7942 067 \}
7896 069 \} 7943 068 \}
7897 070 7944 069
7898 071 /* create mu, used for Barrett reduction */ 7945 070 /* create mu, used for Barrett reduction */
7899 072 if ((err = mp_init (&mu)) != MP_OKAY) \{ 7946 071 if ((err = mp_init (&mu)) != MP_OKAY) \{
7900 073 goto LBL_M; 7947 072 goto LBL_M;
7901 074 \} 7948 073 \}
7902 075 7949 074
7903 076 if (redmode == 0) \{ 7950 075 if (redmode == 0) \{
7904 077 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{ 7951 076 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{
7905 078 goto LBL_MU; 7952 077 goto LBL_MU;
7906 079 \} 7953 078 \}
7907 080 redux = mp_reduce; 7954 079 redux = mp_reduce;
7908 081 \} else \{ 7955 080 \} else \{
7909 082 if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) \{ 7956 081 if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) \{
7910 083 goto LBL_MU; 7957 082 goto LBL_MU;
7911 084 \} 7958 083 \}
7912 085 redux = mp_reduce_2k_l; 7959 084 redux = mp_reduce_2k_l;
7913 086 \} 7960 085 \}
7914 087 7961 086
7915 088 /* create M table 7962 087 /* create M table
7916 089 * 7963 088 *
7917 090 * The M table contains powers of the base, 7964 089 * The M table contains powers of the base,
7918 091 * e.g. M[x] = G**x mod P 7965 090 * e.g. M[x] = G**x mod P
7919 092 * 7966 091 *
7920 093 * The first half of the table is not 7967 092 * The first half of the table is not
7921 094 * computed though accept for M[0] and M[1] 7968 093 * computed though accept for M[0] and M[1]
7922 095 */ 7969 094 */
7923 096 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{ 7970 095 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{
7924 097 goto LBL_MU; 7971 096 goto LBL_MU;
7925 098 \} 7972 097 \}
7926 099 7973 098
7927 100 /* compute the value at M[1<<(winsize-1)] by squaring 7974 099 /* compute the value at M[1<<(winsize-1)] by squaring
7928 101 * M[1] (winsize-1) times 7975 100 * M[1] (winsize-1) times
7929 102 */ 7976 101 */
7930 103 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{ 7977 102 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{
7931 104 goto LBL_MU; 7978 103 goto LBL_MU;
7932 105 \} 7979 104 \}
7933 106 7980 105
7934 107 for (x = 0; x < (winsize - 1); x++) \{ 7981 106 for (x = 0; x < (winsize - 1); x++) \{
7935 108 /* square it */ 7982 107 /* square it */
7936 109 if ((err = mp_sqr (&M[1 << (winsize - 1)], 7983 108 if ((err = mp_sqr (&M[1 << (winsize - 1)],
7937 110 &M[1 << (winsize - 1)])) != MP_OKAY) \{ 7984 109 &M[1 << (winsize - 1)])) != MP_OKAY) \{
7938 111 goto LBL_MU; 7985 110 goto LBL_MU;
7939 112 \} 7986 111 \}
7940 113 7987 112
7941 114 /* reduce modulo P */ 7988 113 /* reduce modulo P */
7942 115 if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{ 7989 114 if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{
7943 116 goto LBL_MU; 7990 115 goto LBL_MU;
7944 117 \} 7991 116 \}
7945 118 \} 7992 117 \}
7946 119 7993 118
7947 120 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P) 7994 119 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P)
7948 121 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1) 7995 120 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1)
7949 122 */ 7996 121 */
7950 123 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{ 7997 122 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{
7951 124 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{ 7998 123 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{
7952 125 goto LBL_MU; 7999 124 goto LBL_MU;
7953 126 \} 8000 125 \}
7954 127 if ((err = redux (&M[x], P, &mu)) != MP_OKAY) \{ 8001 126 if ((err = redux (&M[x], P, &mu)) != MP_OKAY) \{
7955 128 goto LBL_MU; 8002 127 goto LBL_MU;
7956 129 \} 8003 128 \}
7957 130 \} 8004 129 \}
7958 131 8005 130
7959 132 /* setup result */ 8006 131 /* setup result */
7960 133 if ((err = mp_init (&res)) != MP_OKAY) \{ 8007 132 if ((err = mp_init (&res)) != MP_OKAY) \{
7961 134 goto LBL_MU; 8008 133 goto LBL_MU;
7962 135 \} 8009 134 \}
7963 136 mp_set (&res, 1); 8010 135 mp_set (&res, 1);
7964 137 8011 136
7965 138 /* set initial mode and bit cnt */ 8012 137 /* set initial mode and bit cnt */
7966 139 mode = 0; 8013 138 mode = 0;
7967 140 bitcnt = 1; 8014 139 bitcnt = 1;
7968 141 buf = 0; 8015 140 buf = 0;
7969 142 digidx = X->used - 1; 8016 141 digidx = X->used - 1;
7970 143 bitcpy = 0; 8017 142 bitcpy = 0;
7971 144 bitbuf = 0; 8018 143 bitbuf = 0;
7972 145 8019 144
7973 146 for (;;) \{ 8020 145 for (;;) \{
7974 147 /* grab next digit as required */ 8021 146 /* grab next digit as required */
7975 148 if (--bitcnt == 0) \{ 8022 147 if (--bitcnt == 0) \{
7976 149 /* if digidx == -1 we are out of digits */ 8023 148 /* if digidx == -1 we are out of digits */
7977 150 if (digidx == -1) \{ 8024 149 if (digidx == -1) \{
7978 151 break; 8025 150 break;
7979 152 \} 8026 151 \}
7980 153 /* read next digit and reset the bitcnt */ 8027 152 /* read next digit and reset the bitcnt */
7981 154 buf = X->dp[digidx--]; 8028 153 buf = X->dp[digidx--];
7982 155 bitcnt = (int) DIGIT_BIT; 8029 154 bitcnt = (int) DIGIT_BIT;
7983 156 \} 8030 155 \}
7984 157 8031 156
7985 158 /* grab the next msb from the exponent */ 8032 157 /* grab the next msb from the exponent */
7986 159 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1; 8033 158 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1;
7987 160 buf <<= (mp_digit)1; 8034 159 buf <<= (mp_digit)1;
7988 161 8035 160
7989 162 /* if the bit is zero and mode == 0 then we ignore it 8036 161 /* if the bit is zero and mode == 0 then we ignore it
7990 163 * These represent the leading zero bits before the first 1 bit 8037 162 * These represent the leading zero bits before the first 1 bit
7991 164 * in the exponent. Technically this opt is not required but it 8038 163 * in the exponent. Technically this opt is not required but it
7992 165 * does lower the # of trivial squaring/reductions used 8039 164 * does lower the # of trivial squaring/reductions used
7993 166 */ 8040 165 */
7994 167 if (mode == 0 && y == 0) \{ 8041 166 if (mode == 0 && y == 0) \{
7995 168 continue; 8042 167 continue;
7996 169 \} 8043 168 \}
7997 170 8044 169
7998 171 /* if the bit is zero and mode == 1 then we square */ 8045 170 /* if the bit is zero and mode == 1 then we square */
7999 172 if (mode == 1 && y == 0) \{ 8046 171 if (mode == 1 && y == 0) \{
8000 173 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ 8047 172 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{
8001 174 goto LBL_RES; 8048 173 goto LBL_RES;
8002 175 \} 8049 174 \}
8003 176 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ 8050 175 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{
8004 177 goto LBL_RES; 8051 176 goto LBL_RES;
8005 178 \} 8052 177 \}
8006 179 continue; 8053 178 continue;
8007 180 \} 8054 179 \}
8008 181 8055 180
8009 182 /* else we add it to the window */ 8056 181 /* else we add it to the window */
8010 183 bitbuf |= (y << (winsize - ++bitcpy)); 8057 182 bitbuf |= (y << (winsize - ++bitcpy));
8011 184 mode = 2; 8058 183 mode = 2;
8012 185 8059 184
8013 186 if (bitcpy == winsize) \{ 8060 185 if (bitcpy == winsize) \{
8014 187 /* ok window is filled so square as required and multiply */ 8061 186 /* ok window is filled so square as required and multiply */
8015 188 /* square first */ 8062 187 /* square first */
8016 189 for (x = 0; x < winsize; x++) \{ 8063 188 for (x = 0; x < winsize; x++) \{
8017 190 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ 8064 189 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{
8018 191 goto LBL_RES; 8065 190 goto LBL_RES;
8019 192 \} 8066 191 \}
8020 193 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ 8067 192 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{
8021 194 goto LBL_RES; 8068 193 goto LBL_RES;
8022 195 \} 8069 194 \}
8023 196 \} 8070 195 \}
8024 197 8071 196
8025 198 /* then multiply */ 8072 197 /* then multiply */
8026 199 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{ 8073 198 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{
8027 200 goto LBL_RES; 8074 199 goto LBL_RES;
8028 201 \} 8075 200 \}
8029 202 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ 8076 201 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{
8030 203 goto LBL_RES; 8077 202 goto LBL_RES;
8031 204 \} 8078 203 \}
8032 205 8079 204
8033 206 /* empty window and reset */ 8080 205 /* empty window and reset */
8034 207 bitcpy = 0; 8081 206 bitcpy = 0;
8035 208 bitbuf = 0; 8082 207 bitbuf = 0;
8036 209 mode = 1; 8083 208 mode = 1;
8037 210 \} 8084 209 \}
8038 211 \} 8085 210 \}
8039 212 8086 211
8040 213 /* if bits remain then square/multiply */ 8087 212 /* if bits remain then square/multiply */
8041 214 if (mode == 2 && bitcpy > 0) \{ 8088 213 if (mode == 2 && bitcpy > 0) \{
8042 215 /* square then multiply if the bit is set */ 8089 214 /* square then multiply if the bit is set */
8043 216 for (x = 0; x < bitcpy; x++) \{ 8090 215 for (x = 0; x < bitcpy; x++) \{
8044 217 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ 8091 216 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{
8045 218 goto LBL_RES; 8092 217 goto LBL_RES;
8046 219 \} 8093 218 \}
8047 220 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ 8094 219 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{
8048 221 goto LBL_RES; 8095 220 goto LBL_RES;
8049 222 \} 8096 221 \}
8050 223 8097 222
8051 224 bitbuf <<= 1; 8098 223 bitbuf <<= 1;
8052 225 if ((bitbuf & (1 << winsize)) != 0) \{ 8099 224 if ((bitbuf & (1 << winsize)) != 0) \{
8053 226 /* then multiply */ 8100 225 /* then multiply */
8054 227 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{ 8101 226 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{
8055 228 goto LBL_RES; 8102 227 goto LBL_RES;
8056 229 \} 8103 228 \}
8057 230 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ 8104 229 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{
8058 231 goto LBL_RES; 8105 230 goto LBL_RES;
8059 232 \} 8106 231 \}
8060 233 \} 8107 232 \}
8061 234 \} 8108 233 \}
8062 235 \} 8109 234 \}
8063 236 8110 235
8064 237 mp_exch (&res, Y); 8111 236 mp_exch (&res, Y);
8065 238 err = MP_OKAY; 8112 237 err = MP_OKAY;
8066 239 LBL_RES:mp_clear (&res); 8113 238 LBL_RES:mp_clear (&res);
8067 240 LBL_MU:mp_clear (&mu); 8114 239 LBL_MU:mp_clear (&mu);
8068 241 LBL_M: 8115 240 LBL_M:
8069 242 mp_clear(&M[1]); 8116 241 mp_clear(&M[1]);
8070 243 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ 8117 242 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{
8071 244 mp_clear (&M[x]); 8118 243 mp_clear (&M[x]);
8072 245 \} 8119 244 \}
8073 246 return err; 8120 245 return err;
8074 247 \} 8121 246 \}
8075 248 #endif 8122 247 #endif
8123 248
8076 \end{alltt} 8124 \end{alltt}
8077 \end{small} 8125 \end{small}
8078 8126
8079 Lines 21 through 40 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted 8127 Lines 31 through 45 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted
8080 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement 8128 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement
8081 on line 32 the value of $x$ is already known to be greater than $140$. 8129 on line 37 the value of $x$ is already known to be greater than $140$.
8082 8130
8083 The conditional piece of code beginning on line 48 allows the window size to be restricted to five bits. This logic is used to ensure 8131 The conditional piece of code beginning on line 47 allows the window size to be restricted to five bits. This logic is used to ensure
8084 the table of precomputed powers of $G$ remains relatively small. 8132 the table of precomputed powers of $G$ remains relatively small.
8085 8133
8086 The for loop on line 61 initializes the $M$ array while lines 62 and 77 compute the value of $\mu$ required for 8134 The for loop on line 60 initializes the $M$ array while lines 71 and 76 through 85 initialize the reduction
8087 Barrett reduction. 8135 function that will be used for this modulus.
8088 8136
8089 -- More later. 8137 -- More later.
8090 8138
8091 \section{Quick Power of Two} 8139 \section{Quick Power of Two}
8092 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is 8140 Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is
8144 039 a->dp[b / DIGIT_BIT] = ((mp_digit)1) << (b % DIGIT_BIT); 8192 039 a->dp[b / DIGIT_BIT] = ((mp_digit)1) << (b % DIGIT_BIT);
8145 040 8193 040
8146 041 return MP_OKAY; 8194 041 return MP_OKAY;
8147 042 \} 8195 042 \}
8148 043 #endif 8196 043 #endif
8197 044
8149 \end{alltt} 8198 \end{alltt}
8150 \end{small} 8199 \end{small}
8151 8200
8152 \chapter{Higher Level Algorithms} 8201 \chapter{Higher Level Algorithms}
8153 8202
8664 283 \} 8713 283 \}
8665 284 8714 284
8666 285 #endif 8715 285 #endif
8667 286 8716 286
8668 287 #endif 8717 287 #endif
8718 288
8669 \end{alltt} 8719 \end{alltt}
8670 \end{small} 8720 \end{small}
8671 8721
8672 The implementation of this algorithm differs slightly from the pseudo code presented previously. In this algorithm either of the quotient $c$ or 8722 The implementation of this algorithm differs slightly from the pseudo code presented previously. In this algorithm either of the quotient $c$ or
8673 remainder $d$ may be passed as a \textbf{NULL} pointer which indicates their value is not desired. For example, the C code to call the division 8723 remainder $d$ may be passed as a \textbf{NULL} pointer which indicates their value is not desired. For example, the C code to call the division
8675 8725
8676 \begin{verbatim} 8726 \begin{verbatim}
8677 mp_div(&a, &b, &c, NULL); /* c = [a/b] */ 8727 mp_div(&a, &b, &c, NULL); /* c = [a/b] */
8678 \end{verbatim} 8728 \end{verbatim}
8679 8729
8680 Lines 37 and 44 handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor 8730 Lines 108 and 113 handle the two trivial cases of inputs which are division by zero and dividend smaller than the divisor
8681 respectively. After the two trivial cases all of the temporary variables are initialized. Line 105 determines the sign of 8731 respectively. After the two trivial cases all of the temporary variables are initialized. Line 147 determines the sign of
8682 the quotient and line 76 ensures that both $x$ and $y$ are positive. 8732 the quotient and line 148 ensures that both $x$ and $y$ are positive.
8683 8733
8684 The number of bits in the leading digit is calculated on line 105. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits 8734 The number of bits in the leading digit is calculated on line 151. Implictly an mp\_int with $r$ digits will require $lg(\beta)(r-1) + k$ bits
8685 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is 8735 of precision which when reduced modulo $lg(\beta)$ produces the value of $k$. In this case $k$ is the number of bits in the leading digit which is
8686 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting 8736 exactly what is required. For the algorithm to operate $k$ must equal $lg(\beta) - 1$ and when it does not the inputs must be normalized by shifting
8687 them to the left by $lg(\beta) - 1 - k$ bits. 8737 them to the left by $lg(\beta) - 1 - k$ bits.
8688 8738
8689 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the 8739 Throughout the variables $n$ and $t$ will represent the highest digit of $x$ and $y$ respectively. These are first used to produce the
8690 leading digit of the quotient. The loop beginning on line 183 will produce the remainder of the quotient digits. 8740 leading digit of the quotient. The loop beginning on line 184 will produce the remainder of the quotient digits.
8691 8741
8692 The conditional ``continue'' on line 114 is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the 8742 The conditional ``continue'' on line 186 is used to prevent the algorithm from reading past the leading edge of $x$ which can occur when the
8693 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits 8743 algorithm eliminates multiple non-zero digits in a single iteration. This ensures that $x_i$ is always non-zero since by definition the digits
8694 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}. 8744 above the $i$'th position $x$ must be zero in order for the quotient to be precise\footnote{Precise as far as integer division is concerned.}.
8695 8745
8696 Lines 130, 130 and 134 through 134 manually construct the high accuracy estimations by setting the digits of the two mp\_int 8746 Lines 214, 216 and 222 through 225 manually construct the high accuracy estimations by setting the digits of the two mp\_int
8697 variables directly. 8747 variables directly.
8698 8748
8699 \section{Single Digit Helpers} 8749 \section{Single Digit Helpers}
8700 8750
8701 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of 8751 This section briefly describes a series of single digit helper algorithms which come in handy when working with small constants. All of
8755 037 res = mp_sub_d(a, b, c); 8805 037 res = mp_sub_d(a, b, c);
8756 038 8806 038
8757 039 /* fix sign */ 8807 039 /* fix sign */
8758 040 a->sign = c->sign = MP_NEG; 8808 040 a->sign = c->sign = MP_NEG;
8759 041 8809 041
8760 042 return res; 8810 042 /* clamp */
8761 043 \} 8811 043 mp_clamp(c);
8762 044 8812 044
8763 045 /* old number of used digits in c */ 8813 045 return res;
8764 046 oldused = c->used; 8814 046 \}
8765 047 8815 047
8766 048 /* sign always positive */ 8816 048 /* old number of used digits in c */
8767 049 c->sign = MP_ZPOS; 8817 049 oldused = c->used;
8768 050 8818 050
8769 051 /* source alias */ 8819 051 /* sign always positive */
8770 052 tmpa = a->dp; 8820 052 c->sign = MP_ZPOS;
8771 053 8821 053
8772 054 /* destination alias */ 8822 054 /* source alias */
8773 055 tmpc = c->dp; 8823 055 tmpa = a->dp;
8774 056 8824 056
8775 057 /* if a is positive */ 8825 057 /* destination alias */
8776 058 if (a->sign == MP_ZPOS) \{ 8826 058 tmpc = c->dp;
8777 059 /* add digit, after this we're propagating 8827 059
8778 060 * the carry. 8828 060 /* if a is positive */
8779 061 */ 8829 061 if (a->sign == MP_ZPOS) \{
8780 062 *tmpc = *tmpa++ + b; 8830 062 /* add digit, after this we're propagating
8781 063 mu = *tmpc >> DIGIT_BIT; 8831 063 * the carry.
8782 064 *tmpc++ &= MP_MASK; 8832 064 */
8783 065 8833 065 *tmpc = *tmpa++ + b;
8784 066 /* now handle rest of the digits */ 8834 066 mu = *tmpc >> DIGIT_BIT;
8785 067 for (ix = 1; ix < a->used; ix++) \{ 8835 067 *tmpc++ &= MP_MASK;
8786 068 *tmpc = *tmpa++ + mu; 8836 068
8787 069 mu = *tmpc >> DIGIT_BIT; 8837 069 /* now handle rest of the digits */
8788 070 *tmpc++ &= MP_MASK; 8838 070 for (ix = 1; ix < a->used; ix++) \{
8789 071 \} 8839 071 *tmpc = *tmpa++ + mu;
8790 072 /* set final carry */ 8840 072 mu = *tmpc >> DIGIT_BIT;
8791 073 ix++; 8841 073 *tmpc++ &= MP_MASK;
8792 074 *tmpc++ = mu; 8842 074 \}
8793 075 8843 075 /* set final carry */
8794 076 /* setup size */ 8844 076 ix++;
8795 077 c->used = a->used + 1; 8845 077 *tmpc++ = mu;
8796 078 \} else \{ 8846 078
8797 079 /* a was negative and |a| < b */ 8847 079 /* setup size */
8798 080 c->used = 1; 8848 080 c->used = a->used + 1;
8799 081 8849 081 \} else \{
8800 082 /* the result is a single digit */ 8850 082 /* a was negative and |a| < b */
8801 083 if (a->used == 1) \{ 8851 083 c->used = 1;
8802 084 *tmpc++ = b - a->dp[0]; 8852 084
8803 085 \} else \{ 8853 085 /* the result is a single digit */
8804 086 *tmpc++ = b; 8854 086 if (a->used == 1) \{
8805 087 \} 8855 087 *tmpc++ = b - a->dp[0];
8806 088 8856 088 \} else \{
8807 089 /* setup count so the clearing of oldused 8857 089 *tmpc++ = b;
8808 090 * can fall through correctly 8858 090 \}
8809 091 */ 8859 091
8810 092 ix = 1; 8860 092 /* setup count so the clearing of oldused
8811 093 \} 8861 093 * can fall through correctly
8812 094 8862 094 */
8813 095 /* now zero to oldused */ 8863 095 ix = 1;
8814 096 while (ix++ < oldused) \{ 8864 096 \}
8815 097 *tmpc++ = 0; 8865 097
8816 098 \} 8866 098 /* now zero to oldused */
8817 099 mp_clamp(c); 8867 099 while (ix++ < oldused) \{
8818 100 8868 100 *tmpc++ = 0;
8819 101 return MP_OKAY; 8869 101 \}
8820 102 \} 8870 102 mp_clamp(c);
8821 103 8871 103
8822 104 #endif 8872 104 return MP_OKAY;
8873 105 \}
8874 106
8875 107 #endif
8876 108
8823 \end{alltt} 8877 \end{alltt}
8824 \end{small} 8878 \end{small}
8825 8879
8826 Clever use of the letter 't'. 8880 Clever use of the letter 't'.
8827 8881
8927 070 mp_clamp(c); 8981 070 mp_clamp(c);
8928 071 8982 071
8929 072 return MP_OKAY; 8983 072 return MP_OKAY;
8930 073 \} 8984 073 \}
8931 074 #endif 8985 074 #endif
8986 075
8932 \end{alltt} 8987 \end{alltt}
8933 \end{small} 8988 \end{small}
8934 8989
8935 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is 8990 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is
8936 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. 8991 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively.
9072 101 9127 101
9073 102 return res; 9128 102 return res;
9074 103 \} 9129 103 \}
9075 104 9130 104
9076 105 #endif 9131 105 #endif
9132 106
9077 \end{alltt} 9133 \end{alltt}
9078 \end{small} 9134 \end{small}
9079 9135
9080 Like the implementation of algorithm mp\_div this algorithm allows either of the quotient or remainder to be passed as a \textbf{NULL} pointer to 9136 Like the implementation of algorithm mp\_div this algorithm allows either of the quotient or remainder to be passed as a \textbf{NULL} pointer to
9081 indicate the respective value is not required. This allows a trivial single digit modular reduction algorithm, mp\_mod\_d to be created. 9137 indicate the respective value is not required. This allows a trivial single digit modular reduction algorithm, mp\_mod\_d to be created.
9258 123 LBL_T2:mp_clear (&t2); 9314 123 LBL_T2:mp_clear (&t2);
9259 124 LBL_T1:mp_clear (&t1); 9315 124 LBL_T1:mp_clear (&t1);
9260 125 return res; 9316 125 return res;
9261 126 \} 9317 126 \}
9262 127 #endif 9318 127 #endif
9319 128
9263 \end{alltt} 9320 \end{alltt}
9264 \end{small} 9321 \end{small}
9265 9322
9266 \section{Random Number Generation} 9323 \section{Random Number Generation}
9267 9324
9334 046 \} 9391 046 \}
9335 047 9392 047
9336 048 return MP_OKAY; 9393 048 return MP_OKAY;
9337 049 \} 9394 049 \}
9338 050 #endif 9395 050 #endif
9396 051
9339 \end{alltt} 9397 \end{alltt}
9340 \end{small} 9398 \end{small}
9341 9399
9342 \section{Formatted Representations} 9400 \section{Formatted Representations}
9343 The ability to emit a radix-$n$ textual representation of an integer is useful for interacting with human parties. For example, the ability to 9401 The ability to emit a radix-$n$ textual representation of an integer is useful for interacting with human parties. For example, the ability to
9423 018 int mp_read_radix (mp_int * a, const char *str, int radix) 9481 018 int mp_read_radix (mp_int * a, const char *str, int radix)
9424 019 \{ 9482 019 \{
9425 020 int y, res, neg; 9483 020 int y, res, neg;
9426 021 char ch; 9484 021 char ch;
9427 022 9485 022
9428 023 /* make sure the radix is ok */ 9486 023 /* zero the digit bignum */
9429 024 if (radix < 2 || radix > 64) \{ 9487 024 mp_zero(a);
9430 025 return MP_VAL; 9488 025
9431 026 \} 9489 026 /* make sure the radix is ok */
9432 027 9490 027 if (radix < 2 || radix > 64) \{
9433 028 /* if the leading digit is a 9491 028 return MP_VAL;
9434 029 * minus set the sign to negative. 9492 029 \}
9435 030 */ 9493 030
9436 031 if (*str == '-') \{ 9494 031 /* if the leading digit is a
9437 032 ++str; 9495 032 * minus set the sign to negative.
9438 033 neg = MP_NEG; 9496 033 */
9439 034 \} else \{ 9497 034 if (*str == '-') \{
9440 035 neg = MP_ZPOS; 9498 035 ++str;
9441 036 \} 9499 036 neg = MP_NEG;
9442 037 9500 037 \} else \{
9443 038 /* set the integer to the default of zero */ 9501 038 neg = MP_ZPOS;
9444 039 mp_zero (a); 9502 039 \}
9445 040 9503 040
9446 041 /* process each digit of the string */ 9504 041 /* set the integer to the default of zero */
9447 042 while (*str) \{ 9505 042 mp_zero (a);
9448 043 /* if the radix < 36 the conversion is case insensitive 9506 043
9449 044 * this allows numbers like 1AB and 1ab to represent the same value 9507 044 /* process each digit of the string */
9450 045 * [e.g. in hex] 9508 045 while (*str) \{
9451 046 */ 9509 046 /* if the radix < 36 the conversion is case insensitive
9452 047 ch = (char) ((radix < 36) ? toupper (*str) : *str); 9510 047 * this allows numbers like 1AB and 1ab to represent the same value
9453 048 for (y = 0; y < 64; y++) \{ 9511 048 * [e.g. in hex]
9454 049 if (ch == mp_s_rmap[y]) \{ 9512 049 */
9455 050 break; 9513 050 ch = (char) ((radix < 36) ? toupper (*str) : *str);
9456 051 \} 9514 051 for (y = 0; y < 64; y++) \{
9457 052 \} 9515 052 if (ch == mp_s_rmap[y]) \{
9458 053 9516 053 break;
9459 054 /* if the char was found in the map 9517 054 \}
9460 055 * and is less than the given radix add it 9518 055 \}
9461 056 * to the number, otherwise exit the loop. 9519 056
9462 057 */ 9520 057 /* if the char was found in the map
9463 058 if (y < radix) \{ 9521 058 * and is less than the given radix add it
9464 059 if ((res = mp_mul_d (a, (mp_digit) radix, a)) != MP_OKAY) \{ 9522 059 * to the number, otherwise exit the loop.
9465 060 return res; 9523 060 */
9466 061 \} 9524 061 if (y < radix) \{
9467 062 if ((res = mp_add_d (a, (mp_digit) y, a)) != MP_OKAY) \{ 9525 062 if ((res = mp_mul_d (a, (mp_digit) radix, a)) != MP_OKAY) \{
9468 063 return res; 9526 063 return res;
9469 064 \} 9527 064 \}
9470 065 \} else \{ 9528 065 if ((res = mp_add_d (a, (mp_digit) y, a)) != MP_OKAY) \{
9471 066 break; 9529 066 return res;
9472 067 \} 9530 067 \}
9473 068 ++str; 9531 068 \} else \{
9474 069 \} 9532 069 break;
9475 070 9533 070 \}
9476 071 /* set the sign only if a != 0 */ 9534 071 ++str;
9477 072 if (mp_iszero(a) != 1) \{ 9535 072 \}
9478 073 a->sign = neg; 9536 073
9479 074 \} 9537 074 /* set the sign only if a != 0 */
9480 075 return MP_OKAY; 9538 075 if (mp_iszero(a) != 1) \{
9481 076 \} 9539 076 a->sign = neg;
9482 077 #endif 9540 077 \}
9541 078 return MP_OKAY;
9542 079 \}
9543 080 #endif
9544 081
9483 \end{alltt} 9545 \end{alltt}
9484 \end{small} 9546 \end{small}
9485 9547
9486 \subsection{Generating Radix-$n$ Output} 9548 \subsection{Generating Radix-$n$ Output}
9487 Generating radix-$n$ output is fairly trivial with a division and remainder algorithm. 9549 Generating radix-$n$ output is fairly trivial with a division and remainder algorithm.
9597 066 mp_clear (&t); 9659 066 mp_clear (&t);
9598 067 return MP_OKAY; 9660 067 return MP_OKAY;
9599 068 \} 9661 068 \}
9600 069 9662 069
9601 070 #endif 9663 070 #endif
9664 071
9602 \end{alltt} 9665 \end{alltt}
9603 \end{small} 9666 \end{small}
9604 9667
9605 \chapter{Number Theoretic Algorithms} 9668 \chapter{Number Theoretic Algorithms}
9606 This chapter discusses several fundamental number theoretic algorithms such as the greatest common divisor, least common multiple and Jacobi 9669 This chapter discusses several fundamental number theoretic algorithms such as the greatest common divisor, least common multiple and Jacobi
9726 \begin{tabular}{l} 9789 \begin{tabular}{l}
9727 \hline Algorithm \textbf{mp\_gcd}. \\ 9790 \hline Algorithm \textbf{mp\_gcd}. \\
9728 \textbf{Input}. mp\_int $a$ and $b$ \\ 9791 \textbf{Input}. mp\_int $a$ and $b$ \\
9729 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\ 9792 \textbf{Output}. The greatest common divisor $c = (a, b)$. \\
9730 \hline \\ 9793 \hline \\
9731 1. If $a = 0$ and $b \ne 0$ then \\ 9794 1. If $a = 0$ then \\
9732 \hspace{3mm}1.1 $c \leftarrow b$ \\ 9795 \hspace{3mm}1.1 $c \leftarrow \vert b \vert $ \\
9733 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\ 9796 \hspace{3mm}1.2 Return(\textit{MP\_OKAY}). \\
9734 2. If $a \ne 0$ and $b = 0$ then \\ 9797 2. If $b = 0$ then \\
9735 \hspace{3mm}2.1 $c \leftarrow a$ \\ 9798 \hspace{3mm}2.1 $c \leftarrow \vert a \vert $ \\
9736 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\ 9799 \hspace{3mm}2.2 Return(\textit{MP\_OKAY}). \\
9737 3. If $a = b = 0$ then \\ 9800 3. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\
9738 \hspace{3mm}3.1 $c \leftarrow 1$ \\ 9801 4. $k \leftarrow 0$ \\
9739 \hspace{3mm}3.2 Return(\textit{MP\_OKAY}). \\ 9802 5. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
9740 4. $u \leftarrow \vert a \vert, v \leftarrow \vert b \vert$ \\ 9803 \hspace{3mm}5.1 $k \leftarrow k + 1$ \\
9741 5. $k \leftarrow 0$ \\ 9804 \hspace{3mm}5.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\
9742 6. While $u.used > 0$ and $v.used > 0$ and $u_0 \equiv v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 9805 \hspace{3mm}5.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\
9743 \hspace{3mm}6.1 $k \leftarrow k + 1$ \\ 9806 6. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
9744 \hspace{3mm}6.2 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 9807 \hspace{3mm}6.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\
9745 \hspace{3mm}6.3 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 9808 7. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
9746 7. While $u.used > 0$ and $u_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 9809 \hspace{3mm}7.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
9747 \hspace{3mm}7.1 $u \leftarrow \lfloor u / 2 \rfloor$ \\ 9810 8. While $v.used > 0$ \\
9748 8. While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 9811 \hspace{3mm}8.1 If $\vert u \vert > \vert v \vert$ then \\
9749 \hspace{3mm}8.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\ 9812 \hspace{6mm}8.1.1 Swap $u$ and $v$. \\
9750 9. While $v.used > 0$ \\ 9813 \hspace{3mm}8.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\
9751 \hspace{3mm}9.1 If $\vert u \vert > \vert v \vert$ then \\ 9814 \hspace{3mm}8.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\
9752 \hspace{6mm}9.1.1 Swap $u$ and $v$. \\ 9815 \hspace{6mm}8.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
9753 \hspace{3mm}9.2 $v \leftarrow \vert v \vert - \vert u \vert$ \\ 9816 9. $c \leftarrow u \cdot 2^k$ \\
9754 \hspace{3mm}9.3 While $v.used > 0$ and $v_0 \equiv 0 \mbox{ (mod }2\mbox{)}$ \\ 9817 10. Return(\textit{MP\_OKAY}). \\
9755 \hspace{6mm}9.3.1 $v \leftarrow \lfloor v / 2 \rfloor$ \\
9756 10. $c \leftarrow u \cdot 2^k$ \\
9757 11. Return(\textit{MP\_OKAY}). \\
9758 \hline 9818 \hline
9759 \end{tabular} 9819 \end{tabular}
9760 \end{center} 9820 \end{center}
9761 \end{small} 9821 \end{small}
9762 \caption{Algorithm mp\_gcd} 9822 \caption{Algorithm mp\_gcd}
9764 \textbf{Algorithm mp\_gcd.} 9824 \textbf{Algorithm mp\_gcd.}
9765 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of 9825 This algorithm will produce the greatest common divisor of two mp\_ints $a$ and $b$. The algorithm was originally based on Algorithm B of
9766 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as 9826 Knuth \cite[pp. 338]{TAOCPV2} but has been modified to be simpler to explain. In theory it achieves the same asymptotic working time as
9767 Algorithm B and in practice this appears to be true. 9827 Algorithm B and in practice this appears to be true.
9768 9828
9769 The first three steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the 9829 The first two steps handle the cases where either one of or both inputs are zero. If either input is zero the greatest common divisor is the
9770 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of 9830 largest input or zero if they are both zero. If the inputs are not trivial than $u$ and $v$ are assigned the absolute values of
9771 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair. 9831 $a$ and $b$ respectively and the algorithm will proceed to reduce the pair.
9772 9832
9773 Step six will divide out any common factors of two and keep track of the count in the variable $k$. After this step two is no longer a 9833 Step five will divide out any common factors of two and keep track of the count in the variable $k$. After this step, two is no longer a
9774 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step 9834 factor of the remaining greatest common divisor between $u$ and $v$ and can be safely evenly divided out of either whenever they are even. Step
9775 seven and eight ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while loops will iterate since 9835 six and seven ensure that the $u$ and $v$ respectively have no more factors of two. At most only one of the while--loops will iterate since
9776 they cannot both be even. 9836 they cannot both be even.
9777 9837
9778 By step nine both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to 9838 By step eight both of $u$ and $v$ are odd which is required for the inner logic. First the pair are swapped such that $v$ is equal to
9779 or greater than $u$. This ensures that the subtraction on step 9.2 will always produce a positive and even result. Step 9.3 removes any 9839 or greater than $u$. This ensures that the subtraction on step 8.2 will always produce a positive and even result. Step 8.3 removes any
9780 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd. 9840 factors of two from the difference $u$ to ensure that in the next iteration of the loop both are once again odd.
9781 9841
9782 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result 9842 After $v = 0$ occurs the variable $u$ has the greatest common divisor of the pair $\left < u, v \right >$ just after step six. The result
9783 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier. 9843 must be adjusted by multiplying by the common factors of two ($2^k$) removed earlier.
9784 9844
9792 019 \{ 9852 019 \{
9793 020 mp_int u, v; 9853 020 mp_int u, v;
9794 021 int k, u_lsb, v_lsb, res; 9854 021 int k, u_lsb, v_lsb, res;
9795 022 9855 022
9796 023 /* either zero than gcd is the largest */ 9856 023 /* either zero than gcd is the largest */
9797 024 if (mp_iszero (a) == 1 && mp_iszero (b) == 0) \{ 9857 024 if (mp_iszero (a) == MP_YES) \{
9798 025 return mp_abs (b, c); 9858 025 return mp_abs (b, c);
9799 026 \} 9859 026 \}
9800 027 if (mp_iszero (a) == 0 && mp_iszero (b) == 1) \{ 9860 027 if (mp_iszero (b) == MP_YES) \{
9801 028 return mp_abs (a, c); 9861 028 return mp_abs (a, c);
9802 029 \} 9862 029 \}
9803 030 9863 030
9804 031 /* optimized. At this point if a == 0 then 9864 031 /* get copies of a and b we can modify */
9805 032 * b must equal zero too 9865 032 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{
9806 033 */ 9866 033 return res;
9807 034 if (mp_iszero (a) == 1) \{ 9867 034 \}
9808 035 mp_zero(c); 9868 035
9809 036 return MP_OKAY; 9869 036 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{
9810 037 \} 9870 037 goto LBL_U;
9811 038 9871 038 \}
9812 039 /* get copies of a and b we can modify */ 9872 039
9813 040 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{ 9873 040 /* must be positive for the remainder of the algorithm */
9814 041 return res; 9874 041 u.sign = v.sign = MP_ZPOS;
9815 042 \} 9875 042
9816 043 9876 043 /* B1. Find the common power of two for u and v */
9817 044 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{ 9877 044 u_lsb = mp_cnt_lsb(&u);
9818 045 goto LBL_U; 9878 045 v_lsb = mp_cnt_lsb(&v);
9819 046 \} 9879 046 k = MIN(u_lsb, v_lsb);
9820 047 9880 047
9821 048 /* must be positive for the remainder of the algorithm */ 9881 048 if (k > 0) \{
9822 049 u.sign = v.sign = MP_ZPOS; 9882 049 /* divide the power of two out */
9823 050 9883 050 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{
9824 051 /* B1. Find the common power of two for u and v */ 9884 051 goto LBL_V;
9825 052 u_lsb = mp_cnt_lsb(&u); 9885 052 \}
9826 053 v_lsb = mp_cnt_lsb(&v); 9886 053
9827 054 k = MIN(u_lsb, v_lsb); 9887 054 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{
9828 055 9888 055 goto LBL_V;
9829 056 if (k > 0) \{ 9889 056 \}
9830 057 /* divide the power of two out */ 9890 057 \}
9831 058 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{ 9891 058
9832 059 goto LBL_V; 9892 059 /* divide any remaining factors of two out */
9833 060 \} 9893 060 if (u_lsb != k) \{
9834 061 9894 061 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{
9835 062 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{ 9895 062 goto LBL_V;
9836 063 goto LBL_V; 9896 063 \}
9837 064 \} 9897 064 \}
9838 065 \} 9898 065
9839 066 9899 066 if (v_lsb != k) \{
9840 067 /* divide any remaining factors of two out */ 9900 067 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{
9841 068 if (u_lsb != k) \{ 9901 068 goto LBL_V;
9842 069 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{ 9902 069 \}
9843 070 goto LBL_V; 9903 070 \}
9844 071 \} 9904 071
9845 072 \} 9905 072 while (mp_iszero(&v) == 0) \{
9846 073 9906 073 /* make sure v is the largest */
9847 074 if (v_lsb != k) \{ 9907 074 if (mp_cmp_mag(&u, &v) == MP_GT) \{
9848 075 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{ 9908 075 /* swap u and v to make sure v is >= u */
9849 076 goto LBL_V; 9909 076 mp_exch(&u, &v);
9850 077 \} 9910 077 \}
9851 078 \} 9911 078
9852 079 9912 079 /* subtract smallest from largest */
9853 080 while (mp_iszero(&v) == 0) \{ 9913 080 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{
9854 081 /* make sure v is the largest */ 9914 081 goto LBL_V;
9855 082 if (mp_cmp_mag(&u, &v) == MP_GT) \{ 9915 082 \}
9856 083 /* swap u and v to make sure v is >= u */ 9916 083
9857 084 mp_exch(&u, &v); 9917 084 /* Divide out all factors of two */
9858 085 \} 9918 085 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{
9859 086 9919 086 goto LBL_V;
9860 087 /* subtract smallest from largest */ 9920 087 \}
9861 088 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{ 9921 088 \}
9862 089 goto LBL_V; 9922 089
9863 090 \} 9923 090 /* multiply by 2**k which we divided out at the beginning */
9864 091 9924 091 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{
9865 092 /* Divide out all factors of two */ 9925 092 goto LBL_V;
9866 093 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{ 9926 093 \}
9867 094 goto LBL_V; 9927 094 c->sign = MP_ZPOS;
9868 095 \} 9928 095 res = MP_OKAY;
9869 096 \} 9929 096 LBL_V:mp_clear (&u);
9870 097 9930 097 LBL_U:mp_clear (&v);
9871 098 /* multiply by 2**k which we divided out at the beginning */ 9931 098 return res;
9872 099 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{ 9932 099 \}
9873 100 goto LBL_V; 9933 100 #endif
9874 101 \} 9934 101
9875 102 c->sign = MP_ZPOS;
9876 103 res = MP_OKAY;
9877 104 LBL_V:mp_clear (&u);
9878 105 LBL_U:mp_clear (&v);
9879 106 return res;
9880 107 \}
9881 108 #endif
9882 \end{alltt} 9935 \end{alltt}
9883 \end{small} 9936 \end{small}
9884 9937
9885 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the 9938 This function makes use of the macros mp\_iszero and mp\_iseven. The former evaluates to $1$ if the input mp\_int is equivalent to the
9886 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise 9939 integer zero otherwise it evaluates to $0$. The latter evaluates to $1$ if the input mp\_int represents a non-zero even integer otherwise
9887 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three 9940 it evaluates to $0$. Note that just because mp\_iseven may evaluate to $0$ does not mean the input is odd, it could also be zero. The three
9888 trivial cases of inputs are handled on lines 24 through 37. After those lines the inputs are assumed to be non-zero. 9941 trivial cases of inputs are handled on lines 23 through 29. After those lines the inputs are assumed to be non-zero.
9889 9942
9890 Lines 34 and 40 make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two 9943 Lines 32 and 36 make local copies $u$ and $v$ of the inputs $a$ and $b$ respectively. At this point the common factors of two
9891 must be divided out of the two inputs. The while loop on line 80 iterates so long as both are even. The local integer $k$ is used to 9944 must be divided out of the two inputs. The block starting at line 43 removes common factors of two by first counting the number of trailing
9892 keep track of how many factors of $2$ are pulled out of both values. It is assumed that the number of factors will not exceed the maximum 9945 zero bits in both. The local integer $k$ is used to keep track of how many factors of $2$ are pulled out of both values. It is assumed that
9893 value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than entries than are accessible by an ``int'' so this is not 9946 the number of factors will not exceed the maximum value of a C ``int'' data type\footnote{Strictly speaking no array in C may have more than
9894 a limitation.}. 9947 entries than are accessible by an ``int'' so this is not a limitation.}.
9895 9948
9896 At this point there are no more common factors of two in the two values. The while loops on lines 80 and 80 remove any independent 9949 At this point there are no more common factors of two in the two values. The divisions by a power of two on lines 61 and 67 remove
9897 factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop 9950 any independent factors of two such that both $u$ and $v$ are guaranteed to be an odd integer before hitting the main body of the algorithm. The while loop
9898 on line 80 performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in 9951 on line 72 performs the reduction of the pair until $v$ is equal to zero. The unsigned comparison and subtraction algorithms are used in
9899 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative. 9952 place of the full signed routines since both values are guaranteed to be positive and the result of the subtraction is guaranteed to be non-negative.
9900 9953
9901 \section{Least Common Multiple} 9954 \section{Least Common Multiple}
9902 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the 9955 The least common multiple of a pair of integers is their product divided by their greatest common divisor. For two integers $a$ and $b$ the
9903 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$ 9956 least common multiple is normally denoted as $[ a, b ]$ and numerically equivalent to ${ab} \over {(a, b)}$. For example, if $a = 2 \cdot 2 \cdot 3 = 12$
9972 051 LBL_T: 10025 051 LBL_T:
9973 052 mp_clear_multi (&t1, &t2, NULL); 10026 052 mp_clear_multi (&t1, &t2, NULL);
9974 053 return res; 10027 053 return res;
9975 054 \} 10028 054 \}
9976 055 #endif 10029 055 #endif
10030 056
9977 \end{alltt} 10031 \end{alltt}
9978 \end{small} 10032 \end{small}
9979 10033
9980 \section{Jacobi Symbol Computation} 10034 \section{Jacobi Symbol Computation}
9981 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is 10035 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is
10216 096 LBL_P1:mp_clear (&p1); 10270 096 LBL_P1:mp_clear (&p1);
10217 097 LBL_A1:mp_clear (&a1); 10271 097 LBL_A1:mp_clear (&a1);
10218 098 return res; 10272 098 return res;
10219 099 \} 10273 099 \}
10220 100 #endif 10274 100 #endif
10275 101
10221 \end{alltt} 10276 \end{alltt}
10222 \end{small} 10277 \end{small}
10223 10278
10224 As a matter of practicality the variable $a'$ as per the pseudo-code is reprensented by the variable $a1$ since the $'$ symbol is not valid for a C 10279 As a matter of practicality the variable $a'$ as per the pseudo-code is reprensented by the variable $a1$ since the $'$ symbol is not valid for a C
10225 variable name character. 10280 variable name character.
10364 034 #endif 10419 034 #endif
10365 035 10420 035
10366 036 return MP_VAL; 10421 036 return MP_VAL;
10367 037 \} 10422 037 \}
10368 038 #endif 10423 038 #endif
10424 039
10369 \end{alltt} 10425 \end{alltt}
10370 \end{small} 10426 \end{small}
10371 10427
10372 \subsubsection{Odd Moduli} 10428 \subsubsection{Odd Moduli}
10373 10429
10465 041 \} 10521 041 \}
10466 042 10522 042
10467 043 return MP_OKAY; 10523 043 return MP_OKAY;
10468 044 \} 10524 044 \}
10469 045 #endif 10525 045 #endif
10526 046
10470 \end{alltt} 10527 \end{alltt}
10471 \end{small} 10528 \end{small}
10472 10529
10473 The algorithm defaults to a return of $0$ in case an error occurs. The values in the prime table are all specified to be in the range of a 10530 The algorithm defaults to a return of $0$ in case an error occurs. The values in the prime table are all specified to be in the range of a
10474 mp\_digit. The table \_\_prime\_tab is defined in the following file. 10531 mp\_digit. The table \_\_prime\_tab is defined in the following file.
10516 052 0x05F3, 0x05FB, 0x0607, 0x060D, 0x0611, 0x0617, 0x061F, 0x0623, 10573 052 0x05F3, 0x05FB, 0x0607, 0x060D, 0x0611, 0x0617, 0x061F, 0x0623,
10517 053 0x062B, 0x062F, 0x063D, 0x0641, 0x0647, 0x0649, 0x064D, 0x0653 10574 053 0x062B, 0x062F, 0x063D, 0x0641, 0x0647, 0x0649, 0x064D, 0x0653
10518 054 #endif 10575 054 #endif
10519 055 \}; 10576 055 \};
10520 056 #endif 10577 056 #endif
10578 057
10521 \end{alltt} 10579 \end{alltt}
10522 \end{small} 10580 \end{small}
10523 10581
10524 Note that there are two possible tables. When an mp\_digit is 7-bits long only the primes upto $127$ may be included, otherwise the primes 10582 Note that there are two possible tables. When an mp\_digit is 7-bits long only the primes upto $127$ may be included, otherwise the primes
10525 upto $1619$ are used. Note that the value of \textbf{PRIME\_SIZE} is a constant dependent on the size of a mp\_digit. 10583 upto $1619$ are used. Note that the value of \textbf{PRIME\_SIZE} is a constant dependent on the size of a mp\_digit.
10604 053 err = MP_OKAY; 10662 053 err = MP_OKAY;
10605 054 LBL_T:mp_clear (&t); 10663 054 LBL_T:mp_clear (&t);
10606 055 return err; 10664 055 return err;
10607 056 \} 10665 056 \}
10608 057 #endif 10666 057 #endif
10667 058
10609 \end{alltt} 10668 \end{alltt}
10610 \end{small} 10669 \end{small}
10611 10670
10612 \subsection{The Miller-Rabin Test} 10671 \subsection{The Miller-Rabin Test}
10613 The Miller-Rabin (citation) test is another primality test which has tighter error bounds than the Fermat test specifically with sequentially chosen 10672 The Miller-Rabin (citation) test is another primality test which has tighter error bounds than the Fermat test specifically with sequentially chosen
10739 094 LBL_R:mp_clear (&r); 10798 094 LBL_R:mp_clear (&r);
10740 095 LBL_N1:mp_clear (&n1); 10799 095 LBL_N1:mp_clear (&n1);
10741 096 return err; 10800 096 return err;
10742 097 \} 10801 097 \}
10743 098 #endif 10802 098 #endif
10803 099
10744 \end{alltt} 10804 \end{alltt}
10745 \end{small} 10805 \end{small}
10746 10806
10747 10807
10748 10808