Mercurial > dropbear
comparison tommath.tex @ 190:d8254fc979e9 libtommath-orig LTM_0.35
Initial import of libtommath 0.35
author | Matt Johnston <matt@ucc.asn.au> |
---|---|
date | Fri, 06 May 2005 08:59:30 +0000 |
parents | d29b64170cf0 |
children |
comparison
equal
deleted
inserted
replaced
142:d29b64170cf0 | 190:d8254fc979e9 |
---|---|
47 \def\gap{\vspace{0.5ex}} | 47 \def\gap{\vspace{0.5ex}} |
48 \makeindex | 48 \makeindex |
49 \begin{document} | 49 \begin{document} |
50 \frontmatter | 50 \frontmatter |
51 \pagestyle{empty} | 51 \pagestyle{empty} |
52 \title{Implementing Multiple Precision Arithmetic \\ ~ \\ Draft Edition } | 52 \title{Multi--Precision Math} |
53 \author{\mbox{ | 53 \author{\mbox{ |
54 %\begin{small} | 54 %\begin{small} |
55 \begin{tabular}{c} | 55 \begin{tabular}{c} |
56 Tom St Denis \\ | 56 Tom St Denis \\ |
57 Algonquin College \\ | 57 Algonquin College \\ |
64 \end{tabular} | 64 \end{tabular} |
65 %\end{small} | 65 %\end{small} |
66 } | 66 } |
67 } | 67 } |
68 \maketitle | 68 \maketitle |
69 This text has been placed in the public domain. This text corresponds to the v0.30 release of the | 69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the |
70 LibTomMath project. | 70 LibTomMath project. |
71 | 71 |
72 \begin{alltt} | 72 \begin{alltt} |
73 Tom St Denis | 73 Tom St Denis |
74 111 Banning Rd | 74 111 Banning Rd |
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} | 83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} |
84 {\em book} macro package and the Perl {\em booker} package. | 84 {\em book} macro package and the Perl {\em booker} package. |
85 | 85 |
86 \tableofcontents | 86 \tableofcontents |
87 \listoffigures | 87 \listoffigures |
88 \chapter*{Prefaces to the Draft Edition} | 88 \chapter*{Prefaces} |
89 I started this text in April 2003 to complement my LibTomMath library. That is, explain how to implement the functions | 89 When I tell people about my LibTom projects and that I release them as public domain they are often puzzled. |
90 contained in LibTomMath. The goal is to have a textbook that any Computer Science student can use when implementing their | 90 They ask why I did it and especially why I continue to work on them for free. The best I can explain it is ``Because I can.'' |
91 own multiple precision arithmetic. The plan I wanted to follow was flesh out all the | 91 Which seems odd and perhaps too terse for adult conversation. I often qualify it with ``I am able, I am willing.'' which |
92 ideas and concepts I had floating around in my head and then work on it afterwards refining a little bit at a time. Chance | 92 perhaps explains it better. I am the first to admit there is not anything that special with what I have done. Perhaps |
93 would have it that I ended up with my summer off from Algonquin College and I was given four months solid to work on the | 93 others can see that too and then we would have a society to be proud of. My LibTom projects are what I am doing to give |
94 text. | 94 back to society in the form of tools and knowledge that can help others in their endeavours. |
95 | 95 |
96 Choosing to not waste any time I dove right into the project even before my spring semester was finished. I wrote a bit | 96 I started writing this book because it was the most logical task to further my goal of open academia. The LibTomMath source |
97 off and on at first. The moment my exams were finished I jumped into long 12 to 16 hour days. The result after only | 97 code itself was written to be easy to follow and learn from. There are times, however, where pure C source code does not |
98 a couple of months was a ten chapter, three hundred page draft that I quickly had distributed to anyone who wanted | 98 explain the algorithms properly. Hence this book. The book literally starts with the foundation of the library and works |
99 to read it. I had Jean-Luc Cooke print copies for me and I brought them to Crypto'03 in Santa Barbara. So far I have | 99 itself outwards to the more complicated algorithms. The use of both pseudo--code and verbatim source code provides a duality |
100 managed to grab a certain level of attention having people from around the world ask me for copies of the text was certain | 100 of ``theory'' and ``practice'' that the computer science students of the world shall appreciate. I never deviate too far |
101 rewarding. | 101 from relatively straightforward algebra and I hope that this book can be a valuable learning asset. |
102 | 102 |
103 Now we are past December 2003. By this time I had pictured that I would have at least finished my second draft of the text. | 103 This book and indeed much of the LibTom projects would not exist in their current form if it was not for a plethora |
104 Currently I am far off from this goal. I've done partial re-writes of chapters one, two and three but they are not even | 104 of kind people donating their time, resources and kind words to help support my work. Writing a text of significant |
105 finished yet. I haven't given up on the project, only had some setbacks. First O'Reilly declined to publish the text then | 105 length (along with the source code) is a tiresome and lengthy process. Currently the LibTom project is four years old, |
106 Addison-Wesley and Greg is tried another which I don't know the name of. However, at this point I want to focus my energy | 106 comprises of literally thousands of users and over 100,000 lines of source code, TeX and other material. People like Mads and Greg |
107 onto finishing the book not securing a contract. | 107 were there at the beginning to encourage me to work well. It is amazing how timely validation from others can boost morale to |
108 | 108 continue the project. Definitely my parents were there for me by providing room and board during the many months of work in 2003. |
109 So why am I writing this text? It seems like a lot of work right? Most certainly it is a lot of work writing a textbook. | 109 |
110 Even the simplest introductory material has to be lined with references and figures. A lot of the text has to be re-written | 110 To my many friends whom I have met through the years I thank you for the good times and the words of encouragement. I hope I |
111 from point form to prose form to ensure an easier read. Why am I doing all this work for free then? Simple. My philosophy | 111 honour your kind gestures with this project. |
112 is quite simply ``Open Source. Open Academia. Open Minds'' which means that to achieve a goal of open minds, that is, | 112 |
113 people willing to accept new ideas and explore the unknown you have to make available material they can access freely | 113 Open Source. Open Academia. Open Minds. |
114 without hinderance. | |
115 | |
116 I've been writing free software since I was about sixteen but only recently have I hit upon software that people have come | |
117 to depend upon. I started LibTomCrypt in December 2001 and now several major companies use it as integral portions of their | |
118 software. Several educational institutions use it as a matter of course and many freelance developers use it as | |
119 part of their projects. To further my contributions I started the LibTomMath project in December 2002 aimed at providing | |
120 multiple precision arithmetic routines that students could learn from. That is write routines that are not only easy | |
121 to understand and follow but provide quite impressive performance considering they are all in standard portable ISO C. | |
122 | |
123 The second leg of my philosophy is ``Open Academia'' which is where this textbook comes in. In the end, when all is | |
124 said and done the text will be useable by educational institutions as a reference on multiple precision arithmetic. | |
125 | |
126 At this time I feel I should share a little information about myself. The most common question I was asked at | |
127 Crypto'03, perhaps just out of professional courtesy, was which school I either taught at or attended. The unfortunate | |
128 truth is that I neither teach at or attend a school of academic reputation. I'm currently at Algonquin College which | |
129 is what I'd like to call ``somewhat academic but mostly vocational'' college. In otherwords, job training. | |
130 | |
131 I'm a 21 year old computer science student mostly self-taught in the areas I am aware of (which includes a half-dozen | |
132 computer science fields, a few fields of mathematics and some English). I look forward to teaching someday but I am | |
133 still far off from that goal. | |
134 | |
135 Now it would be improper for me to not introduce the rest of the texts co-authors. While they are only contributing | |
136 corrections and editorial feedback their support has been tremendously helpful in presenting the concepts laid out | |
137 in the text so far. Greg has always been there for me. He has tracked my LibTom projects since their inception and even | |
138 sent cheques to help pay tuition from time to time. His background has provided a wonderful source to bounce ideas off | |
139 of and improve the quality of my writing. Mads is another fellow who has just ``been there''. I don't even recall what | |
140 his interest in the LibTom projects is but I'm definitely glad he has been around. His ability to catch logical errors | |
141 in my written English have saved me on several occasions to say the least. | |
142 | |
143 What to expect next? Well this is still a rough draft. I've only had the chance to update a few chapters. However, I've | |
144 been getting the feeling that people are starting to use my text and I owe them some updated material. My current tenative | |
145 plan is to edit one chapter every two weeks starting January 4th. It seems insane but my lower course load at college | |
146 should provide ample time. By Crypto'04 I plan to have a 2nd draft of the text polished and ready to hand out to as many | |
147 people who will take it. | |
148 | 114 |
149 \begin{flushright} Tom St Denis \end{flushright} | 115 \begin{flushright} Tom St Denis \end{flushright} |
150 | 116 |
151 \newpage | 117 \newpage |
152 I found the opportunity to work with Tom appealing for several reasons, not only could I broaden my own horizons, but also | 118 I found the opportunity to work with Tom appealing for several reasons, not only could I broaden my own horizons, but also |
1043 051 \} | 1009 051 \} |
1044 052 #endif | 1010 052 #endif |
1045 \end{alltt} | 1011 \end{alltt} |
1046 \end{small} | 1012 \end{small} |
1047 | 1013 |
1048 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 23) checks | 1014 A quick optimization is to first determine if a memory re-allocation is required at all. The if statement (line 24) checks |
1049 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} | 1015 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} |
1050 the function skips the re-allocation part thus saving time. | 1016 the function skips the re-allocation part thus saving time. |
1051 | 1017 |
1052 When a re-allocation is performed it is turned into an optimal request to save time in the future. The requested digit count is | 1018 When a re-allocation is performed it is turned into an optimal request to save time in the future. The requested digit count is |
1053 padded upwards to 2nd multiple of \textbf{MP\_PREC} larger than \textbf{alloc} (line 26). The XREALLOC function is used | 1019 padded upwards to 2nd multiple of \textbf{MP\_PREC} larger than \textbf{alloc} (line 26). The XREALLOC function is used |
1588 \hspace{-5.1mm}{\bf File}: bn\_mp\_zero.c | 1554 \hspace{-5.1mm}{\bf File}: bn\_mp\_zero.c |
1589 \vspace{-3mm} | 1555 \vspace{-3mm} |
1590 \begin{alltt} | 1556 \begin{alltt} |
1591 016 | 1557 016 |
1592 017 /* set to zero */ | 1558 017 /* set to zero */ |
1593 018 void | 1559 018 void mp_zero (mp_int * a) |
1594 019 mp_zero (mp_int * a) | 1560 019 \{ |
1595 020 \{ | 1561 020 int n; |
1596 021 a->sign = MP_ZPOS; | 1562 021 mp_digit *tmp; |
1597 022 a->used = 0; | 1563 022 |
1598 023 memset (a->dp, 0, sizeof (mp_digit) * a->alloc); | 1564 023 a->sign = MP_ZPOS; |
1599 024 \} | 1565 024 a->used = 0; |
1600 025 #endif | 1566 025 |
1567 026 tmp = a->dp; | |
1568 027 for (n = 0; n < a->alloc; n++) \{ | |
1569 028 *tmp++ = 0; | |
1570 029 \} | |
1571 030 \} | |
1572 031 #endif | |
1601 \end{alltt} | 1573 \end{alltt} |
1602 \end{small} | 1574 \end{small} |
1603 | 1575 |
1604 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the | 1576 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the |
1605 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. | 1577 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. |
1607 \section{Sign Manipulation} | 1579 \section{Sign Manipulation} |
1608 \subsection{Absolute Value} | 1580 \subsection{Absolute Value} |
1609 With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute | 1581 With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute |
1610 the absolute value of an mp\_int. | 1582 the absolute value of an mp\_int. |
1611 | 1583 |
1612 \newpage\begin{figure}[here] | 1584 \begin{figure}[here] |
1613 \begin{center} | 1585 \begin{center} |
1614 \begin{tabular}{l} | 1586 \begin{tabular}{l} |
1615 \hline Algorithm \textbf{mp\_abs}. \\ | 1587 \hline Algorithm \textbf{mp\_abs}. \\ |
1616 \textbf{Input}. An mp\_int $a$ \\ | 1588 \textbf{Input}. An mp\_int $a$ \\ |
1617 \textbf{Output}. Computes $b = \vert a \vert$ \\ | 1589 \textbf{Output}. Computes $b = \vert a \vert$ \\ |
1660 037 \} | 1632 037 \} |
1661 038 #endif | 1633 038 #endif |
1662 \end{alltt} | 1634 \end{alltt} |
1663 \end{small} | 1635 \end{small} |
1664 | 1636 |
1637 This fairly trivial algorithm first eliminates non--required duplications (line 27) and then sets the | |
1638 \textbf{sign} flag to \textbf{MP\_ZPOS}. | |
1639 | |
1665 \subsection{Integer Negation} | 1640 \subsection{Integer Negation} |
1666 With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute | 1641 With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute |
1667 the negative of an mp\_int input. | 1642 the negative of an mp\_int input. |
1668 | 1643 |
1669 \begin{figure}[here] | 1644 \begin{figure}[here] |
1700 016 | 1675 016 |
1701 017 /* b = -a */ | 1676 017 /* b = -a */ |
1702 018 int mp_neg (mp_int * a, mp_int * b) | 1677 018 int mp_neg (mp_int * a, mp_int * b) |
1703 019 \{ | 1678 019 \{ |
1704 020 int res; | 1679 020 int res; |
1705 021 if ((res = mp_copy (a, b)) != MP_OKAY) \{ | 1680 021 if (a != b) \{ |
1706 022 return res; | 1681 022 if ((res = mp_copy (a, b)) != MP_OKAY) \{ |
1707 023 \} | 1682 023 return res; |
1708 024 if (mp_iszero(b) != MP_YES) \{ | 1683 024 \} |
1709 025 b->sign = (a->sign == MP_ZPOS) ? MP_NEG : MP_ZPOS; | 1684 025 \} |
1710 026 \} | 1685 026 |
1711 027 return MP_OKAY; | 1686 027 if (mp_iszero(b) != MP_YES) \{ |
1712 028 \} | 1687 028 b->sign = (a->sign == MP_ZPOS) ? MP_NEG : MP_ZPOS; |
1713 029 #endif | 1688 029 \} else \{ |
1689 030 b->sign = MP_ZPOS; | |
1690 031 \} | |
1691 032 | |
1692 033 return MP_OKAY; | |
1693 034 \} | |
1694 035 #endif | |
1714 \end{alltt} | 1695 \end{alltt} |
1715 \end{small} | 1696 \end{small} |
1697 | |
1698 Like mp\_abs() this function avoids non--required duplications (line 21) and then sets the sign. We | |
1699 have to make sure that only non--zero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero | |
1700 than the \textbf{sign} is hard--coded to \textbf{MP\_ZPOS}. | |
1716 | 1701 |
1717 \section{Small Constants} | 1702 \section{Small Constants} |
1718 \subsection{Setting Small Constants} | 1703 \subsection{Setting Small Constants} |
1719 Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful. | 1704 Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful. |
1720 | 1705 |
1721 \begin{figure}[here] | 1706 \newpage\begin{figure}[here] |
1722 \begin{center} | 1707 \begin{center} |
1723 \begin{tabular}{l} | 1708 \begin{tabular}{l} |
1724 \hline Algorithm \textbf{mp\_set}. \\ | 1709 \hline Algorithm \textbf{mp\_set}. \\ |
1725 \textbf{Input}. An mp\_int $a$ and a digit $b$ \\ | 1710 \textbf{Input}. An mp\_int $a$ and a digit $b$ \\ |
1726 \textbf{Output}. Make $a$ equivalent to $b$ \\ | 1711 \textbf{Output}. Make $a$ equivalent to $b$ \\ |
1755 023 \} | 1740 023 \} |
1756 024 #endif | 1741 024 #endif |
1757 \end{alltt} | 1742 \end{alltt} |
1758 \end{small} | 1743 \end{small} |
1759 | 1744 |
1760 Line 20 calls mp\_zero() to clear the mp\_int and reset the sign. Line 21 copies the digit | 1745 First we zero (line 20) the mp\_int to make sure that the other members are initialized for a |
1761 into the least significant location. Note the usage of a new constant \textbf{MP\_MASK}. This constant is used to quickly | 1746 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count |
1762 reduce an integer modulo $\beta$. Since $\beta$ is of the form $2^k$ for any suitable $k$ it suffices to perform a binary AND with | 1747 is zero. Next we set the digit and reduce it modulo $\beta$ (line 21). After this step we have to |
1763 $MP\_MASK = 2^k - 1$ to perform the reduction. Finally line 22 will set the \textbf{used} member with respect to the | 1748 check if the resulting digit is zero or not. If it is not then we set the \textbf{used} count to one, otherwise |
1764 digit actually set. This function will always make the integer positive. | 1749 to zero. |
1750 | |
1751 We can quickly reduce modulo $\beta$ since it is of the form $2^k$ and a quick binary AND operation with | |
1752 $2^k - 1$ will perform the same operation. | |
1765 | 1753 |
1766 One important limitation of this function is that it will only set one digit. The size of a digit is not fixed, meaning source that uses | 1754 One important limitation of this function is that it will only set one digit. The size of a digit is not fixed, meaning source that uses |
1767 this function should take that into account. Only trivially small constants can be set using this function. | 1755 this function should take that into account. Only trivially small constants can be set using this function. |
1768 | 1756 |
1769 \subsection{Setting Large Constants} | 1757 \subsection{Setting Large Constants} |
1934 049 \} | 1922 049 \} |
1935 050 #endif | 1923 050 #endif |
1936 \end{alltt} | 1924 \end{alltt} |
1937 \end{small} | 1925 \end{small} |
1938 | 1926 |
1939 The two if statements on lines 24 and 28 compare the number of digits in the two inputs. These two are performed before all of the digits | 1927 The two if statements (lines 24 and 28) compare the number of digits in the two inputs. These two are |
1940 are compared since it is a very cheap test to perform and can potentially save considerable time. The implementation given is also not valid | 1928 performed before all of the digits are compared since it is a very cheap test to perform and can potentially save |
1941 without those two statements. $b.alloc$ may be smaller than $a.used$, meaning that undefined values will be read from $b$ past the end of the | 1929 considerable time. The implementation given is also not valid without those two statements. $b.alloc$ may be |
1942 array of digits. | 1930 smaller than $a.used$, meaning that undefined values will be read from $b$ past the end of the array of digits. |
1931 | |
1932 | |
1943 | 1933 |
1944 \subsection{Signed Comparisons} | 1934 \subsection{Signed Comparisons} |
1945 Comparing with sign considerations is also fairly critical in several routines (\textit{division for example}). Based on an unsigned magnitude | 1935 Comparing with sign considerations is also fairly critical in several routines (\textit{division for example}). Based on an unsigned magnitude |
1946 comparison a trivial signed comparison algorithm can be written. | 1936 comparison a trivial signed comparison algorithm can be written. |
1947 | 1937 |
1998 037 \} | 1988 037 \} |
1999 038 #endif | 1989 038 #endif |
2000 \end{alltt} | 1990 \end{alltt} |
2001 \end{small} | 1991 \end{small} |
2002 | 1992 |
2003 The two if statements on lines 22 and 23 perform the initial sign comparison. If the signs are not the equal then which ever | 1993 The two if statements (lines 22 and 23) perform the initial sign comparison. If the signs are not the equal then which ever |
2004 has the positive sign is larger. At line 31, the inputs are compared based on magnitudes. If the signs were both negative then | 1994 has the positive sign is larger. The inputs are compared (line 31) based on magnitudes. If the signs were both |
2005 the unsigned comparison is performed in the opposite direction (\textit{line 33}). Otherwise, the signs are assumed to | 1995 negative then the unsigned comparison is performed in the opposite direction (line 33). Otherwise, the signs are assumed to |
2006 be both positive and a forward direction unsigned comparison is performed. | 1996 be both positive and a forward direction unsigned comparison is performed. |
2007 | 1997 |
2008 \section*{Exercises} | 1998 \section*{Exercises} |
2009 \begin{tabular}{cl} | 1999 \begin{tabular}{cl} |
2010 $\left [ 2 \right ]$ & Modify algorithm mp\_set\_int to accept as input a variable length array of bits. \\ | 2000 $\left [ 2 \right ]$ & Modify algorithm mp\_set\_int to accept as input a variable length array of bits. \\ |
2216 103 \} | 2206 103 \} |
2217 104 #endif | 2207 104 #endif |
2218 \end{alltt} | 2208 \end{alltt} |
2219 \end{small} | 2209 \end{small} |
2220 | 2210 |
2221 Lines 27 to 35 perform the initial sorting of the inputs and determine the $min$ and $max$ variables. Note that $x$ is a pointer to a | 2211 We first sort (lines 27 to 35) the inputs based on magnitude and determine the $min$ and $max$ variables. |
2222 mp\_int assigned to the largest input, in effect it is a local alias. Lines 37 to 42 ensure that the destination is grown to | 2212 Note that $x$ is a pointer to an mp\_int assigned to the largest input, in effect it is a local alias. Next we |
2223 accomodate the result of the addition. | 2213 grow the destination (37 to 42) ensure that it can accomodate the result of the addition. |
2224 | 2214 |
2225 Similar to the implementation of mp\_copy this function uses the braced code and local aliases coding style. The three aliases that are on | 2215 Similar to the implementation of mp\_copy this function uses the braced code and local aliases coding style. The three aliases that are on |
2226 lines 55, 58 and 61 represent the two inputs and destination variables respectively. These aliases are used to ensure the | 2216 lines 55, 58 and 61 represent the two inputs and destination variables respectively. These aliases are used to ensure the |
2227 compiler does not have to dereference $a$, $b$ or $c$ (respectively) to access the digits of the respective mp\_int. | 2217 compiler does not have to dereference $a$, $b$ or $c$ (respectively) to access the digits of the respective mp\_int. |
2228 | 2218 |
2229 The initial carry $u$ is cleared on line 64, note that $u$ is of type mp\_digit which ensures type compatibility within the | 2219 The initial carry $u$ will be cleared (line 64), note that $u$ is of type mp\_digit which ensures type |
2230 implementation. The initial addition loop begins on line 65 and ends on line 74. Similarly the conditional addition loop | 2220 compatibility within the implementation. The initial addition (line 65 to 74) adds digits from |
2231 begins on line 80 and ends on line 90. The addition is finished with the final carry being stored in $tmpc$ on line 93. | 2221 both inputs until the smallest input runs out of digits. Similarly the conditional addition loop |
2232 Note the ``++'' operator on the same line. After line 93 $tmpc$ will point to the $c.used$'th digit of the mp\_int $c$. This is useful | 2222 (line 80 to 90) adds the remaining digits from the larger of the two inputs. The addition is finished |
2233 for the next loop on lines 96 to 99 which set any old upper digits to zero. | 2223 with the final carry being stored in $tmpc$ (line 93). Note the ``++'' operator within the same expression. |
2224 After line 93, $tmpc$ will point to the $c.used$'th digit of the mp\_int $c$. This is useful | |
2225 for the next loop (line 96 to 99) which set any old upper digits to zero. | |
2234 | 2226 |
2235 \subsection{Low Level Subtraction} | 2227 \subsection{Low Level Subtraction} |
2236 The low level unsigned subtraction algorithm is very similar to the low level unsigned addition algorithm. The principle difference is that the | 2228 The low level unsigned subtraction algorithm is very similar to the low level unsigned addition algorithm. The principle difference is that the |
2237 unsigned subtraction algorithm requires the result to be positive. That is when computing $a - b$ the condition $\vert a \vert \ge \vert b\vert$ must | 2229 unsigned subtraction algorithm requires the result to be positive. That is when computing $a - b$ the condition $\vert a \vert \ge \vert b\vert$ must |
2238 be met for this algorithm to function properly. Keep in mind this low level algorithm is not meant to be used in higher level algorithms directly. | 2230 be met for this algorithm to function properly. Keep in mind this low level algorithm is not meant to be used in higher level algorithms directly. |
2243 the range $0 \le x < 2\beta$ for the algorithms to work correctly. However, it is allowable that a mp\_digit represent a larger range of values. For | 2235 the range $0 \le x < 2\beta$ for the algorithms to work correctly. However, it is allowable that a mp\_digit represent a larger range of values. For |
2244 this algorithm we will assume that the variable $\gamma$ represents the number of bits available in a | 2236 this algorithm we will assume that the variable $\gamma$ represents the number of bits available in a |
2245 mp\_digit (\textit{this implies $2^{\gamma} > \beta$}). | 2237 mp\_digit (\textit{this implies $2^{\gamma} > \beta$}). |
2246 | 2238 |
2247 For example, the default for LibTomMath is to use a ``unsigned long'' for the mp\_digit ``type'' while $\beta = 2^{28}$. In ISO C an ``unsigned long'' | 2239 For example, the default for LibTomMath is to use a ``unsigned long'' for the mp\_digit ``type'' while $\beta = 2^{28}$. In ISO C an ``unsigned long'' |
2248 data type must be able to represent $0 \le x < 2^{32}$ meaning that in this case $\gamma = 32$. | 2240 data type must be able to represent $0 \le x < 2^{32}$ meaning that in this case $\gamma \ge 32$. |
2249 | 2241 |
2250 \newpage\begin{figure}[!here] | 2242 \newpage\begin{figure}[!here] |
2251 \begin{center} | 2243 \begin{center} |
2252 \begin{small} | 2244 \begin{small} |
2253 \begin{tabular}{l} | 2245 \begin{tabular}{l} |
2385 083 | 2377 083 |
2386 084 #endif | 2378 084 #endif |
2387 \end{alltt} | 2379 \end{alltt} |
2388 \end{small} | 2380 \end{small} |
2389 | 2381 |
2390 Line 24 and 25 perform the initial hardcoded sorting of the inputs. In reality the $min$ and $max$ variables are only aliases and are only | 2382 Like low level addition we ``sort'' the inputs. Except in this case the sorting is hardcoded |
2391 used to make the source code easier to read. Again the pointer alias optimization is used within this algorithm. Lines 41, 42 and 43 initialize the aliases for | 2383 (lines 24 and 25). In reality the $min$ and $max$ variables are only aliases and are only |
2392 $a$, $b$ and $c$ respectively. | 2384 used to make the source code easier to read. Again the pointer alias optimization is used |
2393 | 2385 within this algorithm. The aliases $tmpa$, $tmpb$ and $tmpc$ are initialized |
2394 The first subtraction loop occurs on lines 46 through 60. The theory behind the subtraction loop is exactly the same as that for | 2386 (lines 41, 42 and 43) for $a$, $b$ and $c$ respectively. |
2395 the addition loop. As remarked earlier there is an implementation reason for using the ``awkward'' method of extracting the carry | 2387 |
2396 (\textit{see line 56}). The traditional method for extracting the carry would be to shift by $lg(\beta)$ positions and logically AND | 2388 The first subtraction loop (lines 46 through 60) subtract digits from both inputs until the smaller of |
2397 the least significant bit. The AND operation is required because all of the bits above the $\lg(\beta)$'th bit will be set to one after a carry | 2389 the two inputs has been exhausted. As remarked earlier there is an implementation reason for using the ``awkward'' |
2398 occurs from subtraction. This carry extraction requires two relatively cheap operations to extract the carry. The other method is to simply | 2390 method of extracting the carry (line 56). The traditional method for extracting the carry would be to shift |
2399 shift the most significant bit to the least significant bit thus extracting the carry with a single cheap operation. This optimization only works on | 2391 by $lg(\beta)$ positions and logically AND the least significant bit. The AND operation is required because all of |
2400 twos compliment machines which is a safe assumption to make. | 2392 the bits above the $\lg(\beta)$'th bit will be set to one after a carry occurs from subtraction. This carry |
2401 | 2393 extraction requires two relatively cheap operations to extract the carry. The other method is to simply shift the |
2402 If $a$ has a larger magnitude than $b$ an additional loop (\textit{see lines 63 through 72}) is required to propagate the carry through | 2394 most significant bit to the least significant bit thus extracting the carry with a single cheap operation. This |
2403 $a$ and copy the result to $c$. | 2395 optimization only works on twos compliment machines which is a safe assumption to make. |
2396 | |
2397 If $a$ has a larger magnitude than $b$ an additional loop (lines 63 through 72) is required to propagate | |
2398 the carry through $a$ and copy the result to $c$. | |
2404 | 2399 |
2405 \subsection{High Level Addition} | 2400 \subsection{High Level Addition} |
2406 Now that both lower level addition and subtraction algorithms have been established an effective high level signed addition algorithm can be | 2401 Now that both lower level addition and subtraction algorithms have been established an effective high level signed addition algorithm can be |
2407 established. This high level addition algorithm will be what other algorithms and developers will use to perform addition of mp\_int data | 2402 established. This high level addition algorithm will be what other algorithms and developers will use to perform addition of mp\_int data |
2408 types. | 2403 types. |
2983 061 \} | 2978 061 \} |
2984 062 #endif | 2979 062 #endif |
2985 \end{alltt} | 2980 \end{alltt} |
2986 \end{small} | 2981 \end{small} |
2987 | 2982 |
2988 The if statement on line 23 ensures that the $b$ variable is greater than zero. The \textbf{used} count is incremented by $b$ before | 2983 The if statement (line 23) ensures that the $b$ variable is greater than zero since we do not interpret negative |
2989 the copy loop begins. This elminates the need for an additional variable in the for loop. The variable $top$ on line 41 is an alias | 2984 shift counts properly. The \textbf{used} count is incremented by $b$ before the copy loop begins. This elminates |
2990 for the leading digit while $bottom$ on line 44 is an alias for the trailing edge. The aliases form a window of exactly $b$ digits | 2985 the need for an additional variable in the for loop. The variable $top$ (line 41) is an alias |
2991 over the input. | 2986 for the leading digit while $bottom$ (line 44) is an alias for the trailing edge. The aliases form a |
2987 window of exactly $b$ digits over the input. | |
2992 | 2988 |
2993 \subsection{Division by $x$} | 2989 \subsection{Division by $x$} |
2994 | 2990 |
2995 Division by powers of $x$ is easily achieved by shifting the digits right and removing any that will end up to the right of the zero'th digit. | 2991 Division by powers of $x$ is easily achieved by shifting the digits right and removing any that will end up to the right of the zero'th digit. |
2996 | 2992 |
3093 066 \} | 3089 066 \} |
3094 067 #endif | 3090 067 #endif |
3095 \end{alltt} | 3091 \end{alltt} |
3096 \end{small} | 3092 \end{small} |
3097 | 3093 |
3098 The only noteworthy element of this routine is the lack of a return type. | 3094 The only noteworthy element of this routine is the lack of a return type since it cannot fail. Like mp\_lshd() we |
3099 | 3095 form a sliding window except we copy in the other direction. After the window (line 59) we then zero |
3100 -- Will update later to give it a return type...Tom | 3096 the upper digits of the input to make sure the result is correct. |
3101 | 3097 |
3102 \section{Powers of Two} | 3098 \section{Powers of Two} |
3103 | 3099 |
3104 Now that algorithms for moving single bits as well as whole digits exist algorithms for moving the ``in between'' distances are required. For | 3100 Now that algorithms for moving single bits as well as whole digits exist algorithms for moving the ``in between'' distances are required. For |
3105 example, to quickly multiply by $2^k$ for any $k$ without using a full multiplier algorithm would prove useful. Instead of performing single | 3101 example, to quickly multiply by $2^k$ for any $k$ without using a full multiplier algorithm would prove useful. Instead of performing single |
3226 079 \} | 3222 079 \} |
3227 080 #endif | 3223 080 #endif |
3228 \end{alltt} | 3224 \end{alltt} |
3229 \end{small} | 3225 \end{small} |
3230 | 3226 |
3231 Notes to be revised when code is updated. -- Tom | 3227 The shifting is performed in--place which means the first step (line 24) is to copy the input to the |
3228 destination. We avoid calling mp\_copy() by making sure the mp\_ints are different. The destination then | |
3229 has to be grown (line 31) to accomodate the result. | |
3230 | |
3231 If the shift count $b$ is larger than $lg(\beta)$ then a call to mp\_lshd() is used to handle all of the multiples | |
3232 of $lg(\beta)$. Leaving only a remaining shift of $lg(\beta) - 1$ or fewer bits left. Inside the actual shift | |
3233 loop (lines 45 to 76) we make use of pre--computed values $shift$ and $mask$. These are used to | |
3234 extract the carry bit(s) to pass into the next iteration of the loop. The $r$ and $rr$ variables form a | |
3235 chain between consecutive iterations to propagate the carry. | |
3232 | 3236 |
3233 \subsection{Division by Power of Two} | 3237 \subsection{Division by Power of Two} |
3234 | 3238 |
3235 \newpage\begin{figure}[!here] | 3239 \newpage\begin{figure}[!here] |
3236 \begin{small} | 3240 \begin{small} |
3359 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally | 3363 The implementation of algorithm mp\_div\_2d is slightly different than the algorithm specifies. The remainder $d$ may be optionally |
3360 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the | 3364 ignored by passing \textbf{NULL} as the pointer to the mp\_int variable. The temporary mp\_int variable $t$ is used to hold the |
3361 result of the remainder operation until the end. This allows $d$ and $a$ to represent the same mp\_int without modifying $a$ before | 3365 result of the remainder operation until the end. This allows $d$ and $a$ to represent the same mp\_int without modifying $a$ before |
3362 the quotient is obtained. | 3366 the quotient is obtained. |
3363 | 3367 |
3364 The remainder of the source code is essentially the same as the source code for mp\_mul\_2d. (-- Fix this paragraph up later, Tom). | 3368 The remainder of the source code is essentially the same as the source code for mp\_mul\_2d. The only significant difference is |
3369 the direction of the shifts. | |
3365 | 3370 |
3366 \subsection{Remainder of Division by Power of Two} | 3371 \subsection{Remainder of Division by Power of Two} |
3367 | 3372 |
3368 The last algorithm in the series of polynomial basis power of two algorithms is calculating the remainder of division by $2^b$. This | 3373 The last algorithm in the series of polynomial basis power of two algorithms is calculating the remainder of division by $2^b$. This |
3369 algorithm benefits from the fact that in twos complement arithmetic $a \mbox{ (mod }2^b\mbox{)}$ is the same as $a$ AND $2^b - 1$. | 3374 algorithm benefits from the fact that in twos complement arithmetic $a \mbox{ (mod }2^b\mbox{)}$ is the same as $a$ AND $2^b - 1$. |
3418 025 mp_zero (c); | 3423 025 mp_zero (c); |
3419 026 return MP_OKAY; | 3424 026 return MP_OKAY; |
3420 027 \} | 3425 027 \} |
3421 028 | 3426 028 |
3422 029 /* if the modulus is larger than the value than return */ | 3427 029 /* if the modulus is larger than the value than return */ |
3423 030 if (b > (int) (a->used * DIGIT_BIT)) \{ | 3428 030 if (b >= (int) (a->used * DIGIT_BIT)) \{ |
3424 031 res = mp_copy (a, c); | 3429 031 res = mp_copy (a, c); |
3425 032 return res; | 3430 032 return res; |
3426 033 \} | 3431 033 \} |
3427 034 | 3432 034 |
3428 035 /* copy */ | 3433 035 /* copy */ |
3444 049 \} | 3449 049 \} |
3445 050 #endif | 3450 050 #endif |
3446 \end{alltt} | 3451 \end{alltt} |
3447 \end{small} | 3452 \end{small} |
3448 | 3453 |
3449 -- Add comments later, Tom. | 3454 We first avoid cases of $b \le 0$ by simply mp\_zero()'ing the destination in such cases. Next if $2^b$ is larger |
3455 than the input we just mp\_copy() the input and return right away. After this point we know we must actually | |
3456 perform some work to produce the remainder. | |
3457 | |
3458 Recalling that reducing modulo $2^k$ and a binary ``and'' with $2^k - 1$ are numerically equivalent we can quickly reduce | |
3459 the number. First we zero any digits above the last digit in $2^b$ (line 41). Next we reduce the | |
3460 leading digit of both (line 45) and then mp\_clamp(). | |
3450 | 3461 |
3451 \section*{Exercises} | 3462 \section*{Exercises} |
3452 \begin{tabular}{cl} | 3463 \begin{tabular}{cl} |
3453 $\left [ 3 \right ] $ & Devise an algorithm that performs $a \cdot 2^b$ for generic values of $b$ \\ | 3464 $\left [ 3 \right ] $ & Devise an algorithm that performs $a \cdot 2^b$ for generic values of $b$ \\ |
3454 & in $O(n)$ time. \\ | 3465 & in $O(n)$ time. \\ |
3609 016 | 3620 016 |
3610 017 /* multiplies |a| * |b| and only computes upto digs digits of result | 3621 017 /* multiplies |a| * |b| and only computes upto digs digits of result |
3611 018 * HAC pp. 595, Algorithm 14.12 Modified so you can control how | 3622 018 * HAC pp. 595, Algorithm 14.12 Modified so you can control how |
3612 019 * many digits of output are created. | 3623 019 * many digits of output are created. |
3613 020 */ | 3624 020 */ |
3614 021 int | 3625 021 int s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) |
3615 022 s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) | 3626 022 \{ |
3616 023 \{ | 3627 023 mp_int t; |
3617 024 mp_int t; | 3628 024 int res, pa, pb, ix, iy; |
3618 025 int res, pa, pb, ix, iy; | 3629 025 mp_digit u; |
3619 026 mp_digit u; | 3630 026 mp_word r; |
3620 027 mp_word r; | 3631 027 mp_digit tmpx, *tmpt, *tmpy; |
3621 028 mp_digit tmpx, *tmpt, *tmpy; | 3632 028 |
3622 029 | 3633 029 /* can we use the fast multiplier? */ |
3623 030 /* can we use the fast multiplier? */ | 3634 030 if (((digs) < MP_WARRAY) && |
3624 031 if (((digs) < MP_WARRAY) && | 3635 031 MIN (a->used, b->used) < |
3625 032 MIN (a->used, b->used) < | 3636 032 (1 << ((CHAR_BIT * sizeof (mp_word)) - (2 * DIGIT_BIT)))) \{ |
3626 033 (1 << ((CHAR_BIT * sizeof (mp_word)) - (2 * DIGIT_BIT)))) \{ | 3637 033 return fast_s_mp_mul_digs (a, b, c, digs); |
3627 034 return fast_s_mp_mul_digs (a, b, c, digs); | 3638 034 \} |
3628 035 \} | 3639 035 |
3629 036 | 3640 036 if ((res = mp_init_size (&t, digs)) != MP_OKAY) \{ |
3630 037 if ((res = mp_init_size (&t, digs)) != MP_OKAY) \{ | 3641 037 return res; |
3631 038 return res; | 3642 038 \} |
3632 039 \} | 3643 039 t.used = digs; |
3633 040 t.used = digs; | 3644 040 |
3634 041 | 3645 041 /* compute the digits of the product directly */ |
3635 042 /* compute the digits of the product directly */ | 3646 042 pa = a->used; |
3636 043 pa = a->used; | 3647 043 for (ix = 0; ix < pa; ix++) \{ |
3637 044 for (ix = 0; ix < pa; ix++) \{ | 3648 044 /* set the carry to zero */ |
3638 045 /* set the carry to zero */ | 3649 045 u = 0; |
3639 046 u = 0; | 3650 046 |
3640 047 | 3651 047 /* limit ourselves to making digs digits of output */ |
3641 048 /* limit ourselves to making digs digits of output */ | 3652 048 pb = MIN (b->used, digs - ix); |
3642 049 pb = MIN (b->used, digs - ix); | 3653 049 |
3643 050 | 3654 050 /* setup some aliases */ |
3644 051 /* setup some aliases */ | 3655 051 /* copy of the digit from a used within the nested loop */ |
3645 052 /* copy of the digit from a used within the nested loop */ | 3656 052 tmpx = a->dp[ix]; |
3646 053 tmpx = a->dp[ix]; | 3657 053 |
3647 054 | 3658 054 /* an alias for the destination shifted ix places */ |
3648 055 /* an alias for the destination shifted ix places */ | 3659 055 tmpt = t.dp + ix; |
3649 056 tmpt = t.dp + ix; | 3660 056 |
3650 057 | 3661 057 /* an alias for the digits of b */ |
3651 058 /* an alias for the digits of b */ | 3662 058 tmpy = b->dp; |
3652 059 tmpy = b->dp; | 3663 059 |
3653 060 | 3664 060 /* compute the columns of the output and propagate the carry */ |
3654 061 /* compute the columns of the output and propagate the carry */ | 3665 061 for (iy = 0; iy < pb; iy++) \{ |
3655 062 for (iy = 0; iy < pb; iy++) \{ | 3666 062 /* compute the column as a mp_word */ |
3656 063 /* compute the column as a mp_word */ | 3667 063 r = ((mp_word)*tmpt) + |
3657 064 r = ((mp_word)*tmpt) + | 3668 064 ((mp_word)tmpx) * ((mp_word)*tmpy++) + |
3658 065 ((mp_word)tmpx) * ((mp_word)*tmpy++) + | 3669 065 ((mp_word) u); |
3659 066 ((mp_word) u); | 3670 066 |
3660 067 | 3671 067 /* the new column is the lower part of the result */ |
3661 068 /* the new column is the lower part of the result */ | 3672 068 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); |
3662 069 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); | 3673 069 |
3663 070 | 3674 070 /* get the carry word from the result */ |
3664 071 /* get the carry word from the result */ | 3675 071 u = (mp_digit) (r >> ((mp_word) DIGIT_BIT)); |
3665 072 u = (mp_digit) (r >> ((mp_word) DIGIT_BIT)); | 3676 072 \} |
3666 073 \} | 3677 073 /* set carry if it is placed below digs */ |
3667 074 /* set carry if it is placed below digs */ | 3678 074 if (ix + iy < digs) \{ |
3668 075 if (ix + iy < digs) \{ | 3679 075 *tmpt = u; |
3669 076 *tmpt = u; | 3680 076 \} |
3670 077 \} | 3681 077 \} |
3671 078 \} | 3682 078 |
3672 079 | 3683 079 mp_clamp (&t); |
3673 080 mp_clamp (&t); | 3684 080 mp_exch (&t, c); |
3674 081 mp_exch (&t, c); | 3685 081 |
3675 082 | 3686 082 mp_clear (&t); |
3676 083 mp_clear (&t); | 3687 083 return MP_OKAY; |
3677 084 return MP_OKAY; | 3688 084 \} |
3678 085 \} | 3689 085 #endif |
3679 086 #endif | |
3680 \end{alltt} | 3690 \end{alltt} |
3681 \end{small} | 3691 \end{small} |
3682 | 3692 |
3683 Lines 31 to 35 determine if the Comba method can be used first. The conditions for using the Comba routine are that min$(a.used, b.used) < \delta$ and | 3693 First we determine (line 30) if the Comba method can be used first since it's faster. The conditions for |
3684 the number of digits of output is less than \textbf{MP\_WARRAY}. This new constant is used to control | 3694 sing the Comba routine are that min$(a.used, b.used) < \delta$ and the number of digits of output is less than |
3685 the stack usage in the Comba routines. By default it is set to $\delta$ but can be reduced when memory is at a premium. | 3695 \textbf{MP\_WARRAY}. This new constant is used to control the stack usage in the Comba routines. By default it is |
3686 | 3696 set to $\delta$ but can be reduced when memory is at a premium. |
3687 Of particular importance is the calculation of the $ix+iy$'th column on lines 64, 65 and 66. Note how all of the | 3697 |
3688 variables are cast to the type \textbf{mp\_word}, which is also the type of variable $\hat r$. That is to ensure that double precision operations | 3698 If we cannot use the Comba method we proceed to setup the baseline routine. We allocate the the destination mp\_int |
3689 are used instead of single precision. The multiplication on line 65 makes use of a specific GCC optimizer behaviour. On the outset it looks like | 3699 $t$ (line 36) to the exact size of the output to avoid further re--allocations. At this point we now |
3690 the compiler will have to use a double precision multiplication to produce the result required. Such an operation would be horribly slow on most | 3700 begin the $O(n^2)$ loop. |
3691 processors and drag this to a crawl. However, GCC is smart enough to realize that double wide output single precision multipliers can be used. For | 3701 |
3692 example, the instruction ``MUL'' on the x86 processor can multiply two 32-bit values and produce a 64-bit result. | 3702 This implementation of multiplication has the caveat that it can be trimmed to only produce a variable number of |
3703 digits as output. In each iteration of the outer loop the $pb$ variable is set (line 48) to the maximum | |
3704 number of inner loop iterations. | |
3705 | |
3706 Inside the inner loop we calculate $\hat r$ as the mp\_word product of the two mp\_digits and the addition of the | |
3707 carry from the previous iteration. A particularly important observation is that most modern optimizing | |
3708 C compilers (GCC for instance) can recognize that a $N \times N \rightarrow 2N$ multiplication is all that | |
3709 is required for the product. In x86 terms for example, this means using the MUL instruction. | |
3710 | |
3711 Each digit of the product is stored in turn (line 68) and the carry propagated (line 71) to the | |
3712 next iteration. | |
3693 | 3713 |
3694 \subsection{Faster Multiplication by the ``Comba'' Method} | 3714 \subsection{Faster Multiplication by the ``Comba'' Method} |
3695 | 3715 |
3696 One of the huge drawbacks of the ``baseline'' algorithms is that at the $O(n^2)$ level the carry must be computed and propagated upwards. This | 3716 One of the huge drawbacks of the ``baseline'' algorithms is that at the $O(n^2)$ level the carry must be |
3697 makes the nested loop very sequential and hard to unroll and implement in parallel. The ``Comba'' \cite{COMBA} method is named after little known | 3717 computed and propagated upwards. This makes the nested loop very sequential and hard to unroll and implement |
3698 (\textit{in cryptographic venues}) Paul G. Comba who described a method of implementing fast multipliers that do not require nested | 3718 in parallel. The ``Comba'' \cite{COMBA} method is named after little known (\textit{in cryptographic venues}) Paul G. |
3699 carry fixup operations. As an interesting aside it seems that Paul Barrett describes a similar technique in | 3719 Comba who described a method of implementing fast multipliers that do not require nested carry fixup operations. As an |
3700 his 1986 paper \cite{BARRETT} written five years before. | 3720 interesting aside it seems that Paul Barrett describes a similar technique in his 1986 paper \cite{BARRETT} written |
3701 | 3721 five years before. |
3702 At the heart of the Comba technique is once again the long-hand algorithm. Except in this case a slight twist is placed on how | 3722 |
3703 the columns of the result are produced. In the standard long-hand algorithm rows of products are produced then added together to form the | 3723 At the heart of the Comba technique is once again the long-hand algorithm. Except in this case a slight |
3704 final result. In the baseline algorithm the columns are added together after each iteration to get the result instantaneously. | 3724 twist is placed on how the columns of the result are produced. In the standard long-hand algorithm rows of products |
3705 | 3725 are produced then added together to form the final result. In the baseline algorithm the columns are added together |
3706 In the Comba algorithm the columns of the result are produced entirely independently of each other. That is at the $O(n^2)$ level a | 3726 after each iteration to get the result instantaneously. |
3707 simple multiplication and addition step is performed. The carries of the columns are propagated after the nested loop to reduce the amount | 3727 |
3708 of work requiored. Succintly the first step of the algorithm is to compute the product vector $\vec x$ as follows. | 3728 In the Comba algorithm the columns of the result are produced entirely independently of each other. That is at |
3729 the $O(n^2)$ level a simple multiplication and addition step is performed. The carries of the columns are propagated | |
3730 after the nested loop to reduce the amount of work requiored. Succintly the first step of the algorithm is to compute | |
3731 the product vector $\vec x$ as follows. | |
3709 | 3732 |
3710 \begin{equation} | 3733 \begin{equation} |
3711 \vec x_n = \sum_{i+j = n} a_ib_j, \forall n \in \lbrace 0, 1, 2, \ldots, i + j \rbrace | 3734 \vec x_n = \sum_{i+j = n} a_ib_j, \forall n \in \lbrace 0, 1, 2, \ldots, i + j \rbrace |
3712 \end{equation} | 3735 \end{equation} |
3713 | 3736 |
3797 \begin{tabular}{l} | 3820 \begin{tabular}{l} |
3798 \hline Algorithm \textbf{fast\_s\_mp\_mul\_digs}. \\ | 3821 \hline Algorithm \textbf{fast\_s\_mp\_mul\_digs}. \\ |
3799 \textbf{Input}. mp\_int $a$, mp\_int $b$ and an integer $digs$ \\ | 3822 \textbf{Input}. mp\_int $a$, mp\_int $b$ and an integer $digs$ \\ |
3800 \textbf{Output}. $c \leftarrow \vert a \vert \cdot \vert b \vert \mbox{ (mod }\beta^{digs}\mbox{)}$. \\ | 3823 \textbf{Output}. $c \leftarrow \vert a \vert \cdot \vert b \vert \mbox{ (mod }\beta^{digs}\mbox{)}$. \\ |
3801 \hline \\ | 3824 \hline \\ |
3802 Place an array of \textbf{MP\_WARRAY} double precision digits named $\hat W$ on the stack. \\ | 3825 Place an array of \textbf{MP\_WARRAY} single precision digits named $W$ on the stack. \\ |
3803 1. If $c.alloc < digs$ then grow $c$ to $digs$ digits. (\textit{mp\_grow}) \\ | 3826 1. If $c.alloc < digs$ then grow $c$ to $digs$ digits. (\textit{mp\_grow}) \\ |
3804 2. If step 1 failed return(\textit{MP\_MEM}).\\ | 3827 2. If step 1 failed return(\textit{MP\_MEM}).\\ |
3805 \\ | 3828 \\ |
3806 Zero the temporary array $\hat W$. \\ | 3829 3. $pa \leftarrow \mbox{MIN}(digs, a.used + b.used)$ \\ |
3807 3. for $n$ from $0$ to $digs - 1$ do \\ | |
3808 \hspace{3mm}3.1 $\hat W_n \leftarrow 0$ \\ | |
3809 \\ | 3830 \\ |
3810 Compute the columns. \\ | 3831 4. $\_ \hat W \leftarrow 0$ \\ |
3811 4. for $ix$ from $0$ to $a.used - 1$ do \\ | 3832 5. for $ix$ from 0 to $pa - 1$ do \\ |
3812 \hspace{3mm}4.1 $pb \leftarrow \mbox{min}(b.used, digs - ix)$ \\ | 3833 \hspace{3mm}5.1 $ty \leftarrow \mbox{MIN}(b.used - 1, ix)$ \\ |
3813 \hspace{3mm}4.2 If $pb < 1$ then goto step 5. \\ | 3834 \hspace{3mm}5.2 $tx \leftarrow ix - ty$ \\ |
3814 \hspace{3mm}4.3 for $iy$ from $0$ to $pb - 1$ do \\ | 3835 \hspace{3mm}5.3 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ |
3815 \hspace{6mm}4.3.1 $\hat W_{ix+iy} \leftarrow \hat W_{ix+iy} + a_{ix}b_{iy}$ \\ | 3836 \hspace{3mm}5.4 for $iz$ from 0 to $iy - 1$ do \\ |
3837 \hspace{6mm}5.4.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx+iy}b_{ty-iy}$ \\ | |
3838 \hspace{3mm}5.5 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$\\ | |
3839 \hspace{3mm}5.6 $\_ \hat W \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ | |
3840 6. $W_{pa} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\ | |
3816 \\ | 3841 \\ |
3817 Propagate the carries upwards. \\ | 3842 7. $oldused \leftarrow c.used$ \\ |
3818 5. $oldused \leftarrow c.used$ \\ | 3843 8. $c.used \leftarrow digs$ \\ |
3819 6. $c.used \leftarrow digs$ \\ | 3844 9. for $ix$ from $0$ to $pa$ do \\ |
3820 7. If $digs > 1$ then do \\ | 3845 \hspace{3mm}9.1 $c_{ix} \leftarrow W_{ix}$ \\ |
3821 \hspace{3mm}7.1. for $ix$ from $1$ to $digs - 1$ do \\ | 3846 10. for $ix$ from $pa + 1$ to $oldused - 1$ do \\ |
3822 \hspace{6mm}7.1.1 $\hat W_{ix} \leftarrow \hat W_{ix} + \lfloor \hat W_{ix-1} / \beta \rfloor$ \\ | 3847 \hspace{3mm}10.1 $c_{ix} \leftarrow 0$ \\ |
3823 \hspace{6mm}7.1.2 $c_{ix - 1} \leftarrow \hat W_{ix - 1} \mbox{ (mod }\beta\mbox{)}$ \\ | |
3824 8. else do \\ | |
3825 \hspace{3mm}8.1 $ix \leftarrow 0$ \\ | |
3826 9. $c_{ix} \leftarrow \hat W_{ix} \mbox{ (mod }\beta\mbox{)}$ \\ | |
3827 \\ | 3848 \\ |
3828 Zero excess digits. \\ | 3849 11. Clamp $c$. \\ |
3829 10. If $digs < oldused$ then do \\ | 3850 12. Return MP\_OKAY. \\ |
3830 \hspace{3mm}10.1 for $n$ from $digs$ to $oldused - 1$ do \\ | |
3831 \hspace{6mm}10.1.1 $c_n \leftarrow 0$ \\ | |
3832 11. Clamp excessive digits of $c$. (\textit{mp\_clamp}) \\ | |
3833 12. Return(\textit{MP\_OKAY}). \\ | |
3834 \hline | 3851 \hline |
3835 \end{tabular} | 3852 \end{tabular} |
3836 \end{center} | 3853 \end{center} |
3837 \end{small} | 3854 \end{small} |
3838 \caption{Algorithm fast\_s\_mp\_mul\_digs} | 3855 \caption{Algorithm fast\_s\_mp\_mul\_digs} |
3839 \label{fig:COMBAMULT} | 3856 \label{fig:COMBAMULT} |
3840 \end{figure} | 3857 \end{figure} |
3841 | 3858 |
3842 \textbf{Algorithm fast\_s\_mp\_mul\_digs.} | 3859 \textbf{Algorithm fast\_s\_mp\_mul\_digs.} |
3843 This algorithm performs the unsigned multiplication of $a$ and $b$ using the Comba method limited to $digs$ digits of precision. The algorithm | 3860 This algorithm performs the unsigned multiplication of $a$ and $b$ using the Comba method limited to $digs$ digits of precision. |
3844 essentially peforms the same calculation as algorithm s\_mp\_mul\_digs, just much faster. | 3861 |
3845 | 3862 The outer loop of this algorithm is more complicated than that of the baseline multiplier. This is because on the inside of the |
3846 The array $\hat W$ is meant to be on the stack when the algorithm is used. The size of the array does not change which is ideal. Note also that | 3863 loop we want to produce one column per pass. This allows the accumulator $\_ \hat W$ to be placed in CPU registers and |
3847 unlike algorithm s\_mp\_mul\_digs no temporary mp\_int is required since the result is calculated directly in $\hat W$. | 3864 reduce the memory bandwidth to two \textbf{mp\_digit} reads per iteration. |
3848 | 3865 |
3849 The $O(n^2)$ loop on step four is where the Comba method's advantages begin to show through in comparison to the baseline algorithm. The lack of | 3866 The $ty$ variable is set to the minimum count of $ix$ or the number of digits in $b$. That way if $a$ has more digits than |
3850 a carry variable or propagation in this loop allows the loop to be performed with only single precision multiplication and additions. Now that each | 3867 $b$ this will be limited to $b.used - 1$. The $tx$ variable is set to the to the distance past $b.used$ the variable |
3851 iteration of the inner loop can be performed independent of the others the inner loop can be performed with a high level of parallelism. | 3868 $ix$ is. This is used for the immediately subsequent statement where we find $iy$. |
3869 | |
3870 The variable $iy$ is the minimum digits we can read from either $a$ or $b$ before running out. Computing one column at a time | |
3871 means we have to scan one integer upwards and the other downwards. $a$ starts at $tx$ and $b$ starts at $ty$. In each | |
3872 pass we are producing the $ix$'th output column and we note that $tx + ty = ix$. As we move $tx$ upwards we have to | |
3873 move $ty$ downards so the equality remains valid. The $iy$ variable is the number of iterations until | |
3874 $tx \ge a.used$ or $ty < 0$ occurs. | |
3875 | |
3876 After every inner pass we store the lower half of the accumulator into $W_{ix}$ and then propagate the carry of the accumulator | |
3877 into the next round by dividing $\_ \hat W$ by $\beta$. | |
3852 | 3878 |
3853 To measure the benefits of the Comba method over the baseline method consider the number of operations that are required. If the | 3879 To measure the benefits of the Comba method over the baseline method consider the number of operations that are required. If the |
3854 cost in terms of time of a multiply and addition is $p$ and the cost of a carry propagation is $q$ then a baseline multiplication would require | 3880 cost in terms of time of a multiply and addition is $p$ and the cost of a carry propagation is $q$ then a baseline multiplication would require |
3855 $O \left ((p + q)n^2 \right )$ time to multiply two $n$-digit numbers. The Comba method requires only $O(pn^2 + qn)$ time, however in practice, | 3881 $O \left ((p + q)n^2 \right )$ time to multiply two $n$-digit numbers. The Comba method requires only $O(pn^2 + qn)$ time, however in practice, |
3856 the speed increase is actually much more. With $O(n)$ space the algorithm can be reduced to $O(pn + qn)$ time by implementing the $n$ multiply | 3882 the speed increase is actually much more. With $O(n)$ space the algorithm can be reduced to $O(pn + qn)$ time by implementing the $n$ multiply |
3875 028 * required for fast Barrett reduction). | 3901 028 * required for fast Barrett reduction). |
3876 029 * | 3902 029 * |
3877 030 * Based on Algorithm 14.12 on pp.595 of HAC. | 3903 030 * Based on Algorithm 14.12 on pp.595 of HAC. |
3878 031 * | 3904 031 * |
3879 032 */ | 3905 032 */ |
3880 033 int | 3906 033 int fast_s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) |
3881 034 fast_s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) | 3907 034 \{ |
3882 035 \{ | 3908 035 int olduse, res, pa, ix, iz; |
3883 036 int olduse, res, pa, ix, iz; | 3909 036 mp_digit W[MP_WARRAY]; |
3884 037 mp_digit W[MP_WARRAY]; | 3910 037 register mp_word _W; |
3885 038 register mp_word _W; | 3911 038 |
3886 039 | 3912 039 /* grow the destination as required */ |
3887 040 /* grow the destination as required */ | 3913 040 if (c->alloc < digs) \{ |
3888 041 if (c->alloc < digs) \{ | 3914 041 if ((res = mp_grow (c, digs)) != MP_OKAY) \{ |
3889 042 if ((res = mp_grow (c, digs)) != MP_OKAY) \{ | 3915 042 return res; |
3890 043 return res; | 3916 043 \} |
3891 044 \} | 3917 044 \} |
3892 045 \} | 3918 045 |
3893 046 | 3919 046 /* number of output digits to produce */ |
3894 047 /* number of output digits to produce */ | 3920 047 pa = MIN(digs, a->used + b->used); |
3895 048 pa = MIN(digs, a->used + b->used); | 3921 048 |
3896 049 | 3922 049 /* clear the carry */ |
3897 050 /* clear the carry */ | 3923 050 _W = 0; |
3898 051 _W = 0; | 3924 051 for (ix = 0; ix < pa; ix++) \{ |
3899 052 for (ix = 0; ix <= pa; ix++) \{ | 3925 052 int tx, ty; |
3900 053 int tx, ty; | 3926 053 int iy; |
3901 054 int iy; | 3927 054 mp_digit *tmpx, *tmpy; |
3902 055 mp_digit *tmpx, *tmpy; | 3928 055 |
3903 056 | 3929 056 /* get offsets into the two bignums */ |
3904 057 /* get offsets into the two bignums */ | 3930 057 ty = MIN(b->used-1, ix); |
3905 058 ty = MIN(b->used-1, ix); | 3931 058 tx = ix - ty; |
3906 059 tx = ix - ty; | 3932 059 |
3907 060 | 3933 060 /* setup temp aliases */ |
3908 061 /* setup temp aliases */ | 3934 061 tmpx = a->dp + tx; |
3909 062 tmpx = a->dp + tx; | 3935 062 tmpy = b->dp + ty; |
3910 063 tmpy = b->dp + ty; | 3936 063 |
3911 064 | 3937 064 /* this is the number of times the loop will iterrate, essentially |
3912 065 /* this is the number of times the loop will iterrate, essentially its | 3938 065 while (tx++ < a->used && ty-- >= 0) \{ ... \} |
3913 | 3939 066 */ |
3914 066 while (tx++ < a->used && ty-- >= 0) \{ ... \} | 3940 067 iy = MIN(a->used-tx, ty+1); |
3915 067 */ | 3941 068 |
3916 068 iy = MIN(a->used-tx, ty+1); | 3942 069 /* execute loop */ |
3917 069 | 3943 070 for (iz = 0; iz < iy; ++iz) \{ |
3918 070 /* execute loop */ | 3944 071 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); |
3919 071 for (iz = 0; iz < iy; ++iz) \{ | 3945 072 \} |
3920 072 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); | 3946 073 |
3921 073 \} | 3947 074 /* store term */ |
3922 074 | 3948 075 W[ix] = ((mp_digit)_W) & MP_MASK; |
3923 075 /* store term */ | 3949 076 |
3924 076 W[ix] = ((mp_digit)_W) & MP_MASK; | 3950 077 /* make next carry */ |
3925 077 | 3951 078 _W = _W >> ((mp_word)DIGIT_BIT); |
3926 078 /* make next carry */ | 3952 079 \} |
3927 079 _W = _W >> ((mp_word)DIGIT_BIT); | 3953 080 |
3928 080 \} | 3954 081 /* store final carry */ |
3929 081 | 3955 082 W[ix] = (mp_digit)(_W & MP_MASK); |
3930 082 /* setup dest */ | 3956 083 |
3931 083 olduse = c->used; | 3957 084 /* setup dest */ |
3932 084 c->used = digs; | 3958 085 olduse = c->used; |
3933 085 | 3959 086 c->used = pa; |
3934 086 \{ | 3960 087 |
3935 087 register mp_digit *tmpc; | 3961 088 \{ |
3936 088 tmpc = c->dp; | 3962 089 register mp_digit *tmpc; |
3937 089 for (ix = 0; ix < digs; ix++) \{ | 3963 090 tmpc = c->dp; |
3938 090 /* now extract the previous digit [below the carry] */ | 3964 091 for (ix = 0; ix < pa+1; ix++) \{ |
3939 091 *tmpc++ = W[ix]; | 3965 092 /* now extract the previous digit [below the carry] */ |
3940 092 \} | 3966 093 *tmpc++ = W[ix]; |
3941 093 | 3967 094 \} |
3942 094 /* clear unused digits [that existed in the old copy of c] */ | 3968 095 |
3943 095 for (; ix < olduse; ix++) \{ | 3969 096 /* clear unused digits [that existed in the old copy of c] */ |
3944 096 *tmpc++ = 0; | 3970 097 for (; ix < olduse; ix++) \{ |
3945 097 \} | 3971 098 *tmpc++ = 0; |
3946 098 \} | 3972 099 \} |
3947 099 mp_clamp (c); | 3973 100 \} |
3948 100 return MP_OKAY; | 3974 101 mp_clamp (c); |
3949 101 \} | 3975 102 return MP_OKAY; |
3950 102 #endif | 3976 103 \} |
3977 104 #endif | |
3951 \end{alltt} | 3978 \end{alltt} |
3952 \end{small} | 3979 \end{small} |
3953 | 3980 |
3954 The memset on line @47,memset@ clears the initial $\hat W$ array to zero in a single step. Like the slower baseline multiplication | 3981 As per the pseudo--code we first calculate $pa$ (line 47) as the number of digits to output. Next we begin the outer loop |
3955 implementation a series of aliases (\textit{lines 62, 63 and 76}) are used to simplify the inner $O(n^2)$ loop. | 3982 to produce the individual columns of the product. We use the two aliases $tmpx$ and $tmpy$ (lines 61, 62) to point |
3956 In this case a new alias $\_\hat W$ has been added which refers to the double precision columns offset by $ix$ in each pass. | 3983 inside the two multiplicands quickly. |
3957 | 3984 |
3958 The inner loop on lines 89, 79 and 80 is where the algorithm will spend the majority of the time, which is why it has been | 3985 The inner loop (lines 70 to 72) of this implementation is where the tradeoff come into play. Originally this comba |
3959 stripped to the bones of any extra baggage\footnote{Hence the pointer aliases.}. On x86 processors the multiplication and additions amount to at the | 3986 implementation was ``row--major'' which means it adds to each of the columns in each pass. After the outer loop it would then fix |
3960 very least five instructions (\textit{two loads, two additions, one multiply}) while on the ARMv4 processors they amount to only three | 3987 the carries. This was very fast except it had an annoying drawback. You had to read a mp\_word and two mp\_digits and write |
3961 (\textit{one load, one store, one multiply-add}). For both of the x86 and ARMv4 processors the GCC compiler performs a good job at unrolling the loop | 3988 one mp\_word per iteration. On processors such as the Athlon XP and P4 this did not matter much since the cache bandwidth |
3962 and scheduling the instructions so there are very few dependency stalls. | 3989 is very high and it can keep the ALU fed with data. It did, however, matter on older and embedded cpus where cache is often |
3963 | 3990 slower and also often doesn't exist. This new algorithm only performs two reads per iteration under the assumption that the |
3964 In theory the difference between the baseline and comba algorithms is a mere $O(qn)$ time difference. However, in the $O(n^2)$ nested loop of the | 3991 compiler has aliased $\_ \hat W$ to a CPU register. |
3965 baseline method there are dependency stalls as the algorithm must wait for the multiplier to finish before propagating the carry to the next | 3992 |
3966 digit. As a result fewer of the often multiple execution units\footnote{The AMD Athlon has three execution units and the Intel P4 has four.} can | 3993 After the inner loop we store the current accumulator in $W$ and shift $\_ \hat W$ (lines 75, 78) to forward it as |
3967 be simultaneously used. | 3994 a carry for the next pass. After the outer loop we use the final carry (line 82) as the last digit of the product. |
3968 | 3995 |
3969 \subsection{Polynomial Basis Multiplication} | 3996 \subsection{Polynomial Basis Multiplication} |
3970 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms | 3997 To break the $O(n^2)$ barrier in multiplication requires a completely different look at integer multiplication. In the following algorithms |
3971 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and | 3998 the use of polynomial basis representation for two integers $a$ and $b$ as $f(x) = \sum_{i=0}^{n} a_i x^i$ and |
3972 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. | 3999 $g(x) = \sum_{i=0}^{n} b_i x^i$ respectively, is required. In this system both $f(x)$ and $g(x)$ have $n + 1$ terms and are of the $n$'th degree. |
4439 \vspace{-3mm} | 4466 \vspace{-3mm} |
4440 \begin{alltt} | 4467 \begin{alltt} |
4441 016 | 4468 016 |
4442 017 /* multiplication using the Toom-Cook 3-way algorithm | 4469 017 /* multiplication using the Toom-Cook 3-way algorithm |
4443 018 * | 4470 018 * |
4444 019 * Much more complicated than Karatsuba but has a lower asymptotic running t | 4471 019 * Much more complicated than Karatsuba but has a lower |
4445 ime of | 4472 020 * asymptotic running time of O(N**1.464). This algorithm is |
4446 020 * O(N**1.464). This algorithm is only particularly useful on VERY large | 4473 021 * only particularly useful on VERY large inputs |
4447 021 * inputs (we're talking 1000s of digits here...). | 4474 022 * (we're talking 1000s of digits here...). |
4448 022 */ | 4475 023 */ |
4449 023 int mp_toom_mul(mp_int *a, mp_int *b, mp_int *c) | 4476 024 int mp_toom_mul(mp_int *a, mp_int *b, mp_int *c) |
4450 024 \{ | 4477 025 \{ |
4451 025 mp_int w0, w1, w2, w3, w4, tmp1, tmp2, a0, a1, a2, b0, b1, b2; | 4478 026 mp_int w0, w1, w2, w3, w4, tmp1, tmp2, a0, a1, a2, b0, b1, b2; |
4452 026 int res, B; | 4479 027 int res, B; |
4453 027 | 4480 028 |
4454 028 /* init temps */ | 4481 029 /* init temps */ |
4455 029 if ((res = mp_init_multi(&w0, &w1, &w2, &w3, &w4, | 4482 030 if ((res = mp_init_multi(&w0, &w1, &w2, &w3, &w4, |
4456 030 &a0, &a1, &a2, &b0, &b1, | 4483 031 &a0, &a1, &a2, &b0, &b1, |
4457 031 &b2, &tmp1, &tmp2, NULL)) != MP_OKAY) \{ | 4484 032 &b2, &tmp1, &tmp2, NULL)) != MP_OKAY) \{ |
4458 032 return res; | 4485 033 return res; |
4459 033 \} | 4486 034 \} |
4460 034 | 4487 035 |
4461 035 /* B */ | 4488 036 /* B */ |
4462 036 B = MIN(a->used, b->used) / 3; | 4489 037 B = MIN(a->used, b->used) / 3; |
4463 037 | 4490 038 |
4464 038 /* a = a2 * B**2 + a1 * B + a0 */ | 4491 039 /* a = a2 * B**2 + a1 * B + a0 */ |
4465 039 if ((res = mp_mod_2d(a, DIGIT_BIT * B, &a0)) != MP_OKAY) \{ | 4492 040 if ((res = mp_mod_2d(a, DIGIT_BIT * B, &a0)) != MP_OKAY) \{ |
4466 040 goto ERR; | 4493 041 goto ERR; |
4467 041 \} | 4494 042 \} |
4468 042 | 4495 043 |
4469 043 if ((res = mp_copy(a, &a1)) != MP_OKAY) \{ | 4496 044 if ((res = mp_copy(a, &a1)) != MP_OKAY) \{ |
4470 044 goto ERR; | 4497 045 goto ERR; |
4471 045 \} | 4498 046 \} |
4472 046 mp_rshd(&a1, B); | 4499 047 mp_rshd(&a1, B); |
4473 047 mp_mod_2d(&a1, DIGIT_BIT * B, &a1); | 4500 048 mp_mod_2d(&a1, DIGIT_BIT * B, &a1); |
4474 048 | 4501 049 |
4475 049 if ((res = mp_copy(a, &a2)) != MP_OKAY) \{ | 4502 050 if ((res = mp_copy(a, &a2)) != MP_OKAY) \{ |
4476 050 goto ERR; | 4503 051 goto ERR; |
4477 051 \} | 4504 052 \} |
4478 052 mp_rshd(&a2, B*2); | 4505 053 mp_rshd(&a2, B*2); |
4479 053 | 4506 054 |
4480 054 /* b = b2 * B**2 + b1 * B + b0 */ | 4507 055 /* b = b2 * B**2 + b1 * B + b0 */ |
4481 055 if ((res = mp_mod_2d(b, DIGIT_BIT * B, &b0)) != MP_OKAY) \{ | 4508 056 if ((res = mp_mod_2d(b, DIGIT_BIT * B, &b0)) != MP_OKAY) \{ |
4482 056 goto ERR; | 4509 057 goto ERR; |
4483 057 \} | 4510 058 \} |
4484 058 | 4511 059 |
4485 059 if ((res = mp_copy(b, &b1)) != MP_OKAY) \{ | 4512 060 if ((res = mp_copy(b, &b1)) != MP_OKAY) \{ |
4486 060 goto ERR; | 4513 061 goto ERR; |
4487 061 \} | 4514 062 \} |
4488 062 mp_rshd(&b1, B); | 4515 063 mp_rshd(&b1, B); |
4489 063 mp_mod_2d(&b1, DIGIT_BIT * B, &b1); | 4516 064 mp_mod_2d(&b1, DIGIT_BIT * B, &b1); |
4490 064 | 4517 065 |
4491 065 if ((res = mp_copy(b, &b2)) != MP_OKAY) \{ | 4518 066 if ((res = mp_copy(b, &b2)) != MP_OKAY) \{ |
4492 066 goto ERR; | 4519 067 goto ERR; |
4493 067 \} | 4520 068 \} |
4494 068 mp_rshd(&b2, B*2); | 4521 069 mp_rshd(&b2, B*2); |
4495 069 | 4522 070 |
4496 070 /* w0 = a0*b0 */ | 4523 071 /* w0 = a0*b0 */ |
4497 071 if ((res = mp_mul(&a0, &b0, &w0)) != MP_OKAY) \{ | 4524 072 if ((res = mp_mul(&a0, &b0, &w0)) != MP_OKAY) \{ |
4498 072 goto ERR; | 4525 073 goto ERR; |
4499 073 \} | 4526 074 \} |
4500 074 | 4527 075 |
4501 075 /* w4 = a2 * b2 */ | 4528 076 /* w4 = a2 * b2 */ |
4502 076 if ((res = mp_mul(&a2, &b2, &w4)) != MP_OKAY) \{ | 4529 077 if ((res = mp_mul(&a2, &b2, &w4)) != MP_OKAY) \{ |
4503 077 goto ERR; | 4530 078 goto ERR; |
4504 078 \} | 4531 079 \} |
4505 079 | 4532 080 |
4506 080 /* w1 = (a2 + 2(a1 + 2a0))(b2 + 2(b1 + 2b0)) */ | 4533 081 /* w1 = (a2 + 2(a1 + 2a0))(b2 + 2(b1 + 2b0)) */ |
4507 081 if ((res = mp_mul_2(&a0, &tmp1)) != MP_OKAY) \{ | 4534 082 if ((res = mp_mul_2(&a0, &tmp1)) != MP_OKAY) \{ |
4508 082 goto ERR; | 4535 083 goto ERR; |
4509 083 \} | 4536 084 \} |
4510 084 if ((res = mp_add(&tmp1, &a1, &tmp1)) != MP_OKAY) \{ | 4537 085 if ((res = mp_add(&tmp1, &a1, &tmp1)) != MP_OKAY) \{ |
4511 085 goto ERR; | 4538 086 goto ERR; |
4512 086 \} | 4539 087 \} |
4513 087 if ((res = mp_mul_2(&tmp1, &tmp1)) != MP_OKAY) \{ | 4540 088 if ((res = mp_mul_2(&tmp1, &tmp1)) != MP_OKAY) \{ |
4514 088 goto ERR; | 4541 089 goto ERR; |
4515 089 \} | 4542 090 \} |
4516 090 if ((res = mp_add(&tmp1, &a2, &tmp1)) != MP_OKAY) \{ | 4543 091 if ((res = mp_add(&tmp1, &a2, &tmp1)) != MP_OKAY) \{ |
4517 091 goto ERR; | 4544 092 goto ERR; |
4518 092 \} | 4545 093 \} |
4519 093 | 4546 094 |
4520 094 if ((res = mp_mul_2(&b0, &tmp2)) != MP_OKAY) \{ | 4547 095 if ((res = mp_mul_2(&b0, &tmp2)) != MP_OKAY) \{ |
4521 095 goto ERR; | 4548 096 goto ERR; |
4522 096 \} | 4549 097 \} |
4523 097 if ((res = mp_add(&tmp2, &b1, &tmp2)) != MP_OKAY) \{ | 4550 098 if ((res = mp_add(&tmp2, &b1, &tmp2)) != MP_OKAY) \{ |
4524 098 goto ERR; | 4551 099 goto ERR; |
4525 099 \} | 4552 100 \} |
4526 100 if ((res = mp_mul_2(&tmp2, &tmp2)) != MP_OKAY) \{ | 4553 101 if ((res = mp_mul_2(&tmp2, &tmp2)) != MP_OKAY) \{ |
4527 101 goto ERR; | 4554 102 goto ERR; |
4528 102 \} | 4555 103 \} |
4529 103 if ((res = mp_add(&tmp2, &b2, &tmp2)) != MP_OKAY) \{ | 4556 104 if ((res = mp_add(&tmp2, &b2, &tmp2)) != MP_OKAY) \{ |
4530 104 goto ERR; | 4557 105 goto ERR; |
4531 105 \} | 4558 106 \} |
4532 106 | 4559 107 |
4533 107 if ((res = mp_mul(&tmp1, &tmp2, &w1)) != MP_OKAY) \{ | 4560 108 if ((res = mp_mul(&tmp1, &tmp2, &w1)) != MP_OKAY) \{ |
4534 108 goto ERR; | 4561 109 goto ERR; |
4535 109 \} | 4562 110 \} |
4536 110 | 4563 111 |
4537 111 /* w3 = (a0 + 2(a1 + 2a2))(b0 + 2(b1 + 2b2)) */ | 4564 112 /* w3 = (a0 + 2(a1 + 2a2))(b0 + 2(b1 + 2b2)) */ |
4538 112 if ((res = mp_mul_2(&a2, &tmp1)) != MP_OKAY) \{ | 4565 113 if ((res = mp_mul_2(&a2, &tmp1)) != MP_OKAY) \{ |
4539 113 goto ERR; | 4566 114 goto ERR; |
4540 114 \} | 4567 115 \} |
4541 115 if ((res = mp_add(&tmp1, &a1, &tmp1)) != MP_OKAY) \{ | 4568 116 if ((res = mp_add(&tmp1, &a1, &tmp1)) != MP_OKAY) \{ |
4542 116 goto ERR; | 4569 117 goto ERR; |
4543 117 \} | 4570 118 \} |
4544 118 if ((res = mp_mul_2(&tmp1, &tmp1)) != MP_OKAY) \{ | 4571 119 if ((res = mp_mul_2(&tmp1, &tmp1)) != MP_OKAY) \{ |
4545 119 goto ERR; | 4572 120 goto ERR; |
4546 120 \} | 4573 121 \} |
4547 121 if ((res = mp_add(&tmp1, &a0, &tmp1)) != MP_OKAY) \{ | 4574 122 if ((res = mp_add(&tmp1, &a0, &tmp1)) != MP_OKAY) \{ |
4548 122 goto ERR; | 4575 123 goto ERR; |
4549 123 \} | 4576 124 \} |
4550 124 | 4577 125 |
4551 125 if ((res = mp_mul_2(&b2, &tmp2)) != MP_OKAY) \{ | 4578 126 if ((res = mp_mul_2(&b2, &tmp2)) != MP_OKAY) \{ |
4552 126 goto ERR; | 4579 127 goto ERR; |
4553 127 \} | 4580 128 \} |
4554 128 if ((res = mp_add(&tmp2, &b1, &tmp2)) != MP_OKAY) \{ | 4581 129 if ((res = mp_add(&tmp2, &b1, &tmp2)) != MP_OKAY) \{ |
4555 129 goto ERR; | 4582 130 goto ERR; |
4556 130 \} | 4583 131 \} |
4557 131 if ((res = mp_mul_2(&tmp2, &tmp2)) != MP_OKAY) \{ | 4584 132 if ((res = mp_mul_2(&tmp2, &tmp2)) != MP_OKAY) \{ |
4558 132 goto ERR; | 4585 133 goto ERR; |
4559 133 \} | 4586 134 \} |
4560 134 if ((res = mp_add(&tmp2, &b0, &tmp2)) != MP_OKAY) \{ | 4587 135 if ((res = mp_add(&tmp2, &b0, &tmp2)) != MP_OKAY) \{ |
4561 135 goto ERR; | 4588 136 goto ERR; |
4562 136 \} | 4589 137 \} |
4563 137 | 4590 138 |
4564 138 if ((res = mp_mul(&tmp1, &tmp2, &w3)) != MP_OKAY) \{ | 4591 139 if ((res = mp_mul(&tmp1, &tmp2, &w3)) != MP_OKAY) \{ |
4565 139 goto ERR; | 4592 140 goto ERR; |
4566 140 \} | 4593 141 \} |
4567 141 | 4594 142 |
4568 142 | 4595 143 |
4569 143 /* w2 = (a2 + a1 + a0)(b2 + b1 + b0) */ | 4596 144 /* w2 = (a2 + a1 + a0)(b2 + b1 + b0) */ |
4570 144 if ((res = mp_add(&a2, &a1, &tmp1)) != MP_OKAY) \{ | 4597 145 if ((res = mp_add(&a2, &a1, &tmp1)) != MP_OKAY) \{ |
4571 145 goto ERR; | 4598 146 goto ERR; |
4572 146 \} | 4599 147 \} |
4573 147 if ((res = mp_add(&tmp1, &a0, &tmp1)) != MP_OKAY) \{ | 4600 148 if ((res = mp_add(&tmp1, &a0, &tmp1)) != MP_OKAY) \{ |
4574 148 goto ERR; | 4601 149 goto ERR; |
4575 149 \} | 4602 150 \} |
4576 150 if ((res = mp_add(&b2, &b1, &tmp2)) != MP_OKAY) \{ | 4603 151 if ((res = mp_add(&b2, &b1, &tmp2)) != MP_OKAY) \{ |
4577 151 goto ERR; | 4604 152 goto ERR; |
4578 152 \} | 4605 153 \} |
4579 153 if ((res = mp_add(&tmp2, &b0, &tmp2)) != MP_OKAY) \{ | 4606 154 if ((res = mp_add(&tmp2, &b0, &tmp2)) != MP_OKAY) \{ |
4580 154 goto ERR; | 4607 155 goto ERR; |
4581 155 \} | 4608 156 \} |
4582 156 if ((res = mp_mul(&tmp1, &tmp2, &w2)) != MP_OKAY) \{ | 4609 157 if ((res = mp_mul(&tmp1, &tmp2, &w2)) != MP_OKAY) \{ |
4583 157 goto ERR; | 4610 158 goto ERR; |
4584 158 \} | 4611 159 \} |
4585 159 | 4612 160 |
4586 160 /* now solve the matrix | 4613 161 /* now solve the matrix |
4587 161 | 4614 162 |
4588 162 0 0 0 0 1 | 4615 163 0 0 0 0 1 |
4589 163 1 2 4 8 16 | 4616 164 1 2 4 8 16 |
4590 164 1 1 1 1 1 | 4617 165 1 1 1 1 1 |
4591 165 16 8 4 2 1 | 4618 166 16 8 4 2 1 |
4592 166 1 0 0 0 0 | 4619 167 1 0 0 0 0 |
4593 167 | 4620 168 |
4594 168 using 12 subtractions, 4 shifts, | 4621 169 using 12 subtractions, 4 shifts, |
4595 169 2 small divisions and 1 small multiplication | 4622 170 2 small divisions and 1 small multiplication |
4596 170 */ | 4623 171 */ |
4597 171 | 4624 172 |
4598 172 /* r1 - r4 */ | 4625 173 /* r1 - r4 */ |
4599 173 if ((res = mp_sub(&w1, &w4, &w1)) != MP_OKAY) \{ | 4626 174 if ((res = mp_sub(&w1, &w4, &w1)) != MP_OKAY) \{ |
4600 174 goto ERR; | 4627 175 goto ERR; |
4601 175 \} | 4628 176 \} |
4602 176 /* r3 - r0 */ | 4629 177 /* r3 - r0 */ |
4603 177 if ((res = mp_sub(&w3, &w0, &w3)) != MP_OKAY) \{ | 4630 178 if ((res = mp_sub(&w3, &w0, &w3)) != MP_OKAY) \{ |
4604 178 goto ERR; | 4631 179 goto ERR; |
4605 179 \} | 4632 180 \} |
4606 180 /* r1/2 */ | 4633 181 /* r1/2 */ |
4607 181 if ((res = mp_div_2(&w1, &w1)) != MP_OKAY) \{ | 4634 182 if ((res = mp_div_2(&w1, &w1)) != MP_OKAY) \{ |
4608 182 goto ERR; | 4635 183 goto ERR; |
4609 183 \} | 4636 184 \} |
4610 184 /* r3/2 */ | 4637 185 /* r3/2 */ |
4611 185 if ((res = mp_div_2(&w3, &w3)) != MP_OKAY) \{ | 4638 186 if ((res = mp_div_2(&w3, &w3)) != MP_OKAY) \{ |
4612 186 goto ERR; | 4639 187 goto ERR; |
4613 187 \} | 4640 188 \} |
4614 188 /* r2 - r0 - r4 */ | 4641 189 /* r2 - r0 - r4 */ |
4615 189 if ((res = mp_sub(&w2, &w0, &w2)) != MP_OKAY) \{ | 4642 190 if ((res = mp_sub(&w2, &w0, &w2)) != MP_OKAY) \{ |
4616 190 goto ERR; | 4643 191 goto ERR; |
4617 191 \} | 4644 192 \} |
4618 192 if ((res = mp_sub(&w2, &w4, &w2)) != MP_OKAY) \{ | 4645 193 if ((res = mp_sub(&w2, &w4, &w2)) != MP_OKAY) \{ |
4619 193 goto ERR; | 4646 194 goto ERR; |
4620 194 \} | 4647 195 \} |
4621 195 /* r1 - r2 */ | 4648 196 /* r1 - r2 */ |
4622 196 if ((res = mp_sub(&w1, &w2, &w1)) != MP_OKAY) \{ | 4649 197 if ((res = mp_sub(&w1, &w2, &w1)) != MP_OKAY) \{ |
4623 197 goto ERR; | 4650 198 goto ERR; |
4624 198 \} | 4651 199 \} |
4625 199 /* r3 - r2 */ | 4652 200 /* r3 - r2 */ |
4626 200 if ((res = mp_sub(&w3, &w2, &w3)) != MP_OKAY) \{ | 4653 201 if ((res = mp_sub(&w3, &w2, &w3)) != MP_OKAY) \{ |
4627 201 goto ERR; | 4654 202 goto ERR; |
4628 202 \} | 4655 203 \} |
4629 203 /* r1 - 8r0 */ | 4656 204 /* r1 - 8r0 */ |
4630 204 if ((res = mp_mul_2d(&w0, 3, &tmp1)) != MP_OKAY) \{ | 4657 205 if ((res = mp_mul_2d(&w0, 3, &tmp1)) != MP_OKAY) \{ |
4631 205 goto ERR; | 4658 206 goto ERR; |
4632 206 \} | 4659 207 \} |
4633 207 if ((res = mp_sub(&w1, &tmp1, &w1)) != MP_OKAY) \{ | 4660 208 if ((res = mp_sub(&w1, &tmp1, &w1)) != MP_OKAY) \{ |
4634 208 goto ERR; | 4661 209 goto ERR; |
4635 209 \} | 4662 210 \} |
4636 210 /* r3 - 8r4 */ | 4663 211 /* r3 - 8r4 */ |
4637 211 if ((res = mp_mul_2d(&w4, 3, &tmp1)) != MP_OKAY) \{ | 4664 212 if ((res = mp_mul_2d(&w4, 3, &tmp1)) != MP_OKAY) \{ |
4638 212 goto ERR; | 4665 213 goto ERR; |
4639 213 \} | 4666 214 \} |
4640 214 if ((res = mp_sub(&w3, &tmp1, &w3)) != MP_OKAY) \{ | 4667 215 if ((res = mp_sub(&w3, &tmp1, &w3)) != MP_OKAY) \{ |
4641 215 goto ERR; | 4668 216 goto ERR; |
4642 216 \} | 4669 217 \} |
4643 217 /* 3r2 - r1 - r3 */ | 4670 218 /* 3r2 - r1 - r3 */ |
4644 218 if ((res = mp_mul_d(&w2, 3, &w2)) != MP_OKAY) \{ | 4671 219 if ((res = mp_mul_d(&w2, 3, &w2)) != MP_OKAY) \{ |
4645 219 goto ERR; | 4672 220 goto ERR; |
4646 220 \} | 4673 221 \} |
4647 221 if ((res = mp_sub(&w2, &w1, &w2)) != MP_OKAY) \{ | 4674 222 if ((res = mp_sub(&w2, &w1, &w2)) != MP_OKAY) \{ |
4648 222 goto ERR; | 4675 223 goto ERR; |
4649 223 \} | 4676 224 \} |
4650 224 if ((res = mp_sub(&w2, &w3, &w2)) != MP_OKAY) \{ | 4677 225 if ((res = mp_sub(&w2, &w3, &w2)) != MP_OKAY) \{ |
4651 225 goto ERR; | 4678 226 goto ERR; |
4652 226 \} | 4679 227 \} |
4653 227 /* r1 - r2 */ | 4680 228 /* r1 - r2 */ |
4654 228 if ((res = mp_sub(&w1, &w2, &w1)) != MP_OKAY) \{ | 4681 229 if ((res = mp_sub(&w1, &w2, &w1)) != MP_OKAY) \{ |
4655 229 goto ERR; | 4682 230 goto ERR; |
4656 230 \} | 4683 231 \} |
4657 231 /* r3 - r2 */ | 4684 232 /* r3 - r2 */ |
4658 232 if ((res = mp_sub(&w3, &w2, &w3)) != MP_OKAY) \{ | 4685 233 if ((res = mp_sub(&w3, &w2, &w3)) != MP_OKAY) \{ |
4659 233 goto ERR; | 4686 234 goto ERR; |
4660 234 \} | 4687 235 \} |
4661 235 /* r1/3 */ | 4688 236 /* r1/3 */ |
4662 236 if ((res = mp_div_3(&w1, &w1, NULL)) != MP_OKAY) \{ | 4689 237 if ((res = mp_div_3(&w1, &w1, NULL)) != MP_OKAY) \{ |
4663 237 goto ERR; | 4690 238 goto ERR; |
4664 238 \} | 4691 239 \} |
4665 239 /* r3/3 */ | 4692 240 /* r3/3 */ |
4666 240 if ((res = mp_div_3(&w3, &w3, NULL)) != MP_OKAY) \{ | 4693 241 if ((res = mp_div_3(&w3, &w3, NULL)) != MP_OKAY) \{ |
4667 241 goto ERR; | 4694 242 goto ERR; |
4668 242 \} | 4695 243 \} |
4669 243 | 4696 244 |
4670 244 /* at this point shift W[n] by B*n */ | 4697 245 /* at this point shift W[n] by B*n */ |
4671 245 if ((res = mp_lshd(&w1, 1*B)) != MP_OKAY) \{ | 4698 246 if ((res = mp_lshd(&w1, 1*B)) != MP_OKAY) \{ |
4672 246 goto ERR; | 4699 247 goto ERR; |
4673 247 \} | 4700 248 \} |
4674 248 if ((res = mp_lshd(&w2, 2*B)) != MP_OKAY) \{ | 4701 249 if ((res = mp_lshd(&w2, 2*B)) != MP_OKAY) \{ |
4675 249 goto ERR; | 4702 250 goto ERR; |
4676 250 \} | 4703 251 \} |
4677 251 if ((res = mp_lshd(&w3, 3*B)) != MP_OKAY) \{ | 4704 252 if ((res = mp_lshd(&w3, 3*B)) != MP_OKAY) \{ |
4678 252 goto ERR; | 4705 253 goto ERR; |
4679 253 \} | 4706 254 \} |
4680 254 if ((res = mp_lshd(&w4, 4*B)) != MP_OKAY) \{ | 4707 255 if ((res = mp_lshd(&w4, 4*B)) != MP_OKAY) \{ |
4681 255 goto ERR; | 4708 256 goto ERR; |
4682 256 \} | 4709 257 \} |
4683 257 | 4710 258 |
4684 258 if ((res = mp_add(&w0, &w1, c)) != MP_OKAY) \{ | 4711 259 if ((res = mp_add(&w0, &w1, c)) != MP_OKAY) \{ |
4685 259 goto ERR; | 4712 260 goto ERR; |
4686 260 \} | 4713 261 \} |
4687 261 if ((res = mp_add(&w2, &w3, &tmp1)) != MP_OKAY) \{ | 4714 262 if ((res = mp_add(&w2, &w3, &tmp1)) != MP_OKAY) \{ |
4688 262 goto ERR; | 4715 263 goto ERR; |
4689 263 \} | 4716 264 \} |
4690 264 if ((res = mp_add(&w4, &tmp1, &tmp1)) != MP_OKAY) \{ | 4717 265 if ((res = mp_add(&w4, &tmp1, &tmp1)) != MP_OKAY) \{ |
4691 265 goto ERR; | 4718 266 goto ERR; |
4692 266 \} | 4719 267 \} |
4693 267 if ((res = mp_add(&tmp1, c, c)) != MP_OKAY) \{ | 4720 268 if ((res = mp_add(&tmp1, c, c)) != MP_OKAY) \{ |
4694 268 goto ERR; | 4721 269 goto ERR; |
4695 269 \} | 4722 270 \} |
4696 270 | 4723 271 |
4697 271 ERR: | 4724 272 ERR: |
4698 272 mp_clear_multi(&w0, &w1, &w2, &w3, &w4, | 4725 273 mp_clear_multi(&w0, &w1, &w2, &w3, &w4, |
4699 273 &a0, &a1, &a2, &b0, &b1, | 4726 274 &a0, &a1, &a2, &b0, &b1, |
4700 274 &b2, &tmp1, &tmp2, NULL); | 4727 275 &b2, &tmp1, &tmp2, NULL); |
4701 275 return res; | 4728 276 return res; |
4702 276 \} | 4729 277 \} |
4703 277 | 4730 278 |
4704 278 #endif | 4731 279 #endif |
4705 \end{alltt} | 4732 \end{alltt} |
4706 \end{small} | 4733 \end{small} |
4707 | 4734 |
4708 -- Comments to be added during editing phase. | 4735 The first obvious thing to note is that this algorithm is complicated. The complexity is worth it if you are multiplying very |
4736 large numbers. For example, a 10,000 digit multiplication takes approximaly 99,282,205 fewer single precision multiplications with | |
4737 Toom--Cook than a Comba or baseline approach (this is a savings of more than 99$\%$). For most ``crypto'' sized numbers this | |
4738 algorithm is not practical as Karatsuba has a much lower cutoff point. | |
4739 | |
4740 First we split $a$ and $b$ into three roughly equal portions. This has been accomplished (lines 40 to 69) with | |
4741 combinations of mp\_rshd() and mp\_mod\_2d() function calls. At this point $a = a2 \cdot \beta^2 + a1 \cdot \beta + a0$ and similiarly | |
4742 for $b$. | |
4743 | |
4744 Next we compute the five points $w0, w1, w2, w3$ and $w4$. Recall that $w0$ and $w4$ can be computed directly from the portions so | |
4745 we get those out of the way first (lines 72 and 77). Next we compute $w1, w2$ and $w3$ using Horners method. | |
4746 | |
4747 After this point we solve for the actual values of $w1, w2$ and $w3$ by reducing the $5 \times 5$ system which is relatively | |
4748 straight forward. | |
4709 | 4749 |
4710 \subsection{Signed Multiplication} | 4750 \subsection{Signed Multiplication} |
4711 Now that algorithms to handle multiplications of every useful dimensions have been developed, a rather simple finishing touch is required. So far all | 4751 Now that algorithms to handle multiplications of every useful dimensions have been developed, a rather simple finishing touch is required. So far all |
4712 of the multiplication algorithms have been unsigned multiplications which leaves only a signed multiplication algorithm to be established. | 4752 of the multiplication algorithms have been unsigned multiplications which leaves only a signed multiplication algorithm to be established. |
4713 | 4753 |
4714 \newpage\begin{figure}[!here] | 4754 \begin{figure}[!here] |
4715 \begin{small} | 4755 \begin{small} |
4716 \begin{center} | 4756 \begin{center} |
4717 \begin{tabular}{l} | 4757 \begin{tabular}{l} |
4718 \hline Algorithm \textbf{mp\_mul}. \\ | 4758 \hline Algorithm \textbf{mp\_mul}. \\ |
4719 \textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ | 4759 \textbf{Input}. mp\_int $a$ and mp\_int $b$ \\ |
4842 | 4882 |
4843 \subsection{The Baseline Squaring Algorithm} | 4883 \subsection{The Baseline Squaring Algorithm} |
4844 The baseline squaring algorithm is meant to be a catch-all squaring algorithm. It will handle any of the input sizes that the faster routines | 4884 The baseline squaring algorithm is meant to be a catch-all squaring algorithm. It will handle any of the input sizes that the faster routines |
4845 will not handle. | 4885 will not handle. |
4846 | 4886 |
4847 \newpage\begin{figure}[!here] | 4887 \begin{figure}[!here] |
4848 \begin{small} | 4888 \begin{small} |
4849 \begin{center} | 4889 \begin{center} |
4850 \begin{tabular}{l} | 4890 \begin{tabular}{l} |
4851 \hline Algorithm \textbf{s\_mp\_sqr}. \\ | 4891 \hline Algorithm \textbf{s\_mp\_sqr}. \\ |
4852 \textbf{Input}. mp\_int $a$ \\ | 4892 \textbf{Input}. mp\_int $a$ \\ |
4902 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_sqr.c | 4942 \hspace{-5.1mm}{\bf File}: bn\_s\_mp\_sqr.c |
4903 \vspace{-3mm} | 4943 \vspace{-3mm} |
4904 \begin{alltt} | 4944 \begin{alltt} |
4905 016 | 4945 016 |
4906 017 /* low level squaring, b = a*a, HAC pp.596-597, Algorithm 14.16 */ | 4946 017 /* low level squaring, b = a*a, HAC pp.596-597, Algorithm 14.16 */ |
4907 018 int | 4947 018 int s_mp_sqr (mp_int * a, mp_int * b) |
4908 019 s_mp_sqr (mp_int * a, mp_int * b) | 4948 019 \{ |
4909 020 \{ | 4949 020 mp_int t; |
4910 021 mp_int t; | 4950 021 int res, ix, iy, pa; |
4911 022 int res, ix, iy, pa; | 4951 022 mp_word r; |
4912 023 mp_word r; | 4952 023 mp_digit u, tmpx, *tmpt; |
4913 024 mp_digit u, tmpx, *tmpt; | 4953 024 |
4914 025 | 4954 025 pa = a->used; |
4915 026 pa = a->used; | 4955 026 if ((res = mp_init_size (&t, 2*pa + 1)) != MP_OKAY) \{ |
4916 027 if ((res = mp_init_size (&t, 2*pa + 1)) != MP_OKAY) \{ | 4956 027 return res; |
4917 028 return res; | 4957 028 \} |
4918 029 \} | 4958 029 |
4919 030 | 4959 030 /* default used is maximum possible size */ |
4920 031 /* default used is maximum possible size */ | 4960 031 t.used = 2*pa + 1; |
4921 032 t.used = 2*pa + 1; | 4961 032 |
4922 033 | 4962 033 for (ix = 0; ix < pa; ix++) \{ |
4923 034 for (ix = 0; ix < pa; ix++) \{ | 4963 034 /* first calculate the digit at 2*ix */ |
4924 035 /* first calculate the digit at 2*ix */ | 4964 035 /* calculate double precision result */ |
4925 036 /* calculate double precision result */ | 4965 036 r = ((mp_word) t.dp[2*ix]) + |
4926 037 r = ((mp_word) t.dp[2*ix]) + | 4966 037 ((mp_word)a->dp[ix])*((mp_word)a->dp[ix]); |
4927 038 ((mp_word)a->dp[ix])*((mp_word)a->dp[ix]); | 4967 038 |
4928 039 | 4968 039 /* store lower part in result */ |
4929 040 /* store lower part in result */ | 4969 040 t.dp[ix+ix] = (mp_digit) (r & ((mp_word) MP_MASK)); |
4930 041 t.dp[ix+ix] = (mp_digit) (r & ((mp_word) MP_MASK)); | 4970 041 |
4931 042 | 4971 042 /* get the carry */ |
4932 043 /* get the carry */ | 4972 043 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); |
4933 044 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); | 4973 044 |
4934 045 | 4974 045 /* left hand side of A[ix] * A[iy] */ |
4935 046 /* left hand side of A[ix] * A[iy] */ | 4975 046 tmpx = a->dp[ix]; |
4936 047 tmpx = a->dp[ix]; | 4976 047 |
4937 048 | 4977 048 /* alias for where to store the results */ |
4938 049 /* alias for where to store the results */ | 4978 049 tmpt = t.dp + (2*ix + 1); |
4939 050 tmpt = t.dp + (2*ix + 1); | 4979 050 |
4940 051 | 4980 051 for (iy = ix + 1; iy < pa; iy++) \{ |
4941 052 for (iy = ix + 1; iy < pa; iy++) \{ | 4981 052 /* first calculate the product */ |
4942 053 /* first calculate the product */ | 4982 053 r = ((mp_word)tmpx) * ((mp_word)a->dp[iy]); |
4943 054 r = ((mp_word)tmpx) * ((mp_word)a->dp[iy]); | 4983 054 |
4944 055 | 4984 055 /* now calculate the double precision result, note we use |
4945 056 /* now calculate the double precision result, note we use | 4985 056 * addition instead of *2 since it's easier to optimize |
4946 057 * addition instead of *2 since it's easier to optimize | 4986 057 */ |
4947 058 */ | 4987 058 r = ((mp_word) *tmpt) + r + r + ((mp_word) u); |
4948 059 r = ((mp_word) *tmpt) + r + r + ((mp_word) u); | 4988 059 |
4949 060 | 4989 060 /* store lower part */ |
4950 061 /* store lower part */ | 4990 061 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); |
4951 062 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); | 4991 062 |
4952 063 | 4992 063 /* get carry */ |
4953 064 /* get carry */ | 4993 064 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); |
4954 065 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); | 4994 065 \} |
4955 066 \} | 4995 066 /* propagate upwards */ |
4956 067 /* propagate upwards */ | 4996 067 while (u != ((mp_digit) 0)) \{ |
4957 068 while (u != ((mp_digit) 0)) \{ | 4997 068 r = ((mp_word) *tmpt) + ((mp_word) u); |
4958 069 r = ((mp_word) *tmpt) + ((mp_word) u); | 4998 069 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); |
4959 070 *tmpt++ = (mp_digit) (r & ((mp_word) MP_MASK)); | 4999 070 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); |
4960 071 u = (mp_digit)(r >> ((mp_word) DIGIT_BIT)); | 5000 071 \} |
4961 072 \} | 5001 072 \} |
4962 073 \} | 5002 073 |
4963 074 | 5003 074 mp_clamp (&t); |
4964 075 mp_clamp (&t); | 5004 075 mp_exch (&t, b); |
4965 076 mp_exch (&t, b); | 5005 076 mp_clear (&t); |
4966 077 mp_clear (&t); | 5006 077 return MP_OKAY; |
4967 078 return MP_OKAY; | 5007 078 \} |
4968 079 \} | 5008 079 #endif |
4969 080 #endif | |
4970 \end{alltt} | 5009 \end{alltt} |
4971 \end{small} | 5010 \end{small} |
4972 | 5011 |
4973 Inside the outer loop (\textit{see line 34}) the square term is calculated on line 37. Line 44 extracts the carry from the square | 5012 Inside the outer loop (line 33) the square term is calculated on line 36. The carry (line 43) has been |
4974 term. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized on lines 47 and 50 respectively. The doubling is performed using two | 5013 extracted from the mp\_word accumulator using a right shift. Aliases for $a_{ix}$ and $t_{ix+iy}$ are initialized |
4975 additions (\textit{see line 59}) since it is usually faster than shifting,if not at least as fast. | 5014 (lines 46 and 49) to simplify the inner loop. The doubling is performed using two |
5015 additions (line 58) since it is usually faster than shifting, if not at least as fast. | |
5016 | |
5017 The important observation is that the inner loop does not begin at $iy = 0$ like for multiplication. As such the inner loops | |
5018 get progressively shorter as the algorithm proceeds. This is what leads to the savings compared to using a multiplication to | |
5019 square a number. | |
4976 | 5020 |
4977 \subsection{Faster Squaring by the ``Comba'' Method} | 5021 \subsection{Faster Squaring by the ``Comba'' Method} |
4978 A major drawback to the baseline method is the requirement for single precision shifting inside the $O(n^2)$ nested loop. Squaring has an additional | 5022 A major drawback to the baseline method is the requirement for single precision shifting inside the $O(n^2)$ nested loop. Squaring has an additional |
4979 drawback that it must double the product inside the inner loop as well. As for multiplication, the Comba technique can be used to eliminate these | 5023 drawback that it must double the product inside the inner loop as well. As for multiplication, the Comba technique can be used to eliminate these |
4980 performance hazards. | 5024 performance hazards. |
4982 The first obvious solution is to make an array of mp\_words which will hold all of the columns. This will indeed eliminate all of the carry | 5026 The first obvious solution is to make an array of mp\_words which will hold all of the columns. This will indeed eliminate all of the carry |
4983 propagation operations from the inner loop. However, the inner product must still be doubled $O(n^2)$ times. The solution stems from the simple fact | 5027 propagation operations from the inner loop. However, the inner product must still be doubled $O(n^2)$ times. The solution stems from the simple fact |
4984 that $2a + 2b + 2c = 2(a + b + c)$. That is the sum of all of the double products is equal to double the sum of all the products. For example, | 5028 that $2a + 2b + 2c = 2(a + b + c)$. That is the sum of all of the double products is equal to double the sum of all the products. For example, |
4985 $ab + ba + ac + ca = 2ab + 2ac = 2(ab + ac)$. | 5029 $ab + ba + ac + ca = 2ab + 2ac = 2(ab + ac)$. |
4986 | 5030 |
4987 However, we cannot simply double all of the columns, since the squares appear only once per row. The most practical solution is to have two mp\_word | 5031 However, we cannot simply double all of the columns, since the squares appear only once per row. The most practical solution is to have two |
4988 arrays. One array will hold the squares and the other array will hold the double products. With both arrays the doubling and carry propagation can be | 5032 mp\_word arrays. One array will hold the squares and the other array will hold the double products. With both arrays the doubling and |
4989 moved to a $O(n)$ work level outside the $O(n^2)$ level. | 5033 carry propagation can be moved to a $O(n)$ work level outside the $O(n^2)$ level. In this case, we have an even simpler solution in mind. |
4990 | 5034 |
4991 \newpage\begin{figure}[!here] | 5035 \newpage\begin{figure}[!here] |
4992 \begin{small} | 5036 \begin{small} |
4993 \begin{center} | 5037 \begin{center} |
4994 \begin{tabular}{l} | 5038 \begin{tabular}{l} |
4995 \hline Algorithm \textbf{fast\_s\_mp\_sqr}. \\ | 5039 \hline Algorithm \textbf{fast\_s\_mp\_sqr}. \\ |
4996 \textbf{Input}. mp\_int $a$ \\ | 5040 \textbf{Input}. mp\_int $a$ \\ |
4997 \textbf{Output}. $b \leftarrow a^2$ \\ | 5041 \textbf{Output}. $b \leftarrow a^2$ \\ |
4998 \hline \\ | 5042 \hline \\ |
4999 Place two arrays of \textbf{MP\_WARRAY} mp\_words named $\hat W$ and $\hat {X}$ on the stack. \\ | 5043 Place an array of \textbf{MP\_WARRAY} mp\_digits named $W$ on the stack. \\ |
5000 1. If $b.alloc < 2a.used + 1$ then grow $b$ to $2a.used + 1$ digits. (\textit{mp\_grow}). \\ | 5044 1. If $b.alloc < 2a.used + 1$ then grow $b$ to $2a.used + 1$ digits. (\textit{mp\_grow}). \\ |
5001 2. If step 1 failed return(\textit{MP\_MEM}). \\ | 5045 2. If step 1 failed return(\textit{MP\_MEM}). \\ |
5002 3. for $ix$ from $0$ to $2a.used + 1$ do \\ | |
5003 \hspace{3mm}3.1 $\hat W_{ix} \leftarrow 0$ \\ | |
5004 \hspace{3mm}3.2 $\hat {X}_{ix} \leftarrow 0$ \\ | |
5005 4. for $ix$ from $0$ to $a.used - 1$ do \\ | |
5006 \hspace{3mm}Compute the square.\\ | |
5007 \hspace{3mm}4.1 $\hat {X}_{ix+ix} \leftarrow \left ( a_{ix} \right )^2$ \\ | |
5008 \\ | 5046 \\ |
5009 \hspace{3mm}Compute the double products.\\ | 5047 3. $pa \leftarrow 2 \cdot a.used$ \\ |
5010 \hspace{3mm}4.2 for $iy$ from $ix + 1$ to $a.used - 1$ do \\ | 5048 4. $\hat W1 \leftarrow 0$ \\ |
5011 \hspace{6mm}4.2.1 $\hat W_{ix+iy} \leftarrow \hat W_{ix+iy} + a_{ix}a_{iy}$ \\ | 5049 5. for $ix$ from $0$ to $pa - 1$ do \\ |
5012 5. $oldused \leftarrow b.used$ \\ | 5050 \hspace{3mm}5.1 $\_ \hat W \leftarrow 0$ \\ |
5013 6. $b.used \leftarrow 2a.used + 1$ \\ | 5051 \hspace{3mm}5.2 $ty \leftarrow \mbox{MIN}(a.used - 1, ix)$ \\ |
5052 \hspace{3mm}5.3 $tx \leftarrow ix - ty$ \\ | |
5053 \hspace{3mm}5.4 $iy \leftarrow \mbox{MIN}(a.used - tx, ty + 1)$ \\ | |
5054 \hspace{3mm}5.5 $iy \leftarrow \mbox{MIN}(iy, \lfloor \left (ty - tx + 1 \right )/2 \rfloor)$ \\ | |
5055 \hspace{3mm}5.6 for $iz$ from $0$ to $iz - 1$ do \\ | |
5056 \hspace{6mm}5.6.1 $\_ \hat W \leftarrow \_ \hat W + a_{tx + iz}a_{ty - iz}$ \\ | |
5057 \hspace{3mm}5.7 $\_ \hat W \leftarrow 2 \cdot \_ \hat W + \hat W1$ \\ | |
5058 \hspace{3mm}5.8 if $ix$ is even then \\ | |
5059 \hspace{6mm}5.8.1 $\_ \hat W \leftarrow \_ \hat W + \left ( a_{\lfloor ix/2 \rfloor}\right )^2$ \\ | |
5060 \hspace{3mm}5.9 $W_{ix} \leftarrow \_ \hat W (\mbox{mod }\beta)$ \\ | |
5061 \hspace{3mm}5.10 $\hat W1 \leftarrow \lfloor \_ \hat W / \beta \rfloor$ \\ | |
5014 \\ | 5062 \\ |
5015 Double the products and propagate the carries simultaneously. \\ | 5063 6. $oldused \leftarrow b.used$ \\ |
5016 7. $\hat W_0 \leftarrow 2 \hat W_0 + \hat {X}_0$ \\ | 5064 7. $b.used \leftarrow 2 \cdot a.used$ \\ |
5017 8. for $ix$ from $1$ to $2a.used$ do \\ | 5065 8. for $ix$ from $0$ to $pa - 1$ do \\ |
5018 \hspace{3mm}8.1 $\hat W_{ix} \leftarrow 2 \hat W_{ix} + \hat {X}_{ix}$ \\ | 5066 \hspace{3mm}8.1 $b_{ix} \leftarrow W_{ix}$ \\ |
5019 \hspace{3mm}8.2 $\hat W_{ix} \leftarrow \hat W_{ix} + \lfloor \hat W_{ix - 1} / \beta \rfloor$ \\ | 5067 9. for $ix$ from $pa$ to $oldused - 1$ do \\ |
5020 \hspace{3mm}8.3 $b_{ix-1} \leftarrow W_{ix-1} \mbox{ (mod }\beta\mbox{)}$ \\ | 5068 \hspace{3mm}9.1 $b_{ix} \leftarrow 0$ \\ |
5021 9. $b_{2a.used} \leftarrow \hat W_{2a.used} \mbox{ (mod }\beta\mbox{)}$ \\ | 5069 10. Clamp excess digits from $b$. (\textit{mp\_clamp}) \\ |
5022 10. if $2a.used + 1 < oldused$ then do \\ | 5070 11. Return(\textit{MP\_OKAY}). \\ |
5023 \hspace{3mm}10.1 for $ix$ from $2a.used + 1$ to $oldused$ do \\ | |
5024 \hspace{6mm}10.1.1 $b_{ix} \leftarrow 0$ \\ | |
5025 11. Clamp excess digits from $b$. (\textit{mp\_clamp}) \\ | |
5026 12. Return(\textit{MP\_OKAY}). \\ | |
5027 \hline | 5071 \hline |
5028 \end{tabular} | 5072 \end{tabular} |
5029 \end{center} | 5073 \end{center} |
5030 \end{small} | 5074 \end{small} |
5031 \caption{Algorithm fast\_s\_mp\_sqr} | 5075 \caption{Algorithm fast\_s\_mp\_sqr} |
5032 \end{figure} | 5076 \end{figure} |
5033 | 5077 |
5034 \textbf{Algorithm fast\_s\_mp\_sqr.} | 5078 \textbf{Algorithm fast\_s\_mp\_sqr.} |
5035 This algorithm computes the square of an input using the Comba technique. It is designed to be a replacement for algorithm s\_mp\_sqr when | 5079 This algorithm computes the square of an input using the Comba technique. It is designed to be a replacement for algorithm |
5036 the number of input digits is less than \textbf{MP\_WARRAY} and less than $\delta \over 2$. | 5080 s\_mp\_sqr when the number of input digits is less than \textbf{MP\_WARRAY} and less than $\delta \over 2$. |
5037 | 5081 This algorithm is very similar to the Comba multiplier except with a few key differences we shall make note of. |
5038 This routine requires two arrays of mp\_words to be placed on the stack. The first array $\hat W$ will hold the double products and the second | 5082 |
5039 array $\hat X$ will hold the squares. Though only at most $MP\_WARRAY \over 2$ words of $\hat X$ are used, it has proven faster on most | 5083 First, we have an accumulator and carry variables $\_ \hat W$ and $\hat W1$ respectively. This is because the inner loop |
5040 processors to simply make it a full size array. | 5084 products are to be doubled. If we had added the previous carry in we would be doubling too much. Next we perform an |
5041 | 5085 addition MIN condition on $iy$ (step 5.5) to prevent overlapping digits. For example, $a_3 \cdot a_5$ is equal |
5042 The loop on step 3 will zero the two arrays to prepare them for the squaring step. Step 4.1 computes the squares of the product. Note how | 5086 $a_5 \cdot a_3$. Whereas in the multiplication case we would have $5 < a.used$ and $3 \ge 0$ is maintained since we double the sum |
5043 it simply assigns the value into the $\hat X$ array. The nested loop on step 4.2 computes the doubles of the products. This loop | 5087 of the products just outside the inner loop we have to avoid doing this. This is also a good thing since we perform |
5044 computes the sum of the products for each column. They are not doubled until later. | 5088 fewer multiplications and the routine ends up being faster. |
5045 | 5089 |
5046 After the squaring loop, the products stored in $\hat W$ musted be doubled and the carries propagated forwards. It makes sense to do both | 5090 Finally the last difference is the addition of the ``square'' term outside the inner loop (step 5.8). We add in the square |
5047 operations at the same time. The expression $\hat W_{ix} \leftarrow 2 \hat W_{ix} + \hat {X}_{ix}$ computes the sum of the double product and the | 5091 only to even outputs and it is the square of the term at the $\lfloor ix / 2 \rfloor$ position. |
5048 squares in place. | |
5049 | 5092 |
5050 \vspace{+3mm}\begin{small} | 5093 \vspace{+3mm}\begin{small} |
5051 \hspace{-5.1mm}{\bf File}: bn\_fast\_s\_mp\_sqr.c | 5094 \hspace{-5.1mm}{\bf File}: bn\_fast\_s\_mp\_sqr.c |
5052 \vspace{-3mm} | 5095 \vspace{-3mm} |
5053 \begin{alltt} | 5096 \begin{alltt} |
5054 016 | 5097 016 |
5055 017 /* fast squaring | 5098 017 /* the jist of squaring... |
5056 018 * | 5099 018 * you do like mult except the offset of the tmpx [one that |
5057 019 * This is the comba method where the columns of the product | 5100 019 * starts closer to zero] can't equal the offset of tmpy. |
5058 020 * are computed first then the carries are computed. This | 5101 020 * So basically you set up iy like before then you min it with |
5059 021 * has the effect of making a very simple inner loop that | 5102 021 * (ty-tx) so that it never happens. You double all those |
5060 022 * is executed the most | 5103 022 * you add in the inner loop |
5061 023 * | 5104 023 |
5062 024 * W2 represents the outer products and W the inner. | 5105 024 After that loop you do the squares and add them in. |
5063 025 * | 5106 025 */ |
5064 026 * A further optimizations is made because the inner | 5107 026 |
5065 027 * products are of the form "A * B * 2". The *2 part does | 5108 027 int fast_s_mp_sqr (mp_int * a, mp_int * b) |
5066 028 * not need to be computed until the end which is good | 5109 028 \{ |
5067 029 * because 64-bit shifts are slow! | 5110 029 int olduse, res, pa, ix, iz; |
5068 030 * | 5111 030 mp_digit W[MP_WARRAY], *tmpx; |
5069 031 * Based on Algorithm 14.16 on pp.597 of HAC. | 5112 031 mp_word W1; |
5070 032 * | 5113 032 |
5071 033 */ | 5114 033 /* grow the destination as required */ |
5072 034 /* the jist of squaring... | 5115 034 pa = a->used + a->used; |
5073 035 | 5116 035 if (b->alloc < pa) \{ |
5074 036 you do like mult except the offset of the tmpx [one that starts closer to ze | 5117 036 if ((res = mp_grow (b, pa)) != MP_OKAY) \{ |
5075 ro] | 5118 037 return res; |
5076 037 can't equal the offset of tmpy. So basically you set up iy like before then | 5119 038 \} |
5077 you min it with | 5120 039 \} |
5078 038 (ty-tx) so that it never happens. You double all those you add in the inner | 5121 040 |
5079 loop | 5122 041 /* number of output digits to produce */ |
5080 039 | 5123 042 W1 = 0; |
5081 040 After that loop you do the squares and add them in. | 5124 043 for (ix = 0; ix < pa; ix++) \{ |
5082 041 | 5125 044 int tx, ty, iy; |
5083 042 Remove W2 and don't memset W | 5126 045 mp_word _W; |
5084 043 | 5127 046 mp_digit *tmpy; |
5085 044 */ | 5128 047 |
5086 045 | 5129 048 /* clear counter */ |
5087 046 int fast_s_mp_sqr (mp_int * a, mp_int * b) | 5130 049 _W = 0; |
5088 047 \{ | 5131 050 |
5089 048 int olduse, res, pa, ix, iz; | 5132 051 /* get offsets into the two bignums */ |
5090 049 mp_digit W[MP_WARRAY], *tmpx; | 5133 052 ty = MIN(a->used-1, ix); |
5091 050 mp_word W1; | 5134 053 tx = ix - ty; |
5092 051 | 5135 054 |
5093 052 /* grow the destination as required */ | 5136 055 /* setup temp aliases */ |
5094 053 pa = a->used + a->used; | 5137 056 tmpx = a->dp + tx; |
5095 054 if (b->alloc < pa) \{ | 5138 057 tmpy = a->dp + ty; |
5096 055 if ((res = mp_grow (b, pa)) != MP_OKAY) \{ | 5139 058 |
5097 056 return res; | 5140 059 /* this is the number of times the loop will iterrate, essentially |
5098 057 \} | 5141 060 while (tx++ < a->used && ty-- >= 0) \{ ... \} |
5099 058 \} | 5142 061 */ |
5100 059 | 5143 062 iy = MIN(a->used-tx, ty+1); |
5101 060 /* number of output digits to produce */ | 5144 063 |
5102 061 W1 = 0; | 5145 064 /* now for squaring tx can never equal ty |
5103 062 for (ix = 0; ix <= pa; ix++) \{ | 5146 065 * we halve the distance since they approach at a rate of 2x |
5104 063 int tx, ty, iy; | 5147 066 * and we have to round because odd cases need to be executed |
5105 064 mp_word _W; | 5148 067 */ |
5106 065 mp_digit *tmpy; | 5149 068 iy = MIN(iy, (ty-tx+1)>>1); |
5107 066 | |
5108 067 /* clear counter */ | |
5109 068 _W = 0; | |
5110 069 | 5150 069 |
5111 070 /* get offsets into the two bignums */ | 5151 070 /* execute loop */ |
5112 071 ty = MIN(a->used-1, ix); | 5152 071 for (iz = 0; iz < iy; iz++) \{ |
5113 072 tx = ix - ty; | 5153 072 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); |
5114 073 | 5154 073 \} |
5115 074 /* setup temp aliases */ | 5155 074 |
5116 075 tmpx = a->dp + tx; | 5156 075 /* double the inner product and add carry */ |
5117 076 tmpy = a->dp + ty; | 5157 076 _W = _W + _W + W1; |
5118 077 | 5158 077 |
5119 078 /* this is the number of times the loop will iterrate, essentially its | 5159 078 /* even columns have the square term in them */ |
5120 | 5160 079 if ((ix&1) == 0) \{ |
5121 079 while (tx++ < a->used && ty-- >= 0) \{ ... \} | 5161 080 _W += ((mp_word)a->dp[ix>>1])*((mp_word)a->dp[ix>>1]); |
5122 080 */ | 5162 081 \} |
5123 081 iy = MIN(a->used-tx, ty+1); | |
5124 082 | 5163 082 |
5125 083 /* now for squaring tx can never equal ty | 5164 083 /* store it */ |
5126 084 * we halve the distance since they approach at a rate of 2x | 5165 084 W[ix] = (mp_digit)(_W & MP_MASK); |
5127 085 * and we have to round because odd cases need to be executed | 5166 085 |
5128 086 */ | 5167 086 /* make next carry */ |
5129 087 iy = MIN(iy, (ty-tx+1)>>1); | 5168 087 W1 = _W >> ((mp_word)DIGIT_BIT); |
5130 088 | 5169 088 \} |
5131 089 /* execute loop */ | 5170 089 |
5132 090 for (iz = 0; iz < iy; iz++) \{ | 5171 090 /* setup dest */ |
5133 091 _W += ((mp_word)*tmpx++)*((mp_word)*tmpy--); | 5172 091 olduse = b->used; |
5134 092 \} | 5173 092 b->used = a->used+a->used; |
5135 093 | 5174 093 |
5136 094 /* double the inner product and add carry */ | 5175 094 \{ |
5137 095 _W = _W + _W + W1; | 5176 095 mp_digit *tmpb; |
5138 096 | 5177 096 tmpb = b->dp; |
5139 097 /* even columns have the square term in them */ | 5178 097 for (ix = 0; ix < pa; ix++) \{ |
5140 098 if ((ix&1) == 0) \{ | 5179 098 *tmpb++ = W[ix] & MP_MASK; |
5141 099 _W += ((mp_word)a->dp[ix>>1])*((mp_word)a->dp[ix>>1]); | 5180 099 \} |
5142 100 \} | 5181 100 |
5143 101 | 5182 101 /* clear unused digits [that existed in the old copy of c] */ |
5144 102 /* store it */ | 5183 102 for (; ix < olduse; ix++) \{ |
5145 103 W[ix] = _W; | 5184 103 *tmpb++ = 0; |
5146 104 | 5185 104 \} |
5147 105 /* make next carry */ | 5186 105 \} |
5148 106 W1 = _W >> ((mp_word)DIGIT_BIT); | 5187 106 mp_clamp (b); |
5149 107 \} | 5188 107 return MP_OKAY; |
5150 108 | 5189 108 \} |
5151 109 /* setup dest */ | 5190 109 #endif |
5152 110 olduse = b->used; | |
5153 111 b->used = a->used+a->used; | |
5154 112 | |
5155 113 \{ | |
5156 114 mp_digit *tmpb; | |
5157 115 tmpb = b->dp; | |
5158 116 for (ix = 0; ix < pa; ix++) \{ | |
5159 117 *tmpb++ = W[ix] & MP_MASK; | |
5160 118 \} | |
5161 119 | |
5162 120 /* clear unused digits [that existed in the old copy of c] */ | |
5163 121 for (; ix < olduse; ix++) \{ | |
5164 122 *tmpb++ = 0; | |
5165 123 \} | |
5166 124 \} | |
5167 125 mp_clamp (b); | |
5168 126 return MP_OKAY; | |
5169 127 \} | |
5170 128 #endif | |
5171 \end{alltt} | 5191 \end{alltt} |
5172 \end{small} | 5192 \end{small} |
5173 | 5193 |
5174 -- Write something deep and insightful later, Tom. | 5194 This implementation is essentially a copy of Comba multiplication with the appropriate changes added to make it faster for |
5195 the special case of squaring. | |
5175 | 5196 |
5176 \subsection{Polynomial Basis Squaring} | 5197 \subsection{Polynomial Basis Squaring} |
5177 The same algorithm that performs optimal polynomial basis multiplication can be used to perform polynomial basis squaring. The minor exception | 5198 The same algorithm that performs optimal polynomial basis multiplication can be used to perform polynomial basis squaring. The minor exception |
5178 is that $\zeta_y = f(y)g(y)$ is actually equivalent to $\zeta_y = f(y)^2$ since $f(y) = g(y)$. Instead of performing $2n + 1$ | 5199 is that $\zeta_y = f(y)g(y)$ is actually equivalent to $\zeta_y = f(y)^2$ since $f(y) = g(y)$. Instead of performing $2n + 1$ |
5179 multiplications to find the $\zeta$ relations, squaring operations are performed instead. | 5200 multiplications to find the $\zeta$ relations, squaring operations are performed instead. |
5387 | 5408 |
5388 By inlining the copy and shift operations the cutoff point for Karatsuba multiplication can be lowered. On the Athlon the cutoff point | 5409 By inlining the copy and shift operations the cutoff point for Karatsuba multiplication can be lowered. On the Athlon the cutoff point |
5389 is exactly at the point where Comba squaring can no longer be used (\textit{128 digits}). On slower processors such as the Intel P4 | 5410 is exactly at the point where Comba squaring can no longer be used (\textit{128 digits}). On slower processors such as the Intel P4 |
5390 it is actually below the Comba limit (\textit{at 110 digits}). | 5411 it is actually below the Comba limit (\textit{at 110 digits}). |
5391 | 5412 |
5392 This routine uses the same error trap coding style as mp\_karatsuba\_sqr. As the temporary variables are initialized errors are redirected to | 5413 This routine uses the same error trap coding style as mp\_karatsuba\_sqr. As the temporary variables are initialized errors are |
5393 the error trap higher up. If the algorithm completes without error the error code is set to \textbf{MP\_OKAY} and mp\_clears are executed normally. | 5414 redirected to the error trap higher up. If the algorithm completes without error the error code is set to \textbf{MP\_OKAY} and |
5394 | 5415 mp\_clears are executed normally. |
5395 \textit{Last paragraph sucks. re-write! -- Tom} | |
5396 | 5416 |
5397 \subsection{Toom-Cook Squaring} | 5417 \subsection{Toom-Cook Squaring} |
5398 The Toom-Cook squaring algorithm mp\_toom\_sqr is heavily based on the algorithm mp\_toom\_mul with the exception that squarings are used | 5418 The Toom-Cook squaring algorithm mp\_toom\_sqr is heavily based on the algorithm mp\_toom\_mul with the exception that squarings are used |
5399 instead of multiplication to find the five relations.. The reader is encouraged to read the description of the latter algorithm and try to | 5419 instead of multiplication to find the five relations. The reader is encouraged to read the description of the latter algorithm and try to |
5400 derive their own Toom-Cook squaring algorithm. | 5420 derive their own Toom-Cook squaring algorithm. |
5401 | 5421 |
5402 \subsection{High Level Squaring} | 5422 \subsection{High Level Squaring} |
5403 \newpage\begin{figure}[!here] | 5423 \newpage\begin{figure}[!here] |
5404 \begin{small} | 5424 \begin{small} |
5480 \section*{Exercises} | 5500 \section*{Exercises} |
5481 \begin{tabular}{cl} | 5501 \begin{tabular}{cl} |
5482 $\left [ 3 \right ] $ & Devise an efficient algorithm for selection of the radix point to handle inputs \\ | 5502 $\left [ 3 \right ] $ & Devise an efficient algorithm for selection of the radix point to handle inputs \\ |
5483 & that have different number of digits in Karatsuba multiplication. \\ | 5503 & that have different number of digits in Karatsuba multiplication. \\ |
5484 & \\ | 5504 & \\ |
5485 $\left [ 3 \right ] $ & In section 5.3 the fact that every column of a squaring is made up \\ | 5505 $\left [ 2 \right ] $ & In section 5.3 the fact that every column of a squaring is made up \\ |
5486 & of double products and at most one square is stated. Prove this statement. \\ | 5506 & of double products and at most one square is stated. Prove this statement. \\ |
5487 & \\ | 5507 & \\ |
5488 $\left [ 2 \right ] $ & In the Comba squaring algorithm half of the $\hat X$ variables are not used. \\ | |
5489 & Revise algorithm fast\_s\_mp\_sqr to shrink the $\hat X$ array. \\ | |
5490 & \\ | |
5491 $\left [ 3 \right ] $ & Prove the equation for Karatsuba squaring. \\ | 5508 $\left [ 3 \right ] $ & Prove the equation for Karatsuba squaring. \\ |
5492 & \\ | 5509 & \\ |
5493 $\left [ 1 \right ] $ & Prove that Karatsuba squaring requires $O \left (n^{lg(3)} \right )$ time. \\ | 5510 $\left [ 1 \right ] $ & Prove that Karatsuba squaring requires $O \left (n^{lg(3)} \right )$ time. \\ |
5494 & \\ | 5511 & \\ |
5495 $\left [ 2 \right ] $ & Determine the minimal ratio between addition and multiplication clock cycles \\ | 5512 $\left [ 2 \right ] $ & Determine the minimal ratio between addition and multiplication clock cycles \\ |
5496 & required for equation $6.7$ to be true. \\ | 5513 & required for equation $6.7$ to be true. \\ |
5497 & \\ | 5514 & \\ |
5515 $\left [ 3 \right ] $ & Implement a threaded version of Comba multiplication (and squaring) where you \\ | |
5516 & compute subsets of the columns in each thread. Determine a cutoff point where \\ | |
5517 & it is effective and add the logic to mp\_mul() and mp\_sqr(). \\ | |
5518 &\\ | |
5519 $\left [ 4 \right ] $ & Same as the previous but also modify the Karatsuba and Toom-Cook. You must \\ | |
5520 & increase the throughput of mp\_exptmod() for random odd moduli in the range \\ | |
5521 & $512 \ldots 4096$ bits significantly ($> 2x$) to complete this challenge. \\ | |
5522 & \\ | |
5498 \end{tabular} | 5523 \end{tabular} |
5499 | 5524 |
5500 \chapter{Modular Reduction} | 5525 \chapter{Modular Reduction} |
5501 \section{Basics of Modular Reduction} | 5526 \section{Basics of Modular Reduction} |
5502 \index{modular residue} | 5527 \index{modular residue} |
5511 other forms of residues. | 5536 other forms of residues. |
5512 | 5537 |
5513 Modular reductions are normally used to create either finite groups, rings or fields. The most common usage for performance driven modular reductions | 5538 Modular reductions are normally used to create either finite groups, rings or fields. The most common usage for performance driven modular reductions |
5514 is in modular exponentiation algorithms. That is to compute $d = a^b \mbox{ (mod }c\mbox{)}$ as fast as possible. This operation is used in the | 5539 is in modular exponentiation algorithms. That is to compute $d = a^b \mbox{ (mod }c\mbox{)}$ as fast as possible. This operation is used in the |
5515 RSA and Diffie-Hellman public key algorithms, for example. Modular multiplication and squaring also appears as a fundamental operation in | 5540 RSA and Diffie-Hellman public key algorithms, for example. Modular multiplication and squaring also appears as a fundamental operation in |
5516 Elliptic Curve cryptographic algorithms. As will be discussed in the subsequent chapter there exist fast algorithms for computing modular | 5541 elliptic curve cryptographic algorithms. As will be discussed in the subsequent chapter there exist fast algorithms for computing modular |
5517 exponentiations without having to perform (\textit{in this example}) $b - 1$ multiplications. These algorithms will produce partial results in the | 5542 exponentiations without having to perform (\textit{in this example}) $b - 1$ multiplications. These algorithms will produce partial results in the |
5518 range $0 \le x < c^2$ which can be taken advantage of to create several efficient algorithms. They have also been used to create redundancy check | 5543 range $0 \le x < c^2$ which can be taken advantage of to create several efficient algorithms. They have also been used to create redundancy check |
5519 algorithms known as CRCs, error correction codes such as Reed-Solomon and solve a variety of number theoeretic problems. | 5544 algorithms known as CRCs, error correction codes such as Reed-Solomon and solve a variety of number theoeretic problems. |
5520 | 5545 |
5521 \section{The Barrett Reduction} | 5546 \section{The Barrett Reduction} |
5725 016 | 5750 016 |
5726 017 /* reduces x mod m, assumes 0 < x < m**2, mu is | 5751 017 /* reduces x mod m, assumes 0 < x < m**2, mu is |
5727 018 * precomputed via mp_reduce_setup. | 5752 018 * precomputed via mp_reduce_setup. |
5728 019 * From HAC pp.604 Algorithm 14.42 | 5753 019 * From HAC pp.604 Algorithm 14.42 |
5729 020 */ | 5754 020 */ |
5730 021 int | 5755 021 int mp_reduce (mp_int * x, mp_int * m, mp_int * mu) |
5731 022 mp_reduce (mp_int * x, mp_int * m, mp_int * mu) | 5756 022 \{ |
5732 023 \{ | 5757 023 mp_int q; |
5733 024 mp_int q; | 5758 024 int res, um = m->used; |
5734 025 int res, um = m->used; | 5759 025 |
5735 026 | 5760 026 /* q = x */ |
5736 027 /* q = x */ | 5761 027 if ((res = mp_init_copy (&q, x)) != MP_OKAY) \{ |
5737 028 if ((res = mp_init_copy (&q, x)) != MP_OKAY) \{ | 5762 028 return res; |
5738 029 return res; | 5763 029 \} |
5739 030 \} | 5764 030 |
5740 031 | 5765 031 /* q1 = x / b**(k-1) */ |
5741 032 /* q1 = x / b**(k-1) */ | 5766 032 mp_rshd (&q, um - 1); |
5742 033 mp_rshd (&q, um - 1); | 5767 033 |
5743 034 | 5768 034 /* according to HAC this optimization is ok */ |
5744 035 /* according to HAC this optimization is ok */ | 5769 035 if (((unsigned long) um) > (((mp_digit)1) << (DIGIT_BIT - 1))) \{ |
5745 036 if (((unsigned long) um) > (((mp_digit)1) << (DIGIT_BIT - 1))) \{ | 5770 036 if ((res = mp_mul (&q, mu, &q)) != MP_OKAY) \{ |
5746 037 if ((res = mp_mul (&q, mu, &q)) != MP_OKAY) \{ | 5771 037 goto CLEANUP; |
5747 038 goto CLEANUP; | 5772 038 \} |
5748 039 \} | 5773 039 \} else \{ |
5749 040 \} else \{ | 5774 040 #ifdef BN_S_MP_MUL_HIGH_DIGS_C |
5750 041 #ifdef BN_S_MP_MUL_HIGH_DIGS_C | 5775 041 if ((res = s_mp_mul_high_digs (&q, mu, &q, um)) != MP_OKAY) \{ |
5751 042 if ((res = s_mp_mul_high_digs (&q, mu, &q, um - 1)) != MP_OKAY) \{ | 5776 042 goto CLEANUP; |
5752 043 goto CLEANUP; | 5777 043 \} |
5753 044 \} | 5778 044 #elif defined(BN_FAST_S_MP_MUL_HIGH_DIGS_C) |
5754 045 #elif defined(BN_FAST_S_MP_MUL_HIGH_DIGS_C) | 5779 045 if ((res = fast_s_mp_mul_high_digs (&q, mu, &q, um)) != MP_OKAY) \{ |
5755 046 if ((res = fast_s_mp_mul_high_digs (&q, mu, &q, um - 1)) != MP_OKAY) \{ | 5780 046 goto CLEANUP; |
5756 047 goto CLEANUP; | 5781 047 \} |
5757 048 \} | 5782 048 #else |
5758 049 #else | 5783 049 \{ |
5759 050 \{ | 5784 050 res = MP_VAL; |
5760 051 res = MP_VAL; | 5785 051 goto CLEANUP; |
5761 052 goto CLEANUP; | 5786 052 \} |
5762 053 \} | 5787 053 #endif |
5763 054 #endif | 5788 054 \} |
5764 055 \} | 5789 055 |
5765 056 | 5790 056 /* q3 = q2 / b**(k+1) */ |
5766 057 /* q3 = q2 / b**(k+1) */ | 5791 057 mp_rshd (&q, um + 1); |
5767 058 mp_rshd (&q, um + 1); | 5792 058 |
5768 059 | 5793 059 /* x = x mod b**(k+1), quick (no division) */ |
5769 060 /* x = x mod b**(k+1), quick (no division) */ | 5794 060 if ((res = mp_mod_2d (x, DIGIT_BIT * (um + 1), x)) != MP_OKAY) \{ |
5770 061 if ((res = mp_mod_2d (x, DIGIT_BIT * (um + 1), x)) != MP_OKAY) \{ | 5795 061 goto CLEANUP; |
5771 062 goto CLEANUP; | 5796 062 \} |
5772 063 \} | 5797 063 |
5773 064 | 5798 064 /* q = q * m mod b**(k+1), quick (no division) */ |
5774 065 /* q = q * m mod b**(k+1), quick (no division) */ | 5799 065 if ((res = s_mp_mul_digs (&q, m, &q, um + 1)) != MP_OKAY) \{ |
5775 066 if ((res = s_mp_mul_digs (&q, m, &q, um + 1)) != MP_OKAY) \{ | 5800 066 goto CLEANUP; |
5776 067 goto CLEANUP; | 5801 067 \} |
5777 068 \} | 5802 068 |
5778 069 | 5803 069 /* x = x - q */ |
5779 070 /* x = x - q */ | 5804 070 if ((res = mp_sub (x, &q, x)) != MP_OKAY) \{ |
5780 071 if ((res = mp_sub (x, &q, x)) != MP_OKAY) \{ | 5805 071 goto CLEANUP; |
5781 072 goto CLEANUP; | 5806 072 \} |
5782 073 \} | 5807 073 |
5783 074 | 5808 074 /* If x < 0, add b**(k+1) to it */ |
5784 075 /* If x < 0, add b**(k+1) to it */ | 5809 075 if (mp_cmp_d (x, 0) == MP_LT) \{ |
5785 076 if (mp_cmp_d (x, 0) == MP_LT) \{ | 5810 076 mp_set (&q, 1); |
5786 077 mp_set (&q, 1); | 5811 077 if ((res = mp_lshd (&q, um + 1)) != MP_OKAY) |
5787 078 if ((res = mp_lshd (&q, um + 1)) != MP_OKAY) | 5812 078 goto CLEANUP; |
5788 079 goto CLEANUP; | 5813 079 if ((res = mp_add (x, &q, x)) != MP_OKAY) |
5789 080 if ((res = mp_add (x, &q, x)) != MP_OKAY) | 5814 080 goto CLEANUP; |
5790 081 goto CLEANUP; | 5815 081 \} |
5791 082 \} | 5816 082 |
5792 083 | 5817 083 /* Back off if it's too big */ |
5793 084 /* Back off if it's too big */ | 5818 084 while (mp_cmp (x, m) != MP_LT) \{ |
5794 085 while (mp_cmp (x, m) != MP_LT) \{ | 5819 085 if ((res = s_mp_sub (x, m, x)) != MP_OKAY) \{ |
5795 086 if ((res = s_mp_sub (x, m, x)) != MP_OKAY) \{ | 5820 086 goto CLEANUP; |
5796 087 goto CLEANUP; | 5821 087 \} |
5797 088 \} | 5822 088 \} |
5798 089 \} | 5823 089 |
5799 090 | 5824 090 CLEANUP: |
5800 091 CLEANUP: | 5825 091 mp_clear (&q); |
5801 092 mp_clear (&q); | 5826 092 |
5802 093 | 5827 093 return res; |
5803 094 return res; | 5828 094 \} |
5804 095 \} | 5829 095 #endif |
5805 096 #endif | |
5806 \end{alltt} | 5830 \end{alltt} |
5807 \end{small} | 5831 \end{small} |
5808 | 5832 |
5809 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves | 5833 The first multiplication that determines the quotient can be performed by only producing the digits from $m - 1$ and up. This essentially halves |
5810 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits | 5834 the number of single precision multiplications required. However, the optimization is only safe if $\beta$ is much larger than the number of digits |
5811 in the modulus. In the source code this is evaluated on lines 36 to 44 where algorithm s\_mp\_mul\_high\_digs is used when it is | 5835 in the modulus. In the source code this is evaluated on lines 36 to 43 where algorithm s\_mp\_mul\_high\_digs is used when it is |
5812 safe to do so. | 5836 safe to do so. |
5813 | 5837 |
5814 \subsection{The Barrett Setup Algorithm} | 5838 \subsection{The Barrett Setup Algorithm} |
5815 In order to use algorithm mp\_reduce the value of $\mu$ must be calculated in advance. Ideally this value should be computed once and stored for | 5839 In order to use algorithm mp\_reduce the value of $\mu$ must be calculated in advance. Ideally this value should be computed once and stored for |
5816 future use so that the Barrett algorithm can be used without delay. | 5840 future use so that the Barrett algorithm can be used without delay. |
5817 | 5841 |
5818 \begin{figure}[!here] | 5842 \newpage\begin{figure}[!here] |
5819 \begin{small} | 5843 \begin{small} |
5820 \begin{center} | 5844 \begin{center} |
5821 \begin{tabular}{l} | 5845 \begin{tabular}{l} |
5822 \hline Algorithm \textbf{mp\_reduce\_setup}. \\ | 5846 \hline Algorithm \textbf{mp\_reduce\_setup}. \\ |
5823 \textbf{Input}. mp\_int $a$ ($a > 1$) \\ | 5847 \textbf{Input}. mp\_int $a$ ($a > 1$) \\ |
6309 020 * which uses the comba method to quickly calculate the columns of the | 6333 020 * which uses the comba method to quickly calculate the columns of the |
6310 021 * reduction. | 6334 021 * reduction. |
6311 022 * | 6335 022 * |
6312 023 * Based on Algorithm 14.32 on pp.601 of HAC. | 6336 023 * Based on Algorithm 14.32 on pp.601 of HAC. |
6313 024 */ | 6337 024 */ |
6314 025 int | 6338 025 int fast_mp_montgomery_reduce (mp_int * x, mp_int * n, mp_digit rho) |
6315 026 fast_mp_montgomery_reduce (mp_int * x, mp_int * n, mp_digit rho) | 6339 026 \{ |
6316 027 \{ | 6340 027 int ix, res, olduse; |
6317 028 int ix, res, olduse; | 6341 028 mp_word W[MP_WARRAY]; |
6318 029 mp_word W[MP_WARRAY]; | 6342 029 |
6319 030 | 6343 030 /* get old used count */ |
6320 031 /* get old used count */ | 6344 031 olduse = x->used; |
6321 032 olduse = x->used; | 6345 032 |
6322 033 | 6346 033 /* grow a as required */ |
6323 034 /* grow a as required */ | 6347 034 if (x->alloc < n->used + 1) \{ |
6324 035 if (x->alloc < n->used + 1) \{ | 6348 035 if ((res = mp_grow (x, n->used + 1)) != MP_OKAY) \{ |
6325 036 if ((res = mp_grow (x, n->used + 1)) != MP_OKAY) \{ | 6349 036 return res; |
6326 037 return res; | 6350 037 \} |
6327 038 \} | 6351 038 \} |
6328 039 \} | 6352 039 |
6329 040 | 6353 040 /* first we have to get the digits of the input into |
6330 041 /* first we have to get the digits of the input into | 6354 041 * an array of double precision words W[...] |
6331 042 * an array of double precision words W[...] | 6355 042 */ |
6332 043 */ | 6356 043 \{ |
6333 044 \{ | 6357 044 register mp_word *_W; |
6334 045 register mp_word *_W; | 6358 045 register mp_digit *tmpx; |
6335 046 register mp_digit *tmpx; | 6359 046 |
6336 047 | 6360 047 /* alias for the W[] array */ |
6337 048 /* alias for the W[] array */ | 6361 048 _W = W; |
6338 049 _W = W; | 6362 049 |
6339 050 | 6363 050 /* alias for the digits of x*/ |
6340 051 /* alias for the digits of x*/ | 6364 051 tmpx = x->dp; |
6341 052 tmpx = x->dp; | 6365 052 |
6342 053 | 6366 053 /* copy the digits of a into W[0..a->used-1] */ |
6343 054 /* copy the digits of a into W[0..a->used-1] */ | 6367 054 for (ix = 0; ix < x->used; ix++) \{ |
6344 055 for (ix = 0; ix < x->used; ix++) \{ | 6368 055 *_W++ = *tmpx++; |
6345 056 *_W++ = *tmpx++; | 6369 056 \} |
6346 057 \} | 6370 057 |
6347 058 | 6371 058 /* zero the high words of W[a->used..m->used*2] */ |
6348 059 /* zero the high words of W[a->used..m->used*2] */ | 6372 059 for (; ix < n->used * 2 + 1; ix++) \{ |
6349 060 for (; ix < n->used * 2 + 1; ix++) \{ | 6373 060 *_W++ = 0; |
6350 061 *_W++ = 0; | 6374 061 \} |
6351 062 \} | 6375 062 \} |
6352 063 \} | 6376 063 |
6353 064 | 6377 064 /* now we proceed to zero successive digits |
6354 065 /* now we proceed to zero successive digits | 6378 065 * from the least significant upwards |
6355 066 * from the least significant upwards | 6379 066 */ |
6356 067 */ | 6380 067 for (ix = 0; ix < n->used; ix++) \{ |
6357 068 for (ix = 0; ix < n->used; ix++) \{ | 6381 068 /* mu = ai * m' mod b |
6358 069 /* mu = ai * m' mod b | 6382 069 * |
6359 070 * | 6383 070 * We avoid a double precision multiplication (which isn't required) |
6360 071 * We avoid a double precision multiplication (which isn't required) | 6384 071 * by casting the value down to a mp_digit. Note this requires |
6361 072 * by casting the value down to a mp_digit. Note this requires | 6385 072 * that W[ix-1] have the carry cleared (see after the inner loop) |
6362 073 * that W[ix-1] have the carry cleared (see after the inner loop) | 6386 073 */ |
6363 074 */ | 6387 074 register mp_digit mu; |
6364 075 register mp_digit mu; | 6388 075 mu = (mp_digit) (((W[ix] & MP_MASK) * rho) & MP_MASK); |
6365 076 mu = (mp_digit) (((W[ix] & MP_MASK) * rho) & MP_MASK); | 6389 076 |
6366 077 | 6390 077 /* a = a + mu * m * b**i |
6367 078 /* a = a + mu * m * b**i | 6391 078 * |
6368 079 * | 6392 079 * This is computed in place and on the fly. The multiplication |
6369 080 * This is computed in place and on the fly. The multiplication | 6393 080 * by b**i is handled by offseting which columns the results |
6370 081 * by b**i is handled by offseting which columns the results | 6394 081 * are added to. |
6371 082 * are added to. | 6395 082 * |
6372 083 * | 6396 083 * Note the comba method normally doesn't handle carries in the |
6373 084 * Note the comba method normally doesn't handle carries in the | 6397 084 * inner loop In this case we fix the carry from the previous |
6374 085 * inner loop In this case we fix the carry from the previous | 6398 085 * column since the Montgomery reduction requires digits of the |
6375 086 * column since the Montgomery reduction requires digits of the | 6399 086 * result (so far) [see above] to work. This is |
6376 087 * result (so far) [see above] to work. This is | 6400 087 * handled by fixing up one carry after the inner loop. The |
6377 088 * handled by fixing up one carry after the inner loop. The | 6401 088 * carry fixups are done in order so after these loops the |
6378 089 * carry fixups are done in order so after these loops the | 6402 089 * first m->used words of W[] have the carries fixed |
6379 090 * first m->used words of W[] have the carries fixed | 6403 090 */ |
6380 091 */ | 6404 091 \{ |
6381 092 \{ | 6405 092 register int iy; |
6382 093 register int iy; | 6406 093 register mp_digit *tmpn; |
6383 094 register mp_digit *tmpn; | 6407 094 register mp_word *_W; |
6384 095 register mp_word *_W; | 6408 095 |
6385 096 | 6409 096 /* alias for the digits of the modulus */ |
6386 097 /* alias for the digits of the modulus */ | 6410 097 tmpn = n->dp; |
6387 098 tmpn = n->dp; | 6411 098 |
6388 099 | 6412 099 /* Alias for the columns set by an offset of ix */ |
6389 100 /* Alias for the columns set by an offset of ix */ | 6413 100 _W = W + ix; |
6390 101 _W = W + ix; | 6414 101 |
6391 102 | 6415 102 /* inner loop */ |
6392 103 /* inner loop */ | 6416 103 for (iy = 0; iy < n->used; iy++) \{ |
6393 104 for (iy = 0; iy < n->used; iy++) \{ | 6417 104 *_W++ += ((mp_word)mu) * ((mp_word)*tmpn++); |
6394 105 *_W++ += ((mp_word)mu) * ((mp_word)*tmpn++); | 6418 105 \} |
6395 106 \} | 6419 106 \} |
6396 107 \} | 6420 107 |
6397 108 | 6421 108 /* now fix carry for next digit, W[ix+1] */ |
6398 109 /* now fix carry for next digit, W[ix+1] */ | 6422 109 W[ix + 1] += W[ix] >> ((mp_word) DIGIT_BIT); |
6399 110 W[ix + 1] += W[ix] >> ((mp_word) DIGIT_BIT); | 6423 110 \} |
6400 111 \} | 6424 111 |
6401 112 | 6425 112 /* now we have to propagate the carries and |
6402 113 /* now we have to propagate the carries and | 6426 113 * shift the words downward [all those least |
6403 114 * shift the words downward [all those least | 6427 114 * significant digits we zeroed]. |
6404 115 * significant digits we zeroed]. | 6428 115 */ |
6405 116 */ | 6429 116 \{ |
6406 117 \{ | 6430 117 register mp_digit *tmpx; |
6407 118 register mp_digit *tmpx; | 6431 118 register mp_word *_W, *_W1; |
6408 119 register mp_word *_W, *_W1; | 6432 119 |
6409 120 | 6433 120 /* nox fix rest of carries */ |
6410 121 /* nox fix rest of carries */ | 6434 121 |
6411 122 | 6435 122 /* alias for current word */ |
6412 123 /* alias for current word */ | 6436 123 _W1 = W + ix; |
6413 124 _W1 = W + ix; | 6437 124 |
6414 125 | 6438 125 /* alias for next word, where the carry goes */ |
6415 126 /* alias for next word, where the carry goes */ | 6439 126 _W = W + ++ix; |
6416 127 _W = W + ++ix; | 6440 127 |
6417 128 | 6441 128 for (; ix <= n->used * 2 + 1; ix++) \{ |
6418 129 for (; ix <= n->used * 2 + 1; ix++) \{ | 6442 129 *_W++ += *_W1++ >> ((mp_word) DIGIT_BIT); |
6419 130 *_W++ += *_W1++ >> ((mp_word) DIGIT_BIT); | 6443 130 \} |
6420 131 \} | 6444 131 |
6421 132 | 6445 132 /* copy out, A = A/b**n |
6422 133 /* copy out, A = A/b**n | 6446 133 * |
6423 134 * | 6447 134 * The result is A/b**n but instead of converting from an |
6424 135 * The result is A/b**n but instead of converting from an | 6448 135 * array of mp_word to mp_digit than calling mp_rshd |
6425 136 * array of mp_word to mp_digit than calling mp_rshd | 6449 136 * we just copy them in the right order |
6426 137 * we just copy them in the right order | 6450 137 */ |
6427 138 */ | 6451 138 |
6428 139 | 6452 139 /* alias for destination word */ |
6429 140 /* alias for destination word */ | 6453 140 tmpx = x->dp; |
6430 141 tmpx = x->dp; | 6454 141 |
6431 142 | 6455 142 /* alias for shifted double precision result */ |
6432 143 /* alias for shifted double precision result */ | 6456 143 _W = W + n->used; |
6433 144 _W = W + n->used; | 6457 144 |
6434 145 | 6458 145 for (ix = 0; ix < n->used + 1; ix++) \{ |
6435 146 for (ix = 0; ix < n->used + 1; ix++) \{ | 6459 146 *tmpx++ = (mp_digit)(*_W++ & ((mp_word) MP_MASK)); |
6436 147 *tmpx++ = (mp_digit)(*_W++ & ((mp_word) MP_MASK)); | 6460 147 \} |
6437 148 \} | 6461 148 |
6438 149 | 6462 149 /* zero oldused digits, if the input a was larger than |
6439 150 /* zero oldused digits, if the input a was larger than | 6463 150 * m->used+1 we'll have to clear the digits |
6440 151 * m->used+1 we'll have to clear the digits | 6464 151 */ |
6441 152 */ | 6465 152 for (; ix < olduse; ix++) \{ |
6442 153 for (; ix < olduse; ix++) \{ | 6466 153 *tmpx++ = 0; |
6443 154 *tmpx++ = 0; | 6467 154 \} |
6444 155 \} | 6468 155 \} |
6445 156 \} | 6469 156 |
6446 157 | 6470 157 /* set the max used and clamp */ |
6447 158 /* set the max used and clamp */ | 6471 158 x->used = n->used + 1; |
6448 159 x->used = n->used + 1; | 6472 159 mp_clamp (x); |
6449 160 mp_clamp (x); | 6473 160 |
6450 161 | 6474 161 /* if A >= m then A = A - m */ |
6451 162 /* if A >= m then A = A - m */ | 6475 162 if (mp_cmp_mag (x, n) != MP_LT) \{ |
6452 163 if (mp_cmp_mag (x, n) != MP_LT) \{ | 6476 163 return s_mp_sub (x, n, x); |
6453 164 return s_mp_sub (x, n, x); | 6477 164 \} |
6454 165 \} | 6478 165 return MP_OKAY; |
6455 166 return MP_OKAY; | 6479 166 \} |
6456 167 \} | 6480 167 #endif |
6457 168 #endif | |
6458 \end{alltt} | 6481 \end{alltt} |
6459 \end{small} | 6482 \end{small} |
6460 | 6483 |
6461 The $\hat W$ array is first filled with digits of $x$ on line 48 then the rest of the digits are zeroed on line 55. Both loops share | 6484 The $\hat W$ array is first filled with digits of $x$ on line 50 then the rest of the digits are zeroed on line 54. Both loops share |
6462 the same alias variables to make the code easier to read. | 6485 the same alias variables to make the code easier to read. |
6463 | 6486 |
6464 The value of $\mu$ is calculated in an interesting fashion. First the value $\hat W_{ix}$ is reduced modulo $\beta$ and cast to a mp\_digit. This | 6487 The value of $\mu$ is calculated in an interesting fashion. First the value $\hat W_{ix}$ is reduced modulo $\beta$ and cast to a mp\_digit. This |
6465 forces the compiler to use a single precision multiplication and prevents any concerns about loss of precision. Line 110 fixes the carry | 6488 forces the compiler to use a single precision multiplication and prevents any concerns about loss of precision. Line 109 fixes the carry |
6466 for the next iteration of the loop by propagating the carry from $\hat W_{ix}$ to $\hat W_{ix+1}$. | 6489 for the next iteration of the loop by propagating the carry from $\hat W_{ix}$ to $\hat W_{ix+1}$. |
6467 | 6490 |
6468 The for loop on line 109 propagates the rest of the carries upwards through the columns. The for loop on line 126 reduces the columns | 6491 The for loop on line 108 propagates the rest of the carries upwards through the columns. The for loop on line 125 reduces the columns |
6469 modulo $\beta$ and shifts them $k$ places at the same time. The alias $\_ \hat W$ actually refers to the array $\hat W$ starting at the $n.used$'th | 6492 modulo $\beta$ and shifts them $k$ places at the same time. The alias $\_ \hat W$ actually refers to the array $\hat W$ starting at the $n.used$'th |
6470 digit, that is $\_ \hat W_{t} = \hat W_{n.used + t}$. | 6493 digit, that is $\_ \hat W_{t} = \hat W_{n.used + t}$. |
6471 | 6494 |
6472 \subsection{Montgomery Setup} | 6495 \subsection{Montgomery Setup} |
6473 To calculate the variable $\rho$ a relatively simple algorithm will be required. | 6496 To calculate the variable $\rho$ a relatively simple algorithm will be required. |
6737 017 /* reduce "x" in place modulo "n" using the Diminished Radix algorithm. | 6760 017 /* reduce "x" in place modulo "n" using the Diminished Radix algorithm. |
6738 018 * | 6761 018 * |
6739 019 * Based on algorithm from the paper | 6762 019 * Based on algorithm from the paper |
6740 020 * | 6763 020 * |
6741 021 * "Generating Efficient Primes for Discrete Log Cryptosystems" | 6764 021 * "Generating Efficient Primes for Discrete Log Cryptosystems" |
6742 022 * Chae Hoon Lim, Pil Loong Lee, | 6765 022 * Chae Hoon Lim, Pil Joong Lee, |
6743 023 * POSTECH Information Research Laboratories | 6766 023 * POSTECH Information Research Laboratories |
6744 024 * | 6767 024 * |
6745 025 * The modulus must be of a special format [see manual] | 6768 025 * The modulus must be of a special format [see manual] |
6746 026 * | 6769 026 * |
6747 027 * Has been modified to use algorithm 7.10 from the LTM book instead | 6770 027 * Has been modified to use algorithm 7.10 from the LTM book instead |
6963 \hspace{-5.1mm}{\bf File}: bn\_mp\_reduce\_2k.c | 6986 \hspace{-5.1mm}{\bf File}: bn\_mp\_reduce\_2k.c |
6964 \vspace{-3mm} | 6987 \vspace{-3mm} |
6965 \begin{alltt} | 6988 \begin{alltt} |
6966 016 | 6989 016 |
6967 017 /* reduces a modulo n where n is of the form 2**p - d */ | 6990 017 /* reduces a modulo n where n is of the form 2**p - d */ |
6968 018 int | 6991 018 int mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d) |
6969 019 mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d) | 6992 019 \{ |
6970 020 \{ | 6993 020 mp_int q; |
6971 021 mp_int q; | 6994 021 int p, res; |
6972 022 int p, res; | 6995 022 |
6973 023 | 6996 023 if ((res = mp_init(&q)) != MP_OKAY) \{ |
6974 024 if ((res = mp_init(&q)) != MP_OKAY) \{ | 6997 024 return res; |
6975 025 return res; | 6998 025 \} |
6976 026 \} | 6999 026 |
6977 027 | 7000 027 p = mp_count_bits(n); |
6978 028 p = mp_count_bits(n); | 7001 028 top: |
6979 029 top: | 7002 029 /* q = a/2**p, a = a mod 2**p */ |
6980 030 /* q = a/2**p, a = a mod 2**p */ | 7003 030 if ((res = mp_div_2d(a, p, &q, a)) != MP_OKAY) \{ |
6981 031 if ((res = mp_div_2d(a, p, &q, a)) != MP_OKAY) \{ | 7004 031 goto ERR; |
6982 032 goto ERR; | 7005 032 \} |
6983 033 \} | 7006 033 |
6984 034 | 7007 034 if (d != 1) \{ |
6985 035 if (d != 1) \{ | 7008 035 /* q = q * d */ |
6986 036 /* q = q * d */ | 7009 036 if ((res = mp_mul_d(&q, d, &q)) != MP_OKAY) \{ |
6987 037 if ((res = mp_mul_d(&q, d, &q)) != MP_OKAY) \{ | 7010 037 goto ERR; |
6988 038 goto ERR; | 7011 038 \} |
6989 039 \} | 7012 039 \} |
6990 040 \} | 7013 040 |
6991 041 | 7014 041 /* a = a + q */ |
6992 042 /* a = a + q */ | 7015 042 if ((res = s_mp_add(a, &q, a)) != MP_OKAY) \{ |
6993 043 if ((res = s_mp_add(a, &q, a)) != MP_OKAY) \{ | 7016 043 goto ERR; |
6994 044 goto ERR; | 7017 044 \} |
6995 045 \} | 7018 045 |
6996 046 | 7019 046 if (mp_cmp_mag(a, n) != MP_LT) \{ |
6997 047 if (mp_cmp_mag(a, n) != MP_LT) \{ | 7020 047 s_mp_sub(a, n, a); |
6998 048 s_mp_sub(a, n, a); | 7021 048 goto top; |
6999 049 goto top; | 7022 049 \} |
7000 050 \} | 7023 050 |
7001 051 | 7024 051 ERR: |
7002 052 ERR: | 7025 052 mp_clear(&q); |
7003 053 mp_clear(&q); | 7026 053 return res; |
7004 054 return res; | 7027 054 \} |
7005 055 \} | 7028 055 |
7006 056 | 7029 056 #endif |
7007 057 #endif | |
7008 \end{alltt} | 7030 \end{alltt} |
7009 \end{small} | 7031 \end{small} |
7010 | 7032 |
7011 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d | 7033 The algorithm mp\_count\_bits calculates the number of bits in an mp\_int which is used to find the initial value of $p$. The call to mp\_div\_2d |
7012 on line 31 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size | 7034 on line 30 calculates both the quotient $q$ and the remainder $a$ required. By doing both in a single function call the code size |
7013 is kept fairly small. The multiplication by $k$ is only performed if $k > 1$. This allows reductions modulo $2^p - 1$ to be performed without | 7035 is kept fairly small. The multiplication by $k$ is only performed if $k > 1$. This allows reductions modulo $2^p - 1$ to be performed without |
7014 any multiplications. | 7036 any multiplications. |
7015 | 7037 |
7016 The unsigned s\_mp\_add, mp\_cmp\_mag and s\_mp\_sub are used in place of their full sign counterparts since the inputs are only valid if they are | 7038 The unsigned s\_mp\_add, mp\_cmp\_mag and s\_mp\_sub are used in place of their full sign counterparts since the inputs are only valid if they are |
7017 positive. By using the unsigned versions the overhead is kept to a minimum. | 7039 positive. By using the unsigned versions the overhead is kept to a minimum. |
7047 \hspace{-5.1mm}{\bf File}: bn\_mp\_reduce\_2k\_setup.c | 7069 \hspace{-5.1mm}{\bf File}: bn\_mp\_reduce\_2k\_setup.c |
7048 \vspace{-3mm} | 7070 \vspace{-3mm} |
7049 \begin{alltt} | 7071 \begin{alltt} |
7050 016 | 7072 016 |
7051 017 /* determines the setup value */ | 7073 017 /* determines the setup value */ |
7052 018 int | 7074 018 int mp_reduce_2k_setup(mp_int *a, mp_digit *d) |
7053 019 mp_reduce_2k_setup(mp_int *a, mp_digit *d) | 7075 019 \{ |
7054 020 \{ | 7076 020 int res, p; |
7055 021 int res, p; | 7077 021 mp_int tmp; |
7056 022 mp_int tmp; | 7078 022 |
7057 023 | 7079 023 if ((res = mp_init(&tmp)) != MP_OKAY) \{ |
7058 024 if ((res = mp_init(&tmp)) != MP_OKAY) \{ | 7080 024 return res; |
7059 025 return res; | 7081 025 \} |
7060 026 \} | 7082 026 |
7061 027 | 7083 027 p = mp_count_bits(a); |
7062 028 p = mp_count_bits(a); | 7084 028 if ((res = mp_2expt(&tmp, p)) != MP_OKAY) \{ |
7063 029 if ((res = mp_2expt(&tmp, p)) != MP_OKAY) \{ | 7085 029 mp_clear(&tmp); |
7064 030 mp_clear(&tmp); | 7086 030 return res; |
7065 031 return res; | 7087 031 \} |
7066 032 \} | 7088 032 |
7067 033 | 7089 033 if ((res = s_mp_sub(&tmp, a, &tmp)) != MP_OKAY) \{ |
7068 034 if ((res = s_mp_sub(&tmp, a, &tmp)) != MP_OKAY) \{ | 7090 034 mp_clear(&tmp); |
7069 035 mp_clear(&tmp); | 7091 035 return res; |
7070 036 return res; | 7092 036 \} |
7071 037 \} | 7093 037 |
7072 038 | 7094 038 *d = tmp.dp[0]; |
7073 039 *d = tmp.dp[0]; | 7095 039 mp_clear(&tmp); |
7074 040 mp_clear(&tmp); | 7096 040 return MP_OKAY; |
7075 041 return MP_OKAY; | 7097 041 \} |
7076 042 \} | 7098 042 #endif |
7077 043 #endif | |
7078 \end{alltt} | 7099 \end{alltt} |
7079 \end{small} | 7100 \end{small} |
7080 | 7101 |
7081 \subsubsection{Unrestricted Detection} | 7102 \subsubsection{Unrestricted Detection} |
7082 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. | 7103 An integer $n$ is a valid unrestricted Diminished Radix modulus if either of the following are true. |
7125 019 \{ | 7146 019 \{ |
7126 020 int ix, iy, iw; | 7147 020 int ix, iy, iw; |
7127 021 mp_digit iz; | 7148 021 mp_digit iz; |
7128 022 | 7149 022 |
7129 023 if (a->used == 0) \{ | 7150 023 if (a->used == 0) \{ |
7130 024 return 0; | 7151 024 return MP_NO; |
7131 025 \} else if (a->used == 1) \{ | 7152 025 \} else if (a->used == 1) \{ |
7132 026 return 1; | 7153 026 return MP_YES; |
7133 027 \} else if (a->used > 1) \{ | 7154 027 \} else if (a->used > 1) \{ |
7134 028 iy = mp_count_bits(a); | 7155 028 iy = mp_count_bits(a); |
7135 029 iz = 1; | 7156 029 iz = 1; |
7136 030 iw = 1; | 7157 030 iw = 1; |
7137 031 | 7158 031 |
7138 032 /* Test every bit from the second digit up, must be 1 */ | 7159 032 /* Test every bit from the second digit up, must be 1 */ |
7139 033 for (ix = DIGIT_BIT; ix < iy; ix++) \{ | 7160 033 for (ix = DIGIT_BIT; ix < iy; ix++) \{ |
7140 034 if ((a->dp[iw] & iz) == 0) \{ | 7161 034 if ((a->dp[iw] & iz) == 0) \{ |
7141 035 return 0; | 7162 035 return MP_NO; |
7142 036 \} | 7163 036 \} |
7143 037 iz <<= 1; | 7164 037 iz <<= 1; |
7144 038 if (iz > (mp_digit)MP_MASK) \{ | 7165 038 if (iz > (mp_digit)MP_MASK) \{ |
7145 039 ++iw; | 7166 039 ++iw; |
7146 040 iz = 1; | 7167 040 iz = 1; |
7147 041 \} | 7168 041 \} |
7148 042 \} | 7169 042 \} |
7149 043 \} | 7170 043 \} |
7150 044 return 1; | 7171 044 return MP_YES; |
7151 045 \} | 7172 045 \} |
7152 046 | 7173 046 |
7153 047 #endif | 7174 047 #endif |
7154 \end{alltt} | 7175 \end{alltt} |
7155 \end{small} | 7176 \end{small} |
7592 058 err = mp_exptmod(&tmpG, &tmpX, P, Y); | 7613 058 err = mp_exptmod(&tmpG, &tmpX, P, Y); |
7593 059 mp_clear_multi(&tmpG, &tmpX, NULL); | 7614 059 mp_clear_multi(&tmpG, &tmpX, NULL); |
7594 060 return err; | 7615 060 return err; |
7595 061 #else | 7616 061 #else |
7596 062 /* no invmod */ | 7617 062 /* no invmod */ |
7597 063 return MP_VAL | 7618 063 return MP_VAL; |
7598 064 #endif | 7619 064 #endif |
7599 065 \} | 7620 065 \} |
7600 066 | 7621 066 |
7601 067 #ifdef BN_MP_DR_IS_MODULUS_C | 7622 067 /* modified diminished radix reduction */ |
7602 068 /* is it a DR modulus? */ | 7623 068 #if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) |
7603 069 dr = mp_dr_is_modulus(P); | 7624 069 if (mp_reduce_is_2k_l(P) == MP_YES) \{ |
7604 070 #else | 7625 070 return s_mp_exptmod(G, X, P, Y, 1); |
7605 071 dr = 0; | 7626 071 \} |
7606 072 #endif | 7627 072 #endif |
7607 073 | 7628 073 |
7608 074 #ifdef BN_MP_REDUCE_IS_2K_C | 7629 074 #ifdef BN_MP_DR_IS_MODULUS_C |
7609 075 /* if not, is it a uDR modulus? */ | 7630 075 /* is it a DR modulus? */ |
7610 076 if (dr == 0) \{ | 7631 076 dr = mp_dr_is_modulus(P); |
7611 077 dr = mp_reduce_is_2k(P) << 1; | 7632 077 #else |
7612 078 \} | 7633 078 /* default to no */ |
7613 079 #endif | 7634 079 dr = 0; |
7614 080 | 7635 080 #endif |
7615 081 /* if the modulus is odd or dr != 0 use the fast method */ | 7636 081 |
7616 082 #ifdef BN_MP_EXPTMOD_FAST_C | 7637 082 #ifdef BN_MP_REDUCE_IS_2K_C |
7617 083 if (mp_isodd (P) == 1 || dr != 0) \{ | 7638 083 /* if not, is it a unrestricted DR modulus? */ |
7618 084 return mp_exptmod_fast (G, X, P, Y, dr); | 7639 084 if (dr == 0) \{ |
7619 085 \} else \{ | 7640 085 dr = mp_reduce_is_2k(P) << 1; |
7620 086 #endif | 7641 086 \} |
7621 087 #ifdef BN_S_MP_EXPTMOD_C | 7642 087 #endif |
7622 088 /* otherwise use the generic Barrett reduction technique */ | 7643 088 |
7623 089 return s_mp_exptmod (G, X, P, Y); | 7644 089 /* if the modulus is odd or dr != 0 use the montgomery method */ |
7624 090 #else | 7645 090 #ifdef BN_MP_EXPTMOD_FAST_C |
7625 091 /* no exptmod for evens */ | 7646 091 if (mp_isodd (P) == 1 || dr != 0) \{ |
7626 092 return MP_VAL; | 7647 092 return mp_exptmod_fast (G, X, P, Y, dr); |
7627 093 #endif | 7648 093 \} else \{ |
7628 094 #ifdef BN_MP_EXPTMOD_FAST_C | 7649 094 #endif |
7629 095 \} | 7650 095 #ifdef BN_S_MP_EXPTMOD_C |
7630 096 #endif | 7651 096 /* otherwise use the generic Barrett reduction technique */ |
7631 097 \} | 7652 097 return s_mp_exptmod (G, X, P, Y, 0); |
7632 098 | 7653 098 #else |
7633 099 #endif | 7654 099 /* no exptmod for evens */ |
7655 100 return MP_VAL; | |
7656 101 #endif | |
7657 102 #ifdef BN_MP_EXPTMOD_FAST_C | |
7658 103 \} | |
7659 104 #endif | |
7660 105 \} | |
7661 106 | |
7662 107 #endif | |
7634 \end{alltt} | 7663 \end{alltt} |
7635 \end{small} | 7664 \end{small} |
7636 | 7665 |
7637 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is | 7666 In order to keep the algorithms in a known state the first step on line 28 is to reject any negative modulus as input. If the exponent is |
7638 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned | 7667 negative the algorithm tries to perform a modular exponentiation with the modular inverse of the base $G$. The temporary variable $tmpG$ is assigned |
7639 the modular inverse of $G$ and $tmpX$ is assigned the absolute value of $X$. The algorithm will recuse with these new values with a positive | 7668 the modular inverse of $G$ and $tmpX$ is assigned the absolute value of $X$. The algorithm will recuse with these new values with a positive |
7640 exponent. | 7669 exponent. |
7641 | 7670 |
7642 If the exponent is positive the algorithm resumes the exponentiation. Line 69 determines if the modulus is of the restricted Diminished Radix | 7671 If the exponent is positive the algorithm resumes the exponentiation. Line 76 determines if the modulus is of the restricted Diminished Radix |
7643 form. If it is not line 77 attempts to determine if it is of a unrestricted Diminished Radix form. The integer $dr$ will take on one | 7672 form. If it is not line 69 attempts to determine if it is of a unrestricted Diminished Radix form. The integer $dr$ will take on one |
7644 of three values. | 7673 of three values. |
7645 | 7674 |
7646 \begin{enumerate} | 7675 \begin{enumerate} |
7647 \item $dr = 0$ means that the modulus is not of either restricted or unrestricted Diminished Radix form. | 7676 \item $dr = 0$ means that the modulus is not of either restricted or unrestricted Diminished Radix form. |
7648 \item $dr = 1$ means that the modulus is of restricted Diminished Radix form. | 7677 \item $dr = 1$ means that the modulus is of restricted Diminished Radix form. |
7649 \item $dr = 2$ means that the modulus is of unrestricted Diminished Radix form. | 7678 \item $dr = 2$ means that the modulus is of unrestricted Diminished Radix form. |
7650 \end{enumerate} | 7679 \end{enumerate} |
7651 | 7680 |
7652 Line 67 determines if the fast modular exponentiation algorithm can be used. It is allowed if $dr \ne 0$ or if the modulus is odd. Otherwise, | 7681 Line 69 determines if the fast modular exponentiation algorithm can be used. It is allowed if $dr \ne 0$ or if the modulus is odd. Otherwise, |
7653 the slower s\_mp\_exptmod algorithm is used which uses Barrett reduction. | 7682 the slower s\_mp\_exptmod algorithm is used which uses Barrett reduction. |
7654 | 7683 |
7655 \subsection{Barrett Modular Exponentiation} | 7684 \subsection{Barrett Modular Exponentiation} |
7656 | 7685 |
7657 \newpage\begin{figure}[!here] | 7686 \newpage\begin{figure}[!here] |
7815 018 #define TAB_SIZE 32 | 7844 018 #define TAB_SIZE 32 |
7816 019 #else | 7845 019 #else |
7817 020 #define TAB_SIZE 256 | 7846 020 #define TAB_SIZE 256 |
7818 021 #endif | 7847 021 #endif |
7819 022 | 7848 022 |
7820 023 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) | 7849 023 int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmod |
7850 e) | |
7821 024 \{ | 7851 024 \{ |
7822 025 mp_int M[TAB_SIZE], res, mu; | 7852 025 mp_int M[TAB_SIZE], res, mu; |
7823 026 mp_digit buf; | 7853 026 mp_digit buf; |
7824 027 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; | 7854 027 int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; |
7825 028 | 7855 028 int (*redux)(mp_int*,mp_int*,mp_int*); |
7826 029 /* find window size */ | 7856 029 |
7827 030 x = mp_count_bits (X); | 7857 030 /* find window size */ |
7828 031 if (x <= 7) \{ | 7858 031 x = mp_count_bits (X); |
7829 032 winsize = 2; | 7859 032 if (x <= 7) \{ |
7830 033 \} else if (x <= 36) \{ | 7860 033 winsize = 2; |
7831 034 winsize = 3; | 7861 034 \} else if (x <= 36) \{ |
7832 035 \} else if (x <= 140) \{ | 7862 035 winsize = 3; |
7833 036 winsize = 4; | 7863 036 \} else if (x <= 140) \{ |
7834 037 \} else if (x <= 450) \{ | 7864 037 winsize = 4; |
7835 038 winsize = 5; | 7865 038 \} else if (x <= 450) \{ |
7836 039 \} else if (x <= 1303) \{ | 7866 039 winsize = 5; |
7837 040 winsize = 6; | 7867 040 \} else if (x <= 1303) \{ |
7838 041 \} else if (x <= 3529) \{ | 7868 041 winsize = 6; |
7839 042 winsize = 7; | 7869 042 \} else if (x <= 3529) \{ |
7840 043 \} else \{ | 7870 043 winsize = 7; |
7841 044 winsize = 8; | 7871 044 \} else \{ |
7842 045 \} | 7872 045 winsize = 8; |
7843 046 | 7873 046 \} |
7844 047 #ifdef MP_LOW_MEM | 7874 047 |
7845 048 if (winsize > 5) \{ | 7875 048 #ifdef MP_LOW_MEM |
7846 049 winsize = 5; | 7876 049 if (winsize > 5) \{ |
7847 050 \} | 7877 050 winsize = 5; |
7848 051 #endif | 7878 051 \} |
7849 052 | 7879 052 #endif |
7850 053 /* init M array */ | 7880 053 |
7851 054 /* init first cell */ | 7881 054 /* init M array */ |
7852 055 if ((err = mp_init(&M[1])) != MP_OKAY) \{ | 7882 055 /* init first cell */ |
7853 056 return err; | 7883 056 if ((err = mp_init(&M[1])) != MP_OKAY) \{ |
7854 057 \} | 7884 057 return err; |
7855 058 | 7885 058 \} |
7856 059 /* now init the second half of the array */ | 7886 059 |
7857 060 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ | 7887 060 /* now init the second half of the array */ |
7858 061 if ((err = mp_init(&M[x])) != MP_OKAY) \{ | 7888 061 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ |
7859 062 for (y = 1<<(winsize-1); y < x; y++) \{ | 7889 062 if ((err = mp_init(&M[x])) != MP_OKAY) \{ |
7860 063 mp_clear (&M[y]); | 7890 063 for (y = 1<<(winsize-1); y < x; y++) \{ |
7861 064 \} | 7891 064 mp_clear (&M[y]); |
7862 065 mp_clear(&M[1]); | 7892 065 \} |
7863 066 return err; | 7893 066 mp_clear(&M[1]); |
7864 067 \} | 7894 067 return err; |
7865 068 \} | 7895 068 \} |
7866 069 | 7896 069 \} |
7867 070 /* create mu, used for Barrett reduction */ | 7897 070 |
7868 071 if ((err = mp_init (&mu)) != MP_OKAY) \{ | 7898 071 /* create mu, used for Barrett reduction */ |
7869 072 goto __M; | 7899 072 if ((err = mp_init (&mu)) != MP_OKAY) \{ |
7870 073 \} | 7900 073 goto LBL_M; |
7871 074 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{ | 7901 074 \} |
7872 075 goto __MU; | 7902 075 |
7873 076 \} | 7903 076 if (redmode == 0) \{ |
7874 077 | 7904 077 if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) \{ |
7875 078 /* create M table | 7905 078 goto LBL_MU; |
7876 079 * | 7906 079 \} |
7877 080 * The M table contains powers of the base, | 7907 080 redux = mp_reduce; |
7878 081 * e.g. M[x] = G**x mod P | 7908 081 \} else \{ |
7879 082 * | 7909 082 if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) \{ |
7880 083 * The first half of the table is not | 7910 083 goto LBL_MU; |
7881 084 * computed though accept for M[0] and M[1] | 7911 084 \} |
7882 085 */ | 7912 085 redux = mp_reduce_2k_l; |
7883 086 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{ | 7913 086 \} |
7884 087 goto __MU; | 7914 087 |
7885 088 \} | 7915 088 /* create M table |
7886 089 | 7916 089 * |
7887 090 /* compute the value at M[1<<(winsize-1)] by squaring | 7917 090 * The M table contains powers of the base, |
7888 091 * M[1] (winsize-1) times | 7918 091 * e.g. M[x] = G**x mod P |
7889 092 */ | 7919 092 * |
7890 093 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{ | 7920 093 * The first half of the table is not |
7891 094 goto __MU; | 7921 094 * computed though accept for M[0] and M[1] |
7892 095 \} | 7922 095 */ |
7893 096 | 7923 096 if ((err = mp_mod (G, P, &M[1])) != MP_OKAY) \{ |
7894 097 for (x = 0; x < (winsize - 1); x++) \{ | 7924 097 goto LBL_MU; |
7895 098 if ((err = mp_sqr (&M[1 << (winsize - 1)], | 7925 098 \} |
7896 099 &M[1 << (winsize - 1)])) != MP_OKAY) \{ | 7926 099 |
7897 100 goto __MU; | 7927 100 /* compute the value at M[1<<(winsize-1)] by squaring |
7898 101 \} | 7928 101 * M[1] (winsize-1) times |
7899 102 if ((err = mp_reduce (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{ | 7929 102 */ |
7900 103 goto __MU; | 7930 103 if ((err = mp_copy (&M[1], &M[1 << (winsize - 1)])) != MP_OKAY) \{ |
7901 104 \} | 7931 104 goto LBL_MU; |
7902 105 \} | 7932 105 \} |
7903 106 | 7933 106 |
7904 107 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P) | 7934 107 for (x = 0; x < (winsize - 1); x++) \{ |
7905 108 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1) | 7935 108 /* square it */ |
7906 109 */ | 7936 109 if ((err = mp_sqr (&M[1 << (winsize - 1)], |
7907 110 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{ | 7937 110 &M[1 << (winsize - 1)])) != MP_OKAY) \{ |
7908 111 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{ | 7938 111 goto LBL_MU; |
7909 112 goto __MU; | 7939 112 \} |
7910 113 \} | 7940 113 |
7911 114 if ((err = mp_reduce (&M[x], P, &mu)) != MP_OKAY) \{ | 7941 114 /* reduce modulo P */ |
7912 115 goto __MU; | 7942 115 if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) \{ |
7913 116 \} | 7943 116 goto LBL_MU; |
7914 117 \} | 7944 117 \} |
7915 118 | 7945 118 \} |
7916 119 /* setup result */ | 7946 119 |
7917 120 if ((err = mp_init (&res)) != MP_OKAY) \{ | 7947 120 /* create upper table, that is M[x] = M[x-1] * M[1] (mod P) |
7918 121 goto __MU; | 7948 121 * for x = (2**(winsize - 1) + 1) to (2**winsize - 1) |
7919 122 \} | 7949 122 */ |
7920 123 mp_set (&res, 1); | 7950 123 for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) \{ |
7921 124 | 7951 124 if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) \{ |
7922 125 /* set initial mode and bit cnt */ | 7952 125 goto LBL_MU; |
7923 126 mode = 0; | 7953 126 \} |
7924 127 bitcnt = 1; | 7954 127 if ((err = redux (&M[x], P, &mu)) != MP_OKAY) \{ |
7925 128 buf = 0; | 7955 128 goto LBL_MU; |
7926 129 digidx = X->used - 1; | 7956 129 \} |
7927 130 bitcpy = 0; | 7957 130 \} |
7928 131 bitbuf = 0; | 7958 131 |
7929 132 | 7959 132 /* setup result */ |
7930 133 for (;;) \{ | 7960 133 if ((err = mp_init (&res)) != MP_OKAY) \{ |
7931 134 /* grab next digit as required */ | 7961 134 goto LBL_MU; |
7932 135 if (--bitcnt == 0) \{ | 7962 135 \} |
7933 136 /* if digidx == -1 we are out of digits */ | 7963 136 mp_set (&res, 1); |
7934 137 if (digidx == -1) \{ | 7964 137 |
7935 138 break; | 7965 138 /* set initial mode and bit cnt */ |
7936 139 \} | 7966 139 mode = 0; |
7937 140 /* read next digit and reset the bitcnt */ | 7967 140 bitcnt = 1; |
7938 141 buf = X->dp[digidx--]; | 7968 141 buf = 0; |
7939 142 bitcnt = (int) DIGIT_BIT; | 7969 142 digidx = X->used - 1; |
7940 143 \} | 7970 143 bitcpy = 0; |
7941 144 | 7971 144 bitbuf = 0; |
7942 145 /* grab the next msb from the exponent */ | 7972 145 |
7943 146 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1; | 7973 146 for (;;) \{ |
7944 147 buf <<= (mp_digit)1; | 7974 147 /* grab next digit as required */ |
7945 148 | 7975 148 if (--bitcnt == 0) \{ |
7946 149 /* if the bit is zero and mode == 0 then we ignore it | 7976 149 /* if digidx == -1 we are out of digits */ |
7947 150 * These represent the leading zero bits before the first 1 bit | 7977 150 if (digidx == -1) \{ |
7948 151 * in the exponent. Technically this opt is not required but it | 7978 151 break; |
7949 152 * does lower the # of trivial squaring/reductions used | 7979 152 \} |
7950 153 */ | 7980 153 /* read next digit and reset the bitcnt */ |
7951 154 if (mode == 0 && y == 0) \{ | 7981 154 buf = X->dp[digidx--]; |
7952 155 continue; | 7982 155 bitcnt = (int) DIGIT_BIT; |
7953 156 \} | 7983 156 \} |
7954 157 | 7984 157 |
7955 158 /* if the bit is zero and mode == 1 then we square */ | 7985 158 /* grab the next msb from the exponent */ |
7956 159 if (mode == 1 && y == 0) \{ | 7986 159 y = (buf >> (mp_digit)(DIGIT_BIT - 1)) & 1; |
7957 160 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 7987 160 buf <<= (mp_digit)1; |
7958 161 goto __RES; | 7988 161 |
7959 162 \} | 7989 162 /* if the bit is zero and mode == 0 then we ignore it |
7960 163 if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) \{ | 7990 163 * These represent the leading zero bits before the first 1 bit |
7961 164 goto __RES; | 7991 164 * in the exponent. Technically this opt is not required but it |
7962 165 \} | 7992 165 * does lower the # of trivial squaring/reductions used |
7963 166 continue; | 7993 166 */ |
7964 167 \} | 7994 167 if (mode == 0 && y == 0) \{ |
7965 168 | 7995 168 continue; |
7966 169 /* else we add it to the window */ | 7996 169 \} |
7967 170 bitbuf |= (y << (winsize - ++bitcpy)); | 7997 170 |
7968 171 mode = 2; | 7998 171 /* if the bit is zero and mode == 1 then we square */ |
7969 172 | 7999 172 if (mode == 1 && y == 0) \{ |
7970 173 if (bitcpy == winsize) \{ | 8000 173 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
7971 174 /* ok window is filled so square as required and multiply */ | 8001 174 goto LBL_RES; |
7972 175 /* square first */ | 8002 175 \} |
7973 176 for (x = 0; x < winsize; x++) \{ | 8003 176 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
7974 177 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 8004 177 goto LBL_RES; |
7975 178 goto __RES; | 8005 178 \} |
7976 179 \} | 8006 179 continue; |
7977 180 if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) \{ | 8007 180 \} |
7978 181 goto __RES; | 8008 181 |
7979 182 \} | 8009 182 /* else we add it to the window */ |
7980 183 \} | 8010 183 bitbuf |= (y << (winsize - ++bitcpy)); |
7981 184 | 8011 184 mode = 2; |
7982 185 /* then multiply */ | 8012 185 |
7983 186 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{ | 8013 186 if (bitcpy == winsize) \{ |
7984 187 goto __RES; | 8014 187 /* ok window is filled so square as required and multiply */ |
7985 188 \} | 8015 188 /* square first */ |
7986 189 if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) \{ | 8016 189 for (x = 0; x < winsize; x++) \{ |
7987 190 goto __RES; | 8017 190 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
7988 191 \} | 8018 191 goto LBL_RES; |
7989 192 | 8019 192 \} |
7990 193 /* empty window and reset */ | 8020 193 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
7991 194 bitcpy = 0; | 8021 194 goto LBL_RES; |
7992 195 bitbuf = 0; | 8022 195 \} |
7993 196 mode = 1; | 8023 196 \} |
7994 197 \} | 8024 197 |
7995 198 \} | 8025 198 /* then multiply */ |
7996 199 | 8026 199 if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) \{ |
7997 200 /* if bits remain then square/multiply */ | 8027 200 goto LBL_RES; |
7998 201 if (mode == 2 && bitcpy > 0) \{ | 8028 201 \} |
7999 202 /* square then multiply if the bit is set */ | 8029 202 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8000 203 for (x = 0; x < bitcpy; x++) \{ | 8030 203 goto LBL_RES; |
8001 204 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ | 8031 204 \} |
8002 205 goto __RES; | 8032 205 |
8003 206 \} | 8033 206 /* empty window and reset */ |
8004 207 if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) \{ | 8034 207 bitcpy = 0; |
8005 208 goto __RES; | 8035 208 bitbuf = 0; |
8006 209 \} | 8036 209 mode = 1; |
8007 210 | 8037 210 \} |
8008 211 bitbuf <<= 1; | 8038 211 \} |
8009 212 if ((bitbuf & (1 << winsize)) != 0) \{ | 8039 212 |
8010 213 /* then multiply */ | 8040 213 /* if bits remain then square/multiply */ |
8011 214 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{ | 8041 214 if (mode == 2 && bitcpy > 0) \{ |
8012 215 goto __RES; | 8042 215 /* square then multiply if the bit is set */ |
8013 216 \} | 8043 216 for (x = 0; x < bitcpy; x++) \{ |
8014 217 if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) \{ | 8044 217 if ((err = mp_sqr (&res, &res)) != MP_OKAY) \{ |
8015 218 goto __RES; | 8045 218 goto LBL_RES; |
8016 219 \} | 8046 219 \} |
8017 220 \} | 8047 220 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8018 221 \} | 8048 221 goto LBL_RES; |
8019 222 \} | 8049 222 \} |
8020 223 | 8050 223 |
8021 224 mp_exch (&res, Y); | 8051 224 bitbuf <<= 1; |
8022 225 err = MP_OKAY; | 8052 225 if ((bitbuf & (1 << winsize)) != 0) \{ |
8023 226 __RES:mp_clear (&res); | 8053 226 /* then multiply */ |
8024 227 __MU:mp_clear (&mu); | 8054 227 if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) \{ |
8025 228 __M: | 8055 228 goto LBL_RES; |
8026 229 mp_clear(&M[1]); | 8056 229 \} |
8027 230 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ | 8057 230 if ((err = redux (&res, P, &mu)) != MP_OKAY) \{ |
8028 231 mp_clear (&M[x]); | 8058 231 goto LBL_RES; |
8029 232 \} | 8059 232 \} |
8030 233 return err; | 8060 233 \} |
8031 234 \} | 8061 234 \} |
8032 235 #endif | 8062 235 \} |
8063 236 | |
8064 237 mp_exch (&res, Y); | |
8065 238 err = MP_OKAY; | |
8066 239 LBL_RES:mp_clear (&res); | |
8067 240 LBL_MU:mp_clear (&mu); | |
8068 241 LBL_M: | |
8069 242 mp_clear(&M[1]); | |
8070 243 for (x = 1<<(winsize-1); x < (1 << winsize); x++) \{ | |
8071 244 mp_clear (&M[x]); | |
8072 245 \} | |
8073 246 return err; | |
8074 247 \} | |
8075 248 #endif | |
8033 \end{alltt} | 8076 \end{alltt} |
8034 \end{small} | 8077 \end{small} |
8035 | 8078 |
8036 Lines 31 through 41 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted | 8079 Lines 21 through 40 determine the optimal window size based on the length of the exponent in bits. The window divisions are sorted |
8037 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement | 8080 from smallest to greatest so that in each \textbf{if} statement only one condition must be tested. For example, by the \textbf{if} statement |
8038 on line 33 the value of $x$ is already known to be greater than $140$. | 8081 on line 32 the value of $x$ is already known to be greater than $140$. |
8039 | 8082 |
8040 The conditional piece of code beginning on line 47 allows the window size to be restricted to five bits. This logic is used to ensure | 8083 The conditional piece of code beginning on line 48 allows the window size to be restricted to five bits. This logic is used to ensure |
8041 the table of precomputed powers of $G$ remains relatively small. | 8084 the table of precomputed powers of $G$ remains relatively small. |
8042 | 8085 |
8043 The for loop on line 60 initializes the $M$ array while lines 61 and 74 compute the value of $\mu$ required for | 8086 The for loop on line 61 initializes the $M$ array while lines 62 and 77 compute the value of $\mu$ required for |
8044 Barrett reduction. | 8087 Barrett reduction. |
8045 | 8088 |
8046 -- More later. | 8089 -- More later. |
8047 | 8090 |
8048 \section{Quick Power of Two} | 8091 \section{Quick Power of Two} |
8384 046 \} | 8427 046 \} |
8385 047 | 8428 047 |
8386 048 | 8429 048 |
8387 049 mp_set(&tq, 1); | 8430 049 mp_set(&tq, 1); |
8388 050 n = mp_count_bits(a) - mp_count_bits(b); | 8431 050 n = mp_count_bits(a) - mp_count_bits(b); |
8389 051 if (((res = mp_copy(a, &ta)) != MP_OKAY) || | 8432 051 if (((res = mp_abs(a, &ta)) != MP_OKAY) || |
8390 052 ((res = mp_copy(b, &tb)) != MP_OKAY) || | 8433 052 ((res = mp_abs(b, &tb)) != MP_OKAY) || |
8391 053 ((res = mp_mul_2d(&tb, n, &tb)) != MP_OKAY) || | 8434 053 ((res = mp_mul_2d(&tb, n, &tb)) != MP_OKAY) || |
8392 054 ((res = mp_mul_2d(&tq, n, &tq)) != MP_OKAY)) \{ | 8435 054 ((res = mp_mul_2d(&tq, n, &tq)) != MP_OKAY)) \{ |
8393 055 goto __ERR; | 8436 055 goto LBL_ERR; |
8394 056 \} | 8437 056 \} |
8395 057 | 8438 057 |
8396 058 while (n-- >= 0) \{ | 8439 058 while (n-- >= 0) \{ |
8397 059 if (mp_cmp(&tb, &ta) != MP_GT) \{ | 8440 059 if (mp_cmp(&tb, &ta) != MP_GT) \{ |
8398 060 if (((res = mp_sub(&ta, &tb, &ta)) != MP_OKAY) || | 8441 060 if (((res = mp_sub(&ta, &tb, &ta)) != MP_OKAY) || |
8399 061 ((res = mp_add(&q, &tq, &q)) != MP_OKAY)) \{ | 8442 061 ((res = mp_add(&q, &tq, &q)) != MP_OKAY)) \{ |
8400 062 goto __ERR; | 8443 062 goto LBL_ERR; |
8401 063 \} | 8444 063 \} |
8402 064 \} | 8445 064 \} |
8403 065 if (((res = mp_div_2d(&tb, 1, &tb, NULL)) != MP_OKAY) || | 8446 065 if (((res = mp_div_2d(&tb, 1, &tb, NULL)) != MP_OKAY) || |
8404 066 ((res = mp_div_2d(&tq, 1, &tq, NULL)) != MP_OKAY)) \{ | 8447 066 ((res = mp_div_2d(&tq, 1, &tq, NULL)) != MP_OKAY)) \{ |
8405 067 goto __ERR; | 8448 067 goto LBL_ERR; |
8406 068 \} | 8449 068 \} |
8407 069 \} | 8450 069 \} |
8408 070 | 8451 070 |
8409 071 /* now q == quotient and ta == remainder */ | 8452 071 /* now q == quotient and ta == remainder */ |
8410 072 n = a->sign; | 8453 072 n = a->sign; |
8411 073 n2 = (a->sign == b->sign ? MP_ZPOS : MP_NEG); | 8454 073 n2 = (a->sign == b->sign ? MP_ZPOS : MP_NEG); |
8412 074 if (c != NULL) \{ | 8455 074 if (c != NULL) \{ |
8413 075 mp_exch(c, &q); | 8456 075 mp_exch(c, &q); |
8414 076 c->sign = n2; | 8457 076 c->sign = (mp_iszero(c) == MP_YES) ? MP_ZPOS : n2; |
8415 077 \} | 8458 077 \} |
8416 078 if (d != NULL) \{ | 8459 078 if (d != NULL) \{ |
8417 079 mp_exch(d, &ta); | 8460 079 mp_exch(d, &ta); |
8418 080 d->sign = n; | 8461 080 d->sign = (mp_iszero(d) == MP_YES) ? MP_ZPOS : n; |
8419 081 \} | 8462 081 \} |
8420 082 __ERR: | 8463 082 LBL_ERR: |
8421 083 mp_clear_multi(&ta, &tb, &tq, &q, NULL); | 8464 083 mp_clear_multi(&ta, &tb, &tq, &q, NULL); |
8422 084 return res; | 8465 084 return res; |
8423 085 \} | 8466 085 \} |
8424 086 | 8467 086 |
8425 087 #else | 8468 087 #else |
8464 126 return res; | 8507 126 return res; |
8465 127 \} | 8508 127 \} |
8466 128 q.used = a->used + 2; | 8509 128 q.used = a->used + 2; |
8467 129 | 8510 129 |
8468 130 if ((res = mp_init (&t1)) != MP_OKAY) \{ | 8511 130 if ((res = mp_init (&t1)) != MP_OKAY) \{ |
8469 131 goto __Q; | 8512 131 goto LBL_Q; |
8470 132 \} | 8513 132 \} |
8471 133 | 8514 133 |
8472 134 if ((res = mp_init (&t2)) != MP_OKAY) \{ | 8515 134 if ((res = mp_init (&t2)) != MP_OKAY) \{ |
8473 135 goto __T1; | 8516 135 goto LBL_T1; |
8474 136 \} | 8517 136 \} |
8475 137 | 8518 137 |
8476 138 if ((res = mp_init_copy (&x, a)) != MP_OKAY) \{ | 8519 138 if ((res = mp_init_copy (&x, a)) != MP_OKAY) \{ |
8477 139 goto __T2; | 8520 139 goto LBL_T2; |
8478 140 \} | 8521 140 \} |
8479 141 | 8522 141 |
8480 142 if ((res = mp_init_copy (&y, b)) != MP_OKAY) \{ | 8523 142 if ((res = mp_init_copy (&y, b)) != MP_OKAY) \{ |
8481 143 goto __X; | 8524 143 goto LBL_X; |
8482 144 \} | 8525 144 \} |
8483 145 | 8526 145 |
8484 146 /* fix the sign */ | 8527 146 /* fix the sign */ |
8485 147 neg = (a->sign == b->sign) ? MP_ZPOS : MP_NEG; | 8528 147 neg = (a->sign == b->sign) ? MP_ZPOS : MP_NEG; |
8486 148 x.sign = y.sign = MP_ZPOS; | 8529 148 x.sign = y.sign = MP_ZPOS; |
8488 150 /* normalize both x and y, ensure that y >= b/2, [b == 2**DIGIT_BIT] */ | 8531 150 /* normalize both x and y, ensure that y >= b/2, [b == 2**DIGIT_BIT] */ |
8489 151 norm = mp_count_bits(&y) % DIGIT_BIT; | 8532 151 norm = mp_count_bits(&y) % DIGIT_BIT; |
8490 152 if (norm < (int)(DIGIT_BIT-1)) \{ | 8533 152 if (norm < (int)(DIGIT_BIT-1)) \{ |
8491 153 norm = (DIGIT_BIT-1) - norm; | 8534 153 norm = (DIGIT_BIT-1) - norm; |
8492 154 if ((res = mp_mul_2d (&x, norm, &x)) != MP_OKAY) \{ | 8535 154 if ((res = mp_mul_2d (&x, norm, &x)) != MP_OKAY) \{ |
8493 155 goto __Y; | 8536 155 goto LBL_Y; |
8494 156 \} | 8537 156 \} |
8495 157 if ((res = mp_mul_2d (&y, norm, &y)) != MP_OKAY) \{ | 8538 157 if ((res = mp_mul_2d (&y, norm, &y)) != MP_OKAY) \{ |
8496 158 goto __Y; | 8539 158 goto LBL_Y; |
8497 159 \} | 8540 159 \} |
8498 160 \} else \{ | 8541 160 \} else \{ |
8499 161 norm = 0; | 8542 161 norm = 0; |
8500 162 \} | 8543 162 \} |
8501 163 | 8544 163 |
8503 165 n = x.used - 1; | 8546 165 n = x.used - 1; |
8504 166 t = y.used - 1; | 8547 166 t = y.used - 1; |
8505 167 | 8548 167 |
8506 168 /* while (x >= y*b**n-t) do \{ q[n-t] += 1; x -= y*b**\{n-t\} \} */ | 8549 168 /* while (x >= y*b**n-t) do \{ q[n-t] += 1; x -= y*b**\{n-t\} \} */ |
8507 169 if ((res = mp_lshd (&y, n - t)) != MP_OKAY) \{ /* y = y*b**\{n-t\} */ | 8550 169 if ((res = mp_lshd (&y, n - t)) != MP_OKAY) \{ /* y = y*b**\{n-t\} */ |
8508 170 goto __Y; | 8551 170 goto LBL_Y; |
8509 171 \} | 8552 171 \} |
8510 172 | 8553 172 |
8511 173 while (mp_cmp (&x, &y) != MP_LT) \{ | 8554 173 while (mp_cmp (&x, &y) != MP_LT) \{ |
8512 174 ++(q.dp[n - t]); | 8555 174 ++(q.dp[n - t]); |
8513 175 if ((res = mp_sub (&x, &y, &x)) != MP_OKAY) \{ | 8556 175 if ((res = mp_sub (&x, &y, &x)) != MP_OKAY) \{ |
8514 176 goto __Y; | 8557 176 goto LBL_Y; |
8515 177 \} | 8558 177 \} |
8516 178 \} | 8559 178 \} |
8517 179 | 8560 179 |
8518 180 /* reset y by shifting it back down */ | 8561 180 /* reset y by shifting it back down */ |
8519 181 mp_rshd (&y, n - t); | 8562 181 mp_rshd (&y, n - t); |
8551 213 mp_zero (&t1); | 8594 213 mp_zero (&t1); |
8552 214 t1.dp[0] = (t - 1 < 0) ? 0 : y.dp[t - 1]; | 8595 214 t1.dp[0] = (t - 1 < 0) ? 0 : y.dp[t - 1]; |
8553 215 t1.dp[1] = y.dp[t]; | 8596 215 t1.dp[1] = y.dp[t]; |
8554 216 t1.used = 2; | 8597 216 t1.used = 2; |
8555 217 if ((res = mp_mul_d (&t1, q.dp[i - t - 1], &t1)) != MP_OKAY) \{ | 8598 217 if ((res = mp_mul_d (&t1, q.dp[i - t - 1], &t1)) != MP_OKAY) \{ |
8556 218 goto __Y; | 8599 218 goto LBL_Y; |
8557 219 \} | 8600 219 \} |
8558 220 | 8601 220 |
8559 221 /* find right hand */ | 8602 221 /* find right hand */ |
8560 222 t2.dp[0] = (i - 2 < 0) ? 0 : x.dp[i - 2]; | 8603 222 t2.dp[0] = (i - 2 < 0) ? 0 : x.dp[i - 2]; |
8561 223 t2.dp[1] = (i - 1 < 0) ? 0 : x.dp[i - 1]; | 8604 223 t2.dp[1] = (i - 1 < 0) ? 0 : x.dp[i - 1]; |
8563 225 t2.used = 3; | 8606 225 t2.used = 3; |
8564 226 \} while (mp_cmp_mag(&t1, &t2) == MP_GT); | 8607 226 \} while (mp_cmp_mag(&t1, &t2) == MP_GT); |
8565 227 | 8608 227 |
8566 228 /* step 3.3 x = x - q\{i-t-1\} * y * b**\{i-t-1\} */ | 8609 228 /* step 3.3 x = x - q\{i-t-1\} * y * b**\{i-t-1\} */ |
8567 229 if ((res = mp_mul_d (&y, q.dp[i - t - 1], &t1)) != MP_OKAY) \{ | 8610 229 if ((res = mp_mul_d (&y, q.dp[i - t - 1], &t1)) != MP_OKAY) \{ |
8568 230 goto __Y; | 8611 230 goto LBL_Y; |
8569 231 \} | 8612 231 \} |
8570 232 | 8613 232 |
8571 233 if ((res = mp_lshd (&t1, i - t - 1)) != MP_OKAY) \{ | 8614 233 if ((res = mp_lshd (&t1, i - t - 1)) != MP_OKAY) \{ |
8572 234 goto __Y; | 8615 234 goto LBL_Y; |
8573 235 \} | 8616 235 \} |
8574 236 | 8617 236 |
8575 237 if ((res = mp_sub (&x, &t1, &x)) != MP_OKAY) \{ | 8618 237 if ((res = mp_sub (&x, &t1, &x)) != MP_OKAY) \{ |
8576 238 goto __Y; | 8619 238 goto LBL_Y; |
8577 239 \} | 8620 239 \} |
8578 240 | 8621 240 |
8579 241 /* if x < 0 then \{ x = x + y*b**\{i-t-1\}; q\{i-t-1\} -= 1; \} */ | 8622 241 /* if x < 0 then \{ x = x + y*b**\{i-t-1\}; q\{i-t-1\} -= 1; \} */ |
8580 242 if (x.sign == MP_NEG) \{ | 8623 242 if (x.sign == MP_NEG) \{ |
8581 243 if ((res = mp_copy (&y, &t1)) != MP_OKAY) \{ | 8624 243 if ((res = mp_copy (&y, &t1)) != MP_OKAY) \{ |
8582 244 goto __Y; | 8625 244 goto LBL_Y; |
8583 245 \} | 8626 245 \} |
8584 246 if ((res = mp_lshd (&t1, i - t - 1)) != MP_OKAY) \{ | 8627 246 if ((res = mp_lshd (&t1, i - t - 1)) != MP_OKAY) \{ |
8585 247 goto __Y; | 8628 247 goto LBL_Y; |
8586 248 \} | 8629 248 \} |
8587 249 if ((res = mp_add (&x, &t1, &x)) != MP_OKAY) \{ | 8630 249 if ((res = mp_add (&x, &t1, &x)) != MP_OKAY) \{ |
8588 250 goto __Y; | 8631 250 goto LBL_Y; |
8589 251 \} | 8632 251 \} |
8590 252 | 8633 252 |
8591 253 q.dp[i - t - 1] = (q.dp[i - t - 1] - 1UL) & MP_MASK; | 8634 253 q.dp[i - t - 1] = (q.dp[i - t - 1] - 1UL) & MP_MASK; |
8592 254 \} | 8635 254 \} |
8593 255 \} | 8636 255 \} |
8610 272 mp_exch (&x, d); | 8653 272 mp_exch (&x, d); |
8611 273 \} | 8654 273 \} |
8612 274 | 8655 274 |
8613 275 res = MP_OKAY; | 8656 275 res = MP_OKAY; |
8614 276 | 8657 276 |
8615 277 __Y:mp_clear (&y); | 8658 277 LBL_Y:mp_clear (&y); |
8616 278 __X:mp_clear (&x); | 8659 278 LBL_X:mp_clear (&x); |
8617 279 __T2:mp_clear (&t2); | 8660 279 LBL_T2:mp_clear (&t2); |
8618 280 __T1:mp_clear (&t1); | 8661 280 LBL_T1:mp_clear (&t1); |
8619 281 __Q:mp_clear (&q); | 8662 281 LBL_Q:mp_clear (&q); |
8620 282 return res; | 8663 282 return res; |
8621 283 \} | 8664 283 \} |
8622 284 | 8665 284 |
8623 285 #endif | 8666 285 #endif |
8624 286 | 8667 286 |
8868 054 | 8911 054 |
8869 055 /* send carry into next iteration */ | 8912 055 /* send carry into next iteration */ |
8870 056 u = (mp_digit) (r >> ((mp_word) DIGIT_BIT)); | 8913 056 u = (mp_digit) (r >> ((mp_word) DIGIT_BIT)); |
8871 057 \} | 8914 057 \} |
8872 058 | 8915 058 |
8873 059 /* store final carry [if any] */ | 8916 059 /* store final carry [if any] and increment ix offset */ |
8874 060 *tmpc++ = u; | 8917 060 *tmpc++ = u; |
8875 061 | 8918 061 ++ix; |
8876 062 /* now zero digits above the top */ | 8919 062 |
8877 063 while (ix++ < olduse) \{ | 8920 063 /* now zero digits above the top */ |
8878 064 *tmpc++ = 0; | 8921 064 while (ix++ < olduse) \{ |
8879 065 \} | 8922 065 *tmpc++ = 0; |
8880 066 | 8923 066 \} |
8881 067 /* set used count */ | 8924 067 |
8882 068 c->used = a->used + 1; | 8925 068 /* set used count */ |
8883 069 mp_clamp(c); | 8926 069 c->used = a->used + 1; |
8884 070 | 8927 070 mp_clamp(c); |
8885 071 return MP_OKAY; | 8928 071 |
8886 072 \} | 8929 072 return MP_OKAY; |
8887 073 #endif | 8930 073 \} |
8931 074 #endif | |
8888 \end{alltt} | 8932 \end{alltt} |
8889 \end{small} | 8933 \end{small} |
8890 | 8934 |
8891 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is | 8935 In this implementation the destination $c$ may point to the same mp\_int as the source $a$ since the result is written after the digit is |
8892 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. | 8936 read from the source. This function uses pointer aliases $tmpa$ and $tmpc$ for the digits of $a$ and $c$ respectively. |
9128 037 if ((res = mp_init (&t1)) != MP_OKAY) \{ | 9172 037 if ((res = mp_init (&t1)) != MP_OKAY) \{ |
9129 038 return res; | 9173 038 return res; |
9130 039 \} | 9174 039 \} |
9131 040 | 9175 040 |
9132 041 if ((res = mp_init (&t2)) != MP_OKAY) \{ | 9176 041 if ((res = mp_init (&t2)) != MP_OKAY) \{ |
9133 042 goto __T1; | 9177 042 goto LBL_T1; |
9134 043 \} | 9178 043 \} |
9135 044 | 9179 044 |
9136 045 if ((res = mp_init (&t3)) != MP_OKAY) \{ | 9180 045 if ((res = mp_init (&t3)) != MP_OKAY) \{ |
9137 046 goto __T2; | 9181 046 goto LBL_T2; |
9138 047 \} | 9182 047 \} |
9139 048 | 9183 048 |
9140 049 /* if a is negative fudge the sign but keep track */ | 9184 049 /* if a is negative fudge the sign but keep track */ |
9141 050 neg = a->sign; | 9185 050 neg = a->sign; |
9142 051 a->sign = MP_ZPOS; | 9186 051 a->sign = MP_ZPOS; |
9145 054 mp_set (&t2, 2); | 9189 054 mp_set (&t2, 2); |
9146 055 | 9190 055 |
9147 056 do \{ | 9191 056 do \{ |
9148 057 /* t1 = t2 */ | 9192 057 /* t1 = t2 */ |
9149 058 if ((res = mp_copy (&t2, &t1)) != MP_OKAY) \{ | 9193 058 if ((res = mp_copy (&t2, &t1)) != MP_OKAY) \{ |
9150 059 goto __T3; | 9194 059 goto LBL_T3; |
9151 060 \} | 9195 060 \} |
9152 061 | 9196 061 |
9153 062 /* t2 = t1 - ((t1**b - a) / (b * t1**(b-1))) */ | 9197 062 /* t2 = t1 - ((t1**b - a) / (b * t1**(b-1))) */ |
9154 063 | 9198 063 |
9155 064 /* t3 = t1**(b-1) */ | 9199 064 /* t3 = t1**(b-1) */ |
9156 065 if ((res = mp_expt_d (&t1, b - 1, &t3)) != MP_OKAY) \{ | 9200 065 if ((res = mp_expt_d (&t1, b - 1, &t3)) != MP_OKAY) \{ |
9157 066 goto __T3; | 9201 066 goto LBL_T3; |
9158 067 \} | 9202 067 \} |
9159 068 | 9203 068 |
9160 069 /* numerator */ | 9204 069 /* numerator */ |
9161 070 /* t2 = t1**b */ | 9205 070 /* t2 = t1**b */ |
9162 071 if ((res = mp_mul (&t3, &t1, &t2)) != MP_OKAY) \{ | 9206 071 if ((res = mp_mul (&t3, &t1, &t2)) != MP_OKAY) \{ |
9163 072 goto __T3; | 9207 072 goto LBL_T3; |
9164 073 \} | 9208 073 \} |
9165 074 | 9209 074 |
9166 075 /* t2 = t1**b - a */ | 9210 075 /* t2 = t1**b - a */ |
9167 076 if ((res = mp_sub (&t2, a, &t2)) != MP_OKAY) \{ | 9211 076 if ((res = mp_sub (&t2, a, &t2)) != MP_OKAY) \{ |
9168 077 goto __T3; | 9212 077 goto LBL_T3; |
9169 078 \} | 9213 078 \} |
9170 079 | 9214 079 |
9171 080 /* denominator */ | 9215 080 /* denominator */ |
9172 081 /* t3 = t1**(b-1) * b */ | 9216 081 /* t3 = t1**(b-1) * b */ |
9173 082 if ((res = mp_mul_d (&t3, b, &t3)) != MP_OKAY) \{ | 9217 082 if ((res = mp_mul_d (&t3, b, &t3)) != MP_OKAY) \{ |
9174 083 goto __T3; | 9218 083 goto LBL_T3; |
9175 084 \} | 9219 084 \} |
9176 085 | 9220 085 |
9177 086 /* t3 = (t1**b - a)/(b * t1**(b-1)) */ | 9221 086 /* t3 = (t1**b - a)/(b * t1**(b-1)) */ |
9178 087 if ((res = mp_div (&t2, &t3, &t3, NULL)) != MP_OKAY) \{ | 9222 087 if ((res = mp_div (&t2, &t3, &t3, NULL)) != MP_OKAY) \{ |
9179 088 goto __T3; | 9223 088 goto LBL_T3; |
9180 089 \} | 9224 089 \} |
9181 090 | 9225 090 |
9182 091 if ((res = mp_sub (&t1, &t3, &t2)) != MP_OKAY) \{ | 9226 091 if ((res = mp_sub (&t1, &t3, &t2)) != MP_OKAY) \{ |
9183 092 goto __T3; | 9227 092 goto LBL_T3; |
9184 093 \} | 9228 093 \} |
9185 094 \} while (mp_cmp (&t1, &t2) != MP_EQ); | 9229 094 \} while (mp_cmp (&t1, &t2) != MP_EQ); |
9186 095 | 9230 095 |
9187 096 /* result can be off by a few so check */ | 9231 096 /* result can be off by a few so check */ |
9188 097 for (;;) \{ | 9232 097 for (;;) \{ |
9189 098 if ((res = mp_expt_d (&t1, b, &t2)) != MP_OKAY) \{ | 9233 098 if ((res = mp_expt_d (&t1, b, &t2)) != MP_OKAY) \{ |
9190 099 goto __T3; | 9234 099 goto LBL_T3; |
9191 100 \} | 9235 100 \} |
9192 101 | 9236 101 |
9193 102 if (mp_cmp (&t2, a) == MP_GT) \{ | 9237 102 if (mp_cmp (&t2, a) == MP_GT) \{ |
9194 103 if ((res = mp_sub_d (&t1, 1, &t1)) != MP_OKAY) \{ | 9238 103 if ((res = mp_sub_d (&t1, 1, &t1)) != MP_OKAY) \{ |
9195 104 goto __T3; | 9239 104 goto LBL_T3; |
9196 105 \} | 9240 105 \} |
9197 106 \} else \{ | 9241 106 \} else \{ |
9198 107 break; | 9242 107 break; |
9199 108 \} | 9243 108 \} |
9200 109 \} | 9244 109 \} |
9208 117 /* set the sign of the result */ | 9252 117 /* set the sign of the result */ |
9209 118 c->sign = neg; | 9253 118 c->sign = neg; |
9210 119 | 9254 119 |
9211 120 res = MP_OKAY; | 9255 120 res = MP_OKAY; |
9212 121 | 9256 121 |
9213 122 __T3:mp_clear (&t3); | 9257 122 LBL_T3:mp_clear (&t3); |
9214 123 __T2:mp_clear (&t2); | 9258 123 LBL_T2:mp_clear (&t2); |
9215 124 __T1:mp_clear (&t1); | 9259 124 LBL_T1:mp_clear (&t1); |
9216 125 return res; | 9260 125 return res; |
9217 126 \} | 9261 126 \} |
9218 127 #endif | 9262 127 #endif |
9219 \end{alltt} | 9263 \end{alltt} |
9220 \end{small} | 9264 \end{small} |
9270 026 return MP_OKAY; | 9314 026 return MP_OKAY; |
9271 027 \} | 9315 027 \} |
9272 028 | 9316 028 |
9273 029 /* first place a random non-zero digit */ | 9317 029 /* first place a random non-zero digit */ |
9274 030 do \{ | 9318 030 do \{ |
9275 031 d = ((mp_digit) abs (rand ())); | 9319 031 d = ((mp_digit) abs (rand ())) & MP_MASK; |
9276 032 \} while (d == 0); | 9320 032 \} while (d == 0); |
9277 033 | 9321 033 |
9278 034 if ((res = mp_add_d (a, d, a)) != MP_OKAY) \{ | 9322 034 if ((res = mp_add_d (a, d, a)) != MP_OKAY) \{ |
9279 035 return res; | 9323 035 return res; |
9280 036 \} | 9324 036 \} |
9281 037 | 9325 037 |
9282 038 while (digits-- > 0) \{ | 9326 038 while (--digits > 0) \{ |
9283 039 if ((res = mp_lshd (a, 1)) != MP_OKAY) \{ | 9327 039 if ((res = mp_lshd (a, 1)) != MP_OKAY) \{ |
9284 040 return res; | 9328 040 return res; |
9285 041 \} | 9329 041 \} |
9286 042 | 9330 042 |
9287 043 if ((res = mp_add_d (a, ((mp_digit) abs (rand ())), a)) != MP_OKAY) \{ | 9331 043 if ((res = mp_add_d (a, ((mp_digit) abs (rand ())), a)) != MP_OKAY) \{ |
9374 \hspace{-5.1mm}{\bf File}: bn\_mp\_read\_radix.c | 9418 \hspace{-5.1mm}{\bf File}: bn\_mp\_read\_radix.c |
9375 \vspace{-3mm} | 9419 \vspace{-3mm} |
9376 \begin{alltt} | 9420 \begin{alltt} |
9377 016 | 9421 016 |
9378 017 /* read a string [ASCII] in a given radix */ | 9422 017 /* read a string [ASCII] in a given radix */ |
9379 018 int mp_read_radix (mp_int * a, char *str, int radix) | 9423 018 int mp_read_radix (mp_int * a, const char *str, int radix) |
9380 019 \{ | 9424 019 \{ |
9381 020 int y, res, neg; | 9425 020 int y, res, neg; |
9382 021 char ch; | 9426 021 char ch; |
9383 022 | 9427 022 |
9384 023 /* make sure the radix is ok */ | 9428 023 /* make sure the radix is ok */ |
9769 040 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{ | 9813 040 if ((res = mp_init_copy (&u, a)) != MP_OKAY) \{ |
9770 041 return res; | 9814 041 return res; |
9771 042 \} | 9815 042 \} |
9772 043 | 9816 043 |
9773 044 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{ | 9817 044 if ((res = mp_init_copy (&v, b)) != MP_OKAY) \{ |
9774 045 goto __U; | 9818 045 goto LBL_U; |
9775 046 \} | 9819 046 \} |
9776 047 | 9820 047 |
9777 048 /* must be positive for the remainder of the algorithm */ | 9821 048 /* must be positive for the remainder of the algorithm */ |
9778 049 u.sign = v.sign = MP_ZPOS; | 9822 049 u.sign = v.sign = MP_ZPOS; |
9779 050 | 9823 050 |
9783 054 k = MIN(u_lsb, v_lsb); | 9827 054 k = MIN(u_lsb, v_lsb); |
9784 055 | 9828 055 |
9785 056 if (k > 0) \{ | 9829 056 if (k > 0) \{ |
9786 057 /* divide the power of two out */ | 9830 057 /* divide the power of two out */ |
9787 058 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{ | 9831 058 if ((res = mp_div_2d(&u, k, &u, NULL)) != MP_OKAY) \{ |
9788 059 goto __V; | 9832 059 goto LBL_V; |
9789 060 \} | 9833 060 \} |
9790 061 | 9834 061 |
9791 062 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{ | 9835 062 if ((res = mp_div_2d(&v, k, &v, NULL)) != MP_OKAY) \{ |
9792 063 goto __V; | 9836 063 goto LBL_V; |
9793 064 \} | 9837 064 \} |
9794 065 \} | 9838 065 \} |
9795 066 | 9839 066 |
9796 067 /* divide any remaining factors of two out */ | 9840 067 /* divide any remaining factors of two out */ |
9797 068 if (u_lsb != k) \{ | 9841 068 if (u_lsb != k) \{ |
9798 069 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{ | 9842 069 if ((res = mp_div_2d(&u, u_lsb - k, &u, NULL)) != MP_OKAY) \{ |
9799 070 goto __V; | 9843 070 goto LBL_V; |
9800 071 \} | 9844 071 \} |
9801 072 \} | 9845 072 \} |
9802 073 | 9846 073 |
9803 074 if (v_lsb != k) \{ | 9847 074 if (v_lsb != k) \{ |
9804 075 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{ | 9848 075 if ((res = mp_div_2d(&v, v_lsb - k, &v, NULL)) != MP_OKAY) \{ |
9805 076 goto __V; | 9849 076 goto LBL_V; |
9806 077 \} | 9850 077 \} |
9807 078 \} | 9851 078 \} |
9808 079 | 9852 079 |
9809 080 while (mp_iszero(&v) == 0) \{ | 9853 080 while (mp_iszero(&v) == 0) \{ |
9810 081 /* make sure v is the largest */ | 9854 081 /* make sure v is the largest */ |
9813 084 mp_exch(&u, &v); | 9857 084 mp_exch(&u, &v); |
9814 085 \} | 9858 085 \} |
9815 086 | 9859 086 |
9816 087 /* subtract smallest from largest */ | 9860 087 /* subtract smallest from largest */ |
9817 088 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{ | 9861 088 if ((res = s_mp_sub(&v, &u, &v)) != MP_OKAY) \{ |
9818 089 goto __V; | 9862 089 goto LBL_V; |
9819 090 \} | 9863 090 \} |
9820 091 | 9864 091 |
9821 092 /* Divide out all factors of two */ | 9865 092 /* Divide out all factors of two */ |
9822 093 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{ | 9866 093 if ((res = mp_div_2d(&v, mp_cnt_lsb(&v), &v, NULL)) != MP_OKAY) \{ |
9823 094 goto __V; | 9867 094 goto LBL_V; |
9824 095 \} | 9868 095 \} |
9825 096 \} | 9869 096 \} |
9826 097 | 9870 097 |
9827 098 /* multiply by 2**k which we divided out at the beginning */ | 9871 098 /* multiply by 2**k which we divided out at the beginning */ |
9828 099 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{ | 9872 099 if ((res = mp_mul_2d (&u, k, c)) != MP_OKAY) \{ |
9829 100 goto __V; | 9873 100 goto LBL_V; |
9830 101 \} | 9874 101 \} |
9831 102 c->sign = MP_ZPOS; | 9875 102 c->sign = MP_ZPOS; |
9832 103 res = MP_OKAY; | 9876 103 res = MP_OKAY; |
9833 104 __V:mp_clear (&u); | 9877 104 LBL_V:mp_clear (&u); |
9834 105 __U:mp_clear (&v); | 9878 105 LBL_U:mp_clear (&v); |
9835 106 return res; | 9879 106 return res; |
9836 107 \} | 9880 107 \} |
9837 108 #endif | 9881 108 #endif |
9838 \end{alltt} | 9882 \end{alltt} |
9839 \end{small} | 9883 \end{small} |
9902 025 return res; | 9946 025 return res; |
9903 026 \} | 9947 026 \} |
9904 027 | 9948 027 |
9905 028 /* t1 = get the GCD of the two inputs */ | 9949 028 /* t1 = get the GCD of the two inputs */ |
9906 029 if ((res = mp_gcd (a, b, &t1)) != MP_OKAY) \{ | 9950 029 if ((res = mp_gcd (a, b, &t1)) != MP_OKAY) \{ |
9907 030 goto __T; | 9951 030 goto LBL_T; |
9908 031 \} | 9952 031 \} |
9909 032 | 9953 032 |
9910 033 /* divide the smallest by the GCD */ | 9954 033 /* divide the smallest by the GCD */ |
9911 034 if (mp_cmp_mag(a, b) == MP_LT) \{ | 9955 034 if (mp_cmp_mag(a, b) == MP_LT) \{ |
9912 035 /* store quotient in t2 such that t2 * b is the LCM */ | 9956 035 /* store quotient in t2 such that t2 * b is the LCM */ |
9913 036 if ((res = mp_div(a, &t1, &t2, NULL)) != MP_OKAY) \{ | 9957 036 if ((res = mp_div(a, &t1, &t2, NULL)) != MP_OKAY) \{ |
9914 037 goto __T; | 9958 037 goto LBL_T; |
9915 038 \} | 9959 038 \} |
9916 039 res = mp_mul(b, &t2, c); | 9960 039 res = mp_mul(b, &t2, c); |
9917 040 \} else \{ | 9961 040 \} else \{ |
9918 041 /* store quotient in t2 such that t2 * a is the LCM */ | 9962 041 /* store quotient in t2 such that t2 * a is the LCM */ |
9919 042 if ((res = mp_div(b, &t1, &t2, NULL)) != MP_OKAY) \{ | 9963 042 if ((res = mp_div(b, &t1, &t2, NULL)) != MP_OKAY) \{ |
9920 043 goto __T; | 9964 043 goto LBL_T; |
9921 044 \} | 9965 044 \} |
9922 045 res = mp_mul(a, &t2, c); | 9966 045 res = mp_mul(a, &t2, c); |
9923 046 \} | 9967 046 \} |
9924 047 | 9968 047 |
9925 048 /* fix the sign to positive */ | 9969 048 /* fix the sign to positive */ |
9926 049 c->sign = MP_ZPOS; | 9970 049 c->sign = MP_ZPOS; |
9927 050 | 9971 050 |
9928 051 __T: | 9972 051 LBL_T: |
9929 052 mp_clear_multi (&t1, &t2, NULL); | 9973 052 mp_clear_multi (&t1, &t2, NULL); |
9930 053 return res; | 9974 053 return res; |
9931 054 \} | 9975 054 \} |
9932 055 #endif | 9976 055 #endif |
9933 \end{alltt} | 9977 \end{alltt} |
9935 | 9979 |
9936 \section{Jacobi Symbol Computation} | 9980 \section{Jacobi Symbol Computation} |
9937 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is | 9981 To explain the Jacobi Symbol we shall first discuss the Legendre function\footnote{Arrg. What is the name of this?} off which the Jacobi symbol is |
9938 defined. The Legendre function computes whether or not an integer $a$ is a quadratic residue modulo an odd prime $p$. Numerically it is | 9982 defined. The Legendre function computes whether or not an integer $a$ is a quadratic residue modulo an odd prime $p$. Numerically it is |
9939 equivalent to equation \ref{eqn:legendre}. | 9983 equivalent to equation \ref{eqn:legendre}. |
9984 | |
9985 \textit{-- Tom, don't be an ass, cite your source here...!} | |
9940 | 9986 |
9941 \begin{equation} | 9987 \begin{equation} |
9942 a^{(p-1)/2} \equiv \begin{array}{rl} | 9988 a^{(p-1)/2} \equiv \begin{array}{rl} |
9943 -1 & \mbox{if }a\mbox{ is a quadratic non-residue.} \\ | 9989 -1 & \mbox{if }a\mbox{ is a quadratic non-residue.} \\ |
9944 0 & \mbox{if }a\mbox{ divides }p\mbox{.} \\ | 9990 0 & \mbox{if }a\mbox{ divides }p\mbox{.} \\ |
10121 047 if ((res = mp_init_copy (&a1, a)) != MP_OKAY) \{ | 10167 047 if ((res = mp_init_copy (&a1, a)) != MP_OKAY) \{ |
10122 048 return res; | 10168 048 return res; |
10123 049 \} | 10169 049 \} |
10124 050 | 10170 050 |
10125 051 if ((res = mp_init (&p1)) != MP_OKAY) \{ | 10171 051 if ((res = mp_init (&p1)) != MP_OKAY) \{ |
10126 052 goto __A1; | 10172 052 goto LBL_A1; |
10127 053 \} | 10173 053 \} |
10128 054 | 10174 054 |
10129 055 /* divide out larger power of two */ | 10175 055 /* divide out larger power of two */ |
10130 056 k = mp_cnt_lsb(&a1); | 10176 056 k = mp_cnt_lsb(&a1); |
10131 057 if ((res = mp_div_2d(&a1, k, &a1, NULL)) != MP_OKAY) \{ | 10177 057 if ((res = mp_div_2d(&a1, k, &a1, NULL)) != MP_OKAY) \{ |
10132 058 goto __P1; | 10178 058 goto LBL_P1; |
10133 059 \} | 10179 059 \} |
10134 060 | 10180 060 |
10135 061 /* step 4. if e is even set s=1 */ | 10181 061 /* step 4. if e is even set s=1 */ |
10136 062 if ((k & 1) == 0) \{ | 10182 062 if ((k & 1) == 0) \{ |
10137 063 s = 1; | 10183 063 s = 1; |
10155 081 if (mp_cmp_d (&a1, 1) == MP_EQ) \{ | 10201 081 if (mp_cmp_d (&a1, 1) == MP_EQ) \{ |
10156 082 *c = s; | 10202 082 *c = s; |
10157 083 \} else \{ | 10203 083 \} else \{ |
10158 084 /* n1 = n mod a1 */ | 10204 084 /* n1 = n mod a1 */ |
10159 085 if ((res = mp_mod (p, &a1, &p1)) != MP_OKAY) \{ | 10205 085 if ((res = mp_mod (p, &a1, &p1)) != MP_OKAY) \{ |
10160 086 goto __P1; | 10206 086 goto LBL_P1; |
10161 087 \} | 10207 087 \} |
10162 088 if ((res = mp_jacobi (&p1, &a1, &r)) != MP_OKAY) \{ | 10208 088 if ((res = mp_jacobi (&p1, &a1, &r)) != MP_OKAY) \{ |
10163 089 goto __P1; | 10209 089 goto LBL_P1; |
10164 090 \} | 10210 090 \} |
10165 091 *c = s * r; | 10211 091 *c = s * r; |
10166 092 \} | 10212 092 \} |
10167 093 | 10213 093 |
10168 094 /* done */ | 10214 094 /* done */ |
10169 095 res = MP_OKAY; | 10215 095 res = MP_OKAY; |
10170 096 __P1:mp_clear (&p1); | 10216 096 LBL_P1:mp_clear (&p1); |
10171 097 __A1:mp_clear (&a1); | 10217 097 LBL_A1:mp_clear (&a1); |
10172 098 return res; | 10218 098 return res; |
10173 099 \} | 10219 099 \} |
10174 100 #endif | 10220 100 #endif |
10175 \end{alltt} | 10221 \end{alltt} |
10176 \end{small} | 10222 \end{small} |
10404 026 | 10450 026 |
10405 027 /* default to not */ | 10451 027 /* default to not */ |
10406 028 *result = MP_NO; | 10452 028 *result = MP_NO; |
10407 029 | 10453 029 |
10408 030 for (ix = 0; ix < PRIME_SIZE; ix++) \{ | 10454 030 for (ix = 0; ix < PRIME_SIZE; ix++) \{ |
10409 031 /* what is a mod __prime_tab[ix] */ | 10455 031 /* what is a mod LBL_prime_tab[ix] */ |
10410 032 if ((err = mp_mod_d (a, __prime_tab[ix], &res)) != MP_OKAY) \{ | 10456 032 if ((err = mp_mod_d (a, ltm_prime_tab[ix], &res)) != MP_OKAY) \{ |
10411 033 return err; | 10457 033 return err; |
10412 034 \} | 10458 034 \} |
10413 035 | 10459 035 |
10414 036 /* is the residue zero? */ | 10460 036 /* is the residue zero? */ |
10415 037 if (res == 0) \{ | 10461 037 if (res == 0) \{ |
10429 | 10475 |
10430 \vspace{+3mm}\begin{small} | 10476 \vspace{+3mm}\begin{small} |
10431 \hspace{-5.1mm}{\bf File}: bn\_prime\_tab.c | 10477 \hspace{-5.1mm}{\bf File}: bn\_prime\_tab.c |
10432 \vspace{-3mm} | 10478 \vspace{-3mm} |
10433 \begin{alltt} | 10479 \begin{alltt} |
10434 016 const mp_digit __prime_tab[] = \{ | 10480 016 const mp_digit ltm_prime_tab[] = \{ |
10435 017 0x0002, 0x0003, 0x0005, 0x0007, 0x000B, 0x000D, 0x0011, 0x0013, | 10481 017 0x0002, 0x0003, 0x0005, 0x0007, 0x000B, 0x000D, 0x0011, 0x0013, |
10436 018 0x0017, 0x001D, 0x001F, 0x0025, 0x0029, 0x002B, 0x002F, 0x0035, | 10482 018 0x0017, 0x001D, 0x001F, 0x0025, 0x0029, 0x002B, 0x002F, 0x0035, |
10437 019 0x003B, 0x003D, 0x0043, 0x0047, 0x0049, 0x004F, 0x0053, 0x0059, | 10483 019 0x003B, 0x003D, 0x0043, 0x0047, 0x0049, 0x004F, 0x0053, 0x0059, |
10438 020 0x0061, 0x0065, 0x0067, 0x006B, 0x006D, 0x0071, 0x007F, | 10484 020 0x0061, 0x0065, 0x0067, 0x006B, 0x006D, 0x0071, 0x007F, |
10439 021 #ifndef MP_8BIT | 10485 021 #ifndef MP_8BIT |
10545 040 return err; | 10591 040 return err; |
10546 041 \} | 10592 041 \} |
10547 042 | 10593 042 |
10548 043 /* compute t = b**a mod a */ | 10594 043 /* compute t = b**a mod a */ |
10549 044 if ((err = mp_exptmod (b, a, a, &t)) != MP_OKAY) \{ | 10595 044 if ((err = mp_exptmod (b, a, a, &t)) != MP_OKAY) \{ |
10550 045 goto __T; | 10596 045 goto LBL_T; |
10551 046 \} | 10597 046 \} |
10552 047 | 10598 047 |
10553 048 /* is it equal to b? */ | 10599 048 /* is it equal to b? */ |
10554 049 if (mp_cmp (&t, b) == MP_EQ) \{ | 10600 049 if (mp_cmp (&t, b) == MP_EQ) \{ |
10555 050 *result = MP_YES; | 10601 050 *result = MP_YES; |
10556 051 \} | 10602 051 \} |
10557 052 | 10603 052 |
10558 053 err = MP_OKAY; | 10604 053 err = MP_OKAY; |
10559 054 __T:mp_clear (&t); | 10605 054 LBL_T:mp_clear (&t); |
10560 055 return err; | 10606 055 return err; |
10561 056 \} | 10607 056 \} |
10562 057 #endif | 10608 057 #endif |
10563 \end{alltt} | 10609 \end{alltt} |
10564 \end{small} | 10610 \end{small} |
10636 037 /* get n1 = a - 1 */ | 10682 037 /* get n1 = a - 1 */ |
10637 038 if ((err = mp_init_copy (&n1, a)) != MP_OKAY) \{ | 10683 038 if ((err = mp_init_copy (&n1, a)) != MP_OKAY) \{ |
10638 039 return err; | 10684 039 return err; |
10639 040 \} | 10685 040 \} |
10640 041 if ((err = mp_sub_d (&n1, 1, &n1)) != MP_OKAY) \{ | 10686 041 if ((err = mp_sub_d (&n1, 1, &n1)) != MP_OKAY) \{ |
10641 042 goto __N1; | 10687 042 goto LBL_N1; |
10642 043 \} | 10688 043 \} |
10643 044 | 10689 044 |
10644 045 /* set 2**s * r = n1 */ | 10690 045 /* set 2**s * r = n1 */ |
10645 046 if ((err = mp_init_copy (&r, &n1)) != MP_OKAY) \{ | 10691 046 if ((err = mp_init_copy (&r, &n1)) != MP_OKAY) \{ |
10646 047 goto __N1; | 10692 047 goto LBL_N1; |
10647 048 \} | 10693 048 \} |
10648 049 | 10694 049 |
10649 050 /* count the number of least significant bits | 10695 050 /* count the number of least significant bits |
10650 051 * which are zero | 10696 051 * which are zero |
10651 052 */ | 10697 052 */ |
10652 053 s = mp_cnt_lsb(&r); | 10698 053 s = mp_cnt_lsb(&r); |
10653 054 | 10699 054 |
10654 055 /* now divide n - 1 by 2**s */ | 10700 055 /* now divide n - 1 by 2**s */ |
10655 056 if ((err = mp_div_2d (&r, s, &r, NULL)) != MP_OKAY) \{ | 10701 056 if ((err = mp_div_2d (&r, s, &r, NULL)) != MP_OKAY) \{ |
10656 057 goto __R; | 10702 057 goto LBL_R; |
10657 058 \} | 10703 058 \} |
10658 059 | 10704 059 |
10659 060 /* compute y = b**r mod a */ | 10705 060 /* compute y = b**r mod a */ |
10660 061 if ((err = mp_init (&y)) != MP_OKAY) \{ | 10706 061 if ((err = mp_init (&y)) != MP_OKAY) \{ |
10661 062 goto __R; | 10707 062 goto LBL_R; |
10662 063 \} | 10708 063 \} |
10663 064 if ((err = mp_exptmod (b, &r, a, &y)) != MP_OKAY) \{ | 10709 064 if ((err = mp_exptmod (b, &r, a, &y)) != MP_OKAY) \{ |
10664 065 goto __Y; | 10710 065 goto LBL_Y; |
10665 066 \} | 10711 066 \} |
10666 067 | 10712 067 |
10667 068 /* if y != 1 and y != n1 do */ | 10713 068 /* if y != 1 and y != n1 do */ |
10668 069 if (mp_cmp_d (&y, 1) != MP_EQ && mp_cmp (&y, &n1) != MP_EQ) \{ | 10714 069 if (mp_cmp_d (&y, 1) != MP_EQ && mp_cmp (&y, &n1) != MP_EQ) \{ |
10669 070 j = 1; | 10715 070 j = 1; |
10670 071 /* while j <= s-1 and y != n1 */ | 10716 071 /* while j <= s-1 and y != n1 */ |
10671 072 while ((j <= (s - 1)) && mp_cmp (&y, &n1) != MP_EQ) \{ | 10717 072 while ((j <= (s - 1)) && mp_cmp (&y, &n1) != MP_EQ) \{ |
10672 073 if ((err = mp_sqrmod (&y, a, &y)) != MP_OKAY) \{ | 10718 073 if ((err = mp_sqrmod (&y, a, &y)) != MP_OKAY) \{ |
10673 074 goto __Y; | 10719 074 goto LBL_Y; |
10674 075 \} | 10720 075 \} |
10675 076 | 10721 076 |
10676 077 /* if y == 1 then composite */ | 10722 077 /* if y == 1 then composite */ |
10677 078 if (mp_cmp_d (&y, 1) == MP_EQ) \{ | 10723 078 if (mp_cmp_d (&y, 1) == MP_EQ) \{ |
10678 079 goto __Y; | 10724 079 goto LBL_Y; |
10679 080 \} | 10725 080 \} |
10680 081 | 10726 081 |
10681 082 ++j; | 10727 082 ++j; |
10682 083 \} | 10728 083 \} |
10683 084 | 10729 084 |
10684 085 /* if y != n1 then composite */ | 10730 085 /* if y != n1 then composite */ |
10685 086 if (mp_cmp (&y, &n1) != MP_EQ) \{ | 10731 086 if (mp_cmp (&y, &n1) != MP_EQ) \{ |
10686 087 goto __Y; | 10732 087 goto LBL_Y; |
10687 088 \} | 10733 088 \} |
10688 089 \} | 10734 089 \} |
10689 090 | 10735 090 |
10690 091 /* probably prime now */ | 10736 091 /* probably prime now */ |
10691 092 *result = MP_YES; | 10737 092 *result = MP_YES; |
10692 093 __Y:mp_clear (&y); | 10738 093 LBL_Y:mp_clear (&y); |
10693 094 __R:mp_clear (&r); | 10739 094 LBL_R:mp_clear (&r); |
10694 095 __N1:mp_clear (&n1); | 10740 095 LBL_N1:mp_clear (&n1); |
10695 096 return err; | 10741 096 return err; |
10696 097 \} | 10742 097 \} |
10697 098 #endif | 10743 098 #endif |
10698 \end{alltt} | 10744 \end{alltt} |
10699 \end{small} | 10745 \end{small} |