## Mercurial > dropbear

### annotate tommath.src @ 19:e1037a1e12e7 libtommath-orig

Find changesets by keywords (author, files, the commit message), revision
number or hash, or revset expression.

0.30 release of LibTomMath

author | Matt Johnston <matt@ucc.asn.au> |
---|---|

date | Tue, 15 Jun 2004 14:42:57 +0000 |

parents | |

children | d29b64170cf0 |

rev | line source |
---|---|

19 | 1 \documentclass[b5paper]{book} |

2 \usepackage{hyperref} | |

3 \usepackage{makeidx} | |

4 \usepackage{amssymb} | |

5 \usepackage{color} | |

6 \usepackage{alltt} | |

7 \usepackage{graphicx} | |

8 \usepackage{layout} | |

9 \def\union{\cup} | |

10 \def\intersect{\cap} | |

11 \def\getsrandom{\stackrel{\rm R}{\gets}} | |

12 \def\cross{\times} | |

13 \def\cat{\hspace{0.5em} \| \hspace{0.5em}} | |

14 \def\catn{$\|$} | |

15 \def\divides{\hspace{0.3em} | \hspace{0.3em}} | |

16 \def\nequiv{\not\equiv} | |

17 \def\approx{\raisebox{0.2ex}{\mbox{\small $\sim$}}} | |

18 \def\lcm{{\rm lcm}} | |

19 \def\gcd{{\rm gcd}} | |

20 \def\log{{\rm log}} | |

21 \def\ord{{\rm ord}} | |

22 \def\abs{{\mathit abs}} | |

23 \def\rep{{\mathit rep}} | |

24 \def\mod{{\mathit\ mod\ }} | |

25 \renewcommand{\pmod}[1]{\ ({\rm mod\ }{#1})} | |

26 \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor} | |

27 \newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil} | |

28 \def\Or{{\rm\ or\ }} | |

29 \def\And{{\rm\ and\ }} | |

30 \def\iff{\hspace{1em}\Longleftrightarrow\hspace{1em}} | |

31 \def\implies{\Rightarrow} | |

32 \def\undefined{{\rm ``undefined"}} | |

33 \def\Proof{\vspace{1ex}\noindent {\bf Proof:}\hspace{1em}} | |

34 \let\oldphi\phi | |

35 \def\phi{\varphi} | |

36 \def\Pr{{\rm Pr}} | |

37 \newcommand{\str}[1]{{\mathbf{#1}}} | |

38 \def\F{{\mathbb F}} | |

39 \def\N{{\mathbb N}} | |

40 \def\Z{{\mathbb Z}} | |

41 \def\R{{\mathbb R}} | |

42 \def\C{{\mathbb C}} | |

43 \def\Q{{\mathbb Q}} | |

44 \definecolor{DGray}{gray}{0.5} | |

45 \newcommand{\emailaddr}[1]{\mbox{$<${#1}$>$}} | |

46 \def\twiddle{\raisebox{0.3ex}{\mbox{\tiny $\sim$}}} | |

47 \def\gap{\vspace{0.5ex}} | |

48 \makeindex | |

49 \begin{document} | |

50 \frontmatter | |

51 \pagestyle{empty} | |

52 \title{Implementing Multiple Precision Arithmetic \\ ~ \\ Draft Edition } | |

53 \author{\mbox{ | |

54 %\begin{small} | |

55 \begin{tabular}{c} | |

56 Tom St Denis \\ | |

57 Algonquin College \\ | |

58 \\ | |

59 Mads Rasmussen \\ | |

60 Open Communications Security \\ | |

61 \\ | |

62 Greg Rose \\ | |

63 QUALCOMM Australia \\ | |

64 \end{tabular} | |

65 %\end{small} | |

66 } | |

67 } | |

68 \maketitle | |

69 This text has been placed in the public domain. This text corresponds to the v0.30 release of the | |

70 LibTomMath project. | |

71 | |

72 \begin{alltt} | |

73 Tom St Denis | |

74 111 Banning Rd | |

75 Ottawa, Ontario | |

76 K2L 1C3 | |

77 Canada | |

78 | |

79 Phone: 1-613-836-3160 | |

80 Email: [email protected] | |

81 \end{alltt} | |

82 | |

83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{} | |

84 {\em book} macro package and the Perl {\em booker} package. | |

85 | |

86 \tableofcontents | |

87 \listoffigures | |

88 \chapter*{Prefaces to the Draft Edition} | |

89 I started this text in April 2003 to complement my LibTomMath library. That is, explain how to implement the functions | |

90 contained in LibTomMath. The goal is to have a textbook that any Computer Science student can use when implementing their | |

91 own multiple precision arithmetic. The plan I wanted to follow was flesh out all the | |

92 ideas and concepts I had floating around in my head and then work on it afterwards refining a little bit at a time. Chance | |

93 would have it that I ended up with my summer off from Algonquin College and I was given four months solid to work on the | |

94 text. | |

95 | |

96 Choosing to not waste any time I dove right into the project even before my spring semester was finished. I wrote a bit | |

97 off and on at first. The moment my exams were finished I jumped into long 12 to 16 hour days. The result after only | |

98 a couple of months was a ten chapter, three hundred page draft that I quickly had distributed to anyone who wanted | |

99 to read it. I had Jean-Luc Cooke print copies for me and I brought them to Crypto'03 in Santa Barbara. So far I have | |

100 managed to grab a certain level of attention having people from around the world ask me for copies of the text was certain | |

101 rewarding. | |

102 | |

103 Now we are past December 2003. By this time I had pictured that I would have at least finished my second draft of the text. | |

104 Currently I am far off from this goal. I've done partial re-writes of chapters one, two and three but they are not even | |

105 finished yet. I haven't given up on the project, only had some setbacks. First O'Reilly declined to publish the text then | |

106 Addison-Wesley and Greg is tried another which I don't know the name of. However, at this point I want to focus my energy | |

107 onto finishing the book not securing a contract. | |

108 | |

109 So why am I writing this text? It seems like a lot of work right? Most certainly it is a lot of work writing a textbook. | |

110 Even the simplest introductory material has to be lined with references and figures. A lot of the text has to be re-written | |

111 from point form to prose form to ensure an easier read. Why am I doing all this work for free then? Simple. My philosophy | |

112 is quite simply ``Open Source. Open Academia. Open Minds'' which means that to achieve a goal of open minds, that is, | |

113 people willing to accept new ideas and explore the unknown you have to make available material they can access freely | |

114 without hinderance. | |

115 | |

116 I've been writing free software since I was about sixteen but only recently have I hit upon software that people have come | |

117 to depend upon. I started LibTomCrypt in December 2001 and now several major companies use it as integral portions of their | |

118 software. Several educational institutions use it as a matter of course and many freelance developers use it as | |

119 part of their projects. To further my contributions I started the LibTomMath project in December 2002 aimed at providing | |

120 multiple precision arithmetic routines that students could learn from. That is write routines that are not only easy | |

121 to understand and follow but provide quite impressive performance considering they are all in standard portable ISO C. | |

122 | |

123 The second leg of my philosophy is ``Open Academia'' which is where this textbook comes in. In the end, when all is | |

124 said and done the text will be useable by educational institutions as a reference on multiple precision arithmetic. | |

125 | |

126 At this time I feel I should share a little information about myself. The most common question I was asked at | |

127 Crypto'03, perhaps just out of professional courtesy, was which school I either taught at or attended. The unfortunate | |

128 truth is that I neither teach at or attend a school of academic reputation. I'm currently at Algonquin College which | |

129 is what I'd like to call ``somewhat academic but mostly vocational'' college. In otherwords, job training. | |

130 | |

131 I'm a 21 year old computer science student mostly self-taught in the areas I am aware of (which includes a half-dozen | |

132 computer science fields, a few fields of mathematics and some English). I look forward to teaching someday but I am | |

133 still far off from that goal. | |

134 | |

135 Now it would be improper for me to not introduce the rest of the texts co-authors. While they are only contributing | |

136 corrections and editorial feedback their support has been tremendously helpful in presenting the concepts laid out | |

137 in the text so far. Greg has always been there for me. He has tracked my LibTom projects since their inception and even | |

138 sent cheques to help pay tuition from time to time. His background has provided a wonderful source to bounce ideas off | |

139 of and improve the quality of my writing. Mads is another fellow who has just ``been there''. I don't even recall what | |

140 his interest in the LibTom projects is but I'm definitely glad he has been around. His ability to catch logical errors | |

141 in my written English have saved me on several occasions to say the least. | |

142 | |

143 What to expect next? Well this is still a rough draft. I've only had the chance to update a few chapters. However, I've | |

144 been getting the feeling that people are starting to use my text and I owe them some updated material. My current tenative | |

145 plan is to edit one chapter every two weeks starting January 4th. It seems insane but my lower course load at college | |

146 should provide ample time. By Crypto'04 I plan to have a 2nd draft of the text polished and ready to hand out to as many | |

147 people who will take it. | |

148 | |

149 \begin{flushright} Tom St Denis \end{flushright} | |

150 | |

151 \newpage | |

152 I found the opportunity to work with Tom appealing for several reasons, not only could I broaden my own horizons, but also | |

153 contribute to educate others facing the problem of having to handle big number mathematical calculations. | |

154 | |

155 This book is Tom's child and he has been caring and fostering the project ever since the beginning with a clear mind of | |

156 how he wanted the project to turn out. I have helped by proofreading the text and we have had several discussions about | |

157 the layout and language used. | |

158 | |

159 I hold a masters degree in cryptography from the University of Southern Denmark and have always been interested in the | |

160 practical aspects of cryptography. | |

161 | |

162 Having worked in the security consultancy business for several years in S\~{a}o Paulo, Brazil, I have been in touch with a | |

163 great deal of work in which multiple precision mathematics was needed. Understanding the possibilities for speeding up | |

164 multiple precision calculations is often very important since we deal with outdated machine architecture where modular | |

165 reductions, for example, become painfully slow. | |

166 | |

167 This text is for people who stop and wonder when first examining algorithms such as RSA for the first time and asks | |

168 themselves, ``You tell me this is only secure for large numbers, fine; but how do you implement these numbers?'' | |

169 | |

170 \begin{flushright} | |

171 Mads Rasmussen | |

172 | |

173 S\~{a}o Paulo - SP | |

174 | |

175 Brazil | |

176 \end{flushright} | |

177 | |

178 \newpage | |

179 It's all because I broke my leg. That just happened to be at about the same time that Tom asked for someone to review the section of the book about | |

180 Karatsuba multiplication. I was laid up, alone and immobile, and thought ``Why not?'' I vaguely knew what Karatsuba multiplication was, but not | |

181 really, so I thought I could help, learn, and stop myself from watching daytime cable TV, all at once. | |

182 | |

183 At the time of writing this, I've still not met Tom or Mads in meatspace. I've been following Tom's progress since his first splash on the | |

184 sci.crypt Usenet news group. I watched him go from a clueless newbie, to the cryptographic equivalent of a reformed smoker, to a real | |

185 contributor to the field, over a period of about two years. I've been impressed with his obvious intelligence, and astounded by his productivity. | |

186 Of course, he's young enough to be my own child, so he doesn't have my problems with staying awake. | |

187 | |

188 When I reviewed that single section of the book, in its very earliest form, I was very pleasantly surprised. So I decided to collaborate more fully, | |

189 and at least review all of it, and perhaps write some bits too. There's still a long way to go with it, and I have watched a number of close | |

190 friends go through the mill of publication, so I think that the way to go is longer than Tom thinks it is. Nevertheless, it's a good effort, | |

191 and I'm pleased to be involved with it. | |

192 | |

193 \begin{flushright} | |

194 Greg Rose, Sydney, Australia, June 2003. | |

195 \end{flushright} | |

196 | |

197 \mainmatter | |

198 \pagestyle{headings} | |

199 \chapter{Introduction} | |

200 \section{Multiple Precision Arithmetic} | |

201 | |

202 \subsection{What is Multiple Precision Arithmetic?} | |

203 When we think of long-hand arithmetic such as addition or multiplication we rarely consider the fact that we instinctively | |

204 raise or lower the precision of the numbers we are dealing with. For example, in decimal we almost immediate can | |

205 reason that $7$ times $6$ is $42$. However, $42$ has two digits of precision as opposed to one digit we started with. | |

206 Further multiplications of say $3$ result in a larger precision result $126$. In these few examples we have multiple | |

207 precisions for the numbers we are working with. Despite the various levels of precision a single subset\footnote{With the occasional optimization.} | |

208 of algorithms can be designed to accomodate them. | |

209 | |

210 By way of comparison a fixed or single precision operation would lose precision on various operations. For example, in | |

211 the decimal system with fixed precision $6 \cdot 7 = 2$. | |

212 | |

213 Essentially at the heart of computer based multiple precision arithmetic are the same long-hand algorithms taught in | |

214 schools to manually add, subtract, multiply and divide. | |

215 | |

216 \subsection{The Need for Multiple Precision Arithmetic} | |

217 The most prevalent need for multiple precision arithmetic, often referred to as ``bignum'' math, is within the implementation | |

218 of public-key cryptography algorithms. Algorithms such as RSA \cite{RSAREF} and Diffie-Hellman \cite{DHREF} require | |

219 integers of significant magnitude to resist known cryptanalytic attacks. For example, at the time of this writing a | |

220 typical RSA modulus would be at least greater than $10^{309}$. However, modern programming languages such as ISO C \cite{ISOC} and | |

221 Java \cite{JAVA} only provide instrinsic support for integers which are relatively small and single precision. | |

222 | |

223 \begin{figure}[!here] | |

224 \begin{center} | |

225 \begin{tabular}{|r|c|} | |

226 \hline \textbf{Data Type} & \textbf{Range} \\ | |

227 \hline char & $-128 \ldots 127$ \\ | |

228 \hline short & $-32768 \ldots 32767$ \\ | |

229 \hline long & $-2147483648 \ldots 2147483647$ \\ | |

230 \hline long long & $-9223372036854775808 \ldots 9223372036854775807$ \\ | |

231 \hline | |

232 \end{tabular} | |

233 \end{center} | |

234 \caption{Typical Data Types for the C Programming Language} | |

235 \label{fig:ISOC} | |

236 \end{figure} | |

237 | |

238 The largest data type guaranteed to be provided by the ISO C programming | |

239 language\footnote{As per the ISO C standard. However, each compiler vendor is allowed to augment the precision as they | |

240 see fit.} can only represent values up to $10^{19}$ as shown in figure \ref{fig:ISOC}. On its own the C language is | |

241 insufficient to accomodate the magnitude required for the problem at hand. An RSA modulus of magnitude $10^{19}$ could be | |

242 trivially factored\footnote{A Pollard-Rho factoring would take only $2^{16}$ time.} on the average desktop computer, | |

243 rendering any protocol based on the algorithm insecure. Multiple precision algorithms solve this very problem by | |

244 extending the range of representable integers while using single precision data types. | |

245 | |

246 Most advancements in fast multiple precision arithmetic stem from the need for faster and more efficient cryptographic | |

247 primitives. Faster modular reduction and exponentiation algorithms such as Barrett's algorithm, which have appeared in | |

248 various cryptographic journals, can render algorithms such as RSA and Diffie-Hellman more efficient. In fact, several | |

249 major companies such as RSA Security, Certicom and Entrust have built entire product lines on the implementation and | |

250 deployment of efficient algorithms. | |

251 | |

252 However, cryptography is not the only field of study that can benefit from fast multiple precision integer routines. | |

253 Another auxiliary use of multiple precision integers is high precision floating point data types. | |

254 The basic IEEE \cite{IEEE} standard floating point type is made up of an integer mantissa $q$, an exponent $e$ and a sign bit $s$. | |

255 Numbers are given in the form $n = q \cdot b^e \cdot -1^s$ where $b = 2$ is the most common base for IEEE. Since IEEE | |

256 floating point is meant to be implemented in hardware the precision of the mantissa is often fairly small | |

257 (\textit{23, 48 and 64 bits}). The mantissa is merely an integer and a multiple precision integer could be used to create | |

258 a mantissa of much larger precision than hardware alone can efficiently support. This approach could be useful where | |

259 scientific applications must minimize the total output error over long calculations. | |

260 | |

261 Another use for large integers is within arithmetic on polynomials of large characteristic (i.e. $GF(p)[x]$ for large $p$). | |

262 In fact the library discussed within this text has already been used to form a polynomial basis library\footnote{See \url{http://poly.libtomcrypt.org} for more details.}. | |

263 | |

264 \subsection{Benefits of Multiple Precision Arithmetic} | |

265 \index{precision} | |

266 The benefit of multiple precision representations over single or fixed precision representations is that | |

267 no precision is lost while representing the result of an operation which requires excess precision. For example, | |

268 the product of two $n$-bit integers requires at least $2n$ bits of precision to be represented faithfully. A multiple | |

269 precision algorithm would augment the precision of the destination to accomodate the result while a single precision system | |

270 would truncate excess bits to maintain a fixed level of precision. | |

271 | |

272 It is possible to implement algorithms which require large integers with fixed precision algorithms. For example, elliptic | |

273 curve cryptography (\textit{ECC}) is often implemented on smartcards by fixing the precision of the integers to the maximum | |

274 size the system will ever need. Such an approach can lead to vastly simpler algorithms which can accomodate the | |

275 integers required even if the host platform cannot natively accomodate them\footnote{For example, the average smartcard | |

276 processor has an 8 bit accumulator.}. However, as efficient as such an approach may be, the resulting source code is not | |

277 normally very flexible. It cannot, at runtime, accomodate inputs of higher magnitude than the designer anticipated. | |

278 | |

279 Multiple precision algorithms have the most overhead of any style of arithmetic. For the the most part the | |

280 overhead can be kept to a minimum with careful planning, but overall, it is not well suited for most memory starved | |

281 platforms. However, multiple precision algorithms do offer the most flexibility in terms of the magnitude of the | |

282 inputs. That is, the same algorithms based on multiple precision integers can accomodate any reasonable size input | |

283 without the designer's explicit forethought. This leads to lower cost of ownership for the code as it only has to | |

284 be written and tested once. | |

285 | |

286 \section{Purpose of This Text} | |

287 The purpose of this text is to instruct the reader regarding how to implement efficient multiple precision algorithms. | |

288 That is to not only explain a limited subset of the core theory behind the algorithms but also the various ``house keeping'' | |

289 elements that are neglected by authors of other texts on the subject. Several well reknowned texts \cite{TAOCPV2,HAC} | |

290 give considerably detailed explanations of the theoretical aspects of algorithms and often very little information | |

291 regarding the practical implementation aspects. | |

292 | |

293 In most cases how an algorithm is explained and how it is actually implemented are two very different concepts. For | |

294 example, the Handbook of Applied Cryptography (\textit{HAC}), algorithm 14.7 on page 594, gives a relatively simple | |

295 algorithm for performing multiple precision integer addition. However, the description lacks any discussion concerning | |

296 the fact that the two integer inputs may be of differing magnitudes. As a result the implementation is not as simple | |

297 as the text would lead people to believe. Similarly the division routine (\textit{algorithm 14.20, pp. 598}) does not | |

298 discuss how to handle sign or handle the dividend's decreasing magnitude in the main loop (\textit{step \#3}). | |

299 | |

300 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers | |

301 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve | |

302 any form of useful performance in non-trivial applications. | |

303 | |

304 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer | |

305 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used | |

306 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field | |

307 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text | |

308 discusses a very large portion of the inner workings of the library. | |

309 | |

310 The algorithms that are presented will always include at least one ``pseudo-code'' description followed | |

311 by the actual C source code that implements the algorithm. The pseudo-code can be used to implement the same | |

312 algorithm in other programming languages as the reader sees fit. | |

313 | |

314 This text shall also serve as a walkthrough of the creation of multiple precision algorithms from scratch. Showing | |

315 the reader how the algorithms fit together as well as where to start on various taskings. | |

316 | |

317 \section{Discussion and Notation} | |

318 \subsection{Notation} | |

319 A multiple precision integer of $n$-digits shall be denoted as $x = (x_{n-1} ... x_1 x_0)_{ \beta }$ and represent | |

320 the integer $x \equiv \sum_{i=0}^{n-1} x_i\beta^i$. The elements of the array $x$ are said to be the radix $\beta$ digits | |

321 of the integer. For example, $x = (1,2,3)_{10}$ would represent the integer | |

322 $1\cdot 10^2 + 2\cdot10^1 + 3\cdot10^0 = 123$. | |

323 | |

324 \index{mp\_int} | |

325 The term ``mp\_int'' shall refer to a composite structure which contains the digits of the integer it represents, as well | |

326 as auxilary data required to manipulate the data. These additional members are discussed further in section | |

327 \ref{sec:MPINT}. For the purposes of this text a ``multiple precision integer'' and an ``mp\_int'' are assumed to be | |

328 synonymous. When an algorithm is specified to accept an mp\_int variable it is assumed the various auxliary data members | |

329 are present as well. An expression of the type \textit{variablename.item} implies that it should evaluate to the | |

330 member named ``item'' of the variable. For example, a string of characters may have a member ``length'' which would | |

331 evaluate to the number of characters in the string. If the string $a$ equals ``hello'' then it follows that | |

332 $a.length = 5$. | |

333 | |

334 For certain discussions more generic algorithms are presented to help the reader understand the final algorithm used | |

335 to solve a given problem. When an algorithm is described as accepting an integer input it is assumed the input is | |

336 a plain integer with no additional multiple-precision members. That is, algorithms that use integers as opposed to | |

337 mp\_ints as inputs do not concern themselves with the housekeeping operations required such as memory management. These | |

338 algorithms will be used to establish the relevant theory which will subsequently be used to describe a multiple | |

339 precision algorithm to solve the same problem. | |

340 | |

341 \subsection{Precision Notation} | |

342 For the purposes of this text a single precision variable must be able to represent integers in the range | |

343 $0 \le x < q \beta$ while a double precision variable must be able to represent integers in the range | |

344 $0 \le x < q \beta^2$. The variable $\beta$ represents the radix of a single digit of a multiple precision integer and | |

345 must be of the form $q^p$ for $q, p \in \Z^+$. The extra radix-$q$ factor allows additions and subtractions to proceed | |

346 without truncation of the carry. Since all modern computers are binary, it is assumed that $q$ is two, for all intents | |

347 and purposes. | |

348 | |

349 \index{mp\_digit} \index{mp\_word} | |

350 Within the source code that will be presented for each algorithm, the data type \textbf{mp\_digit} will represent | |

351 a single precision integer type, while, the data type \textbf{mp\_word} will represent a double precision integer type. In | |

352 several algorithms (notably the Comba routines) temporary results will be stored in arrays of double precision mp\_words. | |

353 For the purposes of this text $x_j$ will refer to the $j$'th digit of a single precision array and $\hat x_j$ will refer to | |

354 the $j$'th digit of a double precision array. Whenever an expression is to be assigned to a double precision | |

355 variable it is assumed that all single precision variables are promoted to double precision during the evaluation. | |

356 Expressions that are assigned to a single precision variable are truncated to fit within the precision of a single | |

357 precision data type. | |

358 | |

359 For example, if $\beta = 10^2$ a single precision data type may represent a value in the | |

360 range $0 \le x < 10^3$, while a double precision data type may represent a value in the range $0 \le x < 10^5$. Let | |

361 $a = 23$ and $b = 49$ represent two single precision variables. The single precision product shall be written | |

362 as $c \leftarrow a \cdot b$ while the double precision product shall be written as $\hat c \leftarrow a \cdot b$. | |

363 In this particular case, $\hat c = 1127$ and $c = 127$. The most significant digit of the product would not fit | |

364 in a single precision data type and as a result $c \ne \hat c$. | |

365 | |

366 \subsection{Algorithm Inputs and Outputs} | |

367 Within the algorithm descriptions all variables are assumed to be scalars of either single or double precision | |

368 as indicated. The only exception to this rule is when variables have been indicated to be of type mp\_int. This | |

369 distinction is important as scalars are often used as array indicies and various other counters. | |

370 | |

371 \subsection{Mathematical Expressions} | |

372 The $\lfloor \mbox{ } \rfloor$ brackets imply an expression truncated to an integer not greater than the expression | |

373 itself. For example, $\lfloor 5.7 \rfloor = 5$. Similarly the $\lceil \mbox{ } \rceil$ brackets imply an expression | |

374 rounded to an integer not less than the expression itself. For example, $\lceil 5.1 \rceil = 6$. Typically when | |

375 the $/$ division symbol is used the intention is to perform an integer division with truncation. For example, | |

376 $5/2 = 2$ which will often be written as $\lfloor 5/2 \rfloor = 2$ for clarity. When an expression is written as a | |

377 fraction a real value division is implied, for example ${5 \over 2} = 2.5$. | |

378 | |

379 The norm of a multiple precision integer, for example, $\vert \vert x \vert \vert$ will be used to represent the number of digits in the representation | |

380 of the integer. For example, $\vert \vert 123 \vert \vert = 3$ and $\vert \vert 79452 \vert \vert = 5$. | |

381 | |

382 \subsection{Work Effort} | |

383 \index{big-Oh} | |

384 To measure the efficiency of the specified algorithms, a modified big-Oh notation is used. In this system all | |

385 single precision operations are considered to have the same cost\footnote{Except where explicitly noted.}. | |

386 That is a single precision addition, multiplication and division are assumed to take the same time to | |

387 complete. While this is generally not true in practice, it will simplify the discussions considerably. | |

388 | |

389 Some algorithms have slight advantages over others which is why some constants will not be removed in | |

390 the notation. For example, a normal baseline multiplication (section \ref{sec:basemult}) requires $O(n^2)$ work while a | |

391 baseline squaring (section \ref{sec:basesquare}) requires $O({{n^2 + n}\over 2})$ work. In standard big-Oh notation these | |

392 would both be said to be equivalent to $O(n^2)$. However, | |

393 in the context of the this text this is not the case as the magnitude of the inputs will typically be rather small. As a | |

394 result small constant factors in the work effort will make an observable difference in algorithm efficiency. | |

395 | |

396 All of the algorithms presented in this text have a polynomial time work level. That is, of the form | |

397 $O(n^k)$ for $n, k \in \Z^{+}$. This will help make useful comparisons in terms of the speed of the algorithms and how | |

398 various optimizations will help pay off in the long run. | |

399 | |

400 \section{Exercises} | |

401 Within the more advanced chapters a section will be set aside to give the reader some challenging exercises related to | |

402 the discussion at hand. These exercises are not designed to be prize winning problems, but instead to be thought | |

403 provoking. Wherever possible the problems are forward minded, stating problems that will be answered in subsequent | |

404 chapters. The reader is encouraged to finish the exercises as they appear to get a better understanding of the | |

405 subject material. | |

406 | |

407 That being said, the problems are designed to affirm knowledge of a particular subject matter. Students in particular | |

408 are encouraged to verify they can answer the problems correctly before moving on. | |

409 | |

410 Similar to the exercises of \cite[pp. ix]{TAOCPV2} these exercises are given a scoring system based on the difficulty of | |

411 the problem. However, unlike \cite{TAOCPV2} the problems do not get nearly as hard. The scoring of these | |

412 exercises ranges from one (the easiest) to five (the hardest). The following table sumarizes the | |

413 scoring system used. | |

414 | |

415 \begin{figure}[here] | |

416 \begin{center} | |

417 \begin{small} | |

418 \begin{tabular}{|c|l|} | |

419 \hline $\left [ 1 \right ]$ & An easy problem that should only take the reader a manner of \\ | |

420 & minutes to solve. Usually does not involve much computer time \\ | |

421 & to solve. \\ | |

422 \hline $\left [ 2 \right ]$ & An easy problem that involves a marginal amount of computer \\ | |

423 & time usage. Usually requires a program to be written to \\ | |

424 & solve the problem. \\ | |

425 \hline $\left [ 3 \right ]$ & A moderately hard problem that requires a non-trivial amount \\ | |

426 & of work. Usually involves trivial research and development of \\ | |

427 & new theory from the perspective of a student. \\ | |

428 \hline $\left [ 4 \right ]$ & A moderately hard problem that involves a non-trivial amount \\ | |

429 & of work and research, the solution to which will demonstrate \\ | |

430 & a higher mastery of the subject matter. \\ | |

431 \hline $\left [ 5 \right ]$ & A hard problem that involves concepts that are difficult for a \\ | |

432 & novice to solve. Solutions to these problems will demonstrate a \\ | |

433 & complete mastery of the given subject. \\ | |

434 \hline | |

435 \end{tabular} | |

436 \end{small} | |

437 \end{center} | |

438 \caption{Exercise Scoring System} | |

439 \end{figure} | |

440 | |

441 Problems at the first level are meant to be simple questions that the reader can answer quickly without programming a solution or | |

442 devising new theory. These problems are quick tests to see if the material is understood. Problems at the second level | |

443 are also designed to be easy but will require a program or algorithm to be implemented to arrive at the answer. These | |

444 two levels are essentially entry level questions. | |

445 | |

446 Problems at the third level are meant to be a bit more difficult than the first two levels. The answer is often | |

447 fairly obvious but arriving at an exacting solution requires some thought and skill. These problems will almost always | |

448 involve devising a new algorithm or implementing a variation of another algorithm previously presented. Readers who can | |

449 answer these questions will feel comfortable with the concepts behind the topic at hand. | |

450 | |

451 Problems at the fourth level are meant to be similar to those of the level three questions except they will require | |

452 additional research to be completed. The reader will most likely not know the answer right away, nor will the text provide | |

453 the exact details of the answer until a subsequent chapter. | |

454 | |

455 Problems at the fifth level are meant to be the hardest | |

456 problems relative to all the other problems in the chapter. People who can correctly answer fifth level problems have a | |

457 mastery of the subject matter at hand. | |

458 | |

459 Often problems will be tied together. The purpose of this is to start a chain of thought that will be discussed in future chapters. The reader | |

460 is encouraged to answer the follow-up problems and try to draw the relevance of problems. | |

461 | |

462 \section{Introduction to LibTomMath} | |

463 | |

464 \subsection{What is LibTomMath?} | |

465 LibTomMath is a free and open source multiple precision integer library written entirely in portable ISO C. By portable it | |

466 is meant that the library does not contain any code that is computer platform dependent or otherwise problematic to use on | |

467 any given platform. | |

468 | |

469 The library has been successfully tested under numerous operating systems including Unix\footnote{All of these | |

470 trademarks belong to their respective rightful owners.}, MacOS, Windows, Linux, PalmOS and on standalone hardware such | |

471 as the Gameboy Advance. The library is designed to contain enough functionality to be able to develop applications such | |

472 as public key cryptosystems and still maintain a relatively small footprint. | |

473 | |

474 \subsection{Goals of LibTomMath} | |

475 | |

476 Libraries which obtain the most efficiency are rarely written in a high level programming language such as C. However, | |

477 even though this library is written entirely in ISO C, considerable care has been taken to optimize the algorithm implementations within the | |

478 library. Specifically the code has been written to work well with the GNU C Compiler (\textit{GCC}) on both x86 and ARM | |

479 processors. Wherever possible, highly efficient algorithms, such as Karatsuba multiplication, sliding window | |

480 exponentiation and Montgomery reduction have been provided to make the library more efficient. | |

481 | |

482 Even with the nearly optimal and specialized algorithms that have been included the Application Programing Interface | |

483 (\textit{API}) has been kept as simple as possible. Often generic place holder routines will make use of specialized | |

484 algorithms automatically without the developer's specific attention. One such example is the generic multiplication | |

485 algorithm \textbf{mp\_mul()} which will automatically use Toom--Cook, Karatsuba, Comba or baseline multiplication | |

486 based on the magnitude of the inputs and the configuration of the library. | |

487 | |

488 Making LibTomMath as efficient as possible is not the only goal of the LibTomMath project. Ideally the library should | |

489 be source compatible with another popular library which makes it more attractive for developers to use. In this case the | |

490 MPI library was used as a API template for all the basic functions. MPI was chosen because it is another library that fits | |

491 in the same niche as LibTomMath. Even though LibTomMath uses MPI as the template for the function names and argument | |

492 passing conventions, it has been written from scratch by Tom St Denis. | |

493 | |

494 The project is also meant to act as a learning tool for students, the logic being that no easy-to-follow ``bignum'' | |

495 library exists which can be used to teach computer science students how to perform fast and reliable multiple precision | |

496 integer arithmetic. To this end the source code has been given quite a few comments and algorithm discussion points. | |

497 | |

498 \section{Choice of LibTomMath} | |

499 LibTomMath was chosen as the case study of this text not only because the author of both projects is one and the same but | |

500 for more worthy reasons. Other libraries such as GMP \cite{GMP}, MPI \cite{MPI}, LIP \cite{LIP} and OpenSSL | |

501 \cite{OPENSSL} have multiple precision integer arithmetic routines but would not be ideal for this text for | |

502 reasons that will be explained in the following sub-sections. | |

503 | |

504 \subsection{Code Base} | |

505 The LibTomMath code base is all portable ISO C source code. This means that there are no platform dependent conditional | |

506 segments of code littered throughout the source. This clean and uncluttered approach to the library means that a | |

507 developer can more readily discern the true intent of a given section of source code without trying to keep track of | |

508 what conditional code will be used. | |

509 | |

510 The code base of LibTomMath is well organized. Each function is in its own separate source code file | |

511 which allows the reader to find a given function very quickly. On average there are $76$ lines of code per source | |

512 file which makes the source very easily to follow. By comparison MPI and LIP are single file projects making code tracing | |

513 very hard. GMP has many conditional code segments which also hinder tracing. | |

514 | |

515 When compiled with GCC for the x86 processor and optimized for speed the entire library is approximately $100$KiB\footnote{The notation ``KiB'' means $2^{10}$ octets, similarly ``MiB'' means $2^{20}$ octets.} | |

516 which is fairly small compared to GMP (over $250$KiB). LibTomMath is slightly larger than MPI (which compiles to about | |

517 $50$KiB) but LibTomMath is also much faster and more complete than MPI. | |

518 | |

519 \subsection{API Simplicity} | |

520 LibTomMath is designed after the MPI library and shares the API design. Quite often programs that use MPI will build | |

521 with LibTomMath without change. The function names correlate directly to the action they perform. Almost all of the | |

522 functions share the same parameter passing convention. The learning curve is fairly shallow with the API provided | |

523 which is an extremely valuable benefit for the student and developer alike. | |

524 | |

525 The LIP library is an example of a library with an API that is awkward to work with. LIP uses function names that are often ``compressed'' to | |

526 illegible short hand. LibTomMath does not share this characteristic. | |

527 | |

528 The GMP library also does not return error codes. Instead it uses a POSIX.1 \cite{POSIX1} signal system where errors | |

529 are signaled to the host application. This happens to be the fastest approach but definitely not the most versatile. In | |

530 effect a math error (i.e. invalid input, heap error, etc) can cause a program to stop functioning which is definitely | |

531 undersireable in many situations. | |

532 | |

533 \subsection{Optimizations} | |

534 While LibTomMath is certainly not the fastest library (GMP often beats LibTomMath by a factor of two) it does | |

535 feature a set of optimal algorithms for tasks such as modular reduction, exponentiation, multiplication and squaring. GMP | |

536 and LIP also feature such optimizations while MPI only uses baseline algorithms with no optimizations. GMP lacks a few | |

537 of the additional modular reduction optimizations that LibTomMath features\footnote{At the time of this writing GMP | |

538 only had Barrett and Montgomery modular reduction algorithms.}. | |

539 | |

540 LibTomMath is almost always an order of magnitude faster than the MPI library at computationally expensive tasks such as modular | |

541 exponentiation. In the grand scheme of ``bignum'' libraries LibTomMath is faster than the average library and usually | |

542 slower than the best libraries such as GMP and OpenSSL by only a small factor. | |

543 | |

544 \subsection{Portability and Stability} | |

545 LibTomMath will build ``out of the box'' on any platform equipped with a modern version of the GNU C Compiler | |

546 (\textit{GCC}). This means that without changes the library will build without configuration or setting up any | |

547 variables. LIP and MPI will build ``out of the box'' as well but have numerous known bugs. Most notably the author of | |

548 MPI has recently stopped working on his library and LIP has long since been discontinued. | |

549 | |

550 GMP requires a configuration script to run and will not build out of the box. GMP and LibTomMath are still in active | |

551 development and are very stable across a variety of platforms. | |

552 | |

553 \subsection{Choice} | |

554 LibTomMath is a relatively compact, well documented, highly optimized and portable library which seems only natural for | |

555 the case study of this text. Various source files from the LibTomMath project will be included within the text. However, | |

556 the reader is encouraged to download their own copy of the library to actually be able to work with the library. | |

557 | |

558 \chapter{Getting Started} | |

559 \section{Library Basics} | |

560 The trick to writing any useful library of source code is to build a solid foundation and work outwards from it. First, | |

561 a problem along with allowable solution parameters should be identified and analyzed. In this particular case the | |

562 inability to accomodate multiple precision integers is the problem. Futhermore, the solution must be written | |

563 as portable source code that is reasonably efficient across several different computer platforms. | |

564 | |

565 After a foundation is formed the remainder of the library can be designed and implemented in a hierarchical fashion. | |

566 That is, to implement the lowest level dependencies first and work towards the most abstract functions last. For example, | |

567 before implementing a modular exponentiation algorithm one would implement a modular reduction algorithm. | |

568 By building outwards from a base foundation instead of using a parallel design methodology the resulting project is | |

569 highly modular. Being highly modular is a desirable property of any project as it often means the resulting product | |

570 has a small footprint and updates are easy to perform. | |

571 | |

572 Usually when I start a project I will begin with the header file. I define the data types I think I will need and | |

573 prototype the initial functions that are not dependent on other functions (within the library). After I | |

574 implement these base functions I prototype more dependent functions and implement them. The process repeats until | |

575 I implement all of the functions I require. For example, in the case of LibTomMath I implemented functions such as | |

576 mp\_init() well before I implemented mp\_mul() and even further before I implemented mp\_exptmod(). As an example as to | |

577 why this design works note that the Karatsuba and Toom-Cook multipliers were written \textit{after} the | |

578 dependent function mp\_exptmod() was written. Adding the new multiplication algorithms did not require changes to the | |

579 mp\_exptmod() function itself and lowered the total cost of ownership (\textit{so to speak}) and of development | |

580 for new algorithms. This methodology allows new algorithms to be tested in a complete framework with relative ease. | |

581 | |

582 FIGU,design_process,Design Flow of the First Few Original LibTomMath Functions. | |

583 | |

584 Only after the majority of the functions were in place did I pursue a less hierarchical approach to auditing and optimizing | |

585 the source code. For example, one day I may audit the multipliers and the next day the polynomial basis functions. | |

586 | |

587 It only makes sense to begin the text with the preliminary data types and support algorithms required as well. | |

588 This chapter discusses the core algorithms of the library which are the dependents for every other algorithm. | |

589 | |

590 \section{What is a Multiple Precision Integer?} | |

591 Recall that most programming languages, in particular ISO C \cite{ISOC}, only have fixed precision data types that on their own cannot | |

592 be used to represent values larger than their precision will allow. The purpose of multiple precision algorithms is | |

593 to use fixed precision data types to create and manipulate multiple precision integers which may represent values | |

594 that are very large. | |

595 | |

596 As a well known analogy, school children are taught how to form numbers larger than nine by prepending more radix ten digits. In the decimal system | |

597 the largest single digit value is $9$. However, by concatenating digits together larger numbers may be represented. Newly prepended digits | |

598 (\textit{to the left}) are said to be in a different power of ten column. That is, the number $123$ can be described as having a $1$ in the hundreds | |

599 column, $2$ in the tens column and $3$ in the ones column. Or more formally $123 = 1 \cdot 10^2 + 2 \cdot 10^1 + 3 \cdot 10^0$. Computer based | |

600 multiple precision arithmetic is essentially the same concept. Larger integers are represented by adjoining fixed | |

601 precision computer words with the exception that a different radix is used. | |

602 | |

603 What most people probably do not think about explicitly are the various other attributes that describe a multiple precision | |

604 integer. For example, the integer $154_{10}$ has two immediately obvious properties. First, the integer is positive, | |

605 that is the sign of this particular integer is positive as opposed to negative. Second, the integer has three digits in | |

606 its representation. There is an additional property that the integer posesses that does not concern pencil-and-paper | |

607 arithmetic. The third property is how many digits placeholders are available to hold the integer. | |

608 | |

609 The human analogy of this third property is ensuring there is enough space on the paper to write the integer. For example, | |

610 if one starts writing a large number too far to the right on a piece of paper they will have to erase it and move left. | |

611 Similarly, computer algorithms must maintain strict control over memory usage to ensure that the digits of an integer | |

612 will not exceed the allowed boundaries. These three properties make up what is known as a multiple precision | |

613 integer or mp\_int for short. | |

614 | |

615 \subsection{The mp\_int Structure} | |

616 \label{sec:MPINT} | |

617 The mp\_int structure is the ISO C based manifestation of what represents a multiple precision integer. The ISO C standard does not provide for | |

618 any such data type but it does provide for making composite data types known as structures. The following is the structure definition | |

619 used within LibTomMath. | |

620 | |

621 \index{mp\_int} | |

622 \begin{verbatim} | |

623 typedef struct { | |

624 int used, alloc, sign; | |

625 mp_digit *dp; | |

626 } mp_int; | |

627 \end{verbatim} | |

628 | |

629 The mp\_int structure can be broken down as follows. | |

630 | |

631 \begin{enumerate} | |

632 \item The \textbf{used} parameter denotes how many digits of the array \textbf{dp} contain the digits used to represent | |

633 a given integer. The \textbf{used} count must be positive (or zero) and may not exceed the \textbf{alloc} count. | |

634 | |

635 \item The \textbf{alloc} parameter denotes how | |

636 many digits are available in the array to use by functions before it has to increase in size. When the \textbf{used} count | |

637 of a result would exceed the \textbf{alloc} count all of the algorithms will automatically increase the size of the | |

638 array to accommodate the precision of the result. | |

639 | |

640 \item The pointer \textbf{dp} points to a dynamically allocated array of digits that represent the given multiple | |

641 precision integer. It is padded with $(\textbf{alloc} - \textbf{used})$ zero digits. The array is maintained in a least | |

642 significant digit order. As a pencil and paper analogy the array is organized such that the right most digits are stored | |

643 first starting at the location indexed by zero\footnote{In C all arrays begin at zero.} in the array. For example, | |

644 if \textbf{dp} contains $\lbrace a, b, c, \ldots \rbrace$ where \textbf{dp}$_0 = a$, \textbf{dp}$_1 = b$, \textbf{dp}$_2 = c$, $\ldots$ then | |

645 it would represent the integer $a + b\beta + c\beta^2 + \ldots$ | |

646 | |

647 \index{MP\_ZPOS} \index{MP\_NEG} | |

648 \item The \textbf{sign} parameter denotes the sign as either zero/positive (\textbf{MP\_ZPOS}) or negative (\textbf{MP\_NEG}). | |

649 \end{enumerate} | |

650 | |

651 \subsubsection{Valid mp\_int Structures} | |

652 Several rules are placed on the state of an mp\_int structure and are assumed to be followed for reasons of efficiency. | |

653 The only exceptions are when the structure is passed to initialization functions such as mp\_init() and mp\_init\_copy(). | |

654 | |

655 \begin{enumerate} | |

656 \item The value of \textbf{alloc} may not be less than one. That is \textbf{dp} always points to a previously allocated | |

657 array of digits. | |

658 \item The value of \textbf{used} may not exceed \textbf{alloc} and must be greater than or equal to zero. | |

659 \item The value of \textbf{used} implies the digit at index $(used - 1)$ of the \textbf{dp} array is non-zero. That is, | |

660 leading zero digits in the most significant positions must be trimmed. | |

661 \begin{enumerate} | |

662 \item Digits in the \textbf{dp} array at and above the \textbf{used} location must be zero. | |

663 \end{enumerate} | |

664 \item The value of \textbf{sign} must be \textbf{MP\_ZPOS} if \textbf{used} is zero; | |

665 this represents the mp\_int value of zero. | |

666 \end{enumerate} | |

667 | |

668 \section{Argument Passing} | |

669 A convention of argument passing must be adopted early on in the development of any library. Making the function | |

670 prototypes consistent will help eliminate many headaches in the future as the library grows to significant complexity. | |

671 In LibTomMath the multiple precision integer functions accept parameters from left to right as pointers to mp\_int | |

672 structures. That means that the source (input) operands are placed on the left and the destination (output) on the right. | |

673 Consider the following examples. | |

674 | |

675 \begin{verbatim} | |

676 mp_mul(&a, &b, &c); /* c = a * b */ | |

677 mp_add(&a, &b, &a); /* a = a + b */ | |

678 mp_sqr(&a, &b); /* b = a * a */ | |

679 \end{verbatim} | |

680 | |

681 The left to right order is a fairly natural way to implement the functions since it lets the developer read aloud the | |

682 functions and make sense of them. For example, the first function would read ``multiply a and b and store in c''. | |

683 | |

684 Certain libraries (\textit{LIP by Lenstra for instance}) accept parameters the other way around, to mimic the order | |

685 of assignment expressions. That is, the destination (output) is on the left and arguments (inputs) are on the right. In | |

686 truth, it is entirely a matter of preference. In the case of LibTomMath the convention from the MPI library has been | |

687 adopted. | |

688 | |

689 Another very useful design consideration, provided for in LibTomMath, is whether to allow argument sources to also be a | |

690 destination. For example, the second example (\textit{mp\_add}) adds $a$ to $b$ and stores in $a$. This is an important | |

691 feature to implement since it allows the calling functions to cut down on the number of variables it must maintain. | |

692 However, to implement this feature specific care has to be given to ensure the destination is not modified before the | |

693 source is fully read. | |

694 | |

695 \section{Return Values} | |

696 A well implemented application, no matter what its purpose, should trap as many runtime errors as possible and return them | |

697 to the caller. By catching runtime errors a library can be guaranteed to prevent undefined behaviour. However, the end | |

698 developer can still manage to cause a library to crash. For example, by passing an invalid pointer an application may | |

699 fault by dereferencing memory not owned by the application. | |

700 | |

701 In the case of LibTomMath the only errors that are checked for are related to inappropriate inputs (division by zero for | |

702 instance) and memory allocation errors. It will not check that the mp\_int passed to any function is valid nor | |

703 will it check pointers for validity. Any function that can cause a runtime error will return an error code as an | |

704 \textbf{int} data type with one of the following values. | |

705 | |

706 \index{MP\_OKAY} \index{MP\_VAL} \index{MP\_MEM} | |

707 \begin{center} | |

708 \begin{tabular}{|l|l|} | |

709 \hline \textbf{Value} & \textbf{Meaning} \\ | |

710 \hline \textbf{MP\_OKAY} & The function was successful \\ | |

711 \hline \textbf{MP\_VAL} & One of the input value(s) was invalid \\ | |

712 \hline \textbf{MP\_MEM} & The function ran out of heap memory \\ | |

713 \hline | |

714 \end{tabular} | |

715 \end{center} | |

716 | |

717 When an error is detected within a function it should free any memory it allocated, often during the initialization of | |

718 temporary mp\_ints, and return as soon as possible. The goal is to leave the system in the same state it was when the | |

719 function was called. Error checking with this style of API is fairly simple. | |

720 | |

721 \begin{verbatim} | |

722 int err; | |

723 if ((err = mp_add(&a, &b, &c)) != MP_OKAY) { | |

724 printf("Error: %s\n", mp_error_to_string(err)); | |

725 exit(EXIT_FAILURE); | |

726 } | |

727 \end{verbatim} | |

728 | |

729 The GMP \cite{GMP} library uses C style \textit{signals} to flag errors which is of questionable use. Not all errors are fatal | |

730 and it was not deemed ideal by the author of LibTomMath to force developers to have signal handlers for such cases. | |

731 | |

732 \section{Initialization and Clearing} | |

733 The logical starting point when actually writing multiple precision integer functions is the initialization and | |

734 clearing of the mp\_int structures. These two algorithms will be used by the majority of the higher level algorithms. | |

735 | |

736 Given the basic mp\_int structure an initialization routine must first allocate memory to hold the digits of | |

737 the integer. Often it is optimal to allocate a sufficiently large pre-set number of digits even though | |

738 the initial integer will represent zero. If only a single digit were allocated quite a few subsequent re-allocations | |

739 would occur when operations are performed on the integers. There is a tradeoff between how many default digits to allocate | |

740 and how many re-allocations are tolerable. Obviously allocating an excessive amount of digits initially will waste | |

741 memory and become unmanageable. | |

742 | |

743 If the memory for the digits has been successfully allocated then the rest of the members of the structure must | |

744 be initialized. Since the initial state of an mp\_int is to represent the zero integer, the allocated digits must be set | |

745 to zero. The \textbf{used} count set to zero and \textbf{sign} set to \textbf{MP\_ZPOS}. | |

746 | |

747 \subsection{Initializing an mp\_int} | |

748 An mp\_int is said to be initialized if it is set to a valid, preferably default, state such that all of the members of the | |

749 structure are set to valid values. The mp\_init algorithm will perform such an action. | |

750 | |

751 \begin{figure}[here] | |

752 \begin{center} | |

753 \begin{tabular}{l} | |

754 \hline Algorithm \textbf{mp\_init}. \\ | |

755 \textbf{Input}. An mp\_int $a$ \\ | |

756 \textbf{Output}. Allocate memory and initialize $a$ to a known valid mp\_int state. \\ | |

757 \hline \\ | |

758 1. Allocate memory for \textbf{MP\_PREC} digits. \\ | |

759 2. If the allocation failed return(\textit{MP\_MEM}) \\ | |

760 3. for $n$ from $0$ to $MP\_PREC - 1$ do \\ | |

761 \hspace{3mm}3.1 $a_n \leftarrow 0$\\ | |

762 4. $a.sign \leftarrow MP\_ZPOS$\\ | |

763 5. $a.used \leftarrow 0$\\ | |

764 6. $a.alloc \leftarrow MP\_PREC$\\ | |

765 7. Return(\textit{MP\_OKAY})\\ | |

766 \hline | |

767 \end{tabular} | |

768 \end{center} | |

769 \caption{Algorithm mp\_init} | |

770 \end{figure} | |

771 | |

772 \textbf{Algorithm mp\_init.} | |

773 The \textbf{MP\_PREC} name represents a constant\footnote{Defined in the ``tommath.h'' header file within LibTomMath.} | |

774 used to dictate the minimum precision of allocated mp\_int integers. Ideally, it is at least equal to $32$ since for most | |

775 purposes that will be more than enough. | |

776 | |

777 Memory for the default number of digits is allocated first. If the allocation fails the algorithm returns immediately | |

778 with the \textbf{MP\_MEM} error code. If the allocation succeeds the remaining members of the mp\_int structure | |

779 must be initialized to reflect the default initial state. | |

780 | |

781 The allocated digits are all set to zero (step three) to ensure they are in a known state. The \textbf{sign}, \textbf{used} | |

782 and \textbf{alloc} are subsequently initialized to represent the zero integer. By step seven the algorithm returns a success | |

783 code and the mp\_int $a$ has been successfully initialized to a valid state representing the integer zero. | |

784 | |

785 \textbf{Remark.} | |

786 This function introduces the idiosyncrasy that all iterative loops, commonly initiated with the ``for'' keyword, iterate incrementally | |

787 when the ``to'' keyword is placed between two expressions. For example, ``for $a$ from $b$ to $c$ do'' means that | |

788 a subsequent expression (or body of expressions) are to be evaluated upto $c - b$ times so long as $b \le c$. In each | |

789 iteration the variable $a$ is substituted for a new integer that lies inclusively between $b$ and $c$. If $b > c$ occured | |

790 the loop would not iterate. By contrast if the ``downto'' keyword were used in place of ``to'' the loop would iterate | |

791 decrementally. | |

792 | |

793 EXAM,bn_mp_init.c | |

794 | |

795 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It | |

796 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The | |

797 call to mp\_init() is used only to initialize the members of the structure to a known default state. | |

798 | |

799 Before any of the other members of the structure are initialized memory from the application heap is allocated with | |

800 the calloc() function (line @22,[email protected]). The size of the allocated memory is large enough to hold \textbf{MP\_PREC} | |

801 mp\_digit variables. The calloc() function is used instead\footnote{calloc() will allocate memory in the same | |

802 manner as malloc() except that it also sets the contents to zero upon successfully allocating the memory.} of malloc() | |

803 since digits have to be set to zero for the function to finish correctly. The \textbf{OPT\_CAST} token is a macro | |

804 definition which will turn into a cast from void * to mp\_digit * for C++ compilers. It is not required for C compilers. | |

805 | |

806 After the memory has been successfully allocated the remainder of the members are initialized | |

807 (lines @29,[email protected] through @31,[email protected]) to their respective default states. At this point the algorithm has succeeded and | |

808 a success code is returned to the calling function. | |

809 | |

810 If this function returns \textbf{MP\_OKAY} it is safe to assume the mp\_int structure has been properly initialized and | |

811 is safe to use with other functions within the library. | |

812 | |

813 \subsection{Clearing an mp\_int} | |

814 When an mp\_int is no longer required by the application, the memory that has been allocated for its digits must be | |

815 returned to the application's memory pool with the mp\_clear algorithm. | |

816 | |

817 \begin{figure}[here] | |

818 \begin{center} | |

819 \begin{tabular}{l} | |

820 \hline Algorithm \textbf{mp\_clear}. \\ | |

821 \textbf{Input}. An mp\_int $a$ \\ | |

822 \textbf{Output}. The memory for $a$ is freed for reuse. \\ | |

823 \hline \\ | |

824 1. If $a$ has been previously freed then return(\textit{MP\_OKAY}). \\ | |

825 2. for $n$ from 0 to $a.used - 1$ do \\ | |

826 \hspace{3mm}2.1 $a_n \leftarrow 0$ \\ | |

827 3. Free the memory allocated for the digits of $a$. \\ | |

828 4. $a.used \leftarrow 0$ \\ | |

829 5. $a.alloc \leftarrow 0$ \\ | |

830 6. $a.sign \leftarrow MP\_ZPOS$ \\ | |

831 7. Return(\textit{MP\_OKAY}). \\ | |

832 \hline | |

833 \end{tabular} | |

834 \end{center} | |

835 \caption{Algorithm mp\_clear} | |

836 \end{figure} | |

837 | |

838 \textbf{Algorithm mp\_clear.} | |

839 This algorithm releases the memory allocated for an mp\_int back into the memory pool for reuse. It is designed | |

840 such that a given mp\_int structure can be cleared multiple times between initializations without attempting to | |

841 free the memory twice\footnote{In ISO C for example, calling free() twice on the same memory block causes undefinied | |

842 behaviour.}. | |

843 | |

844 The first step determines if the mp\_int structure has been marked as free already. If it has, the algorithm returns | |

845 success immediately as no further actions are required. Otherwise, the algorithm will proceed to put the structure | |

846 in a known empty and otherwise invalid state. First the digits of the mp\_int are set to zero. The memory that has been allocated for the | |

847 digits is then freed. The \textbf{used} and \textbf{alloc} counts are both set to zero and the \textbf{sign} set to | |

848 \textbf{MP\_ZPOS}. This known fixed state for cleared mp\_int structures will make debuging easier for the end | |

849 developer. That is, if they spot (via their debugger) an mp\_int they are using that is in this state it will be | |

850 obvious that they erroneously and prematurely cleared the mp\_int structure. | |

851 | |

852 Note that once an mp\_int has been cleared the mp\_int structure is no longer in a valid state for any other algorithm | |

853 with the exception of algorithms mp\_init, mp\_init\_copy, mp\_init\_size and mp\_clear. | |

854 | |

855 EXAM,bn_mp_clear.c | |

856 | |

857 The ``if'' statement (line @21,a->dp != [email protected]) prevents the heap from being corrupted if a user double-frees an | |

858 mp\_int. This is because once the memory is freed the pointer is set to \textbf{NULL} (line @30,[email protected]). | |

859 | |

860 Without the check, code that accidentally calls mp\_clear twice for a given mp\_int structure would try to free the memory | |

861 allocated for the digits twice. This may cause some C libraries to signal a fault. By setting the pointer to | |

862 \textbf{NULL} it helps debug code that may inadvertently free the mp\_int before it is truly not needed, because attempts | |

863 to reference digits should fail immediately. The allocated digits are set to zero before being freed (line @24,[email protected]). | |

864 This is ideal for cryptographic situations where the integer that the mp\_int represents might need to be kept a secret. | |

865 | |

866 \section{Maintenance Algorithms} | |

867 | |

868 The previous sections describes how to initialize and clear an mp\_int structure. To further support operations | |

869 that are to be performed on mp\_int structures (such as addition and multiplication) the dependent algorithms must be | |

870 able to augment the precision of an mp\_int and | |

871 initialize mp\_ints with differing initial conditions. | |

872 | |

873 These algorithms complete the set of low level algorithms required to work with mp\_int structures in the higher level | |

874 algorithms such as addition, multiplication and modular exponentiation. | |

875 | |

876 \subsection{Augmenting an mp\_int's Precision} | |

877 When storing a value in an mp\_int structure, a sufficient number of digits must be available to accomodate the entire | |

878 result of an operation without loss of precision. Quite often the size of the array given by the \textbf{alloc} member | |

879 is large enough to simply increase the \textbf{used} digit count. However, when the size of the array is too small it | |

880 must be re-sized appropriately to accomodate the result. The mp\_grow algorithm will provide this functionality. | |

881 | |

882 \newpage\begin{figure}[here] | |

883 \begin{center} | |

884 \begin{tabular}{l} | |

885 \hline Algorithm \textbf{mp\_grow}. \\ | |

886 \textbf{Input}. An mp\_int $a$ and an integer $b$. \\ | |

887 \textbf{Output}. $a$ is expanded to accomodate $b$ digits. \\ | |

888 \hline \\ | |

889 1. if $a.alloc \ge b$ then return(\textit{MP\_OKAY}) \\ | |

890 2. $u \leftarrow b\mbox{ (mod }MP\_PREC\mbox{)}$ \\ | |

891 3. $v \leftarrow b + 2 \cdot MP\_PREC - u$ \\ | |

892 4. Re-Allocate the array of digits $a$ to size $v$ \\ | |

893 5. If the allocation failed then return(\textit{MP\_MEM}). \\ | |

894 6. for n from a.alloc to $v - 1$ do \\ | |

895 \hspace{+3mm}6.1 $a_n \leftarrow 0$ \\ | |

896 7. $a.alloc \leftarrow v$ \\ | |

897 8. Return(\textit{MP\_OKAY}) \\ | |

898 \hline | |

899 \end{tabular} | |

900 \end{center} | |

901 \caption{Algorithm mp\_grow} | |

902 \end{figure} | |

903 | |

904 \textbf{Algorithm mp\_grow.} | |

905 It is ideal to prevent re-allocations from being performed if they are not required (step one). This is useful to | |

906 prevent mp\_ints from growing excessively in code that erroneously calls mp\_grow. | |

907 | |

908 The requested digit count is padded up to next multiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} (steps two and three). | |

909 This helps prevent many trivial reallocations that would grow an mp\_int by trivially small values. | |

910 | |

911 It is assumed that the reallocation (step four) leaves the lower $a.alloc$ digits of the mp\_int intact. This is much | |

912 akin to how the \textit{realloc} function from the standard C library works. Since the newly allocated digits are | |

913 assumed to contain undefined values they are initially set to zero. | |

914 | |

915 EXAM,bn_mp_grow.c | |

916 | |

917 The first step is to see if we actually need to perform a re-allocation at all (line @24,a->alloc < [email protected]). If a reallocation | |

918 must occur the digit count is padded upwards to help prevent many trivial reallocations (line @28,[email protected]). Next the reallocation is performed | |

919 and the return of realloc() is stored in a temporary pointer named $tmp$ (line @36,[email protected]). The return is stored in a temporary | |

920 instead of $a.dp$ to prevent the code from losing the original pointer in case the reallocation fails. Had the return been stored | |

921 in $a.dp$ instead there would be no way to reclaim the heap originally used. | |

922 | |

923 If the reallocation fails the function will return \textbf{MP\_MEM} (line @39,[email protected]), otherwise, the value of $tmp$ is assigned | |

924 to the pointer $a.dp$ and the function continues. A simple for loop from line @48,a->[email protected] to line @50,}@ will zero all digits | |

925 that were above the old \textbf{alloc} limit to make sure the integer is in a known state. | |

926 | |

927 \subsection{Initializing Variable Precision mp\_ints} | |

928 Occasionally the number of digits required will be known in advance of an initialization, based on, for example, the size | |

929 of input mp\_ints to a given algorithm. The purpose of algorithm mp\_init\_size is similar to mp\_init except that it | |

930 will allocate \textit{at least} a specified number of digits. | |

931 | |

932 \begin{figure}[here] | |

933 \begin{small} | |

934 \begin{center} | |

935 \begin{tabular}{l} | |

936 \hline Algorithm \textbf{mp\_init\_size}. \\ | |

937 \textbf{Input}. An mp\_int $a$ and the requested number of digits $b$. \\ | |

938 \textbf{Output}. $a$ is initialized to hold at least $b$ digits. \\ | |

939 \hline \\ | |

940 1. $u \leftarrow b \mbox{ (mod }MP\_PREC\mbox{)}$ \\ | |

941 2. $v \leftarrow b + 2 \cdot MP\_PREC - u$ \\ | |

942 3. Allocate $v$ digits. \\ | |

943 4. for $n$ from $0$ to $v - 1$ do \\ | |

944 \hspace{3mm}4.1 $a_n \leftarrow 0$ \\ | |

945 5. $a.sign \leftarrow MP\_ZPOS$\\ | |

946 6. $a.used \leftarrow 0$\\ | |

947 7. $a.alloc \leftarrow v$\\ | |

948 8. Return(\textit{MP\_OKAY})\\ | |

949 \hline | |

950 \end{tabular} | |

951 \end{center} | |

952 \end{small} | |

953 \caption{Algorithm mp\_init\_size} | |

954 \end{figure} | |

955 | |

956 \textbf{Algorithm mp\_init\_size.} | |

957 This algorithm will initialize an mp\_int structure $a$ like algorithm mp\_init with the exception that the number of | |

958 digits allocated can be controlled by the second input argument $b$. The input size is padded upwards so it is a | |

959 multiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} digits. This padding is used to prevent trivial | |

960 allocations from becoming a bottleneck in the rest of the algorithms. | |

961 | |

962 Like algorithm mp\_init, the mp\_int structure is initialized to a default state representing the integer zero. This | |

963 particular algorithm is useful if it is known ahead of time the approximate size of the input. If the approximation is | |

964 correct no further memory re-allocations are required to work with the mp\_int. | |

965 | |

966 EXAM,bn_mp_init_size.c | |

967 | |

968 The number of digits $b$ requested is padded (line @22,MP_[email protected]) by first augmenting it to the next multiple of | |

969 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the | |

970 mp\_int is placed in a default state representing the integer zero. Otherwise, the error code \textbf{MP\_MEM} will be | |

971 returned (line @27,[email protected]). | |

972 | |

973 The digits are allocated and set to zero at the same time with the calloc() function (line @25,[email protected]). The | |

974 \textbf{used} count is set to zero, the \textbf{alloc} count set to the padded digit count and the \textbf{sign} flag set | |

975 to \textbf{MP\_ZPOS} to achieve a default valid mp\_int state (lines @29,[email protected], @30,[email protected] and @31,[email protected]). If the function | |

976 returns succesfully then it is correct to assume that the mp\_int structure is in a valid state for the remainder of the | |

977 functions to work with. | |

978 | |

979 \subsection{Multiple Integer Initializations and Clearings} | |

980 Occasionally a function will require a series of mp\_int data types to be made available simultaneously. | |

981 The purpose of algorithm mp\_init\_multi is to initialize a variable length array of mp\_int structures in a single | |

982 statement. It is essentially a shortcut to multiple initializations. | |

983 | |

984 \newpage\begin{figure}[here] | |

985 \begin{center} | |

986 \begin{tabular}{l} | |

987 \hline Algorithm \textbf{mp\_init\_multi}. \\ | |

988 \textbf{Input}. Variable length array $V_k$ of mp\_int variables of length $k$. \\ | |

989 \textbf{Output}. The array is initialized such that each mp\_int of $V_k$ is ready to use. \\ | |

990 \hline \\ | |

991 1. for $n$ from 0 to $k - 1$ do \\ | |

992 \hspace{+3mm}1.1. Initialize the mp\_int $V_n$ (\textit{mp\_init}) \\ | |

993 \hspace{+3mm}1.2. If initialization failed then do \\ | |

994 \hspace{+6mm}1.2.1. for $j$ from $0$ to $n$ do \\ | |

995 \hspace{+9mm}1.2.1.1. Free the mp\_int $V_j$ (\textit{mp\_clear}) \\ | |

996 \hspace{+6mm}1.2.2. Return(\textit{MP\_MEM}) \\ | |

997 2. Return(\textit{MP\_OKAY}) \\ | |

998 \hline | |

999 \end{tabular} | |

1000 \end{center} | |

1001 \caption{Algorithm mp\_init\_multi} | |

1002 \end{figure} | |

1003 | |

1004 \textbf{Algorithm mp\_init\_multi.} | |

1005 The algorithm will initialize the array of mp\_int variables one at a time. If a runtime error has been detected | |

1006 (\textit{step 1.2}) all of the previously initialized variables are cleared. The goal is an ``all or nothing'' | |

1007 initialization which allows for quick recovery from runtime errors. | |

1008 | |

1009 EXAM,bn_mp_init_multi.c | |

1010 | |

1011 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int | |

1012 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the | |

1013 ``...'' argument syntax of the C programming language. The list is terminated with a final \textbf{NULL} argument | |

1014 appended on the right. | |

1015 | |

1016 The function uses the ``stdarg.h'' \textit{va} functions to step portably through the arguments to the function. A count | |

1017 $n$ of succesfully initialized mp\_int structures is maintained (line @47,[email protected]) such that if a failure does occur, | |

1018 the algorithm can backtrack and free the previously initialized structures (lines @27,[email protected] to @46,}@). | |

1019 | |

1020 | |

1021 \subsection{Clamping Excess Digits} | |

1022 When a function anticipates a result will be $n$ digits it is simpler to assume this is true within the body of | |

1023 the function instead of checking during the computation. For example, a multiplication of a $i$ digit number by a | |

1024 $j$ digit produces a result of at most $i + j$ digits. It is entirely possible that the result is $i + j - 1$ | |

1025 though, with no final carry into the last position. However, suppose the destination had to be first expanded | |

1026 (\textit{via mp\_grow}) to accomodate $i + j - 1$ digits than further expanded to accomodate the final carry. | |

1027 That would be a considerable waste of time since heap operations are relatively slow. | |

1028 | |

1029 The ideal solution is to always assume the result is $i + j$ and fix up the \textbf{used} count after the function | |

1030 terminates. This way a single heap operation (\textit{at most}) is required. However, if the result was not checked | |

1031 there would be an excess high order zero digit. | |

1032 | |

1033 For example, suppose the product of two integers was $x_n = (0x_{n-1}x_{n-2}...x_0)_{\beta}$. The leading zero digit | |

1034 will not contribute to the precision of the result. In fact, through subsequent operations more leading zero digits would | |

1035 accumulate to the point the size of the integer would be prohibitive. As a result even though the precision is very | |

1036 low the representation is excessively large. | |

1037 | |

1038 The mp\_clamp algorithm is designed to solve this very problem. It will trim high-order zeros by decrementing the | |

1039 \textbf{used} count until a non-zero most significant digit is found. Also in this system, zero is considered to be a | |

1040 positive number which means that if the \textbf{used} count is decremented to zero, the sign must be set to | |

1041 \textbf{MP\_ZPOS}. | |

1042 | |

1043 \begin{figure}[here] | |

1044 \begin{center} | |

1045 \begin{tabular}{l} | |

1046 \hline Algorithm \textbf{mp\_clamp}. \\ | |

1047 \textbf{Input}. An mp\_int $a$ \\ | |

1048 \textbf{Output}. Any excess leading zero digits of $a$ are removed \\ | |

1049 \hline \\ | |

1050 1. while $a.used > 0$ and $a_{a.used - 1} = 0$ do \\ | |

1051 \hspace{+3mm}1.1 $a.used \leftarrow a.used - 1$ \\ | |

1052 2. if $a.used = 0$ then do \\ | |

1053 \hspace{+3mm}2.1 $a.sign \leftarrow MP\_ZPOS$ \\ | |

1054 \hline \\ | |

1055 \end{tabular} | |

1056 \end{center} | |

1057 \caption{Algorithm mp\_clamp} | |

1058 \end{figure} | |

1059 | |

1060 \textbf{Algorithm mp\_clamp.} | |

1061 As can be expected this algorithm is very simple. The loop on step one is expected to iterate only once or twice at | |

1062 the most. For example, this will happen in cases where there is not a carry to fill the last position. Step two fixes the sign for | |

1063 when all of the digits are zero to ensure that the mp\_int is valid at all times. | |

1064 | |

1065 EXAM,bn_mp_clamp.c | |

1066 | |

1067 Note on line @27,[email protected] how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming | |

1068 language the terms to \&\& are evaluated left to right with a boolean short-circuit if any condition fails. This is | |

1069 important since if the \textbf{used} is zero the test on the right would fetch below the array. That is obviously | |

1070 undesirable. The parenthesis on line @28,a->[email protected] is used to make sure the \textbf{used} count is decremented and not | |

1071 the pointer ``a''. | |

1072 | |

1073 \section*{Exercises} | |

1074 \begin{tabular}{cl} | |

1075 $\left [ 1 \right ]$ & Discuss the relevance of the \textbf{used} member of the mp\_int structure. \\ | |

1076 & \\ | |

1077 $\left [ 1 \right ]$ & Discuss the consequences of not using padding when performing allocations. \\ | |

1078 & \\ | |

1079 $\left [ 2 \right ]$ & Estimate an ideal value for \textbf{MP\_PREC} when performing 1024-bit RSA \\ | |

1080 & encryption when $\beta = 2^{28}$. \\ | |

1081 & \\ | |

1082 $\left [ 1 \right ]$ & Discuss the relevance of the algorithm mp\_clamp. What does it prevent? \\ | |

1083 & \\ | |

1084 $\left [ 1 \right ]$ & Give an example of when the algorithm mp\_init\_copy might be useful. \\ | |

1085 & \\ | |

1086 \end{tabular} | |

1087 | |

1088 | |

1089 %%% | |

1090 % CHAPTER FOUR | |

1091 %%% | |

1092 | |

1093 \chapter{Basic Operations} | |

1094 | |

1095 \section{Introduction} | |

1096 In the previous chapter a series of low level algorithms were established that dealt with initializing and maintaining | |

1097 mp\_int structures. This chapter will discuss another set of seemingly non-algebraic algorithms which will form the low | |

1098 level basis of the entire library. While these algorithm are relatively trivial it is important to understand how they | |

1099 work before proceeding since these algorithms will be used almost intrinsically in the following chapters. | |

1100 | |

1101 The algorithms in this chapter deal primarily with more ``programmer'' related tasks such as creating copies of | |

1102 mp\_int structures, assigning small values to mp\_int structures and comparisons of the values mp\_int structures | |

1103 represent. | |

1104 | |

1105 \section{Assigning Values to mp\_int Structures} | |

1106 \subsection{Copying an mp\_int} | |

1107 Assigning the value that a given mp\_int structure represents to another mp\_int structure shall be known as making | |

1108 a copy for the purposes of this text. The copy of the mp\_int will be a separate entity that represents the same | |

1109 value as the mp\_int it was copied from. The mp\_copy algorithm provides this functionality. | |

1110 | |

1111 \newpage\begin{figure}[here] | |

1112 \begin{center} | |

1113 \begin{tabular}{l} | |

1114 \hline Algorithm \textbf{mp\_copy}. \\ | |

1115 \textbf{Input}. An mp\_int $a$ and $b$. \\ | |

1116 \textbf{Output}. Store a copy of $a$ in $b$. \\ | |

1117 \hline \\ | |

1118 1. If $b.alloc < a.used$ then grow $b$ to $a.used$ digits. (\textit{mp\_grow}) \\ | |

1119 2. for $n$ from 0 to $a.used - 1$ do \\ | |

1120 \hspace{3mm}2.1 $b_{n} \leftarrow a_{n}$ \\ | |

1121 3. for $n$ from $a.used$ to $b.used - 1$ do \\ | |

1122 \hspace{3mm}3.1 $b_{n} \leftarrow 0$ \\ | |

1123 4. $b.used \leftarrow a.used$ \\ | |

1124 5. $b.sign \leftarrow a.sign$ \\ | |

1125 6. return(\textit{MP\_OKAY}) \\ | |

1126 \hline | |

1127 \end{tabular} | |

1128 \end{center} | |

1129 \caption{Algorithm mp\_copy} | |

1130 \end{figure} | |

1131 | |

1132 \textbf{Algorithm mp\_copy.} | |

1133 This algorithm copies the mp\_int $a$ such that upon succesful termination of the algorithm the mp\_int $b$ will | |

1134 represent the same integer as the mp\_int $a$. The mp\_int $b$ shall be a complete and distinct copy of the | |

1135 mp\_int $a$ meaing that the mp\_int $a$ can be modified and it shall not affect the value of the mp\_int $b$. | |

1136 | |

1137 If $b$ does not have enough room for the digits of $a$ it must first have its precision augmented via the mp\_grow | |

1138 algorithm. The digits of $a$ are copied over the digits of $b$ and any excess digits of $b$ are set to zero (step two | |

1139 and three). The \textbf{used} and \textbf{sign} members of $a$ are finally copied over the respective members of | |

1140 $b$. | |

1141 | |

1142 \textbf{Remark.} This algorithm also introduces a new idiosyncrasy that will be used throughout the rest of the | |

1143 text. The error return codes of other algorithms are not explicitly checked in the pseudo-code presented. For example, in | |

1144 step one of the mp\_copy algorithm the return of mp\_grow is not explicitly checked to ensure it succeeded. Text space is | |

1145 limited so it is assumed that if a algorithm fails it will clear all temporarily allocated mp\_ints and return | |

1146 the error code itself. However, the C code presented will demonstrate all of the error handling logic required to | |

1147 implement the pseudo-code. | |

1148 | |

1149 EXAM,bn_mp_copy.c | |

1150 | |

1151 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output | |

1152 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without | |

1153 copying digits (line @24,a == [email protected]). | |

1154 | |

1155 The mp\_int $b$ must have enough digits to accomodate the used digits of the mp\_int $a$. If $b.alloc$ is less than | |

1156 $a.used$ the algorithm mp\_grow is used to augment the precision of $b$ (lines @29,[email protected] to @33,}@). In order to | |

1157 simplify the inner loop that copies the digits from $a$ to $b$, two aliases $tmpa$ and $tmpb$ point directly at the digits | |

1158 of the mp\_ints $a$ and $b$ respectively. These aliases (lines @42,[email protected] and @45,[email protected]) allow the compiler to access the digits without first dereferencing the | |

1159 mp\_int pointers and then subsequently the pointer to the digits. | |

1160 | |

1161 After the aliases are established the digits from $a$ are copied into $b$ (lines @48,[email protected] to @50,}@) and then the excess | |

1162 digits of $b$ are set to zero (lines @53,[email protected] to @55,}@). Both ``for'' loops make use of the pointer aliases and in | |

1163 fact the alias for $b$ is carried through into the second ``for'' loop to clear the excess digits. This optimization | |

1164 allows the alias to stay in a machine register fairly easy between the two loops. | |

1165 | |

1166 \textbf{Remarks.} The use of pointer aliases is an implementation methodology first introduced in this function that will | |

1167 be used considerably in other functions. Technically, a pointer alias is simply a short hand alias used to lower the | |

1168 number of pointer dereferencing operations required to access data. For example, a for loop may resemble | |

1169 | |

1170 \begin{alltt} | |

1171 for (x = 0; x < 100; x++) \{ | |

1172 a->num[4]->dp[x] = 0; | |

1173 \} | |

1174 \end{alltt} | |

1175 | |

1176 This could be re-written using aliases as | |

1177 | |

1178 \begin{alltt} | |

1179 mp_digit *tmpa; | |

1180 a = a->num[4]->dp; | |

1181 for (x = 0; x < 100; x++) \{ | |

1182 *a++ = 0; | |

1183 \} | |

1184 \end{alltt} | |

1185 | |

1186 In this case an alias is used to access the | |

1187 array of digits within an mp\_int structure directly. It may seem that a pointer alias is strictly not required | |

1188 as a compiler may optimize out the redundant pointer operations. However, there are two dominant reasons to use aliases. | |

1189 | |

1190 The first reason is that most compilers will not effectively optimize pointer arithmetic. For example, some optimizations | |

1191 may work for the Microsoft Visual C++ compiler (MSVC) and not for the GNU C Compiler (GCC). Also some optimizations may | |

1192 work for GCC and not MSVC. As such it is ideal to find a common ground for as many compilers as possible. Pointer | |

1193 aliases optimize the code considerably before the compiler even reads the source code which means the end compiled code | |

1194 stands a better chance of being faster. | |

1195 | |

1196 The second reason is that pointer aliases often can make an algorithm simpler to read. Consider the first ``for'' | |

1197 loop of the function mp\_copy() re-written to not use pointer aliases. | |

1198 | |

1199 \begin{alltt} | |

1200 /* copy all the digits */ | |

1201 for (n = 0; n < a->used; n++) \{ | |

1202 b->dp[n] = a->dp[n]; | |

1203 \} | |

1204 \end{alltt} | |

1205 | |

1206 Whether this code is harder to read depends strongly on the individual. However, it is quantifiably slightly more | |

1207 complicated as there are four variables within the statement instead of just two. | |

1208 | |

1209 \subsubsection{Nested Statements} | |

1210 Another commonly used technique in the source routines is that certain sections of code are nested. This is used in | |

1211 particular with the pointer aliases to highlight code phases. For example, a Comba multiplier (discussed in chapter six) | |

1212 will typically have three different phases. First the temporaries are initialized, then the columns calculated and | |

1213 finally the carries are propagated. In this example the middle column production phase will typically be nested as it | |

1214 uses temporary variables and aliases the most. | |

1215 | |

1216 The nesting also simplies the source code as variables that are nested are only valid for their scope. As a result | |

1217 the various temporary variables required do not propagate into other sections of code. | |

1218 | |

1219 | |

1220 \subsection{Creating a Clone} | |

1221 Another common operation is to make a local temporary copy of an mp\_int argument. To initialize an mp\_int | |

1222 and then copy another existing mp\_int into the newly intialized mp\_int will be known as creating a clone. This is | |

1223 useful within functions that need to modify an argument but do not wish to actually modify the original copy. The | |

1224 mp\_init\_copy algorithm has been designed to help perform this task. | |

1225 | |

1226 \begin{figure}[here] | |

1227 \begin{center} | |

1228 \begin{tabular}{l} | |

1229 \hline Algorithm \textbf{mp\_init\_copy}. \\ | |

1230 \textbf{Input}. An mp\_int $a$ and $b$\\ | |

1231 \textbf{Output}. $a$ is initialized to be a copy of $b$. \\ | |

1232 \hline \\ | |

1233 1. Init $a$. (\textit{mp\_init}) \\ | |

1234 2. Copy $b$ to $a$. (\textit{mp\_copy}) \\ | |

1235 3. Return the status of the copy operation. \\ | |

1236 \hline | |

1237 \end{tabular} | |

1238 \end{center} | |

1239 \caption{Algorithm mp\_init\_copy} | |

1240 \end{figure} | |

1241 | |

1242 \textbf{Algorithm mp\_init\_copy.} | |

1243 This algorithm will initialize an mp\_int variable and copy another previously initialized mp\_int variable into it. As | |

1244 such this algorithm will perform two operations in one step. | |

1245 | |

1246 EXAM,bn_mp_init_copy.c | |

1247 | |

1248 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that | |

1249 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call | |

1250 and \textbf{a} will be left intact. | |

1251 | |

1252 \section{Zeroing an Integer} | |

1253 Reseting an mp\_int to the default state is a common step in many algorithms. The mp\_zero algorithm will be the algorithm used to | |

1254 perform this task. | |

1255 | |

1256 \begin{figure}[here] | |

1257 \begin{center} | |

1258 \begin{tabular}{l} | |

1259 \hline Algorithm \textbf{mp\_zero}. \\ | |

1260 \textbf{Input}. An mp\_int $a$ \\ | |

1261 \textbf{Output}. Zero the contents of $a$ \\ | |

1262 \hline \\ | |

1263 1. $a.used \leftarrow 0$ \\ | |

1264 2. $a.sign \leftarrow$ MP\_ZPOS \\ | |

1265 3. for $n$ from 0 to $a.alloc - 1$ do \\ | |

1266 \hspace{3mm}3.1 $a_n \leftarrow 0$ \\ | |

1267 \hline | |

1268 \end{tabular} | |

1269 \end{center} | |

1270 \caption{Algorithm mp\_zero} | |

1271 \end{figure} | |

1272 | |

1273 \textbf{Algorithm mp\_zero.} | |

1274 This algorithm simply resets a mp\_int to the default state. | |

1275 | |

1276 EXAM,bn_mp_zero.c | |

1277 | |

1278 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the | |

1279 \textbf{sign} variable is set to \textbf{MP\_ZPOS}. | |

1280 | |

1281 \section{Sign Manipulation} | |

1282 \subsection{Absolute Value} | |

1283 With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute | |

1284 the absolute value of an mp\_int. | |

1285 | |

1286 \newpage\begin{figure}[here] | |

1287 \begin{center} | |

1288 \begin{tabular}{l} | |

1289 \hline Algorithm \textbf{mp\_abs}. \\ | |

1290 \textbf{Input}. An mp\_int $a$ \\ | |

1291 \textbf{Output}. Computes $b = \vert a \vert$ \\ | |

1292 \hline \\ | |

1293 1. Copy $a$ to $b$. (\textit{mp\_copy}) \\ | |

1294 2. If the copy failed return(\textit{MP\_MEM}). \\ | |

1295 3. $b.sign \leftarrow MP\_ZPOS$ \\ | |

1296 4. Return(\textit{MP\_OKAY}) \\ | |

1297 \hline | |

1298 \end{tabular} | |

1299 \end{center} | |

1300 \caption{Algorithm mp\_abs} | |

1301 \end{figure} | |

1302 | |

1303 \textbf{Algorithm mp\_abs.} | |

1304 This algorithm computes the absolute of an mp\_int input. First it copies $a$ over $b$. This is an example of an | |

1305 algorithm where the check in mp\_copy that determines if the source and destination are equal proves useful. This allows, | |

1306 for instance, the developer to pass the same mp\_int as the source and destination to this function without addition | |

1307 logic to handle it. | |

1308 | |

1309 EXAM,bn_mp_abs.c | |

1310 | |

1311 \subsection{Integer Negation} | |

1312 With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute | |

1313 the negative of an mp\_int input. | |

1314 | |

1315 \begin{figure}[here] | |

1316 \begin{center} | |

1317 \begin{tabular}{l} | |

1318 \hline Algorithm \textbf{mp\_neg}. \\ | |

1319 \textbf{Input}. An mp\_int $a$ \\ | |

1320 \textbf{Output}. Computes $b = -a$ \\ | |

1321 \hline \\ | |

1322 1. Copy $a$ to $b$. (\textit{mp\_copy}) \\ | |

1323 2. If the copy failed return(\textit{MP\_MEM}). \\ | |

1324 3. If $a.used = 0$ then return(\textit{MP\_OKAY}). \\ | |

1325 4. If $a.sign = MP\_ZPOS$ then do \\ | |

1326 \hspace{3mm}4.1 $b.sign = MP\_NEG$. \\ | |

1327 5. else do \\ | |

1328 \hspace{3mm}5.1 $b.sign = MP\_ZPOS$. \\ | |

1329 6. Return(\textit{MP\_OKAY}) \\ | |

1330 \hline | |

1331 \end{tabular} | |

1332 \end{center} | |

1333 \caption{Algorithm mp\_neg} | |

1334 \end{figure} | |

1335 | |

1336 \textbf{Algorithm mp\_neg.} | |

1337 This algorithm computes the negation of an input. First it copies $a$ over $b$. If $a$ has no used digits then | |

1338 the algorithm returns immediately. Otherwise it flips the sign flag and stores the result in $b$. Note that if | |

1339 $a$ had no digits then it must be positive by definition. Had step three been omitted then the algorithm would return | |

1340 zero as negative. | |

1341 | |

1342 EXAM,bn_mp_neg.c | |

1343 | |

1344 \section{Small Constants} | |

1345 \subsection{Setting Small Constants} | |

1346 Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful. | |

1347 | |

1348 \begin{figure}[here] | |

1349 \begin{center} | |

1350 \begin{tabular}{l} | |

1351 \hline Algorithm \textbf{mp\_set}. \\ | |

1352 \textbf{Input}. An mp\_int $a$ and a digit $b$ \\ | |

1353 \textbf{Output}. Make $a$ equivalent to $b$ \\ | |

1354 \hline \\ | |

1355 1. Zero $a$ (\textit{mp\_zero}). \\ | |

1356 2. $a_0 \leftarrow b \mbox{ (mod }\beta\mbox{)}$ \\ | |

1357 3. $a.used \leftarrow \left \lbrace \begin{array}{ll} | |

1358 1 & \mbox{if }a_0 > 0 \\ | |

1359 0 & \mbox{if }a_0 = 0 | |

1360 \end{array} \right .$ \\ | |

1361 \hline | |

1362 \end{tabular} | |

1363 \end{center} | |

1364 \caption{Algorithm mp\_set} | |

1365 \end{figure} | |

1366 | |

1367 \textbf{Algorithm mp\_set.} | |

1368 This algorithm sets a mp\_int to a small single digit value. Step number 1 ensures that the integer is reset to the default state. The | |

1369 single digit is set (\textit{modulo $\beta$}) and the \textbf{used} count is adjusted accordingly. | |

1370 | |

1371 EXAM,bn_mp_set.c | |

1372 | |

1373 Line @21,mp_[email protected] calls mp\_zero() to clear the mp\_int and reset the sign. Line @22,MP_[email protected] copies the digit | |

1374 into the least significant location. Note the usage of a new constant \textbf{MP\_MASK}. This constant is used to quickly | |

1375 reduce an integer modulo $\beta$. Since $\beta$ is of the form $2^k$ for any suitable $k$ it suffices to perform a binary AND with | |

1376 $MP\_MASK = 2^k - 1$ to perform the reduction. Finally line @23,a->[email protected] will set the \textbf{used} member with respect to the | |

1377 digit actually set. This function will always make the integer positive. | |

1378 | |

1379 One important limitation of this function is that it will only set one digit. The size of a digit is not fixed, meaning source that uses | |

1380 this function should take that into account. Only trivially small constants can be set using this function. | |

1381 | |

1382 \subsection{Setting Large Constants} | |

1383 To overcome the limitations of the mp\_set algorithm the mp\_set\_int algorithm is ideal. It accepts a ``long'' | |

1384 data type as input and will always treat it as a 32-bit integer. | |

1385 | |

1386 \begin{figure}[here] | |

1387 \begin{center} |