Mercurial > dropbear
changeset 643:a362b62d38b2 dropbear-tfm
Add tomsfastmath from git rev bfa4582842bc3bab42e4be4aed5703437049502a
with Makefile.in renamed
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/LICENSE Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,36 @@ +TomsFastMath is licensed under DUAL licensing terms. + +Choose and use the license of your needs. + +[LICENSE #1] + +TomsFastMath is public domain. As should all quality software be. + +Tom St Denis + +[/LICENSE #1] + +[LICENSE #2] + + DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE + Version 2, December 2004 + + Copyright (C) 2004 Sam Hocevar <[email protected]> + + Everyone is permitted to copy and distribute verbatim or modified + copies of this license document, and changing it is allowed as long + as the name is changed. + + DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. You just DO WHAT THE FUCK YOU WANT TO. + +[/LICENSE #2] + +-- Mark Karpel�s & Steffen Jaeckel + +Note some ideas were borrowed from LibTomMath and OpenSSL. All of the code is original or ported +from LibTomMath [no code was ported from OpenSSL]. + +-- Tom St Denis
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/SPONSORS Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,5 @@ +Development of TomsFastMath was sponsored by three groups. Two companies that use LTC and LTM commercially +and one individual who decided he wanted to help out by being generous. + +Thanks goes to them [though they wished to remain anonymous] and people like them. +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/changes.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,95 @@ +March 14th, 2007 +0.12 -- Christophe Devine contributed MIPS asm w00t + ++ quick release to get the MIPS code out there + +March 10th, 2007 +0.11 -- re-org'ed the source tree, it's cooler now + -- cleaned up mul/sqr so there is only one file per unit, should help building with older compilers and/or on slower boxes + -- [CRI] optimized fp_read_unsigned_bin + -- fixed -0 ... again ... I now have less hair on my head. + -- [CRI] renamed bn_reverse() -> fp_reverse() + +November 1st, 2006 +0.10 -- Unrolled Montgomery for 1..16 digits with TFM_SMALL_MONT_SET between 10% and 25% speedup depending on size. + -- fixed fp_sqr_comba.c so it builds in ISO C mode [Andreas Lange] + -- [email protected] pointed out fp_radix_size() had a few typos that affected correctness. Fixed. + -- Added support for ECC performance, e.g. define "-DTFM_ALREADY_SET -DTFM_ECC192" and it will disable + all of the unrolled code EXCEPT what is required for ECC P-192. It autodetects 32/64-bit platforms too. It's super neato. + Support for 192, 224, 256, 384 and 521 bit curves through the defines [see tfm.h] + -- AVR32 support added, define TFM_AVR32 to enable + +April 4th, 2006 +0.09 -- Bruce Guenter suggested I use --tag=CC for libtool builds where the compiler may think it's C++. + -- Added support for k=1 in exptmod for RSA exponents. Makes it more competitive with other libraries + -- added cutoffs to the comba32 sqr/mul code to better handle 640, 786 and 896 bit operands (e.g. for RSA-1280, RSA-1536 and RSA-1792), really no effect + for 64-bit boxes as these represent 1280, 1536 and 1792 bit operands (not likely to be invoked). + -- Removed karatsuba from the mul/sqr since they're not useful and slow. + -- added 20, 24 and 28 digit multipliers for oddsized RSA support. You can easily disable them by uncommenting the TFM_SQRXX and TFM_MULXX lines in + tfm.h to save space. Now GMP and TFM are roughly the same speed on the Pentium-M for RSA with LTC. + -- unrolled SSE2 code and optimize for platforms with load/store pipes (e.g. can store and load in a cycle). Got 4% or so boost on my Dothan laptop (marginal improvement on a P4 Prescott) + +Jan 26th, 2006 +0.08 -- Fixed a bug in the generic mult/sqr where we overflowed by one digit + +November 18th, 2005 +0.07 -- Fixes to fp_mul and fp_sqr to clean up the handling of the defines, fix to tfm.h to also clear up the prototypes. + -- Updates to build and run on a IBM PPC 405 [using GCC 3.4.4] + -- Made the "make" command renamable in the build system + +October 31st, 2005 +0.06 -- fixed fp_mul() and fp_sqr() to trim digits when overflows would occur. Produces numerically inprecise results + (e.g. the lower FP_SIZE digits) but shouldn't segfault at least ;-) + -- Updated the combas so you can turn on and off specific unrolled loops at build time + -- Michael Heyman reported a bug in s_fp_sub() that was pretty substantial and a bug in fp_montgomery_calc_normalization(). Fixed. + +August 1st, 2005 +0.05 -- Quick fix to the fp_invmod.c code to let it handle even moduli [required for LTC] + -- Added makefile.shared to make shared objects [required for LTC] + -- Improved makefiles to make them way more configurable + -- Added timing resistant fp_exptmod() enabled with TFM_TIMING_RESISTANT + +July 23rd, 2005 +0.04 -- Fixed bugs in the SSE2 squaring code + -- Rewrote the multipliers to be optimized for small inputs + -- Nelson Bolyard of the NSS crew submitted [among other things] new faster Montgomery reduction + code. It brings the performance for small numbers on the AMD64 and all numbers on the P4 + to a new level. Thanks! + -- Added missing ARM support for fp_montgomery_reduce.c that the NSS folk left off, Officially + the ARM code is for v4 and above WITH the "M" multiplier support (e.g. umlal instruction) + -- Added PPC32 support, define TFM_PPC32 to enable it, I used the "PowerPC 6xx" instruction + databook for reference. Does not require altivec. Should be fairly portable to the other + 32-bit PPCs provided they have mullw and mulhwu instructions. + [Note: porting the macros to PPC64 should be trivial, anyone with a shell to lend... email me!] + -- Rewrote the config a bit in tfm.h so you can better choose which set of "oh my god that's huge" code to + enable for your task. "generic" functions are ALWAYS included which are smaller but will cover the + gaps in the coverage for ya. + -- The PPC32 code has been verified to function on a Darwin box running GCC 2.95.2 + [Thanks to the folk at PeerSec for lending me a shell to use] + -- Fixed a bug in fp_exptmod() where if the exponent was negative AND the destination the output + would have the sign set to FP_NEG. + +March 1st, 2005 +0.03 -- Optimized squaring + -- Applied new license header to all files (still PD) + +September 18th, 2004 +0.02 -- Added TFM_LARGE to turn on/off 16x combas to save even more space. + This also helps prevent killing the cache on smaller cpus. + -- Cast memset to void in fp_init() to catch people who misuse the function (e.g. expect return) + Thanks to Johan Lindh + -- Cleaned up x86-64 support [faster montgomery reductions] + -- Autodetects x86-32 and x86-64 and enables it's asm now + -- Made test demo build cleaner in multilib platforms [e.g. mixed 32/64 bits] + -- Fix to fp_mod to ensure that remainder is of the same sign as the modulus. + -- Fixed bug in fp_montgomery_calc_normalization for single digit moduli + -- cleaned up ISO C macros in comba/mont to avoid branches [works best with GCC 3.4.x branch] + -- Added more testing to tfm.h to help detect misconfigured builds + -- Added TFM_NO_ASM which forces ASM off [even if it was autodetected]. + -- Added fp_radix_size() to API + -- Cleaned up demo/test.c to build with far fewer warnings (mostly %d => %lu fixes) + -- fp_exptmod() now supports negative exponent and base>modulus cases + -- Added fp_ident() which gives a string showing how TFM was configured. Useful for debuging... + -- fix gen.pl script so it includes the whole source tree now + +August 25th, 2004 +0.01 -- Initial Release
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/demo/rsa.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,83 @@ +#include "tfm.h" +#include <time.h> + +int main(void) +{ + fp_int d, e, n, c, m, e_m; + clock_t t1; + int x; + + /* read in the parameters */ + fp_read_radix(&n, "ce032e860a9809a5ec31e4b0fd4b546f8c40043e3d2ec3d8f49d8f2f3dd19e887094ee1af75caa1c2e6cd9ec78bf1dfd6280002ac8c30ecd72da2e4c59a28a9248048aaae2a8fa627f71bece979cebf9f8eee2bd594d4a4f2e791647573c7ec1fcbd320d3825be3fa8a17c97086fdae56f7086ce512b81cc2fe44161270ec5e9", 16); + fp_read_radix(&e, "10001", 16); + fp_read_radix(&m, "39f5a911250f45b99390e2df322b33c729099ab52b5879d06b00818cce57c649a66ed7eb6d8ae214d11caf9c81e83a7368cf0edb2b71dad791f13fecf546123b40377851e67835ade1d6be57f4de18a62db4cdb1880f4ab2e6a29acfd85ca22a13dc1f6fee2621ef0fc8689cd738e6f065c033ec7c148d8d348688af83d6f6bd", 16); + fp_read_radix(&c, "9ff70ea6968a04530e6b06bf01aa937209cc8450e76ac19477743de996ba3fb445923c947f8d0add8c57efa51d15485309918459da6c1e5a97f215193b797dce98db51bdb4639c2ecfa90ebb051e3a2daeffd27a7d6e62043703a7b15e0ada5170427b63099cd01ef52cd92d8723e5774bea32716aaa7f5adbae817fb12a5b50", 16); + + /* test it */ + fp_exptmod(&m, &e, &n, &e_m); + if (fp_cmp(&e_m, &c)) { + char buf[1024]; + printf("Encrypted text not equal\n"); + fp_toradix(&e_m, buf, 16); + printf("e_m == %s\n", buf); + return 0; + } + + printf("CLOCKS_PER_SEC = %llu\n", (unsigned long long)CLOCKS_PER_SEC); + t1 = clock(); + for (x = 0; x < 1000; x++) { + fp_exptmod(&m, &e, &n, &e_m); + } + t1 = clock() - t1; + printf("1000 RSA operations took %10.5g seconds\n", (double)t1 / (double)CLOCKS_PER_SEC); + printf("RSA encrypt/sec %10.5g\n", (double)CLOCKS_PER_SEC / ((double)t1 / 1000.0) ); + + /* read in the parameters */ + fp_read_radix(&n, "a7f30e2e04d31acc6936916af1e404a4007adfb9e97864de28d1c7ba3034633bee2cd9d5da3ea3cdcdc9a6f3daf5702ef750f4c3aadb0e27410ac04532176795995148cdb4691bd09a8a846e3e24e073ce2f89b34dfeb2ee89b646923ca60ee3f73c4d5397478380425e7260f75dfdc54826e160395b0889b1162cf115a9773f", 16); + fp_read_radix(&d, "16d166f3c9a404d810d3611e6e8ed43293fe1db75c8906eb4810785a4b82529929dade1db7f11ac0335d5a59773e3167b022479eedefa514a0399db5c900750a56323cf9f5b0f21e7d60a46d75f3fcaabf30a63cbe34048b741a57ac36a13914afda798709dea5771f8d456cf72ec5f3afc1d88d023de40311143a36e7028739", 16); + fp_read_radix(&c, "7d216641c32543f5b8428bdd0b11d819cfbdb16f1df285247f677aa4d44de62ab064f4a0d060ec99cb94aa398113a4317f2c550d0371140b0fd2c88886cac771812e72faad4b7adf495b9b850b142ccd7f45c0a27f164c8c7731731c0015f69d0241812e769d961054618aeb9e8e8989dba95714a2cf56c9e525c5e34b5812dd", 16); + fp_read_radix(&m, "5f323bf0b394b98ffd78727dc9883bb4f42287def6b60fa2a964b2510bc55d61357bf5a6883d2982b268810f8fef116d3ae68ebb41fd10d65a0af4bec0530eb369f37c14b55c3be60223b582372fb6589b648d5a0c7252d1ae2dae5809785d993e9e5d0c4d9b0bcba0cde0d6671734747fba5483c735e1dab7df7b10ec6f62d8", 16); + + /* test it */ + fp_exptmod(&c, &d, &n, &e_m); + if (fp_cmp(&e_m, &m)) { + char buf[1024]; + printf("Decrypted text not equal\n"); + fp_toradix(&e_m, buf, 16); + printf("e_m == %s\n", buf); + return 0; + } + + t1 = clock(); + for (x = 0; x < 100; x++) { + fp_exptmod(&c, &d, &n, &e_m); + } + t1 = clock() - t1; + printf("100 RSA operations took %10.5g seconds\n", (double)t1 / (double)CLOCKS_PER_SEC); + printf("RSA decrypt/sec %10.5g\n", (double)CLOCKS_PER_SEC / ((double)t1 / 100.0) ); + + + /* test half size */ + fp_rshd(&n, n.used >> 1); + fp_rshd(&d, d.used >> 1); + fp_rshd(&c, c.used >> 1); + printf("n.used == %4d bits\n", n.used * DIGIT_BIT); + + /* ensure n is odd */ + n.dp[0] |= 1; + t1 = clock(); + for (x = 0; x < 100; x++) { + fp_exptmod(&c, &d, &n, &e_m); + } + t1 = clock() - t1; + printf("100 RSA-half operations took %10.5g seconds\n", (double)t1 / (double)CLOCKS_PER_SEC); + printf("RSA decrypt/sec %10.5g (estimate of RSA-1024-CRT) \n", (double)CLOCKS_PER_SEC / ((double)t1 / 50.0) ); + + + + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/demo/stest.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,148 @@ +/* A simple static test program. */ +#include <tfm.h> + +#ifdef GBA_MODE +#include <gba.h> + #define DISPLAY(x) modetxt_puts(vfb, x, 1) +#endif + +#ifndef DISPLAY + #define DISPLAY(x) printf(x) +#endif + + +#ifdef GBA_MODE +int c_main(void) +#else +int main(void) +#endif +{ + fp_int a,b,c,d,e,f; + fp_digit dp; + + fp_init(&a); + fp_init(&b); + fp_init(&c); + fp_init(&d); + fp_init(&e); + fp_init(&f); + +#ifdef GBA_MODE + install_common(); + modetxt_init(); + modetxt_gotoxy(0,0); +#endif + + /* test multiplication */ + fp_read_radix(&a, "3453534534535345345341230891273", 10); + fp_read_radix(&b, "2394873294871238934718923" , 10); + fp_read_radix(&c, "8270777629674273015508507050766235312931312159028658979", 10); + fp_mul(&a, &b, &d); + if (fp_cmp(&c, &d)) { + DISPLAY("mul failed\n"); + return 0; + } else { + DISPLAY("mul passed\n"); + } + + /* test multiplication */ + fp_read_radix(&a, "30481290320498235987349712308523652378643912563478232907782361237864278207235782364578264891274789264278634289739", 10); + fp_read_radix(&b, "48761478126387263782638276327836287632836278362837627838736278362923698724823749238732" , 10); + fp_read_radix(&c, "1486312771227034563307950634490737985563993459700941115664257275795366623795590136120579100118233580357115074068815507257715906295105536107921754177810976863679300283932188006885811950341132768970948", 10); + fp_mul(&a, &b, &d); + if (fp_cmp(&c, &d)) { + DISPLAY("mul failed\n"); + return 0; + } else { + DISPLAY("mul passed\n"); + } + + /* test multiplication */ + fp_read_radix(&a, "115792089237316195423570985008687907853269984665640564039457584007913129639935", 10); + fp_read_radix(&b, "174224571863520493293247799005065324265471" , 10); + fp_read_radix(&c, "20173827172553973356686868531273530268200710714389071377794102651988800859098544338487575161443744102709980552583184385", 10); + fp_mul(&a, &b, &d); + if (fp_cmp(&c, &d)) { + DISPLAY("mul failed\n"); + return 0; + } else { + DISPLAY("mul passed\n"); + } + + /* test squaring */ + fp_read_radix(&a, "298723982748923478923473927489237289347238947238947238947238972893", 10); + fp_read_radix(&b, "89236017869379132235512787068367546521309689412262624434964313994127411682542855190667724226920696163962644836740110835385588789449" , 10); + fp_sqr(&a, &c); + if (fp_cmp(&c, &b)) { + DISPLAY("sqr failed\n"); + return 0; + } else { + DISPLAY("sqr passed\n"); + } + + fp_read_radix(&a, "397823894238973128942895123894327123941724927848927349274897238978927593487012378490184789429812734982738972389", 10); + fp_read_radix(&b, "158263850827461677491961439999264901067636282938352531932899298293270945997930087353471903166601507321298827087008336951419604640736464667188494668962822678461626245753696845719301945679092882499787869509090904187704367321" , 10); + fp_sqr(&a, &c); + if (fp_cmp(&c, &b)) { + DISPLAY("sqr failed\n"); + return 0; + } else { + DISPLAY("sqr passed\n"); + } + + fp_read_radix(&a, "13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084095", 10); + fp_read_radix(&b, "179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474097562152033539671286128252223189553839160721441767298250321715263238814402734379959506792230903356495130620869925267845538430714092411695463462326211969025" , 10); + fp_sqr(&a, &c); + if (fp_cmp(&c, &b)) { + DISPLAY("sqr failed\n"); + return 0; + } else { + DISPLAY("sqr passed\n"); + } + + + /* montgomery reductions */ + fp_read_radix(&a, "234892374892374893489123428937892781237863278637826327367637836278362783627836783678363", 10); + fp_read_radix(&b, "4447823492749823749234123489273987393983289319382762756425425425642727352327452374521", 10); + fp_read_radix(&c, "2396271882990732698083317035605836523697277786556053771759862552557086442129695099100", 10); + fp_montgomery_setup(&b, &dp); + fp_montgomery_reduce(&a, &b, dp); + if (fp_cmp(&a, &c)) { + DISPLAY("mont failed\n"); + return 0; + } else { + DISPLAY("mont passed\n"); + } + + fp_read_radix(&a, "2348923748923748934891234456645654645645684576353428937892781237863278637826327367637836278362783627836783678363", 10); + fp_read_radix(&b, "444782349274982374923412348927398739398328931938276275642542542564272735232745237452123424324324444121111119", 10); + fp_read_radix(&c, "45642613844554582908652603086180267403823312390990082328515008314514368668691233331246183943400359349283420", 10); + fp_montgomery_setup(&b, &dp); + fp_montgomery_reduce(&a, &b, dp); + if (fp_cmp(&a, &c)) { + DISPLAY("mont failed\n"); + return 0; + } else { + DISPLAY("mont passed\n"); + } + + fp_read_radix(&a, "234823424242342923748923748934891234456645654645645684576353424972378234762378623891236834132352375235378462378489378927812378632786378263273676378362783627555555555539568389052478124618461834763837685723645827529034853490580134568947341278498542893481762349723907847892983627836783678363", 10); + fp_read_radix(&b, "44478234927456563455982374923412348927398739398328931938276275642485623481638279025465891276312903262837562349056234783648712314678120389173890128905425242424239784256427", 10); + fp_read_radix(&c, "33160865265453361650564031464519042126185632333462754084489985719613480783282357410514898819797738034600484519472656152351777186694609218202276509271061460265488348645081", 10); + fp_montgomery_setup(&b, &dp); + fp_montgomery_reduce(&a, &b, dp); + if (fp_cmp(&a, &c)) { + DISPLAY("mont failed\n"); + return 0; + } else { + DISPLAY("mont passed\n"); + } + + + return 0; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/demo/test.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,878 @@ +/* TFM demo program */ +#include <tfm.h> + +void draw(fp_int *a) +{ + int x; + printf("%d, %d, ", a->used, a->sign); + for (x = a->used - 1; x >= 0; x--) { + printf("%08lx ", a->dp[x]); + } + printf("\n"); +} + +int myrng(unsigned char *dst, int len, void *dat) +{ + int x; + for (x = 0; x < len; x++) dst[x] = rand() & 0xFF; + return len; +} + +/* RDTSC from Scott Duplichan */ +static ulong64 TIMFUNC (void) + { + #if defined __GNUC__ + #if defined(INTEL_CC) + ulong64 a; + asm ("rdtsc":"=A"(a)); + return a; + #elif defined(__i386__) || defined(__x86_64__) + ulong64 a; + __asm__ __volatile__ ("rdtsc\nmovl %%eax,%0\nmovl %%edx,4+%0\n"::"m"(a):"%eax","%edx"); + return a; + #elif defined(TFM_PPC32) + unsigned long a, b; + __asm__ __volatile__ ("mftbu %1 \nmftb %0\n":"=r"(a), "=r"(b)); + return (((ulong64)b) << 32ULL) | ((ulong64)a); + #elif defined(TFM_AVR32) + FILE *in; + char buf[20]; + in = fopen("/sys/devices/system/cpu/cpu0/pccycles", "r"); + fgets(buf, 20, in); + fclose(in); + return strtoul(buf, NULL, 10); + #else /* gcc-IA64 version */ + unsigned long result; + __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory"); + while (__builtin_expect ((int) result == -1, 0)) + __asm__ __volatile__("mov %0=ar.itc" : "=r"(result) :: "memory"); + return result; + #endif + + // Microsoft and Intel Windows compilers + #elif defined _M_IX86 + __asm rdtsc + #elif defined _M_AMD64 + return __rdtsc (); + #elif defined _M_IA64 + #if defined __INTEL_COMPILER + #include <ia64intrin.h> + #endif + return __getReg (3116); + #else + #error need rdtsc function for this build + #endif + } + + char cmd[4096], buf[4096]; + +int main(void) +{ + fp_int a,b,c,d,e,f; + fp_digit fp; + int n, err; + unsigned long expt_n, add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, inv_n, + div2_n, mul2_n, add_d_n, sub_d_n, mul_d_n, t, cnt, rr, ix; + ulong64 t1, t2; + + srand(time(NULL)); + printf("TFM Ident string:\n%s\n\n", fp_ident()); + fp_zero(&b); fp_zero(&c); fp_zero(&d); fp_zero(&e); fp_zero(&f); + fp_zero(&a); draw(&a); + + /* test set and simple shifts */ + printf("Testing mul/div 2\n"); + fp_set(&a, 1); draw(&a); + for (n = 0; n <= DIGIT_BIT; n++) { + fp_mul_2(&a, &a); printf("(%d) ", fp_count_bits(&a)); + draw(&a); + + } + for (n = 0; n <= (DIGIT_BIT + 1); n++) { + fp_div_2(&a, &a); + draw(&a); + } + fp_set(&a, 1); + + /* test lshd/rshd */ + printf("testing lshd/rshd\n"); + fp_lshd(&a, 3); draw(&a); + fp_rshd(&a, 3); draw(&a); + + /* test more complicated shifts */ + printf("Testing mul/div 2d\n"); + fp_mul_2d(&a, DIGIT_BIT/2, &a); draw(&a); + fp_div_2d(&a, DIGIT_BIT/2, &a, NULL); draw(&a); + + fp_mul_2d(&a, DIGIT_BIT + DIGIT_BIT/2, &a); draw(&a); + fp_div_2d(&a, DIGIT_BIT + DIGIT_BIT/2, &a, NULL); draw(&a); + + /* test neg/abs */ + printf("testing neg/abs\n"); + fp_neg(&a, &a); draw(&a); + fp_neg(&a, &a); draw(&a); + fp_neg(&a, &a); draw(&a); + fp_abs(&a, &a); draw(&a); + + /* test comparisons */ + fp_set(&b, 3); + fp_set(&c, 4); fp_neg(&c, &c); + fp_set(&d, 1); + printf("Testing compares\n%d, %d, %d, %d\n", fp_cmp(&a, &b), fp_cmp(&a, &c), fp_cmp(&a, &d), fp_cmp(&b, &c)); + + /* test add/sub */ + printf("Testing add/sub \n"); + fp_set(&a, ((fp_digit)1)<<(DIGIT_BIT-1)); draw(&a); + fp_set(&b, ((fp_digit)1)<<(DIGIT_BIT-2)); + fp_add(&a, &b, &a); draw(&a); + fp_add(&a, &b, &a); draw(&a); + fp_add(&a, &b, &a); draw(&a); + printf("sub...\n"); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + fp_read_radix(&a, "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF000000000000000000000001", 16); draw(&a); + fp_sub_d(&a, 3, &b); draw(&b); + fp_read_radix(&a, "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFE", 16); + printf("cmp returns: %d, ", fp_cmp(&a, &b)); fp_sub(&a, &b, &a); draw(&a); + + /* test mul_d */ + printf("Testing mul_d and div_d\n"); + fp_set(&a, 1); + fp_mul_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a); draw(&a); + fp_mul_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a); draw(&a); + fp_mul_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a); draw(&a); + printf("div_d\n"); + fp_div_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a, NULL); draw(&a); + fp_div_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a, NULL); draw(&a); + fp_div_d(&a, ((fp_digit)1)<<(DIGIT_BIT/2), &a, NULL); draw(&a); + + /* testing read radix */ + printf("Testing read_radix\n"); + fp_read_radix(&a, "123456789012345678901234567890", 16); draw(&a); + +#if 0 + /* test mont */ + printf("Montgomery test #1\n"); + fp_set(&a, 0x1234567ULL); + fp_montgomery_setup(&a, &fp); + fp_montgomery_calc_normalization(&b, &a); + + fp_read_radix(&d, "123456789123", 16); + for (n = 0; n < 1000000; n++) { + fp_add_d(&d, 1, &d); fp_sqrmod(&d, &a, &d); + fp_mul(&d, &b, &c); + fp_montgomery_reduce(&c, &a, fp); + if (fp_cmp(&c, &d) != FP_EQ) { + printf("Failed mont %d\n", n); + draw(&a); + draw(&d); + draw(&c); + return EXIT_FAILURE; + } + } + printf("Passed.\n"); + + printf("Montgomery test #2\n"); + fp_set(&a, 0x1234567ULL); + fp_lshd(&a, 4); + fp_add_d(&a, 1, &a); + fp_montgomery_setup(&a, &fp); + fp_montgomery_calc_normalization(&b, &a); + + fp_read_radix(&d, "123456789123", 16); + for (n = 0; n < 1000000; n++) { + fp_add_d(&d, 1, &d); fp_sqrmod(&d, &a, &d); + fp_mul(&d, &b, &c); + fp_montgomery_reduce(&c, &a, fp); + if (fp_cmp(&c, &d) != FP_EQ) { + printf("Failed mont %d\n", n); + draw(&a); + draw(&d); + draw(&c); + return EXIT_FAILURE; + } + } + printf("Passed.\n"); + + /* test for size */ + for (ix = 8*DIGIT_BIT; ix < 10*DIGIT_BIT; ix++) { + printf("Testing (not safe-prime): %9lu bits \r", ix); fflush(stdout); + err = fp_prime_random_ex(&a, 8, ix, (rand()&1)?TFM_PRIME_2MSB_OFF:TFM_PRIME_2MSB_ON, myrng, NULL); + if (err != FP_OKAY) { + printf("failed with err code %d\n", err); + return EXIT_FAILURE; + } + if ((unsigned long)fp_count_bits(&a) != ix) { + printf("Prime is %d not %lu bits!!!\n", fp_count_bits(&a), ix); + return EXIT_FAILURE; + } + } + printf("\n\n"); +#endif + +#ifdef TESTING +goto testing; +#endif + +#if 1 + +t1 = TIMFUNC(); +sleep(1); +printf("Ticks per second: %llu\n", TIMFUNC() - t1); + +goto multtime; + /* do some timings... */ + printf("Addition:\n"); + for (t = 2; t <= FP_SIZE/2; t += 2) { + fp_zero(&a); + fp_zero(&b); + fp_zero(&c); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix; + b.dp[ix] = ix; + } + a.used = t; + b.used = t; + t2 = -1; + for (ix = 0; ix < 25000; ++ix) { + t1 = TIMFUNC(); + fp_add(&a, &b, &c); fp_add(&a, &b, &c); + fp_add(&a, &b, &c); fp_add(&a, &b, &c); + fp_add(&a, &b, &c); fp_add(&a, &b, &c); + fp_add(&a, &b, &c); fp_add(&a, &b, &c); + t2 = (TIMFUNC() - t1)>>3; + if (t1<t2) { --ix; t2 = t1; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } +multtime: + printf("Multiplication:\n"); + for (t = 2; t < FP_SIZE/2; t += 2) { + fp_zero(&a); + fp_zero(&b); + fp_zero(&c); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix; + b.dp[ix] = ix; + } + a.used = t; + b.used = t; + t2 = -1; + for (ix = 0; ix < 100; ++ix) { + t1 = TIMFUNC(); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + fp_mul(&a, &b, &c); fp_mul(&a, &b, &c); + t2 = (TIMFUNC() - t1)>>7; + if (t1<t2) { --ix; t2 = t1; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } +//#else +sqrtime: + printf("Squaring:\n"); + for (t = 2; t < FP_SIZE/2; t += 2) { + fp_zero(&a); + fp_zero(&b); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix; + } + a.used = t; + t2 = -1; + for (ix = 0; ix < 100; ++ix) { + t1 = TIMFUNC(); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + fp_sqr(&a, &b); fp_sqr(&a, &b); + t2 = (TIMFUNC() - t1)>>7; + if (t1<t2) { --ix; t2 = t1; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } +invmodtime: + printf("Invmod:\n"); + for (t = 2; t < FP_SIZE/2; t += 2) { + fp_zero(&a); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix | 1; + } + a.used = t; + fp_zero(&b); + for (ix = 0; ix < t; ix++) { + b.dp[ix] = rand(); + } + b.used = t; + fp_clamp(&b); + fp_zero(&c); + t2 = -1; + for (ix = 0; ix < 100; ++ix) { + t1 = TIMFUNC(); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + fp_invmod(&b, &a, &c); + t2 = (TIMFUNC() - t1)>>6; + if (t1<t2) { --ix; t2 = t1; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } +//#else +monttime: + printf("Montgomery:\n"); + for (t = 2; t <= (FP_SIZE/2)-4; t += 2) { +// printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + fp_zero(&a); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix | 1; + } + a.used = t; + + fp_montgomery_setup(&a, &fp); + fp_sub_d(&a, 3, &b); + fp_sqr(&b, &b); + fp_copy(&b, &c); + fp_copy(&b, &d); + + t2 = -1; + for (ix = 0; ix < 100; ++ix) { + t1 = TIMFUNC(); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + fp_montgomery_reduce(&c, &a, &fp); + fp_montgomery_reduce(&d, &a, &fp); + t2 = (TIMFUNC() - t1)>>6; + fp_copy(&b, &c); + fp_copy(&b, &d); + if (t1<t2) { --ix; t2 = t1; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } +//#else +expttime: + printf("Exptmod:\n"); + + for (t = 512/DIGIT_BIT; t <= (FP_SIZE/2)-2; t += 256/DIGIT_BIT) { + fp_zero(&a); + fp_zero(&b); + fp_zero(&c); + for (ix = 0; ix < t; ix++) { + a.dp[ix] = ix+1; + b.dp[ix] = (fp_digit)rand() * (fp_digit)rand(); + c.dp[ix] = ix; + } + a.used = t; + b.used = t; + c.used = t; + + t2 = -1; + for (ix = 0; ix < 500; ++ix) { + t1 = TIMFUNC(); + fp_exptmod(&c, &b, &a, &d); + fp_exptmod(&c, &b, &a, &d); + t2 = (TIMFUNC() - t1)>>1; + fp_copy(&b, &c); + fp_copy(&b, &d); + if (t1<t2) { t2 = t1; --ix; } + } + printf("%5lu-bit: %9llu\n", t * DIGIT_BIT, t2); + } + return 0; +#endif + +return 0; +testing: + + fp_zero(&b); fp_zero(&c); fp_zero(&d); fp_zero(&e); fp_zero(&f); fp_zero(&a); + + + div2_n = mul2_n = inv_n = expt_n = lcm_n = gcd_n = add_n = + sub_n = mul_n = div_n = sqr_n = mul2d_n = div2d_n = cnt = add_d_n = sub_d_n= mul_d_n = 0; + + for (;;) { + printf("%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu/%4lu ", add_n, sub_n, mul_n, div_n, sqr_n, mul2d_n, div2d_n, gcd_n, lcm_n, expt_n, inv_n, div2_n, mul2_n, add_d_n, sub_d_n, mul_d_n); + fgets(cmd, 4095, stdin); + cmd[strlen(cmd)-1] = 0; + printf("%s ]\r",cmd); fflush(stdout); + if (!strcmp(cmd, "mul2d")) { ++mul2d_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); sscanf(buf, "%lu", &rr); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + + fp_mul_2d(&a, rr, &a); + a.sign = b.sign; + if (fp_cmp(&a, &b) != FP_EQ) { + printf("mul2d failed, rr == %lu\n",rr); + draw(&a); + draw(&b); + return 0; + } + } else if (!strcmp(cmd, "div2d")) { ++div2d_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); sscanf(buf, "%lu", &rr); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + + fp_div_2d(&a, rr, &a, &e); + a.sign = b.sign; + if (a.used == b.used && a.used == 0) { a.sign = b.sign = FP_ZPOS; } + if (fp_cmp(&a, &b) != FP_EQ) { + printf("div2d failed, rr == %lu\n",rr); + draw(&a); + draw(&b); + return 0; + } + } else if (!strcmp(cmd, "add")) { ++add_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); + fp_copy(&a, &d); + fp_add(&d, &b, &d); + if (fp_cmp(&c, &d) != FP_EQ) { + printf("add %lu failure!\n", add_n); +draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + + /* test the sign/unsigned storage functions */ + + rr = fp_signed_bin_size(&c); + fp_to_signed_bin(&c, (unsigned char *)cmd); + memset(cmd+rr, rand()&255, sizeof(cmd)-rr); + fp_read_signed_bin(&d, (unsigned char *)cmd, rr); + if (fp_cmp(&c, &d) != FP_EQ) { + printf("fp_signed_bin failure!\n"); + draw(&c); + draw(&d); + return 0; + } + + rr = fp_unsigned_bin_size(&c); + fp_to_unsigned_bin(&c, (unsigned char *)cmd); + memset(cmd+rr, rand()&255, sizeof(cmd)-rr); + fp_read_unsigned_bin(&d, (unsigned char *)cmd, rr); + if (fp_cmp_mag(&c, &d) != FP_EQ) { + printf("fp_unsigned_bin failure!\n"); + draw(&c); + draw(&d); + return 0; + } + + } else if (!strcmp(cmd, "sub")) { ++sub_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); + fp_copy(&a, &d); + fp_sub(&d, &b, &d); + if (fp_cmp(&c, &d) != FP_EQ) { + printf("sub %lu failure!\n", sub_n); +draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + } else if (!strcmp(cmd, "mul")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); +//continue; + fp_copy(&a, &d); + fp_mul(&d, &b, &d); ++mul_n; + if (fp_cmp(&c, &d) != FP_EQ) { + printf("mul %lu failure!\n", mul_n); +draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + } else if (!strcmp(cmd, "div")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&d, buf, 64); +// continue; + fp_div(&a, &b, &e, &f); ++div_n; + if (fp_cmp(&c, &e) != FP_EQ || fp_cmp(&d, &f) != FP_EQ) { + printf("div %lu failure!\n", div_n); +draw(&a);draw(&b);draw(&c);draw(&d); draw(&e); draw(&f); + return 0; + } + + } else if (!strcmp(cmd, "sqr")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); +// continue; + fp_copy(&a, &c); + fp_sqr(&c, &c); ++sqr_n; + if (fp_cmp(&b, &c) != FP_EQ) { + printf("sqr %lu failure!\n", sqr_n); +draw(&a);draw(&b);draw(&c); + return 0; + } + } else if (!strcmp(cmd, "gcd")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); +// continue; + fp_copy(&a, &d); + fp_gcd(&d, &b, &d); ++gcd_n; + d.sign = c.sign; + if (fp_cmp(&c, &d) != FP_EQ) { + printf("gcd %lu failure!\n", gcd_n); +draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + } else if (!strcmp(cmd, "lcm")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); +//continue; + fp_copy(&a, &d); + fp_lcm(&d, &b, &d); ++lcm_n; + d.sign = c.sign; + if (fp_cmp(&c, &d) != FP_EQ) { + printf("lcm %lu failure!\n", lcm_n); + draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + } else if (!strcmp(cmd, "expt")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&d, buf, 64); +// continue; + fp_copy(&a, &e); + fp_exptmod(&e, &b, &c, &e); ++expt_n; + if (fp_cmp(&d, &e) != FP_EQ) { + printf("expt %lu failure!\n", expt_n); + draw(&a);draw(&b);draw(&c);draw(&d); draw(&e); + return 0; + } + } else if (!strcmp(cmd, "invmod")) { + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&c, buf, 64); +//continue; + fp_invmod(&a, &b, &d); +#if 1 + fp_mulmod(&d,&a,&b,&e); ++inv_n; + if (fp_cmp_d(&e, 1) != FP_EQ) { +#else + if (fp_cmp(&d, &c) != FP_EQ) { +#endif + printf("inv [wrong value from MPI?!] failure\n"); + draw(&a);draw(&b);draw(&c);draw(&d); + return 0; + } + + } else if (!strcmp(cmd, "div2")) { ++div2_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fp_div_2(&a, &c); + if (fp_cmp(&c, &b) != FP_EQ) { + printf("div_2 %lu failure\n", div2_n); + draw(&a); + draw(&b); + draw(&c); + return 0; + } + } else if (!strcmp(cmd, "mul2")) { ++mul2_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fp_mul_2(&a, &c); + if (fp_cmp(&c, &b) != FP_EQ) { + printf("mul_2 %lu failure\n", mul2_n); + draw(&a); + draw(&b); + draw(&c); + return 0; + } + } else if (!strcmp(cmd, "add_d")) { ++add_d_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); sscanf(buf, "%lu", &ix); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fp_add_d(&a, ix, &c); + if (fp_cmp(&b, &c) != FP_EQ) { + printf("add_d %lu failure\n", add_d_n); + draw(&a); + draw(&b); + draw(&c); + printf("d == %lu\n", ix); + return 0; + } + } else if (!strcmp(cmd, "sub_d")) { ++sub_d_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); sscanf(buf, "%lu", &ix); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fp_sub_d(&a, ix, &c); + if (fp_cmp(&b, &c) != FP_EQ) { + printf("sub_d %lu failure\n", sub_d_n); + draw(&a); + draw(&b); + draw(&c); + printf("d == %lu\n", ix); + return 0; + } + } else if (!strcmp(cmd, "mul_d")) { ++mul_d_n; + fgets(buf, 4095, stdin); fp_read_radix(&a, buf, 64); + fgets(buf, 4095, stdin); sscanf(buf, "%lu", &ix); + fgets(buf, 4095, stdin); fp_read_radix(&b, buf, 64); + fp_mul_d(&a, ix, &c); + if (fp_cmp(&b, &c) != FP_EQ) { + printf("mul_d %lu failure\n", sub_d_n); + draw(&a); + draw(&b); + draw(&c); + printf("d == %lu\n", ix); + return 0; + } + } + + } +} + + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/filter.pl Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,30 @@ +#!/usr/bin/perl + +# we want to filter every between START_INS and END_INS out and then insert crap from another file (this is fun) + +$dst = shift; +$ins = shift; + +open(SRC,"<$dst"); +open(INS,"<$ins"); +open(TMP,">tmp.delme"); + +$l = 0; +while (<SRC>) { + if ($_ =~ /START_INS/) { + print TMP $_; + $l = 1; + while (<INS>) { + print TMP $_; + } + close INS; + } elsif ($_ =~ /END_INS/) { + print TMP $_; + $l = 0; + } elsif ($l == 0) { + print TMP $_; + } +} + +close TMP; +close SRC;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/gen.pl Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,18 @@ +#!/usr/bin/perl -w +# +# Generates a "single file" you can use to quickly +# add the whole source without any makefile troubles +# +use strict; + +open( OUT, ">mpi.c" ) or die "Couldn't open mpi.c for writing: $!"; +foreach my $filename (glob "*fp_*.c") { + next if ($filename eq "fp_sqr_comba_generic.c"); + open( SRC, "<$filename" ) or die "Couldn't open $filename for reading: $!"; + print OUT "/* Start: $filename */\n"; + print OUT while <SRC>; + print OUT "\n/* End: $filename */\n\n"; + close SRC or die "Error closing $filename after reading: $!"; +} +print OUT "\n/* EOF */\n"; +close OUT or die "Error closing mpi.c after writing: $!";
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/genlist.sh Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,9 @@ +#!/bin/bash +export a=`find . -type f | sort | grep "[.]/src" | grep "[.]c" | grep -v generators | sed -e 'sE\./EE' | sed -e 's/\.c/\.o/' | xargs` +perl ./parsenames.pl OBJECTS "$a" +export a=`find . -type f | grep [.]/src | grep [.]h | sed -e 'se\./ee' | xargs` +perl ./parsenames.pl HEADERS "$a" + +# $Source: /cvs/libtom/tomsfastmath/genlist.sh,v $ +# $Revision: 1.1 $ +# $Date: 2006/12/31 21:31:40 $
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/makefile.shared Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,109 @@ +#makefile for TomsFastMath +# +# +VERSION=0:12 + +CC=libtool --mode=compile --tag=CC gcc + +CFLAGS += -Wall -W -Wshadow -Isrc/headers + +ifndef IGNORE_SPEED + +CFLAGS += -O3 -funroll-all-loops + +#profiling +#PROF=-pg -g +#CFLAGS += $(PROF) + +#speed +CFLAGS += -fomit-frame-pointer + +endif + +#START_INS +OBJECTS=src/addsub/fp_add.o src/addsub/fp_add_d.o src/addsub/fp_addmod.o src/addsub/fp_cmp.o \ +src/addsub/fp_cmp_d.o src/addsub/fp_cmp_mag.o src/addsub/fp_sub.o src/addsub/fp_sub_d.o \ +src/addsub/fp_submod.o src/addsub/s_fp_add.o src/addsub/s_fp_sub.o src/bin/fp_radix_size.o \ +src/bin/fp_read_radix.o src/bin/fp_read_signed_bin.o src/bin/fp_read_unsigned_bin.o \ +src/bin/fp_reverse.o src/bin/fp_s_rmap.o src/bin/fp_signed_bin_size.o src/bin/fp_to_signed_bin.o \ +src/bin/fp_to_unsigned_bin.o src/bin/fp_toradix.o src/bin/fp_unsigned_bin_size.o src/bit/fp_cnt_lsb.o \ +src/bit/fp_count_bits.o src/bit/fp_div_2.o src/bit/fp_div_2d.o src/bit/fp_lshd.o src/bit/fp_mod_2d.o \ +src/bit/fp_rshd.o src/divide/fp_div.o src/divide/fp_div_d.o src/divide/fp_mod.o src/divide/fp_mod_d.o \ +src/exptmod/fp_2expt.o src/exptmod/fp_exptmod.o src/misc/fp_ident.o src/misc/fp_set.o \ +src/mont/fp_montgomery_calc_normalization.o src/mont/fp_montgomery_reduce.o \ +src/mont/fp_montgomery_setup.o src/mul/fp_mul.o src/mul/fp_mul_2.o src/mul/fp_mul_2d.o \ +src/mul/fp_mul_comba.o src/mul/fp_mul_comba_12.o src/mul/fp_mul_comba_17.o src/mul/fp_mul_comba_20.o \ +src/mul/fp_mul_comba_24.o src/mul/fp_mul_comba_28.o src/mul/fp_mul_comba_3.o src/mul/fp_mul_comba_32.o \ +src/mul/fp_mul_comba_4.o src/mul/fp_mul_comba_48.o src/mul/fp_mul_comba_6.o src/mul/fp_mul_comba_64.o \ +src/mul/fp_mul_comba_7.o src/mul/fp_mul_comba_8.o src/mul/fp_mul_comba_9.o \ +src/mul/fp_mul_comba_small_set.o src/mul/fp_mul_d.o src/mul/fp_mulmod.o src/numtheory/fp_gcd.o \ +src/numtheory/fp_invmod.o src/numtheory/fp_isprime.o src/numtheory/fp_lcm.o \ +src/numtheory/fp_prime_miller_rabin.o src/numtheory/fp_prime_random_ex.o src/sqr/fp_sqr.o \ +src/sqr/fp_sqr_comba.o src/sqr/fp_sqr_comba_12.o src/sqr/fp_sqr_comba_17.o src/sqr/fp_sqr_comba_20.o \ +src/sqr/fp_sqr_comba_24.o src/sqr/fp_sqr_comba_28.o src/sqr/fp_sqr_comba_3.o src/sqr/fp_sqr_comba_32.o \ +src/sqr/fp_sqr_comba_4.o src/sqr/fp_sqr_comba_48.o src/sqr/fp_sqr_comba_6.o src/sqr/fp_sqr_comba_64.o \ +src/sqr/fp_sqr_comba_7.o src/sqr/fp_sqr_comba_8.o src/sqr/fp_sqr_comba_9.o \ +src/sqr/fp_sqr_comba_generic.o src/sqr/fp_sqr_comba_small_set.o src/sqr/fp_sqrmod.o + +HEADERS=src/headers/tfm.h + +#END_INS + + +ifndef LIBPATH + LIBPATH=/usr/lib +endif + +ifndef INCPATH + INCPATH=/usr/include +endif + +ifndef INSTALL_GROUP + GROUP=wheel +else + GROUP=$(INSTALL_GROUP) +endif + +ifndef INSTALL_USER + USER=root +else + USER=$(INSTALL_USER) +endif + +ifndef LIBNAME + LIBNAME=libtfm.la +endif + +ifndef LIBNAME_S + LIBNAME_S=libtfm.a +endif + +default: $(LIBNAME) + +objs: $(OBJECTS) + +$(LIBNAME): $(OBJECTS) + libtool --silent --mode=link gcc $(CFLAGS) `find . -type f | grep "[.]lo" | xargs` -o $(LIBNAME) -rpath $(LIBPATH) -version-info $(VERSION) + +install: $(LIBNAME) + install -d -g $(GROUP) -o $(USER) $(DESTDIR)$(LIBPATH) + libtool --silent --mode=install install -c $(LIBNAME) $(DESTDIR)$(LIBPATH)/$(LIBNAME) + install -d -g $(GROUP) -o $(USER) $(DESTDIR)$(INCPATH) + install -g $(GROUP) -o $(USER) $(HEADERS) $(DESTDIR)$(INCPATH) + +mtest/mtest: mtest/mtest.c + cd mtest ; make mtest + +test: $(LIBNAME) demo/test.o mtest/mtest + $(CC) $(CFLAGS) demo/test.o $(LIBNAME_S) $(PROF) -o test + +timing: $(LIBNAME) demo/test.o + $(CC) $(CFLAGS) demo/test.o $(LIBNAME_S) $(PROF) -o test + +stest: $(LIBNAME) demo/stest.o + $(CC) $(CFLAGS) demo/stest.o $(LIBNAME_S) -o stest + +# $Source: /cvs/libtom/tomsfastmath/makefile.shared,v $ +# $Revision: 1.19 $ +# $Date: 2007/03/13 01:23:03 $ +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/mess.sh Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,4 @@ +#!/bin/bash +if cvs log $1 >/dev/null 2>/dev/null; then exit 0; else echo "$1 shouldn't be here" ; exit 1; fi + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/mtest/makefile Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,9 @@ +CFLAGS += -Wall -W -O3 + +default: mtest + +mtest: mtest.o + $(CC) $(CFLAGS) mtest.o -ltommath -o mtest + +clean: + rm -f *.o mtest *~
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/mtest/mtest.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,324 @@ +/* makes a bignum test harness with NUM tests per operation + * + * the output is made in the following format [one parameter per line] + +operation +operand1 +operand2 +[... operandN] +result1 +result2 +[... resultN] + +So for example "a * b mod n" would be + +mulmod +a +b +n +a*b mod n + +e.g. if a=3, b=4 n=11 then + +mulmod +3 +4 +11 +1 + + */ + +#ifdef MP_8BIT +#define THE_MASK 127 +#else +#define THE_MASK 32767 +#endif + +#include <stdio.h> +#include <stdlib.h> +#include <time.h> +#include <tommath.h> +#define CRYPT +#include "../src/headers/tfm.h" + +FILE *rng; + +/* 1-2048 bit numbers */ +void rand_num(mp_int *a) +{ + int n, size; + unsigned char buf[2048]; + + size = 1 + ((fgetc(rng)<<8) + fgetc(rng)) % (FP_MAX_SIZE/16 - DIGIT_BIT/2); + buf[0] = (fgetc(rng)&1)?1:0; + fread(buf+1, 1, size, rng); + while (buf[1] == 0) buf[1] = fgetc(rng); + mp_read_raw(a, buf, 1+size); +} + +/* 1-256 bit numbers (to test things like exptmod) */ +void rand_num2(mp_int *a) +{ + int n, size; + unsigned char buf[2048]; + + size = 1 + ((fgetc(rng)<<8) + fgetc(rng)) % (FP_MAX_SIZE/16 - DIGIT_BIT/2); + buf[0] = (fgetc(rng)&1)?1:0; + fread(buf+1, 1, size, rng); + while (buf[1] == 0) buf[1] = fgetc(rng); + mp_read_raw(a, buf, 1+size); +} + +#define mp_to64(a, b) mp_toradix(a, b, 64) + +int main(void) +{ + int n, tmp; + mp_int a, b, c, d, e; + clock_t t1; + char buf[4096]; + + mp_init(&a); + mp_init(&b); + mp_init(&c); + mp_init(&d); + mp_init(&e); + + + /* initial (2^n - 1)^2 testing, makes sure the comba multiplier works [it has the new carry code] */ +/* + mp_set(&a, 1); + for (n = 1; n < 8192; n++) { + mp_mul(&a, &a, &c); + printf("mul\n"); + mp_to64(&a, buf); + printf("%s\n%s\n", buf, buf); + mp_to64(&c, buf); + printf("%s\n", buf); + + mp_add_d(&a, 1, &a); + mp_mul_2(&a, &a); + mp_sub_d(&a, 1, &a); + } +*/ + + rng = fopen("/dev/urandom", "rb"); + if (rng == NULL) { + rng = fopen("/dev/random", "rb"); + if (rng == NULL) { + fprintf(stderr, "\nWarning: stdin used as random source\n\n"); + rng = stdin; + } + } + + t1 = clock(); + for (;;) { +#if 0 + if (clock() - t1 > CLOCKS_PER_SEC) { + sleep(2); + t1 = clock(); + } +#endif + n = fgetc(rng) % 16; + if (n == 0) { + /* add tests */ + rand_num(&a); + rand_num(&b); + mp_add(&a, &b, &c); + printf("add\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 1) { + /* sub tests */ + rand_num(&a); + rand_num(&b); + mp_sub(&a, &b, &c); + printf("sub\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 2) { + /* mul tests */ + rand_num(&a); + rand_num(&b); + mp_mul(&a, &b, &c); + printf("mul\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 3) { + /* div tests */ + rand_num(&a); + rand_num(&b); + mp_div(&a, &b, &c, &d); + printf("div\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + mp_to64(&d, buf); + printf("%s\n", buf); + } else if (n == 4) { + /* sqr tests */ + rand_num(&a); + mp_sqr(&a, &b); + printf("sqr\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 5) { + /* mul_2d test */ + rand_num(&a); + mp_copy(&a, &b); + n = fgetc(rng) & 63; + mp_mul_2d(&b, n, &b); + mp_to64(&a, buf); + printf("mul2d\n"); + printf("%s\n", buf); + printf("%d\n", n); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 6) { + /* div_2d test */ + rand_num(&a); + mp_copy(&a, &b); + n = fgetc(rng) & 63; + mp_div_2d(&b, n, &b, NULL); + mp_to64(&a, buf); + printf("div2d\n"); + printf("%s\n", buf); + printf("%d\n", n); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 7) { + /* gcd test */ + rand_num(&a); + rand_num(&b); + a.sign = MP_ZPOS; + b.sign = MP_ZPOS; + mp_gcd(&a, &b, &c); + printf("gcd\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 8) { + /* lcm test */ + rand_num(&a); + rand_num(&b); + a.sign = MP_ZPOS; + b.sign = MP_ZPOS; + mp_lcm(&a, &b, &c); + printf("lcm\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 9) { + /* exptmod test */ + rand_num2(&a); + rand_num2(&b); + rand_num2(&c); + a.sign = b.sign = c.sign = 0; + c.dp[0] |= 1; +// if (c.used <= 4) continue; +// if (mp_cmp(&a, &c) != MP_LT) continue; +// if (mp_cmp(&b, &c) != MP_LT) continue; + mp_exptmod(&a, &b, &c, &d); + printf("expt\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + mp_to64(&d, buf); + printf("%s\n", buf); + } else if (n == 10) { + /* invmod test */ + rand_num2(&a); + rand_num2(&b); + b.dp[0] |= 1; + b.sign = MP_ZPOS; + a.sign = MP_ZPOS; + mp_gcd(&a, &b, &c); + if (mp_cmp_d(&c, 1) != 0) continue; + if (mp_cmp_d(&b, 1) == 0) continue; + mp_invmod(&a, &b, &c); + printf("invmod\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + mp_to64(&c, buf); + printf("%s\n", buf); + } else if (n == 11) { + rand_num(&a); + mp_mul_2(&a, &a); + mp_div_2(&a, &b); + printf("div2\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 12) { + rand_num(&a); + mp_mul_2(&a, &b); + printf("mul2\n"); + mp_to64(&a, buf); + printf("%s\n", buf); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 13) { + rand_num(&a); + tmp = abs(rand()) & THE_MASK; + mp_add_d(&a, tmp, &b); + printf("add_d\n"); + mp_to64(&a, buf); + printf("%s\n%d\n", buf, tmp); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 14) { + rand_num(&a); + tmp = abs(rand()) & THE_MASK; + mp_sub_d(&a, tmp, &b); + printf("sub_d\n"); + mp_to64(&a, buf); + printf("%s\n%d\n", buf, tmp); + mp_to64(&b, buf); + printf("%s\n", buf); + } else if (n == 15) { + rand_num(&a); + tmp = abs(rand()) & THE_MASK; + mp_mul_d(&a, tmp, &b); + printf("mul_d\n"); + mp_to64(&a, buf); + printf("%s\n%d\n", buf, tmp); + mp_to64(&b, buf); + printf("%s\n", buf); + } + } + fclose(rng); + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/parsenames.pl Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,26 @@ +#!/usr/bin/perl +# +# Splits the list of files and outputs for makefile type files +# wrapped at 80 chars +# +# Tom St Denis +@a = split(" ", $ARGV[1]); +$b = "$ARGV[0]="; +$len = length($b); +print $b; +foreach my $obj (@a) { + $len = $len + length($obj); + $obj =~ s/\*/\$/; + if ($len > 100) { + printf "\\\n"; + $len = length($obj); + } + print "$obj "; +} +#if ($ARGV[0] eq "HEADERS") { print "testprof/tomcrypt_test.h"; } + +print "\n\n"; + +# $Source: /cvs/libtom/tomsfastmath/parsenames.pl,v $ +# $Revision: 1.1 $ +# $Date: 2006/12/31 21:31:40 $
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/pre_gen/mpi.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,2 @@ + +/* EOF */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/random_txt_files/amd64.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,43 @@ +AMD64 timings + +using ISO C +mult + 512-bit: 496 + 1024-bit: 1717 + 2048-bit: 7200 +sqr + 512-bit: 448 + 1024-bit: 1760 + 2048-bit: 7099 +mont + 512-bit: 1416 + 1024-bit: 5156 + 2048-bit: 20820 +expt + 512-bit: 1520207 + 1024-bit: 10603520 + 2048-bit: 84893649 + +using amd64 +mult + 512-bit: 292 + 1024-bit: 945 + 2048-bit: 3620 +sqr + 512-bit: 238 + 1024-bit: 801 + 2048-bit: 2853 +mont + 512-bit: 731 + 1024-bit: 1730 + 2048-bit: 5462 +Exptmod: + 512-bit: 641743 + 1024-bit: 3167406 + 2048-bit: 20158609 + +LTM exptmods + +Exponentiating 513-bit => 825/sec, 2183028 cycles +Exponentiating 1025-bit => 151/sec, 11900720 cycles +Exponentiating 2049-bit => 24/sec, 72376416 cycles
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/random_txt_files/exptmod_timings.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,45 @@ +LTM timings: + +Athlon Barton +Exponentiating 513-bit => 561/sec, 3909824 cycles +Exponentiating 1025-bit => 103/sec, 21175496 cycles +Exponentiating 2049-bit => 16/sec, 129845554 cycles + +P4 Northwood +Exponentiating 513-bit => 284/sec, 9884722 cycles +Exponentiating 1025-bit => 47/sec, 59090432 cycles +Exponentiating 2049-bit => 6/sec, 427456070 cycles + +TFM timings: + +Athlon Barton + 512-bit: 2289257 + 1024-bit: 12871373 + 2048-bit: 97211357 + +P4 Northwood [x86-32] + 512-bit: 8015598 + 1024-bit: 55559304 + 2048-bit: 409861746 + +P4 Northwood [SSE2] + 512-bit: 5895000 + 1024-bit: 39648730 + 2048-bit: 304110670 + +<center> +<table border=1 width=100%> +<tr><td>Processor</td><td>Size in bits</td><td>x86-32</td> <td>x86-64</td><td>SSE2</td><td>LTM</td></tr> +<tr><td>P4 </td><td>512 </td><td>8015598</td><td></td> <td>5895000</td><td>9884722</td></tr> +<tr><td> </td><td>1024 </td><td>55559304</td><td></td> <td>39648730</td><td>59090432</td></tr> +<tr><td> </td><td>2048 </td><td>409861746</td><td></td> <td>304110670</td><td>427456070</td></tr> +<tr><td>Athlon Barton</td><td>512 </td><td>2289257</td><td></td><td></td><td>3909824</td></tr> +<tr><td> </td><td>1024 </td><td>12871373</td><td></td><td></td><td>21175496</td></tr> +<tr><td> </td><td>2048 </td><td>97211357</td><td></td><td></td><td>129845554</td></tr> +<tr><td>Athlon64 </td><td>512 </td><td></td><td>641743</td><td></td><td>2183028</td></tr> +<tr><td> </td><td>1042 </td><td></td><td>3167406</td><td></td><td>11900720</td></tr> +<tr><td> </td><td>2048 </td><td></td><td>20158609</td><td></td><td>72376416</td></tr> +</table> +<b>Cycles per operation</b> +</center> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/random_txt_files/ltm_times.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,37 @@ +LTM Timings... + +Multiplying 140-bit => 2950763/sec, 952 cycles +Multiplying 196-bit => 2150939/sec, 1306 cycles +Multiplying 252-bit => 1357066/sec, 2070 cycles +Multiplying 308-bit => 1055269/sec, 2662 cycles +Multiplying 364-bit => 817557/sec, 3436 cycles +Multiplying 420-bit => 636413/sec, 4414 cycles +Multiplying 475-bit => 536912/sec, 5232 cycles +Multiplying 531-bit => 433641/sec, 6478 cycles +Multiplying 588-bit => 372069/sec, 7550 cycles +Multiplying 644-bit => 322813/sec, 8702 cycles +Multiplying 698-bit => 275566/sec, 10194 cycles +Multiplying 753-bit => 242082/sec, 11604 cycles +Multiplying 809-bit => 214797/sec, 13078 cycles +Multiplying 867-bit => 189626/sec, 14814 cycles +Multiplying 921-bit => 168858/sec, 16636 cycles +Multiplying 978-bit => 151598/sec, 18530 cycles +Multiplying 1036-bit => 137580/sec, 20418 cycles +Multiplying 1091-bit => 124661/sec, 22534 cycles +Multiplying 1148-bit => 111677/sec, 25154 cycles +Multiplying 1199-bit => 102762/sec, 27336 cycles +Multiplying 1258-bit => 94519/sec, 29720 cycles +Multiplying 1316-bit => 86975/sec, 32298 cycles +Multiplying 1371-bit => 79754/sec, 35222 cycles +Multiplying 1427-bit => 74473/sec, 37720 cycles +Multiplying 1483-bit => 68827/sec, 40814 cycles +Multiplying 1537-bit => 63644/sec, 44138 cycles +Multiplying 1595-bit => 59646/sec, 47096 cycles +Multiplying 1651-bit => 56469/sec, 49746 cycles +Multiplying 1708-bit => 52640/sec, 53364 cycles +Multiplying 1764-bit => 49823/sec, 56382 cycles +Multiplying 1819-bit => 46856/sec, 59952 cycles +Multiplying 1875-bit => 44264/sec, 63462 cycles +Multiplying 1929-bit => 41641/sec, 67460 cycles +Multiplying 1985-bit => 39539/sec, 71046 cycles +Multiplying 2044-bit => 37591/sec, 74728 cycles \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/random_txt_files/newsqr.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,36 @@ +New code added in TFM v0.03 + +OLD 64-bit...[athlon64] + +Squaring: + 256-bit: 89 + 512-bit: 234 + 1024-bit: 815 + 2048-bit: 2851 + +NEW 64-bit ... + +Squaring: + 256-bit: 89 + 512-bit: 228 + 1024-bit: 691 + 2048-bit: 2228 + + +OLD 32-bit [athlonxp] + +Squaring: + + 256-bit: 327 + 512-bit: 1044 + 1024-bit: 3646 + 2048-bit: 17055 + +NEW 32-bit + +Squaring: + + 256-bit: 332 + 512-bit: 894 + 1024-bit: 2983 + 2048-bit: 10385
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/random_txt_files/old_sqr_times.txt Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,14 @@ +I started with: + 512-bit: 16338 + 1024-bit: 51020 + 2048-bit: 142718 + +My x86-32 + 512-bit: 2864 + 1024-bit: 10615 + 2048-bit: 41807 + +My SSE2 + 512-bit: 2168 + 1024-bit: 7727 + 2048-bit: 33163 \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_add.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,43 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_add(fp_int *a, fp_int *b, fp_int *c) +{ + int sa, sb; + + /* get sign of both inputs */ + sa = a->sign; + sb = b->sign; + + /* handle two cases, not four */ + if (sa == sb) { + /* both positive or both negative */ + /* add their magnitudes, copy the sign */ + c->sign = sa; + s_fp_add (a, b, c); + } else { + /* one positive, the other negative */ + /* subtract the one with the greater magnitude from */ + /* the one of the lesser magnitude. The result gets */ + /* the sign of the one with the greater magnitude. */ + if (fp_cmp_mag (a, b) == FP_LT) { + c->sign = sb; + s_fp_sub (b, a, c); + } else { + c->sign = sa; + s_fp_sub (a, b, c); + } + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_add_d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,22 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a + b */ +void fp_add_d(fp_int *a, fp_digit b, fp_int *c) +{ + fp_int tmp; + fp_set(&tmp, b); + fp_add(a,&tmp,c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_addmod.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,23 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* d = a + b (mod c) */ +int fp_addmod(fp_int *a, fp_int *b, fp_int *c, fp_int *d) +{ + fp_int tmp; + fp_zero(&tmp); + fp_add(a, b, &tmp); + return fp_mod(&tmp, c, d); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_cmp.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,31 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_cmp(fp_int *a, fp_int *b) +{ + if (a->sign == FP_NEG && b->sign == FP_ZPOS) { + return FP_LT; + } else if (a->sign == FP_ZPOS && b->sign == FP_NEG) { + return FP_GT; + } else { + /* compare digits */ + if (a->sign == FP_NEG) { + /* if negative compare opposite direction */ + return fp_cmp_mag(b, a); + } else { + return fp_cmp_mag(a, b); + } + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_cmp_d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,38 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* compare against a single digit */ +int fp_cmp_d(fp_int *a, fp_digit b) +{ + /* compare based on sign */ + if ((b && a->used == 0) || a->sign == FP_NEG) { + return FP_LT; + } + + /* compare based on magnitude */ + if (a->used > 1) { + return FP_GT; + } + + /* compare the only digit of a to b */ + if (a->dp[0] > b) { + return FP_GT; + } else if (a->dp[0] < b) { + return FP_LT; + } else { + return FP_EQ; + } + +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_cmp_mag.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,35 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_cmp_mag(fp_int *a, fp_int *b) +{ + int x; + + if (a->used > b->used) { + return FP_GT; + } else if (a->used < b->used) { + return FP_LT; + } else { + for (x = a->used - 1; x >= 0; x--) { + if (a->dp[x] > b->dp[x]) { + return FP_GT; + } else if (a->dp[x] < b->dp[x]) { + return FP_LT; + } + } + } + return FP_EQ; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_sub.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,50 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a - b */ +void fp_sub(fp_int *a, fp_int *b, fp_int *c) +{ + int sa, sb; + + sa = a->sign; + sb = b->sign; + + if (sa != sb) { + /* subtract a negative from a positive, OR */ + /* subtract a positive from a negative. */ + /* In either case, ADD their magnitudes, */ + /* and use the sign of the first number. */ + c->sign = sa; + s_fp_add (a, b, c); + } else { + /* subtract a positive from a positive, OR */ + /* subtract a negative from a negative. */ + /* First, take the difference between their */ + /* magnitudes, then... */ + if (fp_cmp_mag (a, b) != FP_LT) { + /* Copy the sign from the first */ + c->sign = sa; + /* The first has a larger or equal magnitude */ + s_fp_sub (a, b, c); + } else { + /* The result has the *opposite* sign from */ + /* the first number. */ + c->sign = (sa == FP_ZPOS) ? FP_NEG : FP_ZPOS; + /* The second has a larger magnitude */ + s_fp_sub (b, a, c); + } + } +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_sub_d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,22 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a - b */ +void fp_sub_d(fp_int *a, fp_digit b, fp_int *c) +{ + fp_int tmp; + fp_set(&tmp, b); + fp_sub(a, &tmp, c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/fp_submod.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,24 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* d = a - b (mod c) */ +int fp_submod(fp_int *a, fp_int *b, fp_int *c, fp_int *d) +{ + fp_int tmp; + fp_zero(&tmp); + fp_sub(a, b, &tmp); + return fp_mod(&tmp, c, d); +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/s_fp_add.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,42 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* unsigned addition */ +void s_fp_add(fp_int *a, fp_int *b, fp_int *c) +{ + int x, y, oldused; + register fp_word t; + + y = MAX(a->used, b->used); + oldused = c->used; + c->used = y; + + t = 0; + for (x = 0; x < y; x++) { + t += ((fp_word)a->dp[x]) + ((fp_word)b->dp[x]); + c->dp[x] = (fp_digit)t; + t >>= DIGIT_BIT; + } + if (t != 0 && x < FP_SIZE) { + c->dp[c->used++] = (fp_digit)t; + ++x; + } + + c->used = x; + for (; x < oldused; x++) { + c->dp[x] = 0; + } + fp_clamp(c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/addsub/s_fp_sub.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,40 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* unsigned subtraction ||a|| >= ||b|| ALWAYS! */ +void s_fp_sub(fp_int *a, fp_int *b, fp_int *c) +{ + int x, oldbused, oldused; + fp_word t; + + oldused = c->used; + oldbused = b->used; + c->used = a->used; + t = 0; + for (x = 0; x < oldbused; x++) { + t = ((fp_word)a->dp[x]) - (((fp_word)b->dp[x]) + t); + c->dp[x] = (fp_digit)t; + t = (t >> DIGIT_BIT)&1; + } + for (; x < a->used; x++) { + t = ((fp_word)a->dp[x]) - t; + c->dp[x] = (fp_digit)t; + t = (t >> DIGIT_BIT)&1; + } + for (; x < oldused; x++) { + c->dp[x] = 0; + } + fp_clamp(c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_radix_size.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,53 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_radix_size(fp_int *a, int radix, int *size) +{ + int digs; + fp_int t; + fp_digit d; + + *size = 0; + + /* check range of the radix */ + if (radix < 2 || radix > 64) { + return FP_VAL; + } + + /* quick out if its zero */ + if (fp_iszero(a) == 1) { + *size = 2; + return FP_OKAY; + } + + fp_init_copy(&t, a); + + /* if it is negative output a - */ + if (t.sign == FP_NEG) { + (*size)++; + t.sign = FP_ZPOS; + } + + digs = 0; + while (fp_iszero (&t) == FP_NO) { + fp_div_d (&t, (fp_digit) radix, &t, &d); + (*size)++; + } + + /* append a NULL so the string is properly terminated */ + (*size)++; + return FP_OKAY; + +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_read_radix.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,70 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_read_radix(fp_int *a, char *str, int radix) +{ + int y, neg; + char ch; + + /* make sure the radix is ok */ + if (radix < 2 || radix > 64) { + return FP_VAL; + } + + /* if the leading digit is a + * minus set the sign to negative. + */ + if (*str == '-') { + ++str; + neg = FP_NEG; + } else { + neg = FP_ZPOS; + } + + /* set the integer to the default of zero */ + fp_zero (a); + + /* process each digit of the string */ + while (*str) { + /* if the radix < 36 the conversion is case insensitive + * this allows numbers like 1AB and 1ab to represent the same value + * [e.g. in hex] + */ + ch = (char) ((radix < 36) ? toupper ((int)*str) : *str); + for (y = 0; y < 64; y++) { + if (ch == fp_s_rmap[y]) { + break; + } + } + + /* if the char was found in the map + * and is less than the given radix add it + * to the number, otherwise exit the loop. + */ + if (y < radix) { + fp_mul_d (a, (fp_digit) radix, a); + fp_add_d (a, (fp_digit) y, a); + } else { + break; + } + ++str; + } + + /* set the sign only if a != 0 */ + if (fp_iszero(a) != FP_YES) { + a->sign = neg; + } + return FP_OKAY; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_read_signed_bin.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,27 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_read_signed_bin(fp_int *a, unsigned char *b, int c) +{ + /* read magnitude */ + fp_read_unsigned_bin (a, b + 1, c - 1); + + /* first byte is 0 for positive, non-zero for negative */ + if (b[0] == 0) { + a->sign = FP_ZPOS; + } else { + a->sign = FP_NEG; + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_read_unsigned_bin.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,66 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_read_unsigned_bin(fp_int *a, unsigned char *b, int c) +{ + /* zero the int */ + fp_zero (a); + + /* If we know the endianness of this architecture, and we're using + 32-bit fp_digits, we can optimize this */ +#if (defined(ENDIAN_LITTLE) || defined(ENDIAN_BIG)) && !defined(FP_64BIT) + /* But not for both simultaneously */ +#if defined(ENDIAN_LITTLE) && defined(ENDIAN_BIG) +#error Both ENDIAN_LITTLE and ENDIAN_BIG defined. +#endif + { + unsigned char *pd = (unsigned char *)a->dp; + + if ((unsigned)c > (FP_SIZE * sizeof(fp_digit))) { + int excess = c - (FP_SIZE * sizeof(fp_digit)); + c -= excess; + b += excess; + } + a->used = (c + sizeof(fp_digit) - 1)/sizeof(fp_digit); + /* read the bytes in */ +#ifdef ENDIAN_BIG + { + /* Use Duff's device to unroll the loop. */ + int idx = (c - 1) & ~3; + switch (c % 4) { + case 0: do { pd[idx+0] = *b++; + case 3: pd[idx+1] = *b++; + case 2: pd[idx+2] = *b++; + case 1: pd[idx+3] = *b++; + idx -= 4; + } while ((c -= 4) > 0); + } + } +#else + for (c -= 1; c >= 0; c -= 1) { + pd[c] = *b++; + } +#endif + } +#else + /* read the bytes in */ + for (; c > 0; c--) { + fp_mul_2d (a, 8, a); + a->dp[0] |= *b++; + a->used += 1; + } +#endif + fp_clamp (a); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_reverse.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,31 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* reverse an array, used for radix code */ +void fp_reverse (unsigned char *s, int len) +{ + int ix, iy; + unsigned char t; + + ix = 0; + iy = len - 1; + while (ix < iy) { + t = s[ix]; + s[ix] = s[iy]; + s[iy] = t; + ++ix; + --iy; + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_s_rmap.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,17 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* chars used in radix conversions */ +const char *fp_s_rmap = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/"; + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_signed_bin_size.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,19 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_signed_bin_size(fp_int *a) +{ + return 1 + fp_unsigned_bin_size (a); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_to_signed_bin.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,20 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_to_signed_bin(fp_int *a, unsigned char *b) +{ + fp_to_unsigned_bin (a, b + 1); + b[0] = (unsigned char) ((a->sign == FP_ZPOS) ? 0 : 1); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_to_unsigned_bin.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,29 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_to_unsigned_bin(fp_int *a, unsigned char *b) +{ + int x; + fp_int t; + + fp_init_copy(&t, a); + + x = 0; + while (fp_iszero (&t) == FP_NO) { + b[x++] = (unsigned char) (t.dp[0] & 255); + fp_div_2d (&t, 8, &t, NULL); + } + fp_reverse (b, x); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_toradix.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,59 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_toradix(fp_int *a, char *str, int radix) +{ + int digs; + fp_int t; + fp_digit d; + char *_s = str; + + /* check range of the radix */ + if (radix < 2 || radix > 64) { + return FP_VAL; + } + + /* quick out if its zero */ + if (fp_iszero(a) == 1) { + *str++ = '0'; + *str = '\0'; + return FP_OKAY; + } + + fp_init_copy(&t, a); + + /* if it is negative output a - */ + if (t.sign == FP_NEG) { + ++_s; + *str++ = '-'; + t.sign = FP_ZPOS; + } + + digs = 0; + while (fp_iszero (&t) == FP_NO) { + fp_div_d (&t, (fp_digit) radix, &t, &d); + *str++ = fp_s_rmap[d]; + ++digs; + } + + /* reverse the digits of the string. In this case _s points + * to the first digit [exluding the sign] of the number] + */ + fp_reverse ((unsigned char *)_s, digs); + + /* append a NULL so the string is properly terminated */ + *str = '\0'; + return FP_OKAY; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bin/fp_unsigned_bin_size.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,20 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_unsigned_bin_size(fp_int *a) +{ + int size = fp_count_bits (a); + return (size / 8 + ((size & 7) != 0 ? 1 : 0)); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_cnt_lsb.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,46 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +static const int lnz[16] = { + 4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0 +}; + +/* Counts the number of lsbs which are zero before the first zero bit */ +int fp_cnt_lsb(fp_int *a) +{ + int x; + fp_digit q, qq; + + /* easy out */ + if (fp_iszero(a) == 1) { + return 0; + } + + /* scan lower digits until non-zero */ + for (x = 0; x < a->used && a->dp[x] == 0; x++); + q = a->dp[x]; + x *= DIGIT_BIT; + + /* now scan this digit until a 1 is found */ + if ((q & 1) == 0) { + do { + qq = q & 15; + x += lnz[qq]; + q >>= 4; + } while (qq == 0); + } + return x; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_count_bits.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,36 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +int fp_count_bits (fp_int * a) +{ + int r; + fp_digit q; + + /* shortcut */ + if (a->used == 0) { + return 0; + } + + /* get number of digits and add that */ + r = (a->used - 1) * DIGIT_BIT; + + /* take the last digit and count the bits in it */ + q = a->dp[a->used - 1]; + while (q > ((fp_digit) 0)) { + ++r; + q >>= ((fp_digit) 1); + } + return r; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_div_2.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,53 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* b = a/2 */ +void fp_div_2(fp_int * a, fp_int * b) +{ + int x, oldused; + + oldused = b->used; + b->used = a->used; + { + register fp_digit r, rr, *tmpa, *tmpb; + + /* source alias */ + tmpa = a->dp + b->used - 1; + + /* dest alias */ + tmpb = b->dp + b->used - 1; + + /* carry */ + r = 0; + for (x = b->used - 1; x >= 0; x--) { + /* get the carry for the next iteration */ + rr = *tmpa & 1; + + /* shift the current digit, add in carry and store */ + *tmpb-- = (*tmpa-- >> 1) | (r << (DIGIT_BIT - 1)); + + /* forward carry to next iteration */ + r = rr; + } + + /* zero excess digits */ + tmpb = b->dp + b->used; + for (x = b->used; x < oldused; x++) { + *tmpb++ = 0; + } + } + b->sign = a->sign; + fp_clamp (b); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_div_2d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,79 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a / 2**b */ +void fp_div_2d(fp_int *a, int b, fp_int *c, fp_int *d) +{ + fp_digit D, r, rr; + int x; + fp_int t; + + /* if the shift count is <= 0 then we do no work */ + if (b <= 0) { + fp_copy (a, c); + if (d != NULL) { + fp_zero (d); + } + return; + } + + fp_init(&t); + + /* get the remainder */ + if (d != NULL) { + fp_mod_2d (a, b, &t); + } + + /* copy */ + fp_copy(a, c); + + /* shift by as many digits in the bit count */ + if (b >= (int)DIGIT_BIT) { + fp_rshd (c, b / DIGIT_BIT); + } + + /* shift any bit count < DIGIT_BIT */ + D = (fp_digit) (b % DIGIT_BIT); + if (D != 0) { + register fp_digit *tmpc, mask, shift; + + /* mask */ + mask = (((fp_digit)1) << D) - 1; + + /* shift for lsb */ + shift = DIGIT_BIT - D; + + /* alias */ + tmpc = c->dp + (c->used - 1); + + /* carry */ + r = 0; + for (x = c->used - 1; x >= 0; x--) { + /* get the lower bits of this word in a temp */ + rr = *tmpc & mask; + + /* shift the current word and mix in the carry bits from the previous word */ + *tmpc = (*tmpc >> D) | (r << shift); + --tmpc; + + /* set the carry to the carry bits of the current word found above */ + r = rr; + } + } + fp_clamp (c); + if (d != NULL) { + fp_copy (&t, d); + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_lshd.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,38 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_lshd(fp_int *a, int x) +{ + int y; + + /* move up and truncate as required */ + y = MIN(a->used + x - 1, (int)(FP_SIZE-1)); + + /* store new size */ + a->used = y + 1; + + /* move digits */ + for (; y >= x; y--) { + a->dp[y] = a->dp[y-x]; + } + + /* zero lower digits */ + for (; y >= 0; y--) { + a->dp[y] = 0; + } + + /* clamp digits */ + fp_clamp(a); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_mod_2d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,42 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a mod 2**d */ +void fp_mod_2d(fp_int *a, int b, fp_int *c) +{ + int x; + + /* zero if count less than or equal to zero */ + if (b <= 0) { + fp_zero(c); + return; + } + + /* get copy of input */ + fp_copy(a, c); + + /* if 2**d is larger than we just return */ + if (b >= (DIGIT_BIT * a->used)) { + return; + } + + /* zero digits above the last digit of the modulus */ + for (x = (b / DIGIT_BIT) + ((b % DIGIT_BIT) == 0 ? 0 : 1); x < c->used; x++) { + c->dp[x] = 0; + } + /* clear the digit that is not completely outside/inside the modulus */ + c->dp[b / DIGIT_BIT] &= ~((fp_digit)0) >> (DIGIT_BIT - b); + fp_clamp (c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/bit/fp_rshd.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,40 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_rshd(fp_int *a, int x) +{ + int y; + + /* too many digits just zero and return */ + if (x >= a->used) { + fp_zero(a); + return; + } + + /* shift */ + for (y = 0; y < a->used - x; y++) { + a->dp[y] = a->dp[y+x]; + } + + /* zero rest */ + for (; y < a->used; y++) { + a->dp[y] = 0; + } + + /* decrement count */ + a->used -= x; + fp_clamp(a); +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/divide/fp_div.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,157 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* a/b => cb + d == a */ +int fp_div(fp_int *a, fp_int *b, fp_int *c, fp_int *d) +{ + fp_int q, x, y, t1, t2; + int n, t, i, norm, neg; + + /* is divisor zero ? */ + if (fp_iszero (b) == 1) { + return FP_VAL; + } + + /* if a < b then q=0, r = a */ + if (fp_cmp_mag (a, b) == FP_LT) { + if (d != NULL) { + fp_copy (a, d); + } + if (c != NULL) { + fp_zero (c); + } + return FP_OKAY; + } + + fp_init(&q); + q.used = a->used + 2; + + fp_init(&t1); + fp_init(&t2); + fp_init_copy(&x, a); + fp_init_copy(&y, b); + + /* fix the sign */ + neg = (a->sign == b->sign) ? FP_ZPOS : FP_NEG; + x.sign = y.sign = FP_ZPOS; + + /* normalize both x and y, ensure that y >= b/2, [b == 2**DIGIT_BIT] */ + norm = fp_count_bits(&y) % DIGIT_BIT; + if (norm < (int)(DIGIT_BIT-1)) { + norm = (DIGIT_BIT-1) - norm; + fp_mul_2d (&x, norm, &x); + fp_mul_2d (&y, norm, &y); + } else { + norm = 0; + } + + /* note hac does 0 based, so if used==5 then its 0,1,2,3,4, e.g. use 4 */ + n = x.used - 1; + t = y.used - 1; + + /* while (x >= y*b**n-t) do { q[n-t] += 1; x -= y*b**{n-t} } */ + fp_lshd (&y, n - t); /* y = y*b**{n-t} */ + + while (fp_cmp (&x, &y) != FP_LT) { + ++(q.dp[n - t]); + fp_sub (&x, &y, &x); + } + + /* reset y by shifting it back down */ + fp_rshd (&y, n - t); + + /* step 3. for i from n down to (t + 1) */ + for (i = n; i >= (t + 1); i--) { + if (i > x.used) { + continue; + } + + /* step 3.1 if xi == yt then set q{i-t-1} to b-1, + * otherwise set q{i-t-1} to (xi*b + x{i-1})/yt */ + if (x.dp[i] == y.dp[t]) { + q.dp[i - t - 1] = ((((fp_word)1) << DIGIT_BIT) - 1); + } else { + fp_word tmp; + tmp = ((fp_word) x.dp[i]) << ((fp_word) DIGIT_BIT); + tmp |= ((fp_word) x.dp[i - 1]); + tmp /= ((fp_word) y.dp[t]); + q.dp[i - t - 1] = (fp_digit) (tmp); + } + + /* while (q{i-t-1} * (yt * b + y{t-1})) > + xi * b**2 + xi-1 * b + xi-2 + + do q{i-t-1} -= 1; + */ + q.dp[i - t - 1] = (q.dp[i - t - 1] + 1); + do { + q.dp[i - t - 1] = (q.dp[i - t - 1] - 1); + + /* find left hand */ + fp_zero (&t1); + t1.dp[0] = (t - 1 < 0) ? 0 : y.dp[t - 1]; + t1.dp[1] = y.dp[t]; + t1.used = 2; + fp_mul_d (&t1, q.dp[i - t - 1], &t1); + + /* find right hand */ + t2.dp[0] = (i - 2 < 0) ? 0 : x.dp[i - 2]; + t2.dp[1] = (i - 1 < 0) ? 0 : x.dp[i - 1]; + t2.dp[2] = x.dp[i]; + t2.used = 3; + } while (fp_cmp_mag(&t1, &t2) == FP_GT); + + /* step 3.3 x = x - q{i-t-1} * y * b**{i-t-1} */ + fp_mul_d (&y, q.dp[i - t - 1], &t1); + fp_lshd (&t1, i - t - 1); + fp_sub (&x, &t1, &x); + + /* if x < 0 then { x = x + y*b**{i-t-1}; q{i-t-1} -= 1; } */ + if (x.sign == FP_NEG) { + fp_copy (&y, &t1); + fp_lshd (&t1, i - t - 1); + fp_add (&x, &t1, &x); + q.dp[i - t - 1] = q.dp[i - t - 1] - 1; + } + } + + /* now q is the quotient and x is the remainder + * [which we have to normalize] + */ + + /* get sign before writing to c */ + x.sign = x.used == 0 ? FP_ZPOS : a->sign; + + if (c != NULL) { + fp_clamp (&q); + fp_copy (&q, c); + c->sign = neg; + } + + if (d != NULL) { + fp_div_2d (&x, norm, &x, NULL); + +/* the following is a kludge, essentially we were seeing the right remainder but + with excess digits that should have been zero + */ + for (i = b->used; i < x.used; i++) { + x.dp[i] = 0; + } + fp_clamp(&x); + fp_copy (&x, d); + } + + return FP_OKAY; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/divide/fp_div_d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,98 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +static int s_is_power_of_two(fp_digit b, int *p) +{ + int x; + + /* fast return if no power of two */ + if ((b==0) || (b & (b-1))) { + return 0; + } + + for (x = 0; x < DIGIT_BIT; x++) { + if (b == (((fp_digit)1)<<x)) { + *p = x; + return 1; + } + } + return 0; +} + +/* a/b => cb + d == a */ +int fp_div_d(fp_int *a, fp_digit b, fp_int *c, fp_digit *d) +{ + fp_int q; + fp_word w; + fp_digit t; + int ix; + + /* cannot divide by zero */ + if (b == 0) { + return FP_VAL; + } + + /* quick outs */ + if (b == 1 || fp_iszero(a) == 1) { + if (d != NULL) { + *d = 0; + } + if (c != NULL) { + fp_copy(a, c); + } + return FP_OKAY; + } + + /* power of two ? */ + if (s_is_power_of_two(b, &ix) == 1) { + if (d != NULL) { + *d = a->dp[0] & ((((fp_digit)1)<<ix) - 1); + } + if (c != NULL) { + fp_div_2d(a, ix, c, NULL); + } + return FP_OKAY; + } + + /* no easy answer [c'est la vie]. Just division */ + fp_init(&q); + + q.used = a->used; + q.sign = a->sign; + w = 0; + for (ix = a->used - 1; ix >= 0; ix--) { + w = (w << ((fp_word)DIGIT_BIT)) | ((fp_word)a->dp[ix]); + + if (w >= b) { + t = (fp_digit)(w / b); + w -= ((fp_word)t) * ((fp_word)b); + } else { + t = 0; + } + q.dp[ix] = (fp_digit)t; + } + + if (d != NULL) { + *d = (fp_digit)w; + } + + if (c != NULL) { + fp_clamp(&q); + fp_copy(&q, c); + } + + return FP_OKAY; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/divide/fp_mod.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,34 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a mod b, 0 <= c < b */ +int fp_mod(fp_int *a, fp_int *b, fp_int *c) +{ + fp_int t; + int err; + + fp_zero(&t); + if ((err = fp_div(a, b, NULL, &t)) != FP_OKAY) { + return err; + } + if (t.sign != b->sign) { + fp_add(&t, b, c); + } else { + fp_copy(&t, c); + } + return FP_OKAY; +} + + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/divide/fp_mod_d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,20 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a mod b, 0 <= c < b */ +int fp_mod_d(fp_int *a, fp_digit b, fp_digit *c) +{ + return fp_div_d(a, b, NULL, c); +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/exptmod/fp_2expt.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,39 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* computes a = 2**b */ +void fp_2expt(fp_int *a, int b) +{ + int z; + + /* zero a as per default */ + fp_zero (a); + + if (b < 0) { + return; + } + + z = b / DIGIT_BIT; + if (z >= FP_SIZE) { + return; + } + + /* set the used count of where the bit will go */ + a->used = z + 1; + + /* put the single bit in its place */ + a->dp[z] = ((fp_digit)1) << (b % DIGIT_BIT); +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/exptmod/fp_exptmod.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,276 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +#ifdef TFM_TIMING_RESISTANT + +/* timing resistant montgomery ladder based exptmod + + Based on work by Marc Joye, Sung-Ming Yen, "The Montgomery Powering Ladder", Cryptographic Hardware and Embedded Systems, CHES 2002 +*/ +static int _fp_exptmod(fp_int * G, fp_int * X, fp_int * P, fp_int * Y) +{ + fp_int R[2]; + fp_digit buf, mp; + int err, bitcnt, digidx, y; + + /* now setup montgomery */ + if ((err = fp_montgomery_setup (P, &mp)) != FP_OKAY) { + return err; + } + + fp_init(&R[0]); + fp_init(&R[1]); + + /* now we need R mod m */ + fp_montgomery_calc_normalization (&R[0], P); + + /* now set R[0][1] to G * R mod m */ + if (fp_cmp_mag(P, G) != FP_GT) { + /* G > P so we reduce it first */ + fp_mod(G, P, &R[1]); + } else { + fp_copy(G, &R[1]); + } + fp_mulmod (&R[1], &R[0], P, &R[1]); + + /* for j = t-1 downto 0 do + r_!k = R0*R1; r_k = r_k^2 + */ + + /* set initial mode and bit cnt */ + bitcnt = 1; + buf = 0; + digidx = X->used - 1; + + for (;;) { + /* grab next digit as required */ + if (--bitcnt == 0) { + /* if digidx == -1 we are out of digits so break */ + if (digidx == -1) { + break; + } + /* read next digit and reset bitcnt */ + buf = X->dp[digidx--]; + bitcnt = (int)DIGIT_BIT; + } + + /* grab the next msb from the exponent */ + y = (fp_digit)(buf >> (DIGIT_BIT - 1)) & 1; + buf <<= (fp_digit)1; + + /* do ops */ + fp_mul(&R[0], &R[1], &R[y^1]); fp_montgomery_reduce(&R[y^1], P, mp); + fp_sqr(&R[y], &R[y]); fp_montgomery_reduce(&R[y], P, mp); + } + + fp_montgomery_reduce(&R[0], P, mp); + fp_copy(&R[0], Y); + return FP_OKAY; +} + +#else + +/* y = g**x (mod b) + * Some restrictions... x must be positive and < b + */ +static int _fp_exptmod(fp_int * G, fp_int * X, fp_int * P, fp_int * Y) +{ + fp_int M[64], res; + fp_digit buf, mp; + int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; + + /* find window size */ + x = fp_count_bits (X); + if (x <= 21) { + winsize = 1; + } else if (x <= 36) { + winsize = 3; + } else if (x <= 140) { + winsize = 4; + } else if (x <= 450) { + winsize = 5; + } else { + winsize = 6; + } + + /* init M array */ + memset(M, 0, sizeof(M)); + + /* now setup montgomery */ + if ((err = fp_montgomery_setup (P, &mp)) != FP_OKAY) { + return err; + } + + /* setup result */ + fp_init(&res); + + /* create M table + * + * The M table contains powers of the input base, e.g. M[x] = G^x mod P + * + * The first half of the table is not computed though accept for M[0] and M[1] + */ + + /* now we need R mod m */ + fp_montgomery_calc_normalization (&res, P); + + /* now set M[1] to G * R mod m */ + if (fp_cmp_mag(P, G) != FP_GT) { + /* G > P so we reduce it first */ + fp_mod(G, P, &M[1]); + } else { + fp_copy(G, &M[1]); + } + fp_mulmod (&M[1], &res, P, &M[1]); + + /* compute the value at M[1<<(winsize-1)] by squaring M[1] (winsize-1) times */ + fp_copy (&M[1], &M[1 << (winsize - 1)]); + for (x = 0; x < (winsize - 1); x++) { + fp_sqr (&M[1 << (winsize - 1)], &M[1 << (winsize - 1)]); + fp_montgomery_reduce (&M[1 << (winsize - 1)], P, mp); + } + + /* create upper table */ + for (x = (1 << (winsize - 1)) + 1; x < (1 << winsize); x++) { + fp_mul(&M[x - 1], &M[1], &M[x]); + fp_montgomery_reduce(&M[x], P, mp); + } + + /* set initial mode and bit cnt */ + mode = 0; + bitcnt = 1; + buf = 0; + digidx = X->used - 1; + bitcpy = 0; + bitbuf = 0; + + for (;;) { + /* grab next digit as required */ + if (--bitcnt == 0) { + /* if digidx == -1 we are out of digits so break */ + if (digidx == -1) { + break; + } + /* read next digit and reset bitcnt */ + buf = X->dp[digidx--]; + bitcnt = (int)DIGIT_BIT; + } + + /* grab the next msb from the exponent */ + y = (fp_digit)(buf >> (DIGIT_BIT - 1)) & 1; + buf <<= (fp_digit)1; + + /* if the bit is zero and mode == 0 then we ignore it + * These represent the leading zero bits before the first 1 bit + * in the exponent. Technically this opt is not required but it + * does lower the # of trivial squaring/reductions used + */ + if (mode == 0 && y == 0) { + continue; + } + + /* if the bit is zero and mode == 1 then we square */ + if (mode == 1 && y == 0) { + fp_sqr(&res, &res); + fp_montgomery_reduce(&res, P, mp); + continue; + } + + /* else we add it to the window */ + bitbuf |= (y << (winsize - ++bitcpy)); + mode = 2; + + if (bitcpy == winsize) { + /* ok window is filled so square as required and multiply */ + /* square first */ + for (x = 0; x < winsize; x++) { + fp_sqr(&res, &res); + fp_montgomery_reduce(&res, P, mp); + } + + /* then multiply */ + fp_mul(&res, &M[bitbuf], &res); + fp_montgomery_reduce(&res, P, mp); + + /* empty window and reset */ + bitcpy = 0; + bitbuf = 0; + mode = 1; + } + } + + /* if bits remain then square/multiply */ + if (mode == 2 && bitcpy > 0) { + /* square then multiply if the bit is set */ + for (x = 0; x < bitcpy; x++) { + fp_sqr(&res, &res); + fp_montgomery_reduce(&res, P, mp); + + /* get next bit of the window */ + bitbuf <<= 1; + if ((bitbuf & (1 << winsize)) != 0) { + /* then multiply */ + fp_mul(&res, &M[1], &res); + fp_montgomery_reduce(&res, P, mp); + } + } + } + + /* fixup result if Montgomery reduction is used + * recall that any value in a Montgomery system is + * actually multiplied by R mod n. So we have + * to reduce one more time to cancel out the factor + * of R. + */ + fp_montgomery_reduce(&res, P, mp); + + /* swap res with Y */ + fp_copy (&res, Y); + return FP_OKAY; +} + +#endif + + +int fp_exptmod(fp_int * G, fp_int * X, fp_int * P, fp_int * Y) +{ + fp_int tmp; + int err; + +#ifdef TFM_CHECK + /* prevent overflows */ + if (P->used > (FP_SIZE/2)) { + return FP_VAL; + } +#endif + + /* is X negative? */ + if (X->sign == FP_NEG) { + /* yes, copy G and invmod it */ + fp_copy(G, &tmp); + if ((err = fp_invmod(&tmp, P, &tmp)) != FP_OKAY) { + return err; + } + X->sign = FP_ZPOS; + err = _fp_exptmod(&tmp, X, P, Y); + if (X != Y) { + X->sign = FP_NEG; + } + return err; + } else { + /* Positive exponent so just exptmod */ + return _fp_exptmod(G, X, P, Y); + } +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/.gitignore Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,2 @@ +comba_sqr_gen +comba_sqr_smallgen
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/comba_mont_gen.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,132 @@ +#include <stdio.h> + +int main(void) +{ + int x, y, z; + +printf( +#if 1 +"#ifdef TFM_SMALL_SET\n" +"/* computes x/R == x (mod N) via Montgomery Reduction */\n" +"void fp_montgomery_reduce_small(fp_int *a, fp_int *m, fp_digit mp)\n" +"{\n" +" fp_digit c[FP_SIZE], *_c, *tmpm, mu, cy;\n" +" int oldused, x, y, pa;\n" +"\n" +"#if defined(USE_MEMSET)\n" +" /* now zero the buff */\n" +" memset(c, 0, sizeof c);\n" +"#endif\n" +" pa = m->used;\n" +"\n" +" /* copy the input */\n" +" oldused = a->used;\n" +" for (x = 0; x < oldused; x++) {\n" +" c[x] = a->dp[x];\n" +" }\n" +"#if !defined(USE_MEMSET)\n" +" for (; x < 2*pa+3; x++) {\n" +" c[x] = 0;\n" +" }\n" +"#endif\n" +" MONT_START;\n" +#endif +"\n" +" switch (pa) {\n"); + +for (x = 1; x <= 16; x++) { +if (x > 16 && (x != 32 && x != 48 && x != 64)) continue; +if (x > 16) printf("#ifdef TFM_HUGE\n"); + + + +printf(" case %d:\n", x); + +for (y = 0; y < x; y++) { + +printf(" x = %d; cy = 0;\n" + " LOOP_START;\n" + " _c = c + %d;\n" + " tmpm = m->dp;\n", y, y); + +printf("#ifdef INNERMUL8\n"); +for (z = 0; z+8 <= x; z += 8) { +printf(" INNERMUL8; _c += 8; tmpm += 8;\n"); +} +for (; z < x; z++) { +printf(" INNERMUL; ++_c;\n"); +} +printf("#else\n"); +for (z = 0; z < x; z++) { +printf(" INNERMUL; ++_c;\n"); +} +printf("#endif\n"); +printf(" LOOP_END;\n" + " while (cy) {\n" + " PROPCARRY;\n" + " ++_c;\n" + " }\n"); +} +//printf(" }\n"); +printf(" break;\n"); + + + +#define LOOP_MACRO(stride) \ + for (x = 0; x < stride; x++) { \ + fp_digit cy = 0; \ + /* get Mu for this round */ \ + LOOP_START; \ + _c = c + x; \ + tmpm = m->dp; \ + for (y = 0; y < stride; y++) { \ + INNERMUL; \ + ++_c; \ + } \ + LOOP_END; \ + while (cy) { \ + PROPCARRY; \ + ++_c; \ + } \ + } + + + + + +if (x > 16) printf("#endif /* TFM_HUGE */\n"); + + +} + +#if 1 + +printf( +" }\n" +" /* now copy out */\n" +" _c = c + pa;\n" +" tmpm = a->dp;\n" +" for (x = 0; x < pa+1; x++) {\n" +" *tmpm++ = *_c++;\n" +" }\n" +"\n" +" for (; x < oldused; x++) {\n" +" *tmpm++ = 0;\n" +" }\n" +"\n" +" MONT_FINI;\n" +"\n" +" a->used = pa+1;\n" +" fp_clamp(a);\n" +"\n" +" /* if A >= m then A = A - m */\n" +" if (fp_cmp_mag (a, m) != FP_LT) {\n" +" s_fp_sub (a, m, a);\n" +" }\n" +"}\n\n#endif\n"); + +#endif + + +return 0; +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/comba_mult_gen.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,63 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ + +/* program emits a NxN comba multiplier */ +#include <stdio.h> + +int main(int argc, char **argv) +{ + int N, x, y, z; + N = atoi(argv[1]); + + /* print out preamble */ +printf( +"void fp_mul_comba%d(fp_int *A, fp_int *B, fp_int *C)\n" +"{\n" +" fp_digit c0, c1, c2, at[%d];\n" +"\n" +" memcpy(at, A->dp, %d * sizeof(fp_digit));\n" +" memcpy(at+%d, B->dp, %d * sizeof(fp_digit));\n" +" COMBA_START;\n" +"\n" +" COMBA_CLEAR;\n", N, N+N, N, N, N); + + /* now do the rows */ + for (x = 0; x < (N+N-1); x++) { +printf( +" /* %d */\n", x); +if (x > 0) { +printf( +" COMBA_FORWARD;\n"); +} + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if ((y+z)==x) { + printf(" MULADD(at[%d], at[%d]); ", y, z+N); + } + } + } +printf( +"\n" +" COMBA_STORE(C->dp[%d]);\n", x); + } +printf( +" COMBA_STORE2(C->dp[%d]);\n" +" C->used = %d;\n" +" C->sign = A->sign ^ B->sign;\n" +" fp_clamp(C);\n" +" COMBA_FINI;\n" +"}\n\n\n", N+N-1, N+N); + + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/comba_mult_smallgen.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,61 @@ +/* program emits a NxN comba multiplier for 1x1 to 16x16 */ +#include <stdio.h> + +int main(int argc, char **argv) +{ + int N, x, y, z; + + /* print out preamble */ +printf( +"void fp_mul_comba_small(fp_int *A, fp_int *B, fp_int *C)\n" +"{\n" +" fp_digit c0, c1, c2, at[32];\n" +" switch (MAX(A->used, B->used)) { \n" +); + +for (N = 1; N <= 16; N++) { + +printf( +"\n" +" case %d:\n" +" memcpy(at, A->dp, %d * sizeof(fp_digit));\n" +" memcpy(at+%d, B->dp, %d * sizeof(fp_digit));\n" +" COMBA_START;\n" +"\n" +" COMBA_CLEAR;\n", N, N, N, N); + + /* now do the rows */ + for (x = 0; x < (N+N-1); x++) { +printf( +" /* %d */\n", x); +if (x > 0) { +printf( +" COMBA_FORWARD;\n"); +} + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if ((y+z)==x) { + printf(" MULADD(at[%d], at[%d]); ", y, z+N); + } + } + } +printf( +"\n" +" COMBA_STORE(C->dp[%d]);\n", x); + } +printf( +" COMBA_STORE2(C->dp[%d]);\n" +" C->used = %d;\n" +" C->sign = A->sign ^ B->sign;\n" +" fp_clamp(C);\n" +" COMBA_FINI;\n" +" break;\n", N+N-1, N+N); +} +printf(" }\n}\n\n"); + + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/comba_sqr_gen.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,108 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ + +#include <stdio.h> + +int main(int argc, char **argv) +{ + int x, y, z, N, f; + N = atoi(argv[1]); + +printf( +"#define TFM_DEFINES\n" +"#include \"fp_sqr_comba.c\"\n" +"\n" +"#ifdef TFM_SQR%d\n" +"void fp_sqr_comba%d(fp_int *A, fp_int *B)\n" +"{\n" +" fp_digit *a, b[%d], c0, c1, c2, sc0, sc1, sc2;\n" +"#ifdef TFM_ISO\n" +" fp_word tt;\n" +"#endif\n" +"\n" +" a = A->dp;\n" +" COMBA_START; \n" +"\n" +" /* clear carries */\n" +" CLEAR_CARRY;\n" +"\n" +" /* output 0 */\n" +" SQRADD(a[0],a[0]);\n" +" COMBA_STORE(b[0]);\n", N, N, N+N); + + for (x = 1; x < N+N-1; x++) { +printf( +"\n /* output %d */\n" +" CARRY_FORWARD;\n ", x); + + for (f = y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (z != y && z + y == x && y <= z) { + ++f; + } + } + } + + if (f <= 2) { + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (y<=z && (y+z)==x) { + if (y == z) { + printf("SQRADD(a[%d], a[%d]); ", y, y); + } else { + printf("SQRADD2(a[%d], a[%d]); ", y, z); + } + } + } + } + } else { + // new method + /* do evens first */ + f = 0; + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (z != y && z + y == x && y <= z) { + if (f == 0) { + // first double + printf("SQRADDSC(a[%d], a[%d]); ", y, z); + f = 1; + } else { + printf("SQRADDAC(a[%d], a[%d]); ", y, z); + } + } + } + } + // forward the carry + printf("SQRADDDB; "); + if ((x&1) == 0) { + // add the square + printf("SQRADD(a[%d], a[%d]); ", x/2, x/2); + } + } +printf("\n COMBA_STORE(b[%d]);\n", x); + } +printf(" COMBA_STORE2(b[%d]);\n", N+N-1); + +printf( +" COMBA_FINI;\n" +"\n" +" B->used = %d;\n" +" B->sign = FP_ZPOS;\n" +" memcpy(B->dp, b, %d * sizeof(fp_digit));\n" +" memset(B->dp + %d, 0, (FP_SIZE - %d) * sizeof(fp_digit));\n" +" fp_clamp(B);\n" +"}\n#endif\n\n\n", N+N, N+N, N+N, N+N); + + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/comba_sqr_smallgen.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,117 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ + +/* Generates squaring comba code... it learns it knows our secrets! */ +#include <stdio.h> + +int main(int argc, char **argv) +{ + int x, y, z, N, f; + +printf( +"#define TFM_DEFINES\n" +"#include \"fp_sqr_comba.c\"\n" +"\n" +"#if defined(TFM_SMALL_SET)\n" +"void fp_sqr_comba_small(fp_int *A, fp_int *B)\n" +"{\n" +" fp_digit *a, b[32], c0, c1, c2, sc0, sc1, sc2;\n" +"#ifdef TFM_ISO\n" +" fp_word tt;\n" +"#endif\n" +); + +printf(" switch (A->used) { \n"); + +for (N = 1; N <= 16; N++) { +printf( +" case %d:\n" +" a = A->dp;\n" +" COMBA_START; \n" +"\n" +" /* clear carries */\n" +" CLEAR_CARRY;\n" +"\n" +" /* output 0 */\n" +" SQRADD(a[0],a[0]);\n" +" COMBA_STORE(b[0]);\n", N); + + for (x = 1; x < N+N-1; x++) { +printf( +"\n /* output %d */\n" +" CARRY_FORWARD;\n ", x); + + for (f = y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (z != y && z + y == x && y <= z) { + ++f; + } + } + } + + if (f <= 2) { + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (y<=z && (y+z)==x) { + if (y == z) { + printf(" SQRADD(a[%d], a[%d]); ", y, y); + } else { + printf(" SQRADD2(a[%d], a[%d]); ", y, z); + } + } + } + } + } else { + // new method + /* do evens first */ + f = 0; + for (y = 0; y < N; y++) { + for (z = 0; z < N; z++) { + if (z != y && z + y == x && y <= z) { + if (f == 0) { + // first double + printf("SQRADDSC(a[%d], a[%d]); ", y, z); + f = 1; + } else { + printf("SQRADDAC(a[%d], a[%d]); ", y, z); + } + } + } + } + // forward the carry + printf("SQRADDDB; "); + if ((x&1) == 0) { + // add the square + printf("SQRADD(a[%d], a[%d]); ", x/2, x/2); + } + } +printf("\n COMBA_STORE(b[%d]);\n", x); + } +printf(" COMBA_STORE2(b[%d]);\n", N+N-1); + +printf( +" COMBA_FINI;\n" +"\n" +" B->used = %d;\n" +" B->sign = FP_ZPOS;\n" +" memcpy(B->dp, b, %d * sizeof(fp_digit));\n" +" memset(B->dp + %d, 0, (FP_SIZE - %d) * sizeof(fp_digit));\n" +" fp_clamp(B);\n" +" break;\n\n", N+N, N+N, N+N, N+N); +} + +printf("}\n}\n\n#endif /* TFM_SMALL_SET */\n"); + + return 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/generators/makefile Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,17 @@ +all: comba_sqr_gen comba_sqr_smallgen + +clean: + rm -f comba_sqr_gen + rm -f comba_sqr_smallgen + +comba_sqr_gen: comba_sqr_gen.c + gcc -o comba_sqr_gen comba_sqr_gen.c +comba_sqr_smallgen: comba_sqr_smallgen.c + gcc -o comba_sqr_smallgen comba_sqr_smallgen.c + +regen: comba_sqr_gen comba_sqr_smallgen + for i in 3 4 6 7 8 9 12 17 20 24 28 32 48 64; do \ + ./comba_sqr_gen $$i > ../sqr/fp_sqr_comba_$$i.c; \ + done + ./comba_sqr_smallgen > ../sqr/fp_sqr_comba_small_set.c +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/headers/tfm.h Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,568 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#ifndef TFM_H_ +#define TFM_H_ + +#include <stdio.h> +#include <string.h> +#include <stdlib.h> +#include <ctype.h> +#include <limits.h> + +#ifndef MIN + #define MIN(x,y) ((x)<(y)?(x):(y)) +#endif + +#ifndef MAX + #define MAX(x,y) ((x)>(y)?(x):(y)) +#endif + +/* externally define this symbol to ignore the default settings, useful for changing the build from the make process */ +#ifndef TFM_ALREADY_SET + +/* do we want the large set of small multiplications ? + Enable these if you are going to be doing a lot of small (<= 16 digit) multiplications say in ECC + Or if you're on a 64-bit machine doing RSA as a 1024-bit integer == 16 digits ;-) + */ +#define TFM_SMALL_SET + +/* do we want huge code + Enable these if you are doing 20, 24, 28, 32, 48, 64 digit multiplications (useful for RSA) + Less important on 64-bit machines as 32 digits == 2048 bits + */ +#if 0 +#define TFM_MUL3 +#define TFM_MUL4 +#define TFM_MUL6 +#define TFM_MUL7 +#define TFM_MUL8 +#define TFM_MUL9 +#define TFM_MUL12 +#define TFM_MUL17 +#endif +#define TFM_MUL20 +#define TFM_MUL24 +#define TFM_MUL28 +#define TFM_MUL32 +#define TFM_MUL48 +#define TFM_MUL64 + +#if 0 +#define TFM_SQR3 +#define TFM_SQR4 +#define TFM_SQR6 +#define TFM_SQR7 +#define TFM_SQR8 +#define TFM_SQR9 +#define TFM_SQR12 +#define TFM_SQR17 +#endif +#define TFM_SQR20 +#define TFM_SQR24 +#define TFM_SQR28 +#define TFM_SQR32 +#define TFM_SQR48 +#define TFM_SQR64 + +/* do we want some overflow checks + Not required if you make sure your numbers are within range (e.g. by default a modulus for fp_exptmod() can only be upto 2048 bits long) + */ +/* #define TFM_CHECK */ + +/* Is the target a P4 Prescott + */ +/* #define TFM_PRESCOTT */ + +/* Do we want timing resistant fp_exptmod() ? + * This makes it slower but also timing invariant with respect to the exponent + */ +/* #define TFM_TIMING_RESISTANT */ + +#endif + +/* Max size of any number in bits. Basically the largest size you will be multiplying + * should be half [or smaller] of FP_MAX_SIZE-four_digit + * + * You can externally define this or it defaults to 4096-bits [allowing multiplications upto 2048x2048 bits ] + */ +#ifndef FP_MAX_SIZE + #define FP_MAX_SIZE (4096+(8*DIGIT_BIT)) +#endif + +/* will this lib work? */ +#if (CHAR_BIT & 7) + #error CHAR_BIT must be a multiple of eight. +#endif +#if FP_MAX_SIZE % CHAR_BIT + #error FP_MAX_SIZE must be a multiple of CHAR_BIT +#endif + +/* autodetect x86-64 and make sure we are using 64-bit digits with x86-64 asm */ +#if defined(__x86_64__) + #if defined(TFM_X86) || defined(TFM_SSE2) || defined(TFM_ARM) + #error x86-64 detected, x86-32/SSE2/ARM optimizations are not valid! + #endif + #if !defined(TFM_X86_64) && !defined(TFM_NO_ASM) + #define TFM_X86_64 + #endif +#endif +#if defined(TFM_X86_64) + #if !defined(FP_64BIT) + #define FP_64BIT + #endif +#endif + +/* try to detect x86-32 */ +#if defined(__i386__) && !defined(TFM_SSE2) + #if defined(TFM_X86_64) || defined(TFM_ARM) + #error x86-32 detected, x86-64/ARM optimizations are not valid! + #endif + #if !defined(TFM_X86) && !defined(TFM_NO_ASM) + #define TFM_X86 + #endif +#endif + +/* make sure we're 32-bit for x86-32/sse/arm/ppc32 */ +#if (defined(TFM_X86) || defined(TFM_SSE2) || defined(TFM_ARM) || defined(TFM_PPC32)) && defined(FP_64BIT) + #warning x86-32, SSE2 and ARM, PPC32 optimizations require 32-bit digits (undefining) + #undef FP_64BIT +#endif + +/* multi asms? */ +#ifdef TFM_X86 + #define TFM_ASM +#endif +#ifdef TFM_X86_64 + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif +#ifdef TFM_SSE2 + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif +#ifdef TFM_ARM + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif +#ifdef TFM_PPC32 + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif +#ifdef TFM_PPC64 + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif +#ifdef TFM_AVR32 + #ifdef TFM_ASM + #error TFM_ASM already defined! + #endif + #define TFM_ASM +#endif + +/* we want no asm? */ +#ifdef TFM_NO_ASM + #undef TFM_X86 + #undef TFM_X86_64 + #undef TFM_SSE2 + #undef TFM_ARM + #undef TFM_PPC32 + #undef TFM_PPC64 + #undef TFM_AVR32 + #undef TFM_ASM +#endif + +/* ECC helpers */ +#ifdef TFM_ECC192 + #ifdef FP_64BIT + #define TFM_MUL3 + #define TFM_SQR3 + #else + #define TFM_MUL6 + #define TFM_SQR6 + #endif +#endif + +#ifdef TFM_ECC224 + #ifdef FP_64BIT + #define TFM_MUL4 + #define TFM_SQR4 + #else + #define TFM_MUL7 + #define TFM_SQR7 + #endif +#endif + +#ifdef TFM_ECC256 + #ifdef FP_64BIT + #define TFM_MUL4 + #define TFM_SQR4 + #else + #define TFM_MUL8 + #define TFM_SQR8 + #endif +#endif + +#ifdef TFM_ECC384 + #ifdef FP_64BIT + #define TFM_MUL6 + #define TFM_SQR6 + #else + #define TFM_MUL12 + #define TFM_SQR12 + #endif +#endif + +#ifdef TFM_ECC521 + #ifdef FP_64BIT + #define TFM_MUL9 + #define TFM_SQR9 + #else + #define TFM_MUL17 + #define TFM_SQR17 + #endif +#endif + + +/* some default configurations. + */ +#if defined(FP_64BIT) + /* for GCC only on supported platforms */ +#ifndef CRYPT + typedef unsigned long ulong64; +#endif + typedef ulong64 fp_digit; + typedef unsigned long fp_word __attribute__ ((mode(TI))); +#else + /* this is to make porting into LibTomCrypt easier :-) */ +#ifndef CRYPT + #if defined(_MSC_VER) || defined(__BORLANDC__) + typedef unsigned __int64 ulong64; + typedef signed __int64 long64; + #else + typedef unsigned long long ulong64; + typedef signed long long long64; + #endif +#endif + typedef unsigned long fp_digit; + typedef ulong64 fp_word; +#endif + +/* # of digits this is */ +#define DIGIT_BIT (int)((CHAR_BIT) * sizeof(fp_digit)) +#define FP_MASK (fp_digit)(-1) +#define FP_SIZE (FP_MAX_SIZE/DIGIT_BIT) + +/* signs */ +#define FP_ZPOS 0 +#define FP_NEG 1 + +/* return codes */ +#define FP_OKAY 0 +#define FP_VAL 1 +#define FP_MEM 2 + +/* equalities */ +#define FP_LT -1 /* less than */ +#define FP_EQ 0 /* equal to */ +#define FP_GT 1 /* greater than */ + +/* replies */ +#define FP_YES 1 /* yes response */ +#define FP_NO 0 /* no response */ + +/* a FP type */ +typedef struct { + fp_digit dp[FP_SIZE]; + int used, + sign; +} fp_int; + +/* functions */ + +/* returns a TFM ident string useful for debugging... */ +const char *fp_ident(void); + +/* initialize [or zero] an fp int */ +#define fp_init(a) (void)memset((a), 0, sizeof(fp_int)) +#define fp_zero(a) fp_init(a) + +/* zero/even/odd ? */ +#define fp_iszero(a) (((a)->used == 0) ? FP_YES : FP_NO) +#define fp_iseven(a) (((a)->used >= 0 && (((a)->dp[0] & 1) == 0)) ? FP_YES : FP_NO) +#define fp_isodd(a) (((a)->used > 0 && (((a)->dp[0] & 1) == 1)) ? FP_YES : FP_NO) + +/* set to a small digit */ +void fp_set(fp_int *a, fp_digit b); + +/* copy from a to b */ +#define fp_copy(a, b) (void)(((a) != (b)) && memcpy((b), (a), sizeof(fp_int))) +#define fp_init_copy(a, b) fp_copy(b, a) + +/* clamp digits */ +#define fp_clamp(a) { while ((a)->used && (a)->dp[(a)->used-1] == 0) --((a)->used); (a)->sign = (a)->used ? (a)->sign : FP_ZPOS; } + +/* negate and absolute */ +#define fp_neg(a, b) { fp_copy(a, b); (b)->sign ^= 1; fp_clamp(b); } +#define fp_abs(a, b) { fp_copy(a, b); (b)->sign = 0; } + +/* right shift x digits */ +void fp_rshd(fp_int *a, int x); + +/* left shift x digits */ +void fp_lshd(fp_int *a, int x); + +/* signed comparison */ +int fp_cmp(fp_int *a, fp_int *b); + +/* unsigned comparison */ +int fp_cmp_mag(fp_int *a, fp_int *b); + +/* power of 2 operations */ +void fp_div_2d(fp_int *a, int b, fp_int *c, fp_int *d); +void fp_mod_2d(fp_int *a, int b, fp_int *c); +void fp_mul_2d(fp_int *a, int b, fp_int *c); +void fp_2expt (fp_int *a, int b); +void fp_mul_2(fp_int *a, fp_int *c); +void fp_div_2(fp_int *a, fp_int *c); + +/* Counts the number of lsbs which are zero before the first zero bit */ +int fp_cnt_lsb(fp_int *a); + +/* c = a + b */ +void fp_add(fp_int *a, fp_int *b, fp_int *c); + +/* c = a - b */ +void fp_sub(fp_int *a, fp_int *b, fp_int *c); + +/* c = a * b */ +void fp_mul(fp_int *a, fp_int *b, fp_int *c); + +/* b = a*a */ +void fp_sqr(fp_int *a, fp_int *b); + +/* a/b => cb + d == a */ +int fp_div(fp_int *a, fp_int *b, fp_int *c, fp_int *d); + +/* c = a mod b, 0 <= c < b */ +int fp_mod(fp_int *a, fp_int *b, fp_int *c); + +/* compare against a single digit */ +int fp_cmp_d(fp_int *a, fp_digit b); + +/* c = a + b */ +void fp_add_d(fp_int *a, fp_digit b, fp_int *c); + +/* c = a - b */ +void fp_sub_d(fp_int *a, fp_digit b, fp_int *c); + +/* c = a * b */ +void fp_mul_d(fp_int *a, fp_digit b, fp_int *c); + +/* a/b => cb + d == a */ +int fp_div_d(fp_int *a, fp_digit b, fp_int *c, fp_digit *d); + +/* c = a mod b, 0 <= c < b */ +int fp_mod_d(fp_int *a, fp_digit b, fp_digit *c); + +/* ---> number theory <--- */ +/* d = a + b (mod c) */ +int fp_addmod(fp_int *a, fp_int *b, fp_int *c, fp_int *d); + +/* d = a - b (mod c) */ +int fp_submod(fp_int *a, fp_int *b, fp_int *c, fp_int *d); + +/* d = a * b (mod c) */ +int fp_mulmod(fp_int *a, fp_int *b, fp_int *c, fp_int *d); + +/* c = a * a (mod b) */ +int fp_sqrmod(fp_int *a, fp_int *b, fp_int *c); + +/* c = 1/a (mod b) */ +int fp_invmod(fp_int *a, fp_int *b, fp_int *c); + +/* c = (a, b) */ +void fp_gcd(fp_int *a, fp_int *b, fp_int *c); + +/* c = [a, b] */ +void fp_lcm(fp_int *a, fp_int *b, fp_int *c); + +/* setups the montgomery reduction */ +int fp_montgomery_setup(fp_int *a, fp_digit *mp); + +/* computes a = B**n mod b without division or multiplication useful for + * normalizing numbers in a Montgomery system. + */ +void fp_montgomery_calc_normalization(fp_int *a, fp_int *b); + +/* computes x/R == x (mod N) via Montgomery Reduction */ +void fp_montgomery_reduce(fp_int *a, fp_int *m, fp_digit mp); + +/* d = a**b (mod c) */ +int fp_exptmod(fp_int *a, fp_int *b, fp_int *c, fp_int *d); + +/* primality stuff */ + +/* perform a Miller-Rabin test of a to the base b and store result in "result" */ +void fp_prime_miller_rabin (fp_int * a, fp_int * b, int *result); + +/* 256 trial divisions + 8 Miller-Rabins, returns FP_YES if probable prime */ +int fp_isprime(fp_int *a); + +/* Primality generation flags */ +#define TFM_PRIME_BBS 0x0001 /* BBS style prime */ +#define TFM_PRIME_SAFE 0x0002 /* Safe prime (p-1)/2 == prime */ +#define TFM_PRIME_2MSB_OFF 0x0004 /* force 2nd MSB to 0 */ +#define TFM_PRIME_2MSB_ON 0x0008 /* force 2nd MSB to 1 */ + +/* callback for fp_prime_random, should fill dst with random bytes and return how many read [upto len] */ +typedef int tfm_prime_callback(unsigned char *dst, int len, void *dat); + +#define fp_prime_random(a, t, size, bbs, cb, dat) fp_prime_random_ex(a, t, ((size) * 8) + 1, (bbs==1)?TFM_PRIME_BBS:0, cb, dat) + +int fp_prime_random_ex(fp_int *a, int t, int size, int flags, tfm_prime_callback cb, void *dat); + +/* radix conersions */ +int fp_count_bits(fp_int *a); + +int fp_unsigned_bin_size(fp_int *a); +void fp_read_unsigned_bin(fp_int *a, unsigned char *b, int c); +void fp_to_unsigned_bin(fp_int *a, unsigned char *b); + +int fp_signed_bin_size(fp_int *a); +void fp_read_signed_bin(fp_int *a, unsigned char *b, int c); +void fp_to_signed_bin(fp_int *a, unsigned char *b); + +int fp_read_radix(fp_int *a, char *str, int radix); +int fp_toradix(fp_int *a, char *str, int radix); +int fp_toradix_n(fp_int * a, char *str, int radix, int maxlen); + + +/* VARIOUS LOW LEVEL STUFFS */ +void s_fp_add(fp_int *a, fp_int *b, fp_int *c); +void s_fp_sub(fp_int *a, fp_int *b, fp_int *c); +void fp_reverse(unsigned char *s, int len); + +void fp_mul_comba(fp_int *A, fp_int *B, fp_int *C); + +#ifdef TFM_SMALL_SET +void fp_mul_comba_small(fp_int *A, fp_int *B, fp_int *C); +#endif + +#ifdef TFM_MUL3 +void fp_mul_comba3(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL4 +void fp_mul_comba4(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL6 +void fp_mul_comba6(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL7 +void fp_mul_comba7(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL8 +void fp_mul_comba8(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL9 +void fp_mul_comba9(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL12 +void fp_mul_comba12(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL17 +void fp_mul_comba17(fp_int *A, fp_int *B, fp_int *C); +#endif + +#ifdef TFM_MUL20 +void fp_mul_comba20(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL24 +void fp_mul_comba24(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL28 +void fp_mul_comba28(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL32 +void fp_mul_comba32(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL48 +void fp_mul_comba48(fp_int *A, fp_int *B, fp_int *C); +#endif +#ifdef TFM_MUL64 +void fp_mul_comba64(fp_int *A, fp_int *B, fp_int *C); +#endif + +void fp_sqr_comba(fp_int *A, fp_int *B); + +#ifdef TFM_SMALL_SET +void fp_sqr_comba_small(fp_int *A, fp_int *B); +#endif + +#ifdef TFM_SQR3 +void fp_sqr_comba3(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR4 +void fp_sqr_comba4(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR6 +void fp_sqr_comba6(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR7 +void fp_sqr_comba7(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR8 +void fp_sqr_comba8(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR9 +void fp_sqr_comba9(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR12 +void fp_sqr_comba12(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR17 +void fp_sqr_comba17(fp_int *A, fp_int *B); +#endif + +#ifdef TFM_SQR20 +void fp_sqr_comba20(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR24 +void fp_sqr_comba24(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR28 +void fp_sqr_comba28(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR32 +void fp_sqr_comba32(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR48 +void fp_sqr_comba48(fp_int *A, fp_int *B); +#endif +#ifdef TFM_SQR64 +void fp_sqr_comba64(fp_int *A, fp_int *B); +#endif +extern const char *fp_s_rmap; + +#endif + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/misc/fp_ident.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,95 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include "tfm.h" + +const char *fp_ident(void) +{ + static char buf[1024]; + + memset(buf, 0, sizeof(buf)); + snprintf(buf, sizeof(buf)-1, +"TomsFastMath (%s)\n" +"\n" +"Sizeofs\n" +"\tfp_digit = %u\n" +"\tfp_word = %u\n" +"\n" +"FP_MAX_SIZE = %u\n" +"\n" +"Defines: \n" +#ifdef __i386__ +" __i386__ " +#endif +#ifdef __x86_64__ +" __x86_64__ " +#endif +#ifdef TFM_X86 +" TFM_X86 " +#endif +#ifdef TFM_X86_64 +" TFM_X86_64 " +#endif +#ifdef TFM_SSE2 +" TFM_SSE2 " +#endif +#ifdef TFM_ARM +" TFM_ARM " +#endif +#ifdef TFM_PPC32 +" TFM_PPC32 " +#endif +#ifdef TFM_AVR32 +" TFM_AVR32 " +#endif +#ifdef TFM_ECC192 +" TFM_ECC192 " +#endif +#ifdef TFM_ECC224 +" TFM_ECC224 " +#endif +#ifdef TFM_ECC384 +" TFM_ECC384 " +#endif +#ifdef TFM_ECC521 +" TFM_ECC521 " +#endif + +#ifdef TFM_NO_ASM +" TFM_NO_ASM " +#endif +#ifdef FP_64BIT +" FP_64BIT " +#endif +#ifdef TFM_HUGE +" TFM_HUGE " +#endif +"\n", __DATE__, sizeof(fp_digit), sizeof(fp_word), FP_MAX_SIZE); + + if (sizeof(fp_digit) == sizeof(fp_word)) { + strncat(buf, "WARNING: sizeof(fp_digit) == sizeof(fp_word), this build is likely to not work properly.\n", + sizeof(buf)-1); + } + return buf; +} + +#ifdef STANDALONE + +int main(void) +{ + printf("%s\n", fp_ident()); + return 0; +} + +#endif + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/misc/fp_set.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,21 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_set(fp_int *a, fp_digit b) +{ + fp_zero(a); + a->dp[0] = b; + a->used = a->dp[0] ? 1 : 0; +} + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mont/fp_mont_small.i Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,3838 @@ +#ifdef TFM_SMALL_MONT_SET +/* computes x/R == x (mod N) via Montgomery Reduction */ +void fp_montgomery_reduce_small(fp_int *a, fp_int *m, fp_digit mp) +{ + fp_digit c[FP_SIZE], *_c, *tmpm, mu, cy; + int oldused, x, y, pa; + +#if defined(USE_MEMSET) + /* now zero the buff */ + memset(c, 0, sizeof c); +#endif + pa = m->used; + + /* copy the input */ + oldused = a->used; + for (x = 0; x < oldused; x++) { + c[x] = a->dp[x]; + } +#if !defined(USE_MEMSET) + for (; x < 2*pa+3; x++) { + c[x] = 0; + } +#endif + MONT_START; + + switch (pa) { + case 1: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; +#else + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 2: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 3: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 4: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 5: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 6: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 7: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 8: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 9: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 10: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 11: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 12: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 11; cy = 0; + LOOP_START; + _c = c + 11; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 13: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 11; cy = 0; + LOOP_START; + _c = c + 11; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 12; cy = 0; + LOOP_START; + _c = c + 12; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 14: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 11; cy = 0; + LOOP_START; + _c = c + 11; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 12; cy = 0; + LOOP_START; + _c = c + 12; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 13; cy = 0; + LOOP_START; + _c = c + 13; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 15: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 11; cy = 0; + LOOP_START; + _c = c + 11; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 12; cy = 0; + LOOP_START; + _c = c + 12; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 13; cy = 0; + LOOP_START; + _c = c + 13; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 14; cy = 0; + LOOP_START; + _c = c + 14; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + case 16: + x = 0; cy = 0; + LOOP_START; + _c = c + 0; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 1; cy = 0; + LOOP_START; + _c = c + 1; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 2; cy = 0; + LOOP_START; + _c = c + 2; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 3; cy = 0; + LOOP_START; + _c = c + 3; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 4; cy = 0; + LOOP_START; + _c = c + 4; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 5; cy = 0; + LOOP_START; + _c = c + 5; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 6; cy = 0; + LOOP_START; + _c = c + 6; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 7; cy = 0; + LOOP_START; + _c = c + 7; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 8; cy = 0; + LOOP_START; + _c = c + 8; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 9; cy = 0; + LOOP_START; + _c = c + 9; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 10; cy = 0; + LOOP_START; + _c = c + 10; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 11; cy = 0; + LOOP_START; + _c = c + 11; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 12; cy = 0; + LOOP_START; + _c = c + 12; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 13; cy = 0; + LOOP_START; + _c = c + 13; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 14; cy = 0; + LOOP_START; + _c = c + 14; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + x = 15; cy = 0; + LOOP_START; + _c = c + 15; + tmpm = m->dp; +#ifdef INNERMUL8 + INNERMUL8; _c += 8; tmpm += 8; + INNERMUL8; _c += 8; tmpm += 8; +#else + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; + INNERMUL; ++_c; +#endif + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + break; + } + /* now copy out */ + _c = c + pa; + tmpm = a->dp; + for (x = 0; x < pa+1; x++) { + *tmpm++ = *_c++; + } + + for (; x < oldused; x++) { + *tmpm++ = 0; + } + + MONT_FINI; + + a->used = pa+1; + fp_clamp(a); + + /* if A >= m then A = A - m */ + if (fp_cmp_mag (a, m) != FP_LT) { + s_fp_sub (a, m, a); + } +} + +#endif
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mont/fp_montgomery_calc_normalization.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,43 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* computes a = B**n mod b without division or multiplication useful for + * normalizing numbers in a Montgomery system. + */ +void fp_montgomery_calc_normalization(fp_int *a, fp_int *b) +{ + int x, bits; + + /* how many bits of last digit does b use */ + bits = fp_count_bits (b) % DIGIT_BIT; + if (!bits) bits = DIGIT_BIT; + + /* compute A = B^(n-1) * 2^(bits-1) */ + if (b->used > 1) { + fp_2expt (a, (b->used - 1) * DIGIT_BIT + bits - 1); + } else { + fp_set(a, 1); + bits = 1; + } + + /* now compute C = A * B mod b */ + for (x = bits - 1; x < (int)DIGIT_BIT; x++) { + fp_mul_2 (a, a); + if (fp_cmp_mag (a, b) != FP_LT) { + s_fp_sub (a, b, a); + } + } +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mont/fp_montgomery_reduce.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,556 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/******************************************************************/ +#if defined(TFM_X86) && !defined(TFM_SSE2) +/* x86-32 code */ + +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + "movl %5,%%eax \n\t" \ + "mull %4 \n\t" \ + "addl %1,%%eax \n\t" \ + "adcl $0,%%edx \n\t" \ + "addl %%eax,%0 \n\t" \ + "adcl $0,%%edx \n\t" \ + "movl %%edx,%1 \n\t" \ +:"=g"(_c[LO]), "=r"(cy) \ +:"0"(_c[LO]), "1"(cy), "g"(mu), "g"(*tmpm++) \ +: "%eax", "%edx", "%cc") + +#define PROPCARRY \ +asm( \ + "addl %1,%0 \n\t" \ + "setb %%al \n\t" \ + "movzbl %%al,%1 \n\t" \ +:"=g"(_c[LO]), "=r"(cy) \ +:"0"(_c[LO]), "1"(cy) \ +: "%eax", "%cc") + +/******************************************************************/ +#elif defined(TFM_X86_64) +/* x86-64 code */ + +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + "movq %5,%%rax \n\t" \ + "mulq %4 \n\t" \ + "addq %1,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "addq %%rax,%0 \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rdx,%1 \n\t" \ +:"=g"(_c[LO]), "=r"(cy) \ +:"0"(_c[LO]), "1"(cy), "r"(mu), "r"(*tmpm++) \ +: "%rax", "%rdx", "%cc") + +#define INNERMUL8 \ + asm( \ + "movq 0(%5),%%rax \n\t" \ + "movq 0(%2),%%r10 \n\t" \ + "movq 0x8(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x8(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x10(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x10(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x8(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x18(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x18(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x10(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x20(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x20(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x18(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x28(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x28(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x20(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x30(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x30(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x28(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "movq 0x38(%5),%%r11 \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq 0x38(%2),%%r10 \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x30(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ + "movq %%r11,%%rax \n\t" \ + "mulq %4 \n\t" \ + "addq %%r10,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "addq %3,%%rax \n\t" \ + "adcq $0,%%rdx \n\t" \ + "movq %%rax,0x38(%0) \n\t" \ + "movq %%rdx,%1 \n\t" \ + \ +:"=r"(_c), "=r"(cy) \ +: "0"(_c), "1"(cy), "g"(mu), "r"(tmpm)\ +: "%rax", "%rdx", "%r10", "%r11", "%cc") + + +#define PROPCARRY \ +asm( \ + "addq %1,%0 \n\t" \ + "setb %%al \n\t" \ + "movzbq %%al,%1 \n\t" \ +:"=g"(_c[LO]), "=r"(cy) \ +:"0"(_c[LO]), "1"(cy) \ +: "%rax", "%cc") + +/******************************************************************/ +#elif defined(TFM_SSE2) +/* SSE2 code (assumes 32-bit fp_digits) */ +/* XMM register assignments: + * xmm0 *tmpm++, then Mu * (*tmpm++) + * xmm1 c[x], then Mu + * xmm2 mp + * xmm3 cy + * xmm4 _c[LO] + */ + +#define MONT_START \ + asm("movd %0,%%mm2"::"g"(mp)) + +#define MONT_FINI \ + asm("emms") + +#define LOOP_START \ +asm( \ +"movd %0,%%mm1 \n\t" \ +"pxor %%mm3,%%mm3 \n\t" \ +"pmuludq %%mm2,%%mm1 \n\t" \ +:: "g"(c[x])) + +/* pmuludq on mmx registers does a 32x32->64 multiply. */ +#define INNERMUL \ +asm( \ + "movd %1,%%mm4 \n\t" \ + "movd %2,%%mm0 \n\t" \ + "paddq %%mm4,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm0 \n\t" \ + "paddq %%mm0,%%mm3 \n\t" \ + "movd %%mm3,%0 \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +:"=g"(_c[LO]) : "0"(_c[LO]), "g"(*tmpm++) ); + +#define INNERMUL8 \ +asm( \ + "movd 0(%1),%%mm4 \n\t" \ + "movd 0(%2),%%mm0 \n\t" \ + "paddq %%mm4,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm0 \n\t" \ + "movd 4(%2),%%mm5 \n\t" \ + "paddq %%mm0,%%mm3 \n\t" \ + "movd 4(%1),%%mm6 \n\t" \ + "movd %%mm3,0(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm6,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm5 \n\t" \ + "movd 8(%2),%%mm6 \n\t" \ + "paddq %%mm5,%%mm3 \n\t" \ + "movd 8(%1),%%mm7 \n\t" \ + "movd %%mm3,4(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm7,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm6 \n\t" \ + "movd 12(%2),%%mm7 \n\t" \ + "paddq %%mm6,%%mm3 \n\t" \ + "movd 12(%1),%%mm5 \n\t" \ + "movd %%mm3,8(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm5,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm7 \n\t" \ + "movd 16(%2),%%mm5 \n\t" \ + "paddq %%mm7,%%mm3 \n\t" \ + "movd 16(%1),%%mm6 \n\t" \ + "movd %%mm3,12(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm6,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm5 \n\t" \ + "movd 20(%2),%%mm6 \n\t" \ + "paddq %%mm5,%%mm3 \n\t" \ + "movd 20(%1),%%mm7 \n\t" \ + "movd %%mm3,16(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm7,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm6 \n\t" \ + "movd 24(%2),%%mm7 \n\t" \ + "paddq %%mm6,%%mm3 \n\t" \ + "movd 24(%1),%%mm5 \n\t" \ + "movd %%mm3,20(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm5,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm7 \n\t" \ + "movd 28(%2),%%mm5 \n\t" \ + "paddq %%mm7,%%mm3 \n\t" \ + "movd 28(%1),%%mm6 \n\t" \ + "movd %%mm3,24(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +\ + "paddq %%mm6,%%mm3 \n\t" \ + "pmuludq %%mm1,%%mm5 \n\t" \ + "paddq %%mm5,%%mm3 \n\t" \ + "movd %%mm3,28(%0) \n\t" \ + "psrlq $32, %%mm3 \n\t" \ +:"=r"(_c) : "0"(_c), "g"(tmpm) ); + +#define LOOP_END \ +asm( "movd %%mm3,%0 \n" :"=r"(cy)) + +#define PROPCARRY \ +asm( \ + "addl %1,%0 \n\t" \ + "setb %%al \n\t" \ + "movzbl %%al,%1 \n\t" \ +:"=g"(_c[LO]), "=r"(cy) \ +:"0"(_c[LO]), "1"(cy) \ +: "%eax", "%cc") + +/******************************************************************/ +#elif defined(TFM_ARM) + /* ARMv4 code */ + +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + " LDR r0,%1 \n\t" \ + " ADDS r0,r0,%0 \n\t" \ + " MOVCS %0,#1 \n\t" \ + " MOVCC %0,#0 \n\t" \ + " UMLAL r0,%0,%3,%4 \n\t" \ + " STR r0,%1 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"r"(mu),"r"(*tmpm++),"1"(_c[0]):"r0","%cc"); + +#define PROPCARRY \ +asm( \ + " LDR r0,%1 \n\t" \ + " ADDS r0,r0,%0 \n\t" \ + " STR r0,%1 \n\t" \ + " MOVCS %0,#1 \n\t" \ + " MOVCC %0,#0 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"1"(_c[0]):"r0","%cc"); + +/******************************************************************/ +#elif defined(TFM_PPC32) + +/* PPC32 */ +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + " mullw 16,%3,%4 \n\t" \ + " mulhwu 17,%3,%4 \n\t" \ + " addc 16,16,%0 \n\t" \ + " addze 17,17 \n\t" \ + " lwz 18,%1 \n\t" \ + " addc 16,16,18 \n\t" \ + " addze %0,17 \n\t" \ + " stw 16,%1 \n\t" \ +:"=r"(cy),"=g"(_c[0]):"0"(cy),"r"(mu),"r"(tmpm[0]),"1"(_c[0]):"16", "17", "18","%cc"); ++tmpm; + +#define PROPCARRY \ +asm( \ + " lwz 16,%1 \n\t" \ + " addc 16,16,%0 \n\t" \ + " stw 16,%1 \n\t" \ + " xor %0,%0,%0 \n\t" \ + " addze %0,%0 \n\t" \ +:"=r"(cy),"=g"(_c[0]):"0"(cy),"1"(_c[0]):"16","%cc"); + +/******************************************************************/ +#elif defined(TFM_PPC64) + +/* PPC64 */ +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + " mulld r16,%3,%4 \n\t" \ + " mulhdu r17,%3,%4 \n\t" \ + " addc r16,16,%0 \n\t" \ + " addze r17,r17 \n\t" \ + " ldx r18,0,%1 \n\t" \ + " addc r16,r16,r18 \n\t" \ + " addze %0,r17 \n\t" \ + " sdx r16,0,%1 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"r"(mu),"r"(tmpm[0]),"1"(_c[0]):"r16", "r17", "r18","%cc"); ++tmpm; + +#define PROPCARRY \ +asm( \ + " ldx r16,0,%1 \n\t" \ + " addc r16,r16,%0 \n\t" \ + " sdx r16,0,%1 \n\t" \ + " xor %0,%0,%0 \n\t" \ + " addze %0,%0 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"1"(_c[0]):"r16","%cc"); + +/******************************************************************/ +#elif defined(TFM_AVR32) + +/* AVR32 */ +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + " ld.w r2,%1 \n\t" \ + " add r2,%0 \n\t" \ + " eor r3,r3 \n\t" \ + " acr r3 \n\t" \ + " macu.d r2,%3,%4 \n\t" \ + " st.w %1,r2 \n\t" \ + " mov %0,r3 \n\t" \ +:"=r"(cy),"=r"(_c):"0"(cy),"r"(mu),"r"(*tmpm++),"1"(_c):"r2","r3"); + +#define PROPCARRY \ +asm( \ + " ld.w r2,%1 \n\t" \ + " add r2,%0 \n\t" \ + " st.w %1,r2 \n\t" \ + " eor %0,%0 \n\t" \ + " acr %0 \n\t" \ +:"=r"(cy),"=r"(&_c[0]):"0"(cy),"1"(&_c[0]):"r2","%cc"); + +/******************************************************************/ +#elif defined(TFM_MIPS) + +/* MIPS */ +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ +asm( \ + " multu %3,%4 \n\t" \ + " mflo $12 \n\t" \ + " mfhi $13 \n\t" \ + " addu $12,$12,%0 \n\t" \ + " sltu $10,$12,%0 \n\t" \ + " addu $13,$13,$10 \n\t" \ + " lw $10,%1 \n\t" \ + " addu $12,$12,$10 \n\t" \ + " sltu $10,$12,$10 \n\t" \ + " addu %0,$13,$10 \n\t" \ + " sw $12,%1 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"r"(mu),"r"(tmpm[0]),"1"(_c[0]):"$10","$12","$13"); ++tmpm; + +#define PROPCARRY \ +asm( \ + " lw $10,%1 \n\t" \ + " addu $10,$10,%0 \n\t" \ + " sw $10,%1 \n\t" \ + " sltu %0,$10,%0 \n\t" \ +:"=r"(cy),"=m"(_c[0]):"0"(cy),"1"(_c[0]):"$10"); + +/******************************************************************/ +#else + +/* ISO C code */ +#define MONT_START +#define MONT_FINI +#define LOOP_END +#define LOOP_START \ + mu = c[x] * mp + +#define INNERMUL \ + do { fp_word t; \ + _c[0] = t = ((fp_word)_c[0] + (fp_word)cy) + \ + (((fp_word)mu) * ((fp_word)*tmpm++)); \ + cy = (t >> DIGIT_BIT); \ + } while (0) + +#define PROPCARRY \ + do { fp_digit t = _c[0] += cy; cy = (t < cy); } while (0) + +#endif +/******************************************************************/ + + +#define LO 0 + +#ifdef TFM_SMALL_MONT_SET +#include "fp_mont_small.i" +#endif + +/* computes x/R == x (mod N) via Montgomery Reduction */ +void fp_montgomery_reduce(fp_int *a, fp_int *m, fp_digit mp) +{ + fp_digit c[FP_SIZE], *_c, *tmpm, mu; + int oldused, x, y, pa; + + /* bail if too large */ + if (m->used > (FP_SIZE/2)) { + return; + } + +#ifdef TFM_SMALL_MONT_SET + if (m->used <= 16) { + fp_montgomery_reduce_small(a, m, mp); + return; + } +#endif + +#if defined(USE_MEMSET) + /* now zero the buff */ + memset(c, 0, sizeof c); +#endif + pa = m->used; + + /* copy the input */ + oldused = a->used; + for (x = 0; x < oldused; x++) { + c[x] = a->dp[x]; + } +#if !defined(USE_MEMSET) + for (; x < 2*pa+1; x++) { + c[x] = 0; + } +#endif + MONT_START; + + for (x = 0; x < pa; x++) { + fp_digit cy = 0; + /* get Mu for this round */ + LOOP_START; + _c = c + x; + tmpm = m->dp; + y = 0; + #if (defined(TFM_SSE2) || defined(TFM_X86_64)) + for (; y < (pa & ~7); y += 8) { + INNERMUL8; + _c += 8; + tmpm += 8; + } + #endif + + for (; y < pa; y++) { + INNERMUL; + ++_c; + } + LOOP_END; + while (cy) { + PROPCARRY; + ++_c; + } + } + + /* now copy out */ + _c = c + pa; + tmpm = a->dp; + for (x = 0; x < pa+1; x++) { + *tmpm++ = *_c++; + } + + for (; x < oldused; x++) { + *tmpm++ = 0; + } + + MONT_FINI; + + a->used = pa+1; + fp_clamp(a); + + /* if A >= m then A = A - m */ + if (fp_cmp_mag (a, m) != FP_LT) { + s_fp_sub (a, m, a); + } +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mont/fp_montgomery_setup.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,48 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* setups the montgomery reduction */ +int fp_montgomery_setup(fp_int *a, fp_digit *rho) +{ + fp_digit x, b; + +/* fast inversion mod 2**k + * + * Based on the fact that + * + * XA = 1 (mod 2**n) => (X(2-XA)) A = 1 (mod 2**2n) + * => 2*X*A - X*X*A*A = 1 + * => 2*(1) - (1) = 1 + */ + b = a->dp[0]; + + if ((b & 1) == 0) { + return FP_VAL; + } + + x = (((b + 2) & 4) << 1) + b; /* here x*a==1 mod 2**4 */ + x *= 2 - b * x; /* here x*a==1 mod 2**8 */ + x *= 2 - b * x; /* here x*a==1 mod 2**16 */ + x *= 2 - b * x; /* here x*a==1 mod 2**32 */ +#ifdef FP_64BIT + x *= 2 - b * x; /* here x*a==1 mod 2**64 */ +#endif + + /* rho = -1/m mod b */ + *rho = (((fp_word) 1 << ((fp_word) DIGIT_BIT)) - ((fp_word)x)); + + return FP_OKAY; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mul/fp_mul.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,128 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a * b */ +void fp_mul(fp_int *A, fp_int *B, fp_int *C) +{ + int y, yy; + + /* call generic if we're out of range */ + if (A->used + B->used > FP_SIZE) { + fp_mul_comba(A, B, C); + return ; + } + + y = MAX(A->used, B->used); + yy = MIN(A->used, B->used); + /* pick a comba (unrolled 4/8/16/32 x or rolled) based on the size + of the largest input. We also want to avoid doing excess mults if the + inputs are not close to the next power of two. That is, for example, + if say y=17 then we would do (32-17)^2 = 225 unneeded multiplications + */ + +#ifdef TFM_MUL3 + if (y <= 3) { + fp_mul_comba3(A,B,C); + return; + } +#endif +#ifdef TFM_MUL4 + if (y == 4) { + fp_mul_comba4(A,B,C); + return; + } +#endif +#ifdef TFM_MUL6 + if (y <= 6) { + fp_mul_comba6(A,B,C); + return; + } +#endif +#ifdef TFM_MUL7 + if (y == 7) { + fp_mul_comba7(A,B,C); + return; + } +#endif +#ifdef TFM_MUL8 + if (y == 8) { + fp_mul_comba8(A,B,C); + return; + } +#endif +#ifdef TFM_MUL9 + if (y == 9) { + fp_mul_comba9(A,B,C); + return; + } +#endif +#ifdef TFM_MUL12 + if (y <= 12) { + fp_mul_comba12(A,B,C); + return; + } +#endif +#ifdef TFM_MUL17 + if (y <= 17) { + fp_mul_comba17(A,B,C); + return; + } +#endif + +#ifdef TFM_SMALL_SET + if (y <= 16) { + fp_mul_comba_small(A,B,C); + return; + } +#endif +#if defined(TFM_MUL20) + if (y <= 20) { + fp_mul_comba20(A,B,C); + return; + } +#endif +#if defined(TFM_MUL24) + if (yy >= 16 && y <= 24) { + fp_mul_comba24(A,B,C); + return; + } +#endif +#if defined(TFM_MUL28) + if (yy >= 20 && y <= 28) { + fp_mul_comba28(A,B,C); + return; + } +#endif +#if defined(TFM_MUL32) + if (yy >= 24 && y <= 32) { + fp_mul_comba32(A,B,C); + return; + } +#endif +#if defined(TFM_MUL48) + if (yy >= 40 && y <= 48) { + fp_mul_comba48(A,B,C); + return; + } +#endif +#if defined(TFM_MUL64) + if (yy >= 56 && y <= 64) { + fp_mul_comba64(A,B,C); + return; + } +#endif + fp_mul_comba(A,B,C); +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mul/fp_mul_2.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,67 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +void fp_mul_2(fp_int * a, fp_int * b) +{ + int x, oldused; + + oldused = b->used; + b->used = a->used; + + { + register fp_digit r, rr, *tmpa, *tmpb; + + /* alias for source */ + tmpa = a->dp; + + /* alias for dest */ + tmpb = b->dp; + + /* carry */ + r = 0; + for (x = 0; x < a->used; x++) { + + /* get what will be the *next* carry bit from the + * MSB of the current digit + */ + rr = *tmpa >> ((fp_digit)(DIGIT_BIT - 1)); + + /* now shift up this digit, add in the carry [from the previous] */ + *tmpb++ = ((*tmpa++ << ((fp_digit)1)) | r); + + /* copy the carry that would be from the source + * digit into the next iteration + */ + r = rr; + } + + /* new leading digit? */ + if (r != 0 && b->used != (FP_SIZE-1)) { + /* add a MSB which is always 1 at this point */ + *tmpb = 1; + ++(b->used); + } + + /* now zero any excess digits on the destination + * that we didn't write to + */ + tmpb = b->dp + b->used; + for (x = b->used; x < oldused; x++) { + *tmpb++ = 0; + } + } + b->sign = a->sign; +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mul/fp_mul_2d.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,47 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ +#include <tfm.h> + +/* c = a * 2**d */ +void fp_mul_2d(fp_int *a, int b, fp_int *c) +{ + fp_digit carry, carrytmp, shift; + int x; + + /* copy it */ + fp_copy(a, c); + + /* handle whole digits */ + if (b >= DIGIT_BIT) { + fp_lshd(c, b/DIGIT_BIT); + } + b %= DIGIT_BIT; + + /* shift the digits */ + if (b != 0) { + carry = 0; + shift = DIGIT_BIT - b; + for (x = 0; x < c->used; x++) { + carrytmp = c->dp[x] >> shift; + c->dp[x] = (c->dp[x] << b) + carry; + carry = carrytmp; + } + /* store last carry if room */ + if (carry && x < FP_SIZE) { + c->dp[c->used++] = carry; + } + } + fp_clamp(c); +} + + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mul/fp_mul_comba.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,368 @@ +/* TomsFastMath, a fast ISO C bignum library. + * + * This project is meant to fill in where LibTomMath + * falls short. That is speed ;-) + * + * This project is public domain and free for all purposes. + * + * Tom St Denis, [email protected] + */ + +/* About this file... + +*/ + +#include <tfm.h> + +#if defined(TFM_PRESCOTT) && defined(TFM_SSE2) + #undef TFM_SSE2 + #define TFM_X86 +#endif + +/* these are the combas. Worship them. */ +#if defined(TFM_X86) +/* Generic x86 optimized code */ + +/* anything you need at the start */ +#define COMBA_START + +/* clear the chaining variables */ +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +/* forward the carry to the next digit */ +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +/* store the first sum */ +#define COMBA_STORE(x) \ + x = c0; + +/* store the second sum [carry] */ +#define COMBA_STORE2(x) \ + x = c1; + +/* anything you need at the end */ +#define COMBA_FINI + +/* this should multiply i and j */ +#define MULADD(i, j) \ +asm( \ + "movl %6,%%eax \n\t" \ + "mull %7 \n\t" \ + "addl %%eax,%0 \n\t" \ + "adcl %%edx,%1 \n\t" \ + "adcl $0,%2 \n\t" \ + :"=r"(c0), "=r"(c1), "=r"(c2): "0"(c0), "1"(c1), "2"(c2), "m"(i), "m"(j) :"%eax","%edx","%cc"); + +#elif defined(TFM_X86_64) +/* x86-64 optimized */ + +/* anything you need at the start */ +#define COMBA_START + +/* clear the chaining variables */ +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +/* forward the carry to the next digit */ +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +/* store the first sum */ +#define COMBA_STORE(x) \ + x = c0; + +/* store the second sum [carry] */ +#define COMBA_STORE2(x) \ + x = c1; + +/* anything you need at the end */ +#define COMBA_FINI + +/* this should multiply i and j */ +#define MULADD(i, j) \ +asm ( \ + "movq %6,%%rax \n\t" \ + "mulq %7 \n\t" \ + "addq %%rax,%0 \n\t" \ + "adcq %%rdx,%1 \n\t" \ + "adcq $0,%2 \n\t" \ + :"=r"(c0), "=r"(c1), "=r"(c2): "0"(c0), "1"(c1), "2"(c2), "g"(i), "g"(j) :"%rax","%rdx","%cc"); + +#elif defined(TFM_SSE2) +/* use SSE2 optimizations */ + +/* anything you need at the start */ +#define COMBA_START + +/* clear the chaining variables */ +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +/* forward the carry to the next digit */ +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +/* store the first sum */ +#define COMBA_STORE(x) \ + x = c0; + +/* store the second sum [carry] */ +#define COMBA_STORE2(x) \ + x = c1; + +/* anything you need at the end */ +#define COMBA_FINI \ + asm("emms"); + +/* this should multiply i and j */ +#define MULADD(i, j) \ +asm( \ + "movd %6,%%mm0 \n\t" \ + "movd %7,%%mm1 \n\t" \ + "pmuludq %%mm1,%%mm0\n\t" \ + "movd %%mm0,%%eax \n\t" \ + "psrlq $32,%%mm0 \n\t" \ + "addl %%eax,%0 \n\t" \ + "movd %%mm0,%%eax \n\t" \ + "adcl %%eax,%1 \n\t" \ + "adcl $0,%2 \n\t" \ + :"=r"(c0), "=r"(c1), "=r"(c2): "0"(c0), "1"(c1), "2"(c2), "m"(i), "m"(j) :"%eax","%cc"); + +#elif defined(TFM_ARM) +/* ARM code */ + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +#define MULADD(i, j) \ +asm( \ +" UMULL r0,r1,%6,%7 \n\t" \ +" ADDS %0,%0,r0 \n\t" \ +" ADCS %1,%1,r1 \n\t" \ +" ADC %2,%2,#0 \n\t" \ +:"=r"(c0), "=r"(c1), "=r"(c2) : "0"(c0), "1"(c1), "2"(c2), "r"(i), "r"(j) : "r0", "r1", "%cc"); + +#elif defined(TFM_PPC32) +/* For 32-bit PPC */ + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +/* untested: will mulhwu change the flags? Docs say no */ +#define MULADD(i, j) \ +asm( \ + " mullw 16,%6,%7 \n\t" \ + " addc %0,%0,16 \n\t" \ + " mulhwu 16,%6,%7 \n\t" \ + " adde %1,%1,16 \n\t" \ + " addze %2,%2 \n\t" \ +:"=r"(c0), "=r"(c1), "=r"(c2):"0"(c0), "1"(c1), "2"(c2), "r"(i), "r"(j):"16"); + +#elif defined(TFM_PPC64) +/* For 64-bit PPC */ + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +/* untested: will mulhdu change the flags? Docs say no */ +#define MULADD(i, j) \ +asm( \ + " mulld r16,%6,%7 \n\t" \ + " addc %0,%0,16 \n\t" \ + " mulhdu r16,%6,%7 \n\t" \ + " adde %1,%1,16 \n\t" \ + " addze %2,%2 \n\t" \ +:"=r"(c0), "=r"(c1), "=r"(c2):"0"(c0), "1"(c1), "2"(c2), "r"(i), "r"(j):"r16"); + +#elif defined(TFM_AVR32) + +/* ISO C code */ + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +#define MULADD(i, j) \ +asm( \ + " mulu.d r2,%6,%7 \n\t"\ + " add %0,r2 \n\t"\ + " adc %1,%1,r3 \n\t"\ + " acr %2 \n\t"\ +:"=r"(c0), "=r"(c1), "=r"(c2):"0"(c0), "1"(c1), "2"(c2), "r"(i), "r"(j):"r2","r3"); + +#elif defined(TFM_MIPS) + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +#define MULADD(i, j) \ +asm( \ + " multu %6,%7 \n\t" \ + " mflo $12 \n\t" \ + " mfhi $13 \n\t" \ + " addu %0,%0,$12 \n\t" \ + " sltu $12,%0,$12 \n\t" \ + " addu %1,%1,$13 \n\t" \ + " sltu $13,%1,$13 \n\t" \ + " addu %1,%1,$12 \n\t" \ + " sltu $12,%1,$12 \n\t" \ + " addu %2,%2,$13 \n\t" \ + " addu %2,%2,$12 \n\t" \ +:"=r"(c0), "=r"(c1), "=r"(c2):"0"(c0), "1"(c1), "2"(c2), "r"(i), "r"(j):"$12","$13"); + +#else +/* ISO C code */ + +#define COMBA_START + +#define COMBA_CLEAR \ + c0 = c1 = c2 = 0; + +#define COMBA_FORWARD \ + do { c0 = c1; c1 = c2; c2 = 0; } while (0); + +#define COMBA_STORE(x) \ + x = c0; + +#define COMBA_STORE2(x) \ + x = c1; + +#define COMBA_FINI + +#define MULADD(i, j) \ + do { fp_word t; \ + t = (fp_word)c0 + ((fp_word)i) * ((fp_word)j); c0 = t; \ + t = (fp_word)c1 + (t >> DIGIT_BIT); c1 = t; c2 += t >> DIGIT_BIT; \ + } while (0); + +#endif + +#ifndef TFM_DEFINES + +/* generic PxQ multiplier */ +void fp_mul_comba(fp_int *A, fp_int *B, fp_int *C) +{ + int ix, iy, iz, tx, ty, pa; + fp_digit c0, c1, c2, *tmpx, *tmpy; + fp_int tmp, *dst; + + COMBA_START; + COMBA_CLEAR; + + /* get size of output and trim */ + pa = A->used + B->used; + if (pa >= FP_SIZE) { + pa = FP_SIZE-1; + } + + if (A == C || B == C) { + fp_zero(&tmp); + dst = &tmp; + } else { + fp_zero(C); + dst = C; + } + + for (ix = 0; ix < pa; ix++) { + /* get offsets into the two bignums */ + ty = MIN(ix, B->used-1); + tx = ix - ty; + + /* setup temp aliases */ + tmpx = A->dp + tx; + tmpy = B->dp + ty; + + /* this is the number of times the loop will iterrate, essentially its + while (tx++ < a->used && ty-- >= 0) { ... } + */ + iy = MIN(A->used-tx, ty+1); + + /* execute loop */ + COMBA_FORWARD; + for (iz = 0; iz < iy; ++iz) { + MULADD(*tmpx++, *tmpy--); + } + + /* store term */ + COMBA_STORE(dst->dp[ix]); + } + COMBA_FINI; + + dst->used = pa; + dst->sign = A->sign ^ B->sign; + fp_clamp(dst); + fp_copy(dst, C); +} + +#endif + +/* $Source$ */ +/* $Revision$ */ +/* $Date$ */ +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tomsfastmath/src/mul/fp_mul_comba_12.c Wed Nov 23 18:10:20 2011 +0700 @@ -0,0 +1,111 @@ +#define TFM_DEFINES +#include "fp_mul_comba.c" + +#ifdef TFM_MUL12 +void fp_mul_comba12(fp_int *A, fp_int *B, fp_int *C) +{ + fp_digit c0, c1, c2, at[24]; + + memcpy(at, A->dp, 12 * sizeof(fp_digit)); + memcpy(at+12, B->dp, 12 * sizeof(fp_digit)); + COMBA_START; + + COMBA_CLEAR; + /* 0 */ + MULADD(at[0], at[12]); + COMBA_STORE(C->dp[0]); + /* 1 */ + COMBA_FORWARD; + MULADD(at[0], at[13]); MULADD(at[1], at[12]); + COMBA_STORE(C->dp[1]); + /* 2 */ + COMBA_FORWARD; + MULADD(at[0], at[14]); MULADD(at[1], at[13]); MULADD(at[2], at[12]); + COMBA_STORE(C->dp[2]); + /* 3 */ + COMBA_FORWARD; + MULADD(at[0], at[15]); MULADD(at[1], at[14]); MULADD(at[2], at[13]); MULADD(at[3], at[12]); + COMBA_STORE(C->dp[3]); + /* 4 */ + COMBA_FORWARD; + MULADD(at[0], at[16]); MULADD(at[1], at[15]); MULADD(at[2], at[14]); MULADD(at[3], at[13]); MULADD(at[4], at[12]); + COMBA_STORE(C->dp[4]); + /* 5 */ + COMBA_FORWARD; + MULADD(at[0], at[17]); MULADD(at[1], at[16]); MULADD(at[2], at[15]); MULADD(at[3], at[14]); MULADD(at[4], at[13]); MULADD(at[5], at[12]); + COMBA_STORE(C->dp[5]); + /* 6 */ + COMBA_FORWARD; + MULADD(at[0], at[18]); MULADD(at[1], at[17]); MULADD(at[2], at[16]); MULADD(at[3], at[15]); MULADD(at[4], at[14]); MULADD(at[5], at[13]); MULADD(at[6], at[12]); + COMBA_STORE(C->dp[6]); + /* 7 */ + COMBA_FORWARD; + MULADD(at[0], at[19]); MULADD(at[1], at[18]); MULADD(at[2], at[17]); MULADD(at[3], at[16]); MULADD(at[4], at[15]); MULADD(at[5], at[14]); MULADD(at[6], at[13]); MULADD(at[7], at[12]); + COMBA_STORE(C->dp[7]); + /* 8 */ + COMBA_FORWARD; + MULADD(at[0], at[20]); MULADD(at[1], at[19]); MULADD(at[2], at[18]); MULADD(at[3], at[17]); MULADD(at[4], at[16]); MULADD(at[5], at[15]); MULADD(at[6], at[14]); MULADD(at[7], at[13]); MULADD(at[8], at[12]); + COMBA_STORE(C->dp[8]); + /* 9 */ + COMBA_FORWARD; + MULADD(at[0], at[21]); MULADD(at[1], at[20]); MULADD(at[2], at[19]); MULADD(at[3], at[18]); MULADD(at[4], at[17]); MULADD(at[5], at[16]); MULADD(at[6], at[15]); MULADD(at[7], at[14]); MULADD(at[8], at[13]); MULADD(at[9], at[12]); + COMBA_STORE(C->dp[9]); + /* 10 */ + COMBA_FORWARD; + MULADD(at[0], at[22]); MULADD(at[1], at[21]); MULADD(at[2], at[20]); MULADD(at[3], at[19]); MULADD(at[4], at[18]); MULADD(at[5], at[17]); MULADD(at[6], at[16]); MULADD(at[7], at[15]); MULADD(at[8], at[14]); MULADD(at[9], at[13]); MULADD(at[10], at[12]); + COMBA_STORE(C->dp[10]); + /* 11 */ + COMBA_FORWARD; + MULADD(at[0], at[23]); MULADD(at[1], at[22]); MULADD(at[2], at[21]); MULADD(at[3], at[20]); MULADD(at[4], at[19]); MULADD(at[5], at[18]); MULADD(at[6], at[17]); MULADD(at[7], at[16]); MULADD(at[8], at[15]); MULADD(at[9], at[14]); MULADD(at[10], at[13]); MULADD(at[11], at[12]); + COMBA_STORE(C->dp[11]); + /* 12 */ + COMBA_FORWARD; + MULADD(at[1], at[23]); MULADD(at[2], at[22]); MULADD(at[3], at[21]); MULADD(at[4], at[20]); MULADD(at[5], at[19]); MULADD(at[6], at[18]); MULADD(at[7], at[17]); MULADD(at[8], at[16]); MULADD(at[9], at[15]); MULADD(at[10], at[14]); MULADD(at[11], at[13]); + COMBA_STORE(C->dp[12]); + /* 13 */ + COMBA_FORWARD; + MULADD(at[2], at[23]); MULADD(at[3], at[22]); MULADD(at[4], at[21]); MULADD(at[5], at[20]); MULADD(at[6], at[19]); MULADD(at[7], at[18]); MULADD(at[8], at[17]); MULADD(at[9], at[16]); MULADD(at[10], at[15]); MULADD(at[11], at[14]); + COMBA_STORE(C->dp[13]); + /* 14 */ + COMBA_FORWARD; + MULADD(at[3], at[23]); MULADD(at[4], at[22]); MULADD(at[5], at[21]); MULADD(at[6], at[20]); MULADD(at[7], at[19]); MULADD(at[8], at[18]); MULADD(at[9], at[17]); MULADD(at[10], at[16]); MULADD(at[11], at[15]); + COMBA_STORE(C->dp[14]); + /* 15 */ + COMBA_FORWARD; + MULADD(at[4], at[23]); MULADD(at[5], at[22]); MULADD(at[6], at[21]); MULADD(at[7], at[20]); MULADD(at[8], at[19]); MULADD(at[9], at[18]); MULADD(at[10], at[17]); MULADD(at[11], at[16]); + COMBA_STORE(C->dp[15]); + /* 16 */ + COMBA_FORWARD; + MULADD(at[5], at[23]); MULADD(at[6], at[22]); MULADD(at[7], at[21]); MULADD(at[8], at[20]); MULADD(at[9], at[19]); MULADD(at[10], at[18]); MULADD(at[11], at[17]); + COMBA_STORE(C->dp[16]); + /* 17 */ + COMBA_FORWARD; + MULADD(at[6], at[23]); MULADD(at[7], at[22]); MULADD(at[8], at[21]); MULADD(at[9], at[20]); MULADD(at[10], at[19]); MULADD(at[11], at[18]); + COMBA_STORE(C->dp[17]); + /* 18 */ + COMBA_FORWARD; + MULADD(at[7], at[23]); MULADD(at[8], at[22]); MULADD(at[9], at[21]); MULADD(at[10], at[20]); MULADD(at[11], at[19]); + COMBA_STORE(C->dp[18]); + /* 19 */<