Ossa Sepia

August 8, 2021

Begin non-pgp message

Filed under: Coding,EuCrypt — Diana Coman @ 1:33 pm

$ ./keys encrypt plain.txt data/keys/*****************.pub

-----BEGIN NON-PGP MESSAGE-----
aae0eecc27c78efb65ddb0cb977330ca588796353d3634eaeaca0482db312a6405aa044dd38ec3b7
22d22d523f30911d660898404b164d3e0a7dbb09bf977183823818dce811a7d8a62125f3b6abd278
7148f03de27860f77e33c09e5916dd06943a11562ca6dc38dd7406133878c08b6727c9883208a173
665faaf51c58d00c0c3fdb25defe48394f918b075bb1a48d684b6c4ccb0c2d0e94ef96c1057bbfcd
b05001ce01d50f6af8c49122ab765f148af228fbc89d2c7f5accaff3499447d3a10e8c034ef44e9a
956e7732c453cb040b019f6d239a0d992523cf11d5aff816d59ac7f066d1f0a315340a9eff2bd182
a8a7ade55b937b216133f0e2de80cae63b5aa1776e7788437cf83af2706afad42d567f0a61ecfc58
73453e5a72f1a311d2914f1aabe0ec855c270cd12c65a2b086eb5a4457c026f8d44a4a72ca102826
2aba52d483d6ccd3a3a5dcfb0ad1c1f3fdad6769ed1b0df529c3936125c1192a7117a26b20af3a64
1b712db591e4495360531571ec999f183f639a03edebb0a0d2427f0158077822e5b955a5179cf36d
5853b16c60ebf6060cb4347488f3b41a820be0cb91cb90c108106c6c196f2ebf2c1dcbfed870572f
939973dad085af441e1fd15989331694910e1d8ed731b895df91c09ff80ef86392ad00e6f2564122
67712a5b64fb440631cc
-----END NON-PGP MESSAGE-----

Need I say more?

July 8, 2020

EuCrypt addition: Keccak File Hashing

Filed under: Coding,EuCrypt — Diana Coman @ 1:37 pm

Since the client data model includes Keccak hashes for files received from Eulora’s server and the files themselves may be of any size whatsoever, it follows that both client and server have to be able to use EuCrypt’s Keccak sponge sequentially too – basically feeding it data in chunks as opposed to all in one go, as a single input (be it bitstream or bytestream). 1 So I got my Ada reference book out on the desk again and it turns out that it’s not even all that difficult really – even though this addition has to be yet another package on top of the existing ones in EuCrypt, mainly because such type of use breaks by definition the stateless guarantee provided by the “single input” use and in turn, this would then propagate to all code that uses the Keccak package (and that’s the encryption scheme, mainly). Therefore, rather than forcing now stateful code everywhere just because there’s a need to calculate hashes for files on the disk, I simply provide this file-hashing as a separate package using the same underlying sponge. Quite as it should even be, I would say: there is only one sponge implementation but there are now two options to using it for hashing, namely a stateless one for data that is held entirely in memory (the one that existed already) and a stateful one for data that is fed sequentially (my new code). As this is now implemented, tested and integrated into the respective parts on both client and server, I’d rather take the time and write it down as well, to unload it and have it all in one place.

The approach here is very straightforward: keep a sponge’s state locally, read from the input file blocks as big as the sponge can handle in one go (and add if needed padding to the last block), pass each block as soon as read on to the sponge, scramble the state and repeat until the whole file has been processed; then squeeze out of the sponge a block and return its first 8 octets given that the hash is meant to be that size. Ada’s Sequential_IO package simply needs to be provided with the type of element to read and then it works like any other file input/output, without any trouble. In principle, the most effective implementation would be to read a whole block (ie as many octets as the sponge can absorb in one go) each time but this means that one has to handle at file reading time the special case of the last block that may be incomplete. For now at least I preferred to sidestep this and I went instead for the cheap and angry solution that seems however perfectly adequate for current needs: simply read a file octet by octet, so that there is no special case at all. Here’s the code that does it all:

  -- for reading files one Octet at a time
  package Octet_IO is new Ada.Sequential_IO(Element_Type =>
                                              Interfaces.Unsigned_8);

  function Hash_File(Filename: in String; Hash: out Raw_Types.Octets_8)
      return Boolean is
    F: Octet_IO.File_Type;
    S: Keccak.State := (others => (others => 0));
    Block_Len: Keccak.Keccak_Rate := Keccak.Default_Byterate;
    Block: Keccak.Bytestream(1..Block_Len);
    Pos: Keccak.Keccak_Rate;
  begin
    Octet_IO.Open(F, Octet_IO.In_File, Filename);
    -- check that this is not an empty file as hashing of that is nonsense
    if Octet_IO.End_Of_File(F) then
      Octet_IO.Close(F); -- close it before returning!
      return False;
    end if;

    -- read from file and absorb into the sponge
    while not Octet_IO.End_Of_File(F) loop
      -- read block by block
      Pos := 1;
      while Pos <= Block_Len and (not Octet_IO.End_Of_File(F)) loop
        Octet_IO.Read(F, Block(Pos));
        Pos := Pos + 1;
      end loop; -- single block loop
      -- if it's an incomplete block, it needs padding
      if Pos <= Block_Len then
        -- pad it with 10*1
        Block(Pos..Block'Last) := (others => 0);
        Block(Pos) := 1;
        Block(Block'Last) := Block(Block'Last) + 16#80#;
      end if;
      -- here the block is complete, padded if needed.
      -- absorb it into the state
      Keccak.AbsorbBlock( Block, S);
      -- scramble state
      S := Keccak.Keccak_Function( S );
    end loop; -- full file loop
    Octet_IO.Close(F);

    -- now squeeze a block and get the 8 octets required
    Keccak.SqueezeBlock( Block, S);
    Hash := Block(1..8);

    -- if it got here, all is well, return true.
    return True;
  exception
    when others =>
      Octet_IO.Close(F);
      return False;
  end Hash_File;

Note that the above returns the *raw* Keccak hash, meaning the direct output of the sponge, as a set of octets. This is normally fine and well but if one specifies the hash as “unsigned 64”, it means that the above set of octets has to be interpreted as a number – and in turn, this means that byte/bit order matters. Since this can and does create confusion quite easily, I’ll state it here plainly again: the raw output of the Keccak sponge is MSB/b, meaning that on a little endian machine, you’ll need to flip both bytes and bits if you want to get the exact same number as you would on a big endian machine! Since this is however something that I already sorted out before, I added to the above convenient wrappers to do this properly so that the whole code should work seamlessly on both big and little endian computers anyway:

  function Hash_File(Filename: in String; Hash: out Interfaces.Unsigned_64)
      return Boolean is
    Raw: Raw_Types.Octets_8;
  begin
    -- calculate the raw hash and then convert it
    if Hash_File(Filename, Raw) then
      Hash := Hash2Val(Raw);
      return True;
    else
      return False;
    end if;
  end Hash_File;
  function Hash2Val( Raw: in Raw_Types.Octets_8 )
      return Interfaces.Unsigned_64 is
    B8: Raw_Types.Octets_8;
    U64: Interfaces.Unsigned_64;
  begin
    -- convert to U64 (NB: no need to squeeze etc, as block_len has to be > 8)
    -- for this to remain consistent on both little and big endian machines:
    -- on little endian, octets and bits need to be flipped before conversion
    if Default_Bit_Order = Low_Order_First then
      for BI in 1..8 loop
        B8(BI) := Keccak.Reverse_Table(Natural(Raw(Raw'First+8-BI)));
      end loop;
    else --simply copy as it is
      B8 := Raw(1..8);
    end if;
    -- convert and return
    U64 := Raw_Types.Cast(B8);
    return U64;
  end Hash2Val;

Using the above, one can now test Keccak hashes of files both raw and in the more usual numerical format (e.g. hex as given for instance by the keksum implementation). More importantly for me, Eulora’s client can now check the hash of a received file when it gets the last chunk of it and therefore it can decide whether the whole thing is any good or not! I’m quite happy to inform as well that initial tests are running fine – the data acquisition part of the client successfully requests files and receives them apparently unmolested (even when there are quite a few chunks, so far I tested with several thousands meaning up to 10MB files), writing them neatly to disk exactly as and where intended. Hooray!

The above out of the way, the uglier next step is to get that hideous gui-code to actually *use* those files properly, too!

  1. This is not something entirely new, since vtools for instance has previously adapted my Keccak implementation to a similar use, in order to calculate the hashes for vpatches. However, the approach taken there (by phf) apparently aimed to wrap the Ada implementation for C/CPP use and I don’t want this at all. First, I’d much rather move C/CPP code to Ada if/when possible than the other way around. Second, there is absolutely no good reason in this case to force any C/CPP code in the mix since Ada actually provides all that is needed to interact with files on the disk without any trouble whatsoever.[]

March 15, 2019

EuCrypt Chapter 16: Bytestream Input/Output Keccak

Filed under: EuCrypt — Diana Coman @ 4:30 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Keccak 1 suffers from a bit/byte issue: while internally the actual transformations work at byte-level, the input is taken bit by bit and expected to be in LSB order 2 while the output is offered again bit by bit but coming out in MSB 3 order if taken byte by byte. Moreover, the padding applied to any input to bring it to a convenient length is again defined – and even applied – at bit level rather than byte level. While originally I discussed this issue in more detail and systematised the options available for Keccak to get some clarity and make a decision on either bit-level or byte-level, the actual implementation followed as closely as possible the original specification retaining bitstream (i.e. bit by bit) input and output while doing internally the transformations at byte level, as confusing as that was. As soon as this implementation was put to actual use though, it became clear that the bitstream part really has to go because it causes huge waste (8x stack-allocated space for any input, quite correctly described as exploding) and trouble in the form of overflowing the stack even for relatively small inputs. So this is the promised .vpatch that updates Keccak to work on bytestream input and produce bytestream output, getting rid of the x8 waste and effectively choosing options 1.1, 2.2 and 3.2.

The obvious change is to convert all the bit* to byte*. This includes constants such as the Keccak rate that is now expressed in number of octets, types such as bitstream that becomes bytestream and functions such as BitsToWord that becomes BytesToWordLE. Note that the internals of the Keccak sponge (i.e. the transformations) are unchanged since they weren’t working at bit-level anyway. The less obvious change is the addition of bit-reversing (via lookup table since it’s fastest) – this is needed to ensure that Keccak receives and produces at octet-level the same values as it did at bit-level. Specifically, this means that input values on Big Endian iron will be bit-reversed for input (since input is expected LSB) and obtained output values on Little Endian iron will be bit-reversed (since output is extracted MSB). It’s ugly but so far it’s the only option that doesn’t change the Keccak hash essentially.

The .vpatch contains those main changes:

  • Bit reversing: there is a lookup table with the corresponding bit-reversed values for any byte (i.e. 0-255 values mapped to corresponding values with the other bit-order convention). So the bit-reversing of a byte is simply a matter of reading the value from the table at that index.
  • Input: bytestream instead of bistream so effectively an array of octets. Because of Keccak’s LSB expectation re input bits, the BitsToWord function became BytesToWordLE meaning that it will reverse the bits on Big Endian Iron. The Padding is also expressed at byte level in LSB format (so the last byte is 0x80 rather that 0x01.
  • Output: bytestream instead of bitstream. Because of Keccak spitting out MSB when output is extracted at byte level, the WordToBits function became WordToBytesBE meaning that it will flip the bits on little endian so that the iron sees the same value as a big endian iron would.
  • Tests: there is an additional test that checks the values in the bit-reversing table for correctness, effectively calculating each of them and comparing to the constant; other than this, the same tests are simply updated to use bytestream/bytes as input and output; as in the previous version, there are also tests for calculating a hash or using Keccak on a String, quite unchanged.

The .vpatch for the above can be found on my Reference Code Shelf and is linked here too, for your convenience.

As I don’t have any Big Endian iron around, I couldn’t test the above on anything other than Little Endian so if you are looking for something easy to help with, you’ve found it: kindly test and report here in the comments or on your blog (a pingback will be enough or otherwise comment here with a link).

  1. As originally specified by Bertoni et al.[]
  2. Least significant bit.[]
  3. Most significant bit.[]

October 10, 2018

EuCrypt Chapter 14: CRC32 Implementation with Lookup Table

Filed under: Coding,EuCrypt — Diana Coman @ 1:51 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

The communication protocol for Eulora uses CRC32 as checksum for its packages and so I found myself looking in disbelief at the fact that GNAT imposes the use of Streams for something as simple as calculating the checksum of anything at all, no matter how small no matter what use, no matter what you might need or want or indeed find useful, no matter what. No matter! As usual, the forum quickly pointed me in the right direction – thank you ave1! – namely looking under the hood of course, in this case GNAT’s own hood, the Systems.CRC32 package. Still, this package makes a whole dance of eating one single character at a time since it is written precisely to support the stream monstrosity on top rather than to support the user with what they might indeed need. Happily though, CRC32 is a very simple thing and absolutely easy to lift and package into 52 lines of code in the .adb file + 130 in the .ads file so 182 all in total 1, comments and two types of input (string or raw array of octets) included. And since a CRC32 implementation is anyway likely to be useful outside of Eulora’s communication protocol too, I’m adding it as a .vpatch on the EuCrypt tree where it seems to fit best at the moment. It’s a lib on its own as “CRC32” when compiled via its own crc32.gpr or otherwise part of the aggregate EuCrypt project if you’d rather compile all of EuCrypt via its eucrypt.gpr.

My CRC32 lib uses the 0x04C11DB7 generator polynomial and a lookup table with pre-calculated values for faster execution. As Stanislav points out, implementing the actual division and living with the slow-down incurred is not a huge problem but at least for now I did not bother with it. The CRC32 lib provides as output 32 bits of checksum for either a string or an array of octets. At the moment at least I really don’t see the need for anything more complicated than this – even the string-input method is more for the convenience of other/later uses than anything else. For my own current work on Eulora’s protocol I expect to use CRC32 on arrays of octets directly.

The .vpatch adding CRC32 to EuCrypt and my signature for it are as usual on my Reference Code Shelf with additional links here for your convenience:

  1. Specifically: 52 lines is the count of crc32.adb that does the work. The .ads file brings in another 130 lines that are mostly the lookup table with pre-calculated values. The .gpr file has another 61 lines and the restrict.adc another 80 lines.[]

June 15, 2018

EuCrypt Manifest File

Filed under: EuCrypt — Diana Coman @ 2:09 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

This little patch simply adds a manifest file to EuCrypt so that the intended order of patches is made explicit, easily accessible and easily maintained throughout the life of the project even as new patches are added by others (hopefully). I’ve created this file following the manifest format specification proposed by Michael Trinque. For each patch, I used the block count 1 as reported by mimisbrunnr on the day when I published the patch.

Since I publish this as a .vpatch on top of existing EuCrypt itself, there is of course a line in the manifest file for this vpatch too. To keep it nicely flowing from the previous last leaf of EuCrypt, there is an additonal change to the README file of the project as well (since it’s this file I have used so far, before the manifest solution was adopted, as an implicit way of forcing some meaningful order on the vpatches).

As usual, the .vpatch and its signatures can be found on my Reference Code Shelf and are also linked directly from here for your convenience:

  1. Starting with last round number on that day so as to have some space for the case when there were several patches on same day.[]

May 3, 2018

EuCrypt Chapter 13: SMG RNG

Filed under: EuCrypt — Diana Coman @ 3:01 pm

Motto: It’s just the int interpretation of a float’s binary representation.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

This unexpected chapter equips EuCrypt with more convenient ways of using the excellent Fuckgoats (FG) for generating random numbers in various formats. The backstory is a heartwarming (if short) tale of idiocy stabbing itself in its own useless foot, so let’s indulge for a moment.

As you might be aware, cryptography algorithms such as RSA and games such as Eulora rely to a significant extent on randomly generated numbers. As you might not fully appreciate though, the usual “random number generator” available on your computer is at best pseudo-random but the pseudo – here starts the idiocy – is not mentioned much, preferably not at all. So one gets /dev/random, /dev/urandom but never, ever, ever /dev/pseudorandom. One might say “what’s in a name” 1 and attempt to sort-of still work with it – let’s just feed /dev/random with truly random bits from the FG and keep it around the house, as it might still be useful, why not? But of course the whole thing is made so that there isn’t in fact a clear and straightforward way to even feed it better inputs – in other words, through its own working, /dev/random managed to push itself into being discarded even sooner rather than later. Isn’t that… nice?

Once the silly idea of still using /dev/random in some way was thus abandoned as it should be, the only remaining thing was to provide some reliable ways of directly converting the raw random bits that FGs produce into some actual numbers of the sort that a user (a human, a game or anything else) needs in practice. Obviously, there can never be a full list of what sort of formats and ranges of numbers one might want, so this is by no means a comprehensive set or the only set of formats, ranges and even ways of converting FG bits into random numbers. It’s simply a starter set of methods that are useful to S.MG at this time and may be perhaps useful to you at some time. As always, before using them, first read and understand their working and their limitations so that you can actually decide whether they meet your needs or not. And if they don’t, either change them into what you need them to be or – even better yet – add to them your own methods that do exactly what you need them to do.

A random number generator based on FG would logically be a library on its own so that it can be easily used by any code – even without importing the whole EuCrypt if it doesn’t need the rest. However, at the moment, the new code is simply an addition to truerandom.c that is part of smg_rsa, as it effectively builds on the other methods in there (those for accessing FGs specifically). I refrained from extracting the RNG into a separate module at this time because of a lack of tools: current vdiff does not know about simply moving code without changing it. As soon as the new vtools are ready and fully functional, it should be an easy task to simply move the relevant code to a different module with a short and easy to read vpatch (as opposed to the large delete & add vpatch that would result now from such a simple move). Until then at least, smg_rng will effectively be part of smg_rsa.

There are 4 new methods for obtaining random numbers using bits from a source of entropy 2, providing the following types and ranges of numbers:

  • an unsigned integer between 0 and 2^32 – 1 (i.e. on precisely 32 bits): this method reads 32 bits from the specified entropy source and then copies them directly to the memory address of an uint32_t (unsigned integer on 32 bits) variable that is the result. Although working at bit-level, the method doesn’t really care about endianness, since the bits are random anyway: sure, same bits will be interpreted as a different value by your Big Endian iron compared to your Little Endian iron but that doesn’t make the series obtained with successive calls to this method from *the same machine* any less random.
  • an unsigned integer between 0 and 2^64 – 1 (i.e. on precisely 64 bits): this method is just like the above, but for larger numbers (64 bits instead of 32); specifically, this method reads 64 bits from the specified entropy source and then copies them directly to the memory address of an uint64_t (unsigned integer on 64 bits) variable that is the result. Although working at bit-level, the method doesn’t really care about endianness, since the bits are random anyway: sure, same bits will be interpreted as a different value by your Big Endian iron compared to your Little Endian iron but that doesn’t make the series obtained with successive calls to this method from *the same machine* any less random.
  • a “dirty” float between 0 and 1: this method obtains first a 32-bit random integer value with the relevant method above and then divides this (as a float) by the maximum possible value on 32 bits (2^32 – 1). The resulting float is deemed “dirty” from a randomness perspective because of the way in which floating points are stored: basically there will be some unpredictable rounding of the least significant bits so use this *only* if this degradation is insignificant to you.
  • a float between 1 and 2, assuming the IEEE 754/1985 internal representation: unlike the “dirty” float method, this one directly writes the random bits obtained from FG to the mantissa part of a float number in memory (it also sets the sign bit to 0 and the exponent to 127 effectively ensuring that the result is read as a positive float with value 1.mantissa). For clarity reasons, the code of this method actually writes first 32 random bits at the address of the desired float and then simply goes in and sets the sign to 0 and the exponent to 127, leaving the rest (i.e. the mantissa) with the random bits. Because of this direct bit-diddling, this method has to take into account the endianness: on Big Endian, the sign and exponent are the first two octets at the address of the float, while on Little Endian the exponent and sign are the last 2 octets. To handle this, the method first calls a little bit of new code that tests the endianness and then it sets accordingly the relevant offsets for the 2 octets that need diddling.

The signatures of the above methods as well as the new snippet of code for testing endianness at run-time are added to the relevant header file, eucrypt/smg_rsa/include/smg_rsa.h, together with the usual comments to help the reader understand the code.

The run-time test of endianness:

/* A way to determine endianness at runtime.
 * Required for diddling a float's mantissa for instance.
 */
static const int onect = 1;
#define is_bigendian() ( (*(char*)&onect) == 0 )

The signatures of rng methods:

/* Returns (in parameter *n) a *potentially biased* random float between 0 and 1
 * Uses bits from ENTROPY_SOURCE but it rounds when converting to float
 * NB: This function rounds impredictably.
       Use it ONLY if LSB normalization is insignificant to you!
 * This function uses rng_uint64 below.
 *
 * @param n - a float value (LSB rounded) between 0 and 1, obtained using
 *            a 64-bit random integer (64 bits from ENTROPY_SOURCE)
 * @return  - a positive value on success and a negative value in case of error
 *            main possible cause for error: failure to open ENTROPY_SOURCE.
 *            NB: a non-responsive/malconfigured source can result in blocking
 */
int rng_dirty_float(float *n);


/* Returns (in parameter *n)  a randomly generated float between 1 and 2 using:
 *    - the IEEE 754/1985 format for single float representation
 *    - ENTROPY_SOURCE to obtain bits that are *directly* used as mantissa
 * NB: this value is between 1 and 2 due to the normalised format that includes
 *     an implicit 1 ( i.e. value is (-1)^sign * 2^(e-127) * (1.mantissa) )
 *
 * From IEEE 754/1985, a description of the single float format:
 *   msb means most significant bit
 *   lsb means least significant bit
 *  1    8               23            ... widths
 * +-+-------+-----------------------+
 * |s|   e   |            f          |
 * +-+-------+-----------------------+
 *    msb lsb msb                 lsb  ... order

 * A 32-bit single format number X is divided as shown in the figure above. The
 * value v of X is inferred from its constituent fields thus:
 * 1. If e = 255 and f != 0 , then v is NaN regardless of s
 * 2. If e = 255 and f = 0 , then v = (-1)^s INFINITY
 * 3. If 0 < e < 255 , then v = (-1)^s * 2^(e-127) * ( 1.f )
 * 4. If e = 0 and f != 0 , then v = (-1)^s * 2^(-126) * ( 0.f ) (denormalized
 *    numbers)
 * 5. If e = 0 and f = 0 , then v = ( -1 )^s * 0 (zero)
 *
 * @param n - the address of an IEEE 754/1985 float: its mantissa will be set to
 *              random bits obtained from ENTROPY_SOURCE; its sign will be set
 *              to 0; its exponent will be set to 127 (the bias value so
 *              that the actual exponent is 0).
 * @return  - a positive value on success and a negative value in case of error
 *              main possible cause for error: failure to open ENTROPY_SOURCE.
 *              NB: a non-responsive/malconfigured source can result in blocking
 */
int rng_float_754_1985(float *n);

/* Returns (in parameter *n) a random unsigned integer value on 32 bits.
 * Uses random bits from ENTROPY_SOURCE that are directly interpreted as int
 *
 * @param n - it will contain the random integer obtained by interpreting 32
 *            bits from ENTROPY_SOURCE as an unsigned int value on 32 bits.
 * @return  - a positive value on success and a negative value in case of error
 */
int rng_uint32( uint32_t *n );

/* Returns (in parameter *n) a random unsigned integer value on 64 bits.
 * Uses random bits from ENTROPY_SOURCE that are directly interpreted as int
 *
 * @param n - it will contain the random integer obtained by interpreting 64
 *            bits from ENTROPY_SOURCE as an unsigned int value on 64 bits.
 * @return  - a positive value on success and a negative value in case of error
 */
int rng_uint64( uint64_t *n );

The actual implementation of the above methods is in eucrypt/smg_rsa/truerandom.c:

int rng_dirty_float(float *n) {
  int status;   /* for aborting in case of error */
  uint32_t r;   /* a random value on 32 bits */
  uint32_t maxval = 0xffffffff; /* maximum value on 32 bits */

  /* obtain a random number on 32 bits using ENTROPY_SOURCE */
  status = rng_uint32( &r );
  if ( status < 0 )
    return status;

  /* calculate and assign the floating-point random value as (r*1.0)/max val */
  /* multiplication by 1.0 IS NEEDED to do float division rather than int div*/
  *n = ( r * 1.0 ) / maxval;

  return 1;
}

int rng_float_754_1985(float *n) {
  /* Single float ieee 754/1985 has 23 bits that can be set for the mantissa
   * (and one implicit bit=1).
   * Full single float ieee 754/1985 representation takes 4 octets in total.
   */
  int noctets = 4; /* number of octets to read from ENTROPY_SOURCE */
  int nread;       /* number of octets *read* from ENTROPY_SOURCE  */
  unsigned char bits[ noctets ]; /* the random bits from ENTROPY_SOURCE */
  int oSignExp, oExpM;/* offsets for sign+exponent octet, exponent+mantissa*/

  /* obtain random bits */
  nread = get_random_octets( noctets, bits );

  if (nread != noctets )
    return -1;  /* something wrong at reading from ENTROPY_SOURCE, abort */

  /* set offsets for bit diddling depending on endianness of iron */
  if (is_bigendian()) {
    oSignExp = 0;
    oExpM = 1;
  }
  else {
    oSignExp = 3;
    oExpM = 2;
  }

  /* set sign=0; exponent=127; explicit mantissa = random bits (23 bits) */
  *(bits+oExpM) = *(bits+2) | 0x80;  /* one bit of exponent set */
  *(bits+oSignExp) = 0x3f;           /* sign=0; exponent bits for 127 */

  /* now copy the bits to the result var (i.e. as a float's representation */
  memcpy( n, bits, noctets );
  return 1;
}

int rng_uint32( uint32_t *n ) {
  int noctets = 4;  /*  32 bits aka 4 octets to read from ENTROPY_SOURCE     */
  int nread;        /*  the number of octets read from ENTROPY_SOURCE        */
  unsigned char bits[ noctets ]; /* for storing the bits from ENTROPY_SOURCE */

  /* read random 32 bits from ENTROPY_SOURCE */
  nread = get_random_octets( noctets, bits );
  if ( nread != noctets )
    return -1;

  /* copy the random bits to n, to be interpreted as uint32 */
  /* endianness is irrelevant here - the bits are random anyway */
  memcpy( n, bits, noctets );

  return 1;
}

int rng_uint64( uint64_t *n ) {
  int noctets = 8;  /*  64 bits aka 8 octets to read from ENTROPY_SOURCE     */
  int nread;        /*  the number of octets read from ENTROPY_SOURCE        */
  unsigned char bits[ noctets ]; /* for storing the bits from ENTROPY_SOURCE */

  /* read random 64 bits from ENTROPY_SOURCE */
  nread = get_random_octets( noctets, bits );
  if ( nread != noctets )
    return -1;

  /* copy the random bits to n, to be interpreted as uint64 */
  /* endianness is irrelevant here - the bits are random anyway */
  memcpy( n, bits, noctets );

  return 1;
}

As usual, for every new piece of code, there are also new tests. In this case, the tests are very basic, simply calling each method and reporting both the resulting number and the status code (i.e. whether the method reported any errors). Note that assessing the outputs of a random number generator is way beyond the scope of those basic tests - you should devise and use your own tools for evaluating (to a degree that is satisfying for yourself) the resulting sequence of numbers obtained through repeated calls to any or all of these methods. The new tests in eucrypt/smg_rsa/tests/tests.c:

void test_dirty_float_rng( int nruns ) {
  int i, status;
  float dirty;

  printf("Running test for smg rng dirty float with %d runsn", nruns);
  for (i=0; i0 ? "OK" : "FAIL");
  }
}

void test_ieee_float_rng( int nruns ) {
  int i, status;
  float ieee;

  printf("Running test for smg rng ieee 745/1985 float with %d runsn", nruns);
  for (i=0; i0 ? "OK" : "FAIL");
  }
}

void test_uint32_rng( int nruns ) {
  int i, status;
  uint32_t n;

  printf("Running test for smg rng unsigned int32 with %d runsn", nruns);
  for (i=0; i0 ? "OK" : "FAIL");
  }
}

void test_uint64_rng( int nruns ) {
  int i, status;
  uint64_t n;

  printf("Running test for smg rng unsigned int64 with %d runsn", nruns);
  for (i=0; i0 ? "OK" : "FAIL");
  }
}

The .vpatch for the above smg rng implementation can be found on my Reference Code Shelf as well as below:

  1. A full backstory, at the very least! But let’s not digress even more from the code…[]
  2. The source of entropy is assumed to be a FG but – unlike /dev/urandom – you can change this quite easily.[]

March 15, 2018

EuCrypt: Additional Check and a Note on Cosmetic Changes

Filed under: EuCrypt — Diana Coman @ 6:46 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

A very useful comment by Arjen (ave1) at my previous post on EuCrypt prompted me to have another look at all calls for random bits from the entropy source, throughout EuCrypt. As Arjen mentioned, it turns out that there is one lonely call for which the potential error flag (the number of octets read) is not directly checked by the caller. Since this is a sensitive issue (the call is in primegen.c precisely when attempting to find randomly a large prime number), I am adding this missing check there. As a result, EuCrypt’s behaviour is now entirely uniform in this respect: any time EuCrypt requests a number of random bits from the entropy source, it does so in a loop, discarding any output that is less than/different from what it requested and insisting with the requests until satisfied.

The .vpatch contains the small change to the code, basically a loop around the request for random bits from the entropy source. To ensure that the whole V tree of EuCrypt remains neatly with a single leaf, namely this patch, I added also a short comment to the README file of EuCrypt stating this design principle of trying for as long as needed: “NB: EuCrypt aims to *keep trying* to accomplish a task. In particular, when entropy is needed, it will keep asking for as many random bits as it needs from the configured entropy source and it will not proceed unless those are provided.” The change to the code is clear enough from the .vpatch itself:

+  int nread;
 	do {
-		get_random_octets_from( noctets, p, entropy_source );
+    do {
+      nread = get_random_octets_from( noctets, p, entropy_source );
+    } while ( nread != noctets );

Note that one can legitimately raise the concern (as Arjen did) that an inaccessible/in error state entropy source can therefore result in an infinite loop. This is true and it is intended behaviour: the entropy source is part of the environment in which EuCrypt runs and looking after it is outside of the scope of EuCrypt itself. Moreover, a working (and reliable) entropy source is simply a crucial pre-requisite for the tasks of EuCrypt and therefore I do not want it to proceed with anything at all when this pre-requisite is not met. An alternative solution would be to simply abort whatever EuCrypt was trying to do, as soon as the entropy source fails to provide the requested bits. However, this effectively means that EuCrypt will need to be restarted even if the environment is set up so that access to the entropy source is re-established. Basically this would add the restarting of EuCrypt and potential recovery of previous work as an additional task for the environment manager (whatever that might be) and I don’t see at the moment the case for this. I’d much rather have EuCrypt keep trying, simply and stubbornly until the conditions are met and it can proceed with what it had to do. Should the caller prefer otherwise, they can of course have their own mechanisms in place and make their own decisions as to recovery options if the call to EuCrypt takes longer than they are willing to wait for. What EuCrypt promises is simple: it will NOT proceed with fewer random bits than it needs and it will KEEP trying to get them for as long as it takes or until it gets killed by the caller/outside forces.

It might be worth mentioning at this point that the design decision to re-open the entropy source for requests that are not made by the same caller / block of code provides also a bit of support for this approach of keep-trying-until-it-works-no-matter-what: as long as the string identifier for the entropy source is the same, the physical source can change seamlessly between uses by EuCrypt.

On a different, lighter note, having looked at a potential cosmetic patch to the whole library to replace all tabs with spaces and to ensure *everywhere* strict adherence to the coding style 1 introduced midway through development, I can say that such a patch will weigh in at more than 2k lines for non-mpi components only, going above 9k lines when mpi is included as well. As such, I’m really in two minds whether it would be more useful than annoying. Perhaps have your say on this matter if you feel strongly about the tabs-to-spaces and 80 columns rule. Before you do, note however that all code written after those rules were adopted is already following them, of course. This being said, a review of the code is good at any time and the need for cosmetic changes is as good a time as any if not even better than many.

The eucrypt_check_nread.vpatch and my signature for it are as usual on my Reference Code Shelf and linked here as well, for your convenience:

  1. maximum 80 columns per line and various alignments[]

March 8, 2018

EuCrypt: Compilation Sheet

Filed under: EuCrypt — Diana Coman @ 2:02 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

First of all, you’ll need V, the republican versioning system, there is no way around this and I won’t provide any way around it either. Grab yourself a copy of V and learn to use it or implement your own copy, as you prefer. If you need some help with that, go first through the gentle introduction to V by Ben Vulpes and if you are still stuck after that come and ask intelligent questions in #trilema on irc.

Once you have V in working order, head to my Reference Code Shelf and download the .vpatches for EuCrypt and their signatures. Alternatively, you should be able to get the same from btcbase’s mirror of EuCrypt, possibly 1 with some other signatures as well if that helps. Press to the .vpatch you want, but make sure you do include the fixes at least for the components you are interested in. Just saying. Then you’re ready for building the lib itself.

EuCrypt can be built in 2 main ways:

  • as a single, aggregate library, including therefore all the components: mpi, smg_keccak, smg_bit_keccak, smg_rsa, smg_serpent; you’ll get one static library with everything, ready to be used; current size of the resulting file is 215K when built with AdaCore’s GNAT 2016 and gcc 4.9.4.
  • component by component, picking and choosing only what you need; current sizes when built with AdaCore’s GNAT 2016 and gcc 4.9.4: mpi 109K; smg_bit_keccak 17K; smg_keccak 42K; smg_rsa 19K; smg_serpent 20K – 31K (depending on level of optimisation chosen). NOTE: smg_rsa uses smg_keccak and mpi!

EuCrypt is written in Ada (Serpent and Keccak components) and C (mpi and rsa components). Therefore you’ll need a tool chain that supports multi-language libraries. My personal recommendation is to use AdaCore’s GNAT – it is currently the only tool I know to actually work out of the box for everything that EuCrypt needs 2 and it includes directly the rather powerful GPRBuild tool 3 for automatic builds of multi-language projects of all sorts. Given this lack of alternatives that I could recommend, I’ll mirror here the precise version of AdaCore’s GNAT that I am currently using and that I recommend you use too for building EuCrypt:

To compile, simply go to the eucrypt folder and run gprbuild. This will build EuCrypt as an aggregate library. To build any separate component, go to its own folder and run gprbuild there.

EuCrypt has been built successfully so far on the following systems:

  • CentOS 6.8, AdaCore’s GNAT GPL 2016 (gcc 4.9.4)
  • Ubuntu 14.04, AdaCore’s GNAT GPL 2016 (gcc 4.9.4)

I’ll gladly add to the above list any other systems/configurations that I become aware of – just tell me in the comments below what you compiled it on or even better – write it up on your blog and drop me a link in the comments below (a trackback is good too – just make sure it works!).

Note that the building of EuCrypt as a multi-language, C and Ada project should be quite pain-free with gprbuild. However, there is quite a lot of pain at writing code time when you need to interface between the two languages and especially when you need to pass strings and/or pointers. You can see such interfacing in action in Chapter 12 of EuCrypt (the wrapper for using Keccak OAEP + RSA directly from C code) but I fully recommend in any case that you read as well the very clear account of sending array of octets between C and Ada, by ave1. At the moment EuCrypt uses a more basic method to accomplish the same task, namely copying octet by octet an array of characters from Ada to C or from C to Ada, as required. Feel free to change that (as anything else), of course and let me know how it goes – there is no better way of understanding some code than trying to make a meaningful change to it!

  1. hopefully, at some point not that distant into the future[]
  2. You CAN supposedly use any other version of GNAT, most notably whatever comes with your gcc and/or specific OS distribution but it seems to lag behind for one thing and to be rather prone to trouble due for instance (among other troubles) to various version mismatch between all the different moving parts; so if you DO use a different GNAT and get it to work correctly, please document your work, write it up on your blog and drop me a link – I’ll be happy to read it and add it as known working alternative![]
  3. This is NEEDED if you want to build eucrypt as a standalone aggregate library! While you CAN build the components separately with gnatmake for instance, you won’t be able to build aggregate library with it as gnatmake simply doesn’t support this at the moment as far as I’m aware.[]

March 3, 2018

EuCrypt: Correcting Bounds Error in Keccak (Word-Level)

Filed under: EuCrypt — Diana Coman @ 3:55 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Thanks to prompt use of Keccak as hashing method in the new vtools by phf , an error has been discovered and neatly reported 1 about 8 hours ago. Thanks to the very helpful way in which the error was reported and the very helpful way in which Ada reports the very line on which some check failed, it took all of 10 minutes to identify the problem: I had flipped 2 variables, attempting to copy from “ToPos” to “FromPos” and somehow I failed to see this glaring error even at re-readings… Other than the obvious fact that I should take a break, get more sleep and all that, there really is nothing else I can say on how this happened. So I’ll get straight on to the fix for the error itself, meaning changing one line in eucrypt/smg_keccak/smg_keccak.adb:

-      BWord(Bitword'First .. Bitword'First + SBB - 1) := Block(ToPos..FromPos);
+      BWord(Bitword'First .. Bitword'First + SBB - 1) := Block(FromPos..ToPos);

Note that the bit-level Keccak will not have this sort of problem as it doesn’t have to dance around the “is it multiple of 8 or not?” issue that required the above code in the first place in the word-level version. So there is nothing to change in the bit-level Keccak.

As usual, an error found means at least one test added to first show the error and then showcase the result of the fix. In this case the test has been basically already provided by the helpful phf. I have simply adapted it to make it use the full valid range of the bitrate for Keccak (based on the Keccak_Rate type defined in smg_keccak.ads) and to actually fail if there is a problem anywhere, since the failure itself will be pointing precisely at where the trouble is. The new test is added to eucrypt/smg_keccak/tests/smg_keccak-test.adb:

+  procedure test_all_bitrates is
+    Input : constant String := "hello, world";
+    Bin   : Bitstream( 0 .. Input'Length * 8 - 1 ) := ( others => 0 );
+    Bout  : Bitstream( 0 .. 100 ) := ( others => 0 );
+  begin
+    ToBitstream( Input, Bin );
+    Put_Line("Testing all bitrates:");
+      for Bitrate in Keccak_Rate'Range loop
+        Sponge(Bin, Bout, Bitrate);
+        Put_Line("PASSED: keccak with bitrate " & Integer'Image(Bitrate));
+      end loop;
+  end test_all_bitrates;
+

Running all the tests shows now a very satisfying PASSED for ALL valid bitrates of Keccak. Note that bitrate has to be between 1 and width of the Keccak state, which is currently 1600. Anything outside this range will cause CORRECTLY an abortion of execution – this will happen however as soon as the Sponge procedure is called since the value will simply not match the restricted Keccak_Rate type.

The .vpatch for the above and its signature will be on my Reference Code Shelf as well as linked here directly for your convenience:

  1. Reported with code to reproduce the problem and a paste of the output and everything one needs to find the trouble in about 10 minutes flat.[]

March 1, 2018

EuCrypt Chapter 12: Wrapper C-Ada for RSA + OAEP

Filed under: EuCrypt — Diana Coman @ 9:45 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Find yourself a comfortable position, as this might take a while to go through. At 1034 lines, the .vpatch for this chapter deals with the gnarly mess that is the passing of char* to Ada and back, while also gathering everything from previous chapters together into the actual EuCrypt library as a standalone, static lib. As usual and as always in my books, adding something does not mean that I’m taking away previous, useful options, quite on the contrary. Specifically, this chapter provides a way to compile everything in a single lib *in addition* to the previous options of compiling each and any of the EuCrypt components as standalone libraries if preferred.

The main thorny issue with compiling EuCrypt as a whole is that one needs to put together code in both Ada and C. I have to say that this past week increased my disgust for C at an incredible speed – I suspect this outcome is simply unavoidable when getting back to C after a spell of Ada and moreover when this getting back to C involves a significant part of in-your-face comparison of the 2 languages as one has to write a “wrapper for C” with 2 parameters for each 1 single parameter that Ada requires, while also being anyway at the mercy of the caller for all that verbosity to actually make any sense as intended. Still, as long as there isn’t any replacement for the despicable mpi that effectively forces C into EuCrypt, such is the life to live – stinky, verbose and treacherous, so watch your step and read 10 times before running code even once, not to mention even thinking of writing any of it.

The brighter side of all this come of course from the Ada-part: GNAT’s Project Manager (GPR) is an excellent tool for building and managing precisely this sort of mixed-languages projects. Moreover, GPR 1 is the only tool that I found to actually work 2 for aggregate libraries such as EuCrypt. And since this is the only thing that I know to actually work as intended for the task at hand, the first part of this .vpatch nukes all previous Makefiles and provides instead sweet and short .gpr files for each and every component as well as for eucrypt as a whole:

  • EuCrypt as a whole (top level, including ALL the different components), eucrypt/eucrypt.gpr:

     -- S.MG, 2018
    
    aggregate library project EuCrypt is
      for Project_Files use (
                              "mpi/mpi.gpr",
                              "smg_bit_keccak/smg_bit_keccak.gpr",
                              "smg_keccak/smg_keccak.gpr",
                              "smg_rsa/smg_rsa.gpr",
                              "smg_serpent/smg_serpent.gpr");
    
      for Library_Name use "EuCrypt";
      for Library_Kind use "static";
    
      for Library_Dir use "lib";
    end EuCrypt;
    
    
  • MPI component, eucrypt/mpi/mpi.gpr:

    -- S.MG, 2018
    
    project MPI is
      for Languages use ("C");
      for Library_Name use "MPI";
      for Library_Kind use "static";
    
      for Source_Dirs use (".", "include");
      for Object_Dir use "obj";
      for Library_Dir use "bin";
    
    end MPI;
    
    
  • MPI tests, eucrypt/mpi/tests/test_mpi.gpr

    -- S.MG, 2018
    
    with "../mpi.gpr";
    
    project test_MPI is
      for Languages use ("C");
    
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Exec_Dir use ".";
    
      for Main use ("test_mpi.c");
    end test_MPI;
    
  • Bit-level keccak component, eucrypt/smg_bit_keccak/smg_bit_keccak.gpr:

     -- S.MG, 2018
    project SMG_Bit_Keccak is
      for Languages use ("Ada");
      for Library_Name use "SMG_Bit_Keccak";
      for Library_Kind use "static";
    
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Library_Dir use "lib";
    
    end SMG_Bit_Keccak;
    
    
  • Bit-level Keccak tests, eucrypt/smg_bit_keccak/tests/smg_bit_keccak_test.gpr:

     -- Tests for SMG_Bit_Keccak (part of EuCrypt)
     -- S.MG, 2018
    
    with "../smg_bit_keccak.gpr";
    
    project SMG_Bit_Keccak_Test is
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Exec_Dir use ".";
    
      for Main use ("smg_bit_keccak-test.adb");
    end SMG_Bit_Keccak_Test;
    
  • Keccak (word-level) component, eucrypt/smg_keccak/smg_keccak.gpr:

     -- S.MG, 2018
    project SMG_Keccak is
      for Languages use ("Ada");
      for Library_Name use "SMG_Keccak";
      for Library_Kind use "static";
    
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Library_Dir use "lib";
    
    end SMG_Keccak;
    
  • Keccak (word-level) tests, eucrypt/smg_keccak/tests/smg_keccak-test.gpr:

     -- Tests for SMG_Keccak (part of EuCrypt)
     -- S.MG, 2018
    
    with "../smg_keccak.gpr";
    
    project SMG_Keccak_Test is
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Exec_Dir use ".";
    
      for Main use ("smg_keccak-test.adb");
    end SMG_Keccak_Test;
    
  • RSA (including true random number generator and OAEP padding using Keccak as hash function) component, eucrypt/smg_rsa/smg_rsa.gpr:

     -- S.MG, 2018
    
    with "../mpi/mpi.gpr";
    with "../smg_keccak/smg_keccak.gpr";
    
    project SMG_RSA is
      for Languages use ("C");
      for Library_Name use "SMG_RSA";
      for Library_Kind use "static";
    
      for Source_Dirs use (".", "include");
      for Object_Dir use "obj";
      for Library_Dir use "bin";
    
    end SMG_RSA;
    
  • RSA tests, eucrypt/smg_rsa/tests/smg_rsa_tests.gpr:

     -- Tests for SMG_RSA (part of EuCrypt)
     -- S.MG, 2018
    
    with "../smg_rsa.gpr";
    
    project SMG_RSA_Tests is
      for Languages use("C");
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Exec_Dir use ".";
    
      for Main use ("tests.c");
    end SMG_RSA_Tests;
    
  • Serpent component, eucrypt/smg_serpent/smg_serpent.gpr:

    -- S.MG, 2018
    
    project SMG_Serpent is
      for Languages use ("Ada");
      for Library_Name use "SMG_Serpent";
      for Library_Kind use "static";
    
      for Source_Dirs use ("src");
      for Object_Dir use "obj";
      for Library_Dir use "lib";
    
    end SMG_Serpent;
    
    
  • Serpent tests, eucrypt/smg_serpent/tests/smg_serpent_tests.gpr:

     -- Tests for SMG_Serpent (part of EuCrypt)
     -- S.MG, 2018
    
    with "../smg_serpent.gpr";
    
    project SMG_Serpent_Tests is
      for Source_Dirs use (".");
      for Object_Dir use "obj";
      for Exec_Dir use ".";
    
      for Main use ("testall.adb");
    end SMG_Serpent_Tests;
    
    

To compile anything, simply go to where the .gpr file of interest is and then run “gprbuild.” As there is only one .gpr file per folder, you don’t even need to specify *what* to build, as it’s clever enough to figure out you mean that .gpr right there. Any components that are required by the one you are trying to build will be built automatically as part of the process. To clean up the result of any compilation, all you need to do is “gprclean -r”. The r flag means recursive as otherwise gpr will not follow through to clean any dependencies it built.

With the “easy” part of building EuCrypt and its components out of the way, let’s see the gnarly part of making a working wrapper for rsa and oaep, effectively mixing Ada code (the oaep part in smg_keccak) and C code (the rsa part in smg_rsa). Since I’m misfortunate enough to have Eulora’s server code + underlying graphics engine in C, it follows that the rsa and oaep will need to be used from C anyway, so the approach here is to simply add to the smg_rsa component 2 new methods that do the OAEP padding + RSA encryption (and then in reverse for decrypting). The signatures for those 2 new methods can be found in eucrypt/smg_rsa/include/smg_rsa.h:


/*********
 * @param output - an MPI with KEY_LENGTH_OCTETS octets allocated space;
                   it will hold the result: (rsa(oaep(input), pk))
   @param input  - the plain-text message to be encrypted; maximum length is
                   245 octets (1960 bits)
   @param pk     - public key with which to encrypt
   NB: this method does NOT allocate memory for output!
   preconditions:
     - output IS different from input!
     - output has at least KEY_LENGTH_OCTETS octets allocated space
     - input is AT MOST max_len_msg octets long (ct defined in smg_oaep.ads)
 */
void rsa_oaep_encrypt( MPI output, MPI input, RSA_public_key *pk);

/*
 * Opposite operation to rsa_oaep_encrypt.
 * Attempts oaep_decrypt(rsa_decrypt(input))
 * @param output - an MPI to hold the result; allocated >= max_len_msg octets
 * @param input  - an MPI previously obtained with rsa_oaep_encrypt
 * @param sk     - the secret key with which to decrypt
 * @param success - this will be set to -1 if there is an error
 *
 * preconditions:
 *   - output IS different from input!
 *   - output has at least KEY_LENGTH_OCTETS octets allocated space
 *   - input is precisely KEY_LENGTH_OCTETS
 */
void rsa_oaep_decrypt( MPI output, MPI input, RSA_secret_key *sk, int *success);

The comments in the code snippet above should be clear enough: rsa_oaep_encrypt takes an MPI (the “message”) and a public key as input, calls first oaep on the message and then rsa on the result, returning (via “output”) the final result; rsa_oaep_decrypt does the reverse operations using the corresponding secret key and performing therefore rsa first, followed by oaep decrypt. Note that rsa_oaep_decrypt has an additional return value through the “success” flag – this is needed in order to signal to the caller potential clear failures at decryption (for instance if the input given is corrupted or otherwise a non-valid input). Obviously, the method works with a relatively limited definition of “success” since it can’t really tell whether you got the original plain-text, nor is it meant to care about that: when the success flag is set to a negative value 3 the caller can be absolutely sure that something went wrong; on the other hand, when the success flag is set to a positive value, the caller can only be assured that *something* was obtained through the decryption process from the original MPI. No assurance is given (nor can be given) regarding the value, integrity or meaning of that something.

Since the OAEP part is implemented in Ada, the same .h file as above also contains a few more declarations:

/*
 * This is the maximum length of a plain-text message (in octets) that can be
 * oeap+rsa encrypted in a single block. Its value is defined in smg_oaep.ads
 */
extern int max_len_msg;

/*
 * ada-exported oaep encrypt
 */
extern void oaep_encrypt_c( char* msg, int msglen,
                            char* entropy, int entlen,
                            char* encr, int encrlen,
                            int* success);

/*
 * ada-exported oaep decrypt
 */
extern void oaep_decrypt_c( char* encr, int encrlen,
                            char* decr, int* decrlen,
                            int* success);

Those oaep_encrypt_c and oaep_decrypt_c are Ada wrappers made specifically for use from C. The reason why such wrappers are needed (as opposed to using directly the Ada OAEP procedure for instance) is precisely the need to translate back and forth between C’s char * unkempt “strings” and Ada’s very tidy, trim and proper fixed-length Strings. The length of a char * from C can be anything and C doesn’t bother apparently to pass it explicitly to Ada when calling an Ada function. At the same time, Ada is very careful to avoid writing out of bounds so its “Strings” are fixed-size arrays of char, with very well-known and unchangeable length 4. To help with such interfacing problems, Ada conveniently offers the packages Interfaces.C and Interfaces.C.Strings. Using the later, one could work with chars_ptr that are in principle specifically made to handle the char * of C. However, since my purpose is not at all to import char * into C but rather to somehow bridge the chasm and recover the information from char * into a sane, fixed-length array of char, it follows that I’m using Interfaces.C.char_array. And to use those, C will have to pass explicitly the length of each char * parameter that it ever passes to Ada in any form. On its side, Ada will have to (unfortunately…) …basically trust that value as correct and use therefore a fixed-length String of precisely that length, reading/writing at the address stored in the char* parameter at most length octets and not more.

On C side, the implementation of rsa_oaep_encrypt and rsa_oaep_decrypt are quite straightforward, making use of the previously discussed oaep_encrypt_c and oaep_decrypt_c external methods conveniently provided by Ada. From eucrypt/smg_rsa/rsa.c:

void rsa_oaep_encrypt( MPI output, MPI input, RSA_public_key *pk) {
  /* precondition: output is different from input */
  assert( output != input );

  /* precondition: output has enough memory allocated */
	unsigned int nlimbs_n = mpi_nlimb_hint_from_nbytes( KEY_LENGTH_OCTETS);
	assert( mpi_get_alloced( output ) >= nlimbs_n);

  /* precondition: input is at most max_len_msg octets long */
  unsigned int nlimbs_msg = mpi_nlimb_hint_from_nbytes( max_len_msg );
  assert( mpi_get_nlimbs( input ) <= nlimbs_msg);

  /* Step 1: oaep padding */
  /* get message char array and length */
  int msglen = 0;
  int sign;
  unsigned char * msg = mpi_get_buffer( input, &msglen, &sign);
  /* allocate memory for result */
  int encrlen = KEY_LENGTH_OCTETS;
  unsigned char * encr = xmalloc( encrlen );
  int entlen = KEY_LENGTH_OCTETS;
  unsigned char * entropy = xmalloc( entlen );
  int success = -10;
  /* call oaep until result is strictly < N of the rsa key to use */
  MPI oaep = mpi_alloc( nlimbs_n ); /* result of oaep encrypt/pad */

  int nread;
  do {
    /* get random bits */
		do {
      nread = get_random_octets( entlen, entropy );
		} while (nread != entlen);

    oaep_encrypt_c( msg, msglen, entropy, entlen, encr, encrlen, &success);
    if (success > 0) {
      /* set the obtained oaep to output mpi and compare to N of the rsa key */
      /* NB: 0-led encr WILL GET TRUNCATED!! */
      mpi_set_buffer( oaep, encr, encrlen, 0);
    }
    printf(".");
  }
  while ( success <=0 || mpi_cmp( oaep, pk->n ) >= 0 );

  printf("n");
  /* Step2 : call rsa for final result */
  public_rsa( output, oaep, pk );

  /* clear up */
  xfree( msg );
  xfree( encr );
  xfree( entropy );
  mpi_free( oaep );
}

void rsa_oaep_decrypt( MPI output, MPI input, RSA_secret_key *sk, int *success)
{
  *success = -1;
	unsigned int nlimbs_n = mpi_nlimb_hint_from_nbytes( KEY_LENGTH_OCTETS);
  unsigned int nlimbs_msg = mpi_nlimb_hint_from_nbytes( max_len_msg );

  /* preconditions */
  assert( output != input );
	assert( mpi_get_alloced( output ) >= nlimbs_msg);
  assert( mpi_get_nlimbs( input )  == nlimbs_n);

  /* rsa */
  MPI rsa_decr = mpi_alloc( nlimbs_n );
  secret_rsa( rsa_decr, input, sk );

  /* oaep */
  unsigned encr_len, decr_len;
  int sign, flag;
  char *oaep_encr = mpi_get_buffer( rsa_decr, &encr_len, &sign );
  char *oaep_decr = xmalloc( encr_len );
  decr_len = encr_len;
  oaep_decrypt_c( oaep_encr, encr_len, oaep_decr, &decr_len, &flag );

  /* check status */
  if ( flag > 0 ) {
    *success = 1;
    mpi_set_buffer( output, oaep_decr, decr_len, 0 );
  }
  else
    *success = -1;

  /* cleanup */
  mpi_free( rsa_decr );
  xfree( oaep_encr );
  xfree( oaep_decr );
}

The actual steps of both encryption and decryption above should be fairly obvious and further explained by the comments in the code. I’ll just point to you one important decision taken in there: the oaep encryption is called in a loop until the result obtained is strictly smaller than the modulus (n) of the RSA key that will be used for encryption. This condition is a requirement of RSA and since oaep returns a block of the same length of TMSR RSA modulus, it follows that in some cases the oaep block will actually be a number bigger than the modulus, there is no way around this. Happily though, the TMSR RSA modulus has the highest bit set, meaning that an oaep block with highest bit 0 will surely be smaller than n. And since the highest bit of the resulting oaep block is quite random (TMSR OAEP uses *by design* a significant number of *true* random bits), it follows that it’s really enough to simply check the result provided by oaep encryption, discard it if it’s bigger than the modulus n and try again with another set of random bits – in most cases there should be at most 2 tries in order to get something that can be then safely fed to RSA encrypt.

To support the calling of Ada stuff from C as shown above, a lot of ugly code has to be added in Ada – the wrappers for the otherwise-perfectly-fine Ada oaep methods. Those are all at least contained in a single package, namely smg_oaep in eucrypt/smg_keccak/smg_oaep.ads and eucrypt/smg_keccak/smg_oaep.adb. First, methods to copy octet by octet from /to those char *:

  -- copy from Ada String to C char array and back, octet by octet

  -- This copies first Len characters from A to the first Len positions in S
  -- NB: this does NOT allocate /check memory!
  -- Caller has to ensure that:
  --    S has space for at least Len characters
  --    A has at least Len characters
  procedure Char_Array_To_String( A   : in Interfaces.C.char_array;
                                  Len : in Natural;
                                  S   : out String);

  -- This copies first Len characters from S to the first Len positions in A
  -- NB: there are NO checks or memory allocations here!
  -- Caller has to make sure that:
  --   S'Length >= Len
  --   A has allocated space for at least Len characters
  procedure String_To_Char_Array( S   : in String;
                                  Len : in Natural;
                                  A   : out Interfaces.C.char_array);

Then, the wrappers themselves and the pragma export for them:

 -- wrapper of oaep_encrypt for direct use from C
  -- NB: caller HAS TO provide the length of the Message (parameter LenMsg)
  -- NB: caller HAS TO provide the length of the Entropy (parameter LenEnt)
  -- NB: caller HAS TO provide the allocated space for result (LenEncr)
  -- NB: LenEncr HAS TO be at least OAEP_LENGTH_OCTETS!
  -- NB: LenEnt HAS TO be at least OAEP_LENGTH_OCTETS or this will FAIL!
  procedure OAEP_Encrypt_C( Msg       : in Interfaces.C.char_array;
                            MsgLen    : in Interfaces.C.size_t;
                            Entropy   : in Interfaces.C.char_array;
                            EntLen    : in Interfaces.C.size_t;
                            Encr      : out Interfaces.C.char_array;
                            EncrLen   : in Interfaces.C.size_t;
                            Success   : out Interfaces.C.Int);
  pragma Export( C, OAEP_Encrypt_C, "oaep_encrypt_c" );

 -- wrapper for use from C
  procedure oaep_decrypt_c( Encr    : in Interfaces.C.Char_Array;
                            EncrLen : in Interfaces.C.Int;
                            Decr    : out Interfaces.C.Char_Array;
                            DecrLen : in out Interfaces.C.Int;
                            Success : out Interfaces.C.Int);
  pragma Export( C, oaep_decrypt_c, "oaep_decrypt_c");

The implementations for all the above:

  -- This copies first Len characters from A to the first Len positions in S
  -- NB: this does NOT allocate /check memory!
  -- Caller has to ensure that:
  --    S has space for at least Len characters
  --    A has at least Len characters
  procedure Char_Array_To_String( A   : in Interfaces.C.char_array;
                                  Len : in Natural;
                                  S   : out String) is
  begin
    for Index in 0 .. Len - 1 loop
      S( S'First + Index ) := Character( A( Interfaces.C.size_t( Index )));
    end loop;
  end Char_Array_To_String;

  -- This copies first Len characters from S to the first Len positions in A
  -- NB: there are NO checks or memory allocations here!
  -- Caller has to make sure that:
  --   S'Length >= Len
  --   A has allocated space for at least Len characters
  procedure String_To_Char_Array( S   : in String;
                                  Len : in Natural;
                                  A   : out Interfaces.C.char_array) is
    C : Character;
  begin
    for Index in 0 .. Len - 1 loop
      C := S( S'First + Index );
      A( Interfaces.C.size_t( Index )) := Interfaces.C.Char( C );
    end loop;
  end String_To_Char_Array;

  procedure OAEP_Encrypt_C( Msg       : in Interfaces.C.char_array;
                            MsgLen    : in Interfaces.C.size_t;
                            Entropy   : in Interfaces.C.char_array;
                            EntLen    : in Interfaces.C.size_t;
                            Encr      : out Interfaces.C.char_array;
                            EncrLen   : in Interfaces.C.size_t;
                            Success   : out Interfaces.C.Int) is
    AdaMsgLen  : Natural := Natural( MsgLen );
    AdaEntLen  : Natural := Natural( EntLen );
    AdaEncrLen : Natural := Natural( EncrLen );
    AdaMsg     : String( 1 .. AdaMsgLen );
    AdaEntBlock: OAEP_Block;
    AdaResult  : OAEP_Block := ( others => '0' );
  begin
    Success := 0;
    -- check there is enough entropy and enoug output space, fail otherwise
    if AdaEntLen /= AdaEntBlock'Length or AdaEncrLen < AdaResult'Length then
      return;
    end if;
    -- translate to Ada
      --Interfaces.C.To_Ada( Msg, AdaMsg, AdaMsgLen );
    Char_Array_To_String( Msg, AdaMsgLen, AdaMsg );
      --Interfaces.C.To_Ada( Entropy, AdaEntropy, AdaEntLen );
    Char_Array_To_String( Entropy, AdaEntLen, AdaEntBlock );

    -- call the actual oaep encrypt
    OAEP_Encrypt( AdaMsg, AdaEntBlock, AdaResult );

    -- translate back to C, set success flag and return
       --Interfaces.C.To_C( AdaResult, CEncr, CEncrLen, False );
    -- EncrLen has already been tested to be at least AdaResult'Length
    String_To_Char_Array( AdaResult, AdaEncrLen, Encr );
    Success := 1;

  end OAEP_Encrypt_C;

  procedure oaep_decrypt_c( Encr    : in Interfaces.C.Char_Array;
                            EncrLen : in Interfaces.C.Int;
                            Decr    : out Interfaces.C.Char_Array;
                            DecrLen : in out Interfaces.C.Int;
                            Success : out Interfaces.C.Int) is
    AdaDecr    : OAEP_HALF := ( others => '0' );
    AdaEncr    : OAEP_Block:= ( others => '0' );
    AdaEncrLen : Natural := Natural( EncrLen );
    AdaDecrLen : Natural := 0;
    AdaFlag    : Boolean;
  begin
    -- check and set success flag/exit if needed
    Success := 0;
    if EncrLen /= OAEP_Block'Length then
      return;
    end if;

    -- translate to Ada: copy octet by octet as C.To_Ada is problematic
      -- Interfaces.C.To_Ada( Encr, AdaEncr, AdaEncrLen, False );
    Char_Array_To_String( Encr, AdaEncrLen, AdaEncr );

    -- actual decrypt
    OAEP_Decrypt( AdaEncr, AdaDecrLen, AdaDecr, AdaFlag );

    -- translate back to C
    AdaDecrLen := AdaDecrLen / 8;  -- from bits to octets
    if AdaFlag and
       Natural( DecrLen ) >= AdaDecrLen and
       AdaDecr'Length >= AdaDecrLen then
      Success := 1;
      DecrLen := Interfaces.C.Int( AdaDecrLen );
        -- Interfaces.C.To_C( AdaDecr, Decr, AdaDecrLen );
      String_To_Char_Array( AdaDecr, AdaDecrLen, Decr );
    end if;
  end oaep_decrypt_c;

In case you wonder why am I using this octet by octet copy thing instead of the procedures in Interfaces.C (To_Ada and To_C): I tried to use them and in some cases they still fail miserably, quite possibly because I don’t yet fully understand them and therefore I’m not using them properly. So if you have experience with them for the sort of task you see here, please chime in. In any case, for as long as I can’t trust them, I can’t use them here, so octet by octet copying it is at least for now.

As seen before with the .gpr files, the compilation itself is quite straightforward. The call of Ada methods from C is also just a matter of “pragma export “foo” ” on Ada side and “extern foo” on C side. The handy .gpr file takes otherwise care of the dependency introduced between the two EuCrypt components (since smg_rsa uses now smg_keccak) and we are all set. Time therefore to run the tests! More specifically, the *new* tests, in eucrypt/smg_rsa/tests/tests.c:

void test_oaep_encr_decr( int nruns ) {
  /* a set of RSA keys previously generated with eucrypt */
	RSA_public_key pk;
	pk.n = mpi_alloc(0);
	pk.e = mpi_alloc(0);

  RSA_secret_key sk;
  sk.n = mpi_alloc(0);
  sk.e = mpi_alloc(0);
  sk.d = mpi_alloc(0);
  sk.p = mpi_alloc(0);
  sk.q = mpi_alloc(0);
  sk.u = mpi_alloc(0);

  mpi_fromstr(sk.n, "0x
CD2C025323BEA46FFF2FA8D7A9D39817EA713421F4AE03FA8120641193892A70BFECF5
83101635A432110D3DDE6339E3CC7ECC0AD91C026FCACE832DD3888A6FCA7BCE56C390
5A5AC8C7BC921DA675E4B62489B254EB34659D547D71165BC998983A81937BD251AEE1
2D985EC387D5376F5DCC5EF7EC530FBD6FD2AA7285EE1AF3335EA73163F0954F30402E
D7B374EE84A97B1849B0674B0DA0A2050BD79B71ABB1559F3A9CFDB8557DED7BC90CF2
09E8A847E9C226140845B7D03842162E7DA5DD16326CB1F71A248D841FE9076A09911F
2F4F5E3EA44EA8DE40332BF00406990BCCF61C322A03C456EF3A98B341E0BDBC1088CE
683E78510E76B72C2BCC1EE9AEDD80FFF18ABFC5923B2F36B581C25114AB2DF9F6C2B1
9481703FD19E313DCD7ACE15FA11B27D25BCE5388C180A7E21167FB87750599E1ED7C7
50F4A844E1DC2270C62D19671CF8F4C25B81E366B09FC850AE642136D204A9160AEECE
575B57378AA439E9DD46DC990288CD54BAA35EEE1C02456CD39458A6F1CBF012DCEDF4
27CCF3F3F53645658FC49C9C9D7F2856DB571D92B967AB5845514E0054DDB49099F5DD
04A6F6F5C5CE642276834B932881AEB648D1F25E9223971F56E249EF40CF7D80F22621
CDD0260E9E7D23746960ADB52CF2987584FB1DE95A69A39E5CB12B76E0F5C1A0529C0C
065D2E35720810F7C7983180B9A9EA0E00C11B79DC3D");

  mpi_fromstr(sk.e, "0x
DD4856B4EE3D099A8604AE392D8EFEC094CDF01546A28BE87CB484F999E8E75CDFCD01
D04D455A6A9254C60BD28C0B03611FC3E751CC27EF768C0B401C4FD2B27C092834A6F2
49A145C4EDC47A3B3D363EC352462C945334D160AF9AA72202862912493AC6190AA3A6
149D4D8B9996BA7927D3D0D2AD00D30FD630CF464E6CAF9CF49355B9A70E05DB7AE915
F9F602772F8D11E5FCDFC7709210F248052615967090CC1F43D410C83724AA5912B2F0
52E6B39449A89A97C79C92DC8CB8DEEFCF248C1E1D2FC5BFE85165ECA31839CAA9CEB3
3A92EBDC0EB3BAC0F810938BB173C7DA21DCBB2220D44CBA0FD40A2C868FC93AC5243E
C137C27B0A76D65634EBB3");

  mpi_fromstr(sk.d, "0x
7C8A6FA1199D99DCA45E9BDF567CA49D02B237340D7E999150BC4883AE29DEC5158521
B338F35DC883792356BDDBB3C8B3030A6DD4C6522599A3254E751F9BA1CB1061C5633C
81BBFACF6FCD64502614102DFED3F3FA284066C342D5E00953B415915331E30812E5FB
CD6680ADCCDEE40B8376A3A225F2E160EA59C7566804526D73BB660A648A3EF9802313
B2F841E8458B2AAACE7AACF31083E8F3F630298138393BC88BBD7D4AA4334949651D25
365B10DBF4A4A08E20A6CC74BFDD37C1C38E2ADC2A283DF06590DF06B46F67F6ACA67F
AC464C795261659A2F9558802D0BBAA05FD1E1AF2CDC70654723DF7EFAEA148B8CDBEB
C89EA2320AB9BBB1BC4311475DF3D91446F02EF192368DFEBAC598CCFD4407DEC58FDC
1A94CCDD6E5FBA9C52164ACEA8AEE633E557BCCEACB7A1AF656C379482D784A120A725
32F9B2B35173D505F21D5AD4CB9511BC836DC923730B70291B70290A216CA3B21CFF79
E895C35F4F7AF80E1BD9ED2773BD26919A76E4298D169160593E0335BE2A2A2D2E8516
948F657E1B1260E18808A9D463C108535FB60B3B28F711C81E5DE24F40214134A53CE5
9A952C8970A1D771EBEFFA2F4359DCF157995B3F1950DE3C6EC41B7FF837148F55F323
372AF3F20CE8B8038E750C23D8F5041FA951327859B0E47483F0A47103EF808C72C251
006FA526245291C8C84C12D2EF63FB2301EA3EEDA42B");

  mpi_fromstr(sk.p, "0x
E236732452039C14EC1D3B8095BDDCFB7625CE27B1EA5394CF4ED09D3CEECAA4FC0BF6
2F7CE975E0C8929CE84B0259D773EA038396479BF15DA065BA70E549B248D77B4B23ED
A267308510DBEE2FD44E35D880EE7CFB81E0646AA8630165BD8988C3A8776D9E704C20
AA25CA0A3C32F27F592D5FD363B04DD57D8C61FFDCDFCCC59E2913DE0EE47769180340
E1EA5A803AA2301A010FF553A380F002601F0853FCACDB82D76FE2FACBCD6E5F294439
0799EA5AE9D7880D4E1D4AE146DC1D4E8495B9DD30E57E883923C5FC26682B7142D35C
D8A0FC561FE725A6CF419B15341F40FE0C31132CBD81DD8E50697BD1EBFFA16B522E16
F5B49A03B707218C7DA60B");

  mpi_fromstr(sk.q, "0x
E830482A3C4F5C3A7E59C10FF8BA760DB1C6D55880B796FFDA4A82E0B60E974E81D04B
2A4AD417823EBFB4E8EFB13782943562B19B6C4A680E3BA0C8E37B5023470F4F1AC1F8
A0B10672EF75CD58BCD45E6B14503B8A6A70AFE79F6201AF56E7364A1C742BE1453FD2
24FDC9D66522EAF4466A084BCB9E46D455A2946E94CBF028770F38D0B741C2CC59308F
71D8C2B4B9C928E0AE8D68DEB48A3E9EFD84A10301EBD55F8221CA32FC567B306B2A8E
116350AFB995859FDF4378C5CFD06901494E8CFA5D8FAC564D6531FA8A2E4761F5EFBA
F78750B6F4662BE9EA4C2FAD67AF73EEB36B41FC15CB678810C19A51DF23555695C4C1
546F3FACA39CAA7BB8DBD7");

  mpi_fromstr(sk.u, "0x
846232322775C1CD7D5569DC59E2F3E61A885AE2E9C4A4F8CB3ACBE8C3A5441E5FE348
A2A8AC9C2998FBF282222BF508AA1ECF66A76AEDD2D9C97028BFD3F6CA0542E38A5312
603C70B95650CE73F80FDD729988FBDB5595A5BF8A007EA34E54994A697906CE56354C
E00DF10EB711DEC274A62494E3D350D88736CF67A477FB600AC9F1D6580727585092BF
5EBC092CC4D6CF75769051033A1197103BE269942F372168A53771746FBA18ED6972D5
0B935A9B1D6B5B3DD50CD89A27FE93C10924E9103FACF7B4C5724A046C3D3B50CC1C78
5F5C8E00DBE1D6561F120F5294C170914BC10F978ED4356EED67A9F3A60D70AFE540FC
5373CBAE3D0A7FD1C87273");

  /* copy the public key components */
  pk.n = mpi_copy( sk.n );
  pk.e = mpi_copy( sk.e );

  /* some plain text message */
	MPI msg = mpi_alloc(0);
	mpi_fromstr(msg, "0x
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
28097EF92F6E78CAE0CB97");

  /* actual testing */
	printf("TEST verify oaep_encr_decr on message: n");
	mpi_print( stdout, msg, 1);
	printf("n");

  int nlimbs_n = mpi_nlimb_hint_from_nbytes( KEY_LENGTH_OCTETS);
	MPI encr = mpi_alloc( nlimbs_n );
	MPI decr = mpi_alloc( nlimbs_n );
  int success;

  adainit();
  rsa_oaep_encrypt( encr, msg, &pk );
  rsa_oaep_decrypt( decr, encr, &sk, &success );

  if (success <= 0 ||
      mpi_cmp(encr, msg) == 0 ||
      mpi_cmp(msg, decr) != 0)
    printf("FAILED: success flag is %dn", success);
  else
    printf("PASSEDn");

  /* attempt to decrypt corrupted block */
  mpi_clear( decr );
  rsa_oaep_decrypt( decr, pk.n, &sk, &success);
  if (success > 0)
    printf("FAILED: attempt to decrypt non-/corrupted oaep blockn");
  else
    printf("PASSED: attempt to decrypt non-/corrupted oaep blockn");
  adafinal();

  /* clean up */
  mpi_free( sk.n );
  mpi_free( sk.e );
  mpi_free( sk.d );
  mpi_free( sk.p );
  mpi_free( sk.q );
  mpi_free( sk.u );

	mpi_free( pk.n );
	mpi_free( pk.e );

	mpi_free( msg );
	mpi_free( encr );
	mpi_free( decr );
}

void test_mpi_buffer() {
  unsigned int noctets = 10;
  int nlimbs = mpi_nlimb_hint_from_nbytes( noctets );
  MPI m = mpi_alloc( nlimbs );
  unsigned char *setbuffer = xmalloc( noctets );
  unsigned char *getbuffer;
  unsigned int i, sign, mpilen, nerrors;

  for (i=0; i< noctets; i++)
    setbuffer[i] = i;

  mpi_set_buffer( m, setbuffer, noctets, 0);

  getbuffer = mpi_get_buffer( m, &mpilen, &sign );

  if (mpilen == noctets -1 ) {
    nerrors = 0;
    for (i=0;i0)
      printf("FAIL: got %d different values!n", nerrors);
    else printf("PASSED: mpi_get/set_buffern");
  }

  mpi_free(m);
  xfree(setbuffer);
  xfree(getbuffer);
}

The test_oaep_encr_decr method uses a pair of TMSR RSA keys (previously generated by smg_rsa) to attempt oaep+rsa on a message, using rsa_oaep_encrypt and rsa_oaep_decrypt. There is also an attempt at decrypting a “corrupted” oaep block and this correctly fails with the success flag set accordingly.

The second test method in there is …bonus. It follows a further discovery of the unexpected in the mpi “code”: the mpi_set_buffer and mpi_get_buffer methods for an mpi are not exactly symmetrical. In other words, calling mpi_get_buffer for the same mpi on which you previously (immediately before!) called mpi_set_buffer with a given buffer b will NOT always return EXACTLY same b! That’s because mpi_set_buffer will helpfully trim leading-0 octets from the buffer you pass, so if you pass number 009, it will store 9 only and therefore it will return…9. To reflect this, the test gives a warning as result – basically I’m flagging this for the future, not changing anything at the moment. This MPI implementation has already eaten an incredible amount of time with very little to show for it in return and I foresee that it will still eat even more time with similarly poor returns on it. Moreover, it has already gotten to the stage where I think it would have been probably better *not* to have an mpi implementation at all than to have this one. I can only add that I would certainly throw it away and implement a useful Ada library, if not for the fact that there really, really are many more pressing things to do right now. Sigh.

Getting back to happier facts, this chapter quite completes EuCrypt as the library contains now everything that it is currently meant to contain. Since the original Introduction, there has been a change in that the smg_comm component was taken out of EuCrypt: the reason for this removal is that smg_comm is very specific to Eulora and as such it is naturally a user of EuCrypt rather than part of it; by contrast, EuCrypt offers generic crypto routines (despite being made because Eulora needs it). The .vpatch and its signature can be found on my Reference Code Shelf and are also linked directly here for your convenience:

Any further chapters -if and when they might be- will deal with cosmetic changes or fixing of errors if any are found. At the moment there aren’t any further components /parts planned as part of EuCrypt itself. Give it a spin!

  1. I’m currently using Adacore’s 2016 version with gcc 4.9.4 as previously mentioned in the logs.[]
  2. Note that here as everywhere I really recommend the Adacore version as opposed to whatever your favourite linux distribution can find: the reason for this recommendation is pure and painful experience – while I think that one *can* get a working setup with non-Adacore GNAT and/or GPR and random-flavour gcc, it’s certainly not a straightforward task and there are so many things that can (and do!) go wrong (mismatching versions of various tools is the one I stumbled on repeatedly) that it’s not worth it, simply. So do yourself a favour and get the only ada-tron that actually… works – Adacore’s.[]
  3. Grrr, why can’t this be a Boolean as it should be!!![]
  4. Yes, I know you can use unbounded strings instead but I won’t be using those unless I really, really have no choice. I suggest you do the same rather than writing “in Ada” while importing as much C-uncertainty as possible.[]

February 22, 2018

EuCrypt Chapter 11: Serpent

Filed under: EuCrypt — Diana Coman @ 1:22 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

This chapter adds to EuCrypt’s generous endowment in the bit and byte diddling 1 department: a byte-ready Serpent joins the bit-level Keccak and byte-level Keccak from previous chapters!

As I previously reviewed the functioning and testing of this implementation of Serpent, I won’t repeat that part here. I’ll go instead straight to the code, which is found in eucrypt/smg_serpent. There is first the smg_serpent.gpr file, for compiling (using gprbuild) 2 the smg_serpent as a static library by itself:

-- S.MG, 2018

project SMG_Serpent is
  for Languages use ("Ada");
  for Library_Name use "SMG_Serpent";
  for Library_Kind use "static";

  for Source_Dirs use ("src");
  for Object_Dir use "obj";
  for Library_Dir use "lib";

end SMG_Serpent;

As you might guess from the .gpr file above, the actual sources of Serpent itself are found in eucrypt/smg_serpent/src. The smg_serpent.ads is minimal, simply defining a few types that Serpent uses (Bytes as array of Unsigned_8 aka array of octets aka array of groups of 8 bits; Words as array of Unsigned_32; Block as an array of 16 octets; Key as an array of 32 octets and Key_Schedule as an array of 140 32-bit words indexed for Serpent’s purposes from -8 to 13), the three main procedures for Serpent use (making the key schedule out of a given Serpent key; encrypting a given plaintext with a given key schedule; decrypting a given encrypted text with a given key schedule) and one additional procedure that provides one single self-test of Serpent’s functioning:

-------------------------------------------------------------------------------
-- S.MG, 2018; with added automated tests
--
-- Serpent Blockcipher
--
-- Copyright (c) 1998 Markus G. Kuhn . All rights reserved.
--
-- $Id: serpent.ads,v 1.2 1998-06-10 14:22:16+00 mgk25 Exp $
--
-------------------------------------------------------------------------------
--
-- This is the Ada95 reference implementation of the Serpent cipher
-- submitted by Ross Anderson, Eli Biham and Lars Knudson in June 1998 to
-- the NIST Advanced Encryption Standard (AES) contest. Please note that
-- this is a revised algorithm that is not identical to the old version
-- presented at the 1998 Fast Software Encryption Workshop.
-- 
--
-- Compiled with GNAT 3.10p under Linux, this implementation encrypts and
-- decrypts with 20.8 Mbit/s on a 300 MHz Pentium II.
--
-------------------------------------------------------------------------------

with Interfaces; use Interfaces;

package SMG_Serpent is

  pragma Pure(SMG_Serpent);

  type Bytes is array (Integer range <>) of Unsigned_8;
  type Words is array (Integer range <>) of Unsigned_32;
  subtype Block is Bytes (0 .. 15);
  subtype Key   is Bytes (0 .. 31);
  subtype Key_Schedule is Words (-8 .. 131);

  procedure Prepare_Key (K : in Key; W : out Key_Schedule);

  procedure Encrypt (W : in Key_Schedule; Plaintext  :  in Block;
					   Ciphertext : out Block);

  procedure Decrypt (W : in Key_Schedule; Ciphertext :  in Block;
					   Plaintext  : out Block);

  procedure Selftest;

  Implementation_Error : exception;  -- raised if Selftest failed

end SMG_Serpent;

The actual implementation of the above is quite straightforward but relatively verbose due to loops being unrolled for faster execution. There is also handling of big endian / little endian iron through octet flipping, as required:

 -------------------------------------------------------------------------------
 --
 -- Serpent Blockcipher
 --
 -- Copyright (c) 1998 Markus G. Kuhn . All rights reserved.
 --
 -- Modified by S.MG, 2018
 --
 -------------------------------------------------------------------------------
 --
 -- This implementation is optimized for best execution time by use of
 -- function inlining and loop unrolling. It is not intended to be used in
 -- applications (such as smartcards) where machine code size matters. Best
 -- compiled with highest optimization level activated and all run-time
 -- checks supressed.
 --
 -------------------------------------------------------------------------------

with System, Ada.Unchecked_Conversion;
use System;

package body SMG_Serpent is

  pragma Optimize( Time );

  -- Auxiliary functions for byte array to word array conversion with
  -- Bigendian/Littleendian handling.
  --
  -- The convention followed here is that the input byte array
  --
  --   00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
  --
  -- is converted into the register values
  --
  --   X0 = 03020100,  X1 = 07060504,  X2 = 0b0a0908,  X3 = 0f0e0d0c

  subtype Bytes_4 is Bytes (0 .. 3);
  function Cast is new Ada.Unchecked_Conversion (Bytes_4, Unsigned_32);
  function Cast is new Ada.Unchecked_Conversion (Unsigned_32, Bytes_4);

  function Bytes_To_Word (X : Bytes_4) return Unsigned_32 is
  begin
    if Default_Bit_Order = Low_Order_First then
      -- we have a Littleendian processor
      return Cast(X);
    else
      -- word sex change
      return Cast3);
    end if;
  end Bytes_To_Word;

  function Word_To_Bytes (X : Unsigned_32) return Bytes_4 is
  begin
    if Default_Bit_Order = Low_Order_First then
      -- we have a Littleendian processor
      return Cast(X);
    else
      -- word sex change
      return (Cast(X)(3), Cast(X)(2), Cast(X)(1), Cast(X)(0));
    end if;
  end Word_To_Bytes;

  pragma Inline(Bytes_To_Word, Word_To_Bytes);
  -- inline functions for the Encryption and Decryption procedures

  -- Sbox function
  procedure S (R : Integer; X0, X1, X2, X3 : in out Unsigned_32) is
      T01, T02, T03, T04, T05, T06, T07, T08, T09,
      T10, T11, T12, T13, T14, T15, T16, T17, T18 : Unsigned_32;
      W, X, Y, Z : Unsigned_32;
  begin
    if R = 0 then
      -- S0:   3  8 15  1 10  6  5 11 14 13  4  2  7  0  9 12
      -- depth = 5,7,4,2, Total gates=18
      T01 := X1  xor X2;
      T02 := X0  or X3;
      T03 := X0  xor X1;
      Z   := T02 xor T01;
      T05 := X2  or z;
      T06 := X0  xor X3;
      T07 := X1  or X2;
      T08 := X3  and T05;
      T09 := T03 and T07;
      Y   := T09 xor T08;
      T11 := T09 and y;
      T12 := X2  xor X3;
      T13 := T07 xor T11;
      T14 := X1  and T06;
      T15 := T06 xor T13;
      W   :=     not T15;
      T17 := W   xor T14;
      X   := T12 xor T17;
    elsif R = 1 then
      -- S1:  15 12  2  7  9  0  5 10  1 11 14  8  6 13  3  4
      -- depth = 10,7,3,5, Total gates=18
      T01 := X0  or X3;
      T02 := X2  xor X3;
      T03 :=     not X1;
      T04 := X0  xor X2;
      T05 := X0  or T03;
      T06 := X3  and T04;
      T07 := T01 and T02;
      T08 := X1  or T06;
      Y   := T02 xor T05;
      T10 := T07 xor T08;
      T11 := T01 xor T10;
      T12 := Y   xor T11;
      T13 := X1  and X3;
      Z   :=     not T10;
      X   := T13 xor T12;
      T16 := T10 or x;
      T17 := T05 and T16;
      W   := X2  xor T17;
    elsif R = 2 then
      -- S2:   8  6  7  9  3 12 10 15 13  1 14  4  0 11  5  2
      -- depth = 3,8,11,7, Total gates=16
      T01 := X0  or X2;
      T02 := X0  xor X1;
      T03 := X3  xor T01;
      W   := T02 xor T03;
      T05 := X2  xor w;
      T06 := X1  xor T05;
      T07 := X1  or T05;
      T08 := T01 and T06;
      T09 := T03 xor T07;
      T10 := T02 or T09;
      X   := T10 xor T08;
      T12 := X0  or X3;
      T13 := T09 xor x;
      T14 := X1  xor T13;
      Z   :=     not T09;
      Y   := T12 xor T14;
    elsif R = 3 then
      -- S3:   0 15 11  8 12  9  6  3 13  1  2  4 10  7  5 14
      -- depth = 8,3,5,5, Total gates=18
      T01 := X0  xor X2;
      T02 := X0  or X3;
      T03 := X0  and X3;
      T04 := T01 and T02;
      T05 := X1  or T03;
      T06 := X0  and X1;
      T07 := X3  xor T04;
      T08 := X2  or T06;
      T09 := X1  xor T07;
      T10 := X3  and T05;
      T11 := T02 xor T10;
      Z   := T08 xor T09;
      T13 := X3  or z;
      T14 := X0  or T07;
      T15 := X1  and T13;
      Y   := T08 xor T11;
      W   := T14 xor T15;
      X   := T05 xor T04;
    elsif R = 4 then
      -- S4:   1 15  8  3 12  0 11  6  2  5  4 10  9 14  7 13
      -- depth = 6,7,5,3, Total gates=19
      T01 := X0  or X1;
      T02 := X1  or X2;
      T03 := X0  xor T02;
      T04 := X1  xor X3;
      T05 := X3  or T03;
      T06 := X3  and T01;
      Z   := T03 xor T06;
      T08 := Z   and T04;
      T09 := T04 and T05;
      T10 := X2  xor T06;
      T11 := X1  and X2;
      T12 := T04 xor T08;
      T13 := T11 or T03;
      T14 := T10 xor T09;
      T15 := X0  and T05;
      T16 := T11 or T12;
      Y   := T13 xor T08;
      X   := T15 xor T16;
      W   :=     not T14;
    elsif R = 5 then
      -- S5:  15  5  2 11  4 10  9 12  0  3 14  8 13  6  7  1
      -- depth = 4,6,8,6, Total gates=17
      T01 := X1  xor X3;
      T02 := X1  or X3;
      T03 := X0  and T01;
      T04 := X2  xor T02;
      T05 := T03 xor T04;
      W   :=     not T05;
      T07 := X0  xor T01;
      T08 := X3  or w;
      T09 := X1  or T05;
      T10 := X3  xor T08;
      T11 := X1  or T07;
      T12 := T03 or w;
      T13 := T07 or T10;
      T14 := T01 xor T11;
      Y   := T09 xor T13;
      X   := T07 xor T08;
      Z   := T12 xor T14;
    elsif R = 6 then
      -- S6:   7  2 12  5  8  4  6 11 14  9  1 15 13  3 10  0
      -- depth = 8,3,6,3, Total gates=19
      T01 := X0  and X3;
      T02 := X1  xor X2;
      T03 := X0  xor X3;
      T04 := T01 xor T02;
      T05 := X1  or X2;
      X   :=     not T04;
      T07 := T03 and T05;
      T08 := X1  and x;
      T09 := X0  or X2;
      T10 := T07 xor T08;
      T11 := X1  or X3;
      T12 := X2  xor T11;
      T13 := T09 xor T10;
      Y   :=     not T13;
      T15 := X   and T03;
      Z   := T12 xor T07;
      T17 := X0  xor X1;
      T18 := Y   xor T15;
      W   := T17 xor T18;
    elsif R = 7 then
      -- S7:   1 13 15  0 14  8  2 11  7  4 12 10  9  3  5  6
      -- depth = 10,7,10,4, Total gates=19
      T01 := X0  and X2;
      T02 :=     not X3;
      T03 := X0  and T02;
      T04 := X1  or T01;
      T05 := X0  and X1;
      T06 := X2  xor T04;
      Z   := T03 xor T06;
      T08 := X2  or z;
      T09 := X3  or T05;
      T10 := X0  xor T08;
      T11 := T04 and z;
      X   := T09 xor T10;
      T13 := X1  xor x;
      T14 := T01 xor x;
      T15 := X2  xor T05;
      T16 := T11 or T13;
      T17 := T02 or T14;
      W   := T15 xor T17;
      Y   := X0  xor T16;
    end if;
    X0 := W;
    X1 := X;
    X2 := Y;
    X3 := Z;
  end S;


  -- Inverse Sbox function

  procedure SI (R : Integer; X0, X1, X2, X3 : in out Unsigned_32) is
      T01, T02, T03, T04, T05, T06, T07, T08, T09,
      T10, T11, T12, T13, T14, T15, T16, T17, T18 : Unsigned_32;
      W, X, Y, Z : Unsigned_32;
  begin
    if R = 0 then
      -- InvS0:  13  3 11  0 10  6  5 12  1 14  4  7 15  9  8  2
      -- depth = 8,4,3,6, Total gates=19
      T01 := X2  xor X3;
      T02 := X0  or X1;
      T03 := X1  or X2;
      T04 := X2  and T01;
      T05 := T02 xor T01;
      T06 := X0  or T04;
      Y   :=     not T05;
      T08 := X1  xor X3;
      T09 := T03 and T08;
      T10 := X3  or y;
      X   := T09 xor T06;
      T12 := X0  or T05;
      T13 := X   xor T12;
      T14 := T03 xor T10;
      T15 := X0  xor X2;
      Z   := T14 xor T13;
      T17 := T05 and T13;
      T18 := T14 or T17;
      W   := T15 xor T18;
    elsif R = 1 then
      -- InvS1:   5  8  2 14 15  6 12  3 11  4  7  9  1 13 10  0
      -- depth = 7,4,5,3, Total gates=18
      T01 := X0  xor X1;
      T02 := X1  or X3;
      T03 := X0  and X2;
      T04 := X2  xor T02;
      T05 := X0  or T04;
      T06 := T01 and T05;
      T07 := X3  or T03;
      T08 := X1  xor T06;
      T09 := T07 xor T06;
      T10 := T04 or T03;
      T11 := X3  and T08;
      Y   :=     not T09;
      X   := T10 xor T11;
      T14 := X0  or y;
      T15 := T06 xor x;
      Z   := T01 xor T04;
      T17 := X2  xor T15;
      W   := T14 xor T17;
    elsif R = 2 then
      -- InvS2:  12  9 15  4 11 14  1  2  0  3  6 13  5  8 10  7
      -- depth = 3,6,8,3, Total gates=18
      T01 := X0  xor X3;
      T02 := X2  xor X3;
      T03 := X0  and X2;
      T04 := X1  or T02;
      W   := T01 xor T04;
      T06 := X0  or X2;
      T07 := X3  or w;
      T08 :=     not X3;
      T09 := X1  and T06;
      T10 := T08 or T03;
      T11 := X1  and T07;
      T12 := T06 and T02;
      Z   := T09 xor T10;
      X   := T12 xor T11;
      T15 := X2  and z;
      T16 := W   xor x;
      T17 := T10 xor T15;
      Y   := T16 xor T17;
    elsif R = 3 then
      -- InvS3:   0  9 10  7 11 14  6 13  3  5 12  2  4  8 15  1
      -- depth = 3,6,4,4, Total gates=17
      T01 := X2  or X3;
      T02 := X0  or X3;
      T03 := X2  xor T02;
      T04 := X1  xor T02;
      T05 := X0  xor X3;
      T06 := T04 and T03;
      T07 := X1  and T01;
      Y   := T05 xor T06;
      T09 := X0  xor T03;
      W   := T07 xor T03;
      T11 := W   or T05;
      T12 := T09 and T11;
      T13 := X0  and y;
      T14 := T01 xor T05;
      X   := X1  xor T12;
      T16 := X1  or T13;
      Z   := T14 xor T16;
    elsif R = 4 then
      -- InvS4:   5  0  8  3 10  9  7 14  2 12 11  6  4 15 13  1
      -- depth = 6,4,7,3, Total gates=17
      T01 := X1  or X3;
      T02 := X2  or X3;
      T03 := X0  and T01;
      T04 := X1  xor T02;
      T05 := X2  xor X3;
      T06 :=     not T03;
      T07 := X0  and T04;
      X   := T05 xor T07;
      T09 := X   or T06;
      T10 := X0  xor T07;
      T11 := T01 xor T09;
      T12 := X3  xor T04;
      T13 := X2  or T10;
      Z   := T03 xor T12;
      T15 := X0  xor T04;
      Y   := T11 xor T13;
      W   := T15 xor T09;
    elsif R = 5 then
      -- InvS5:   8 15  2  9  4  1 13 14 11  6  5  3  7 12 10  0
      -- depth = 4,6,9,7, Total gates=17
      T01 := X0  and X3;
      T02 := X2  xor T01;
      T03 := X0  xor X3;
      T04 := X1  and T02;
      T05 := X0  and X2;
      W   := T03 xor T04;
      T07 := X0  and w;
      T08 := T01 xor w;
      T09 := X1  or T05;
      T10 :=     not X1;
      X   := T08 xor T09;
      T12 := T10 or T07;
      T13 := W   or x;
      Z   := T02 xor T12;
      T15 := T02 xor T13;
      T16 := X1  xor X3;
      Y   := T16 xor T15;
    elsif R = 6 then
      -- InvS6:  15 10  1 13  5  3  6  0  4  9 14  7  2 12  8 11
      -- depth = 5,3,8,6, Total gates=19
      T01 := X0  xor X2;
      T02 :=     not X2;
      T03 := X1  and T01;
      T04 := X1  or T02;
      T05 := X3  or T03;
      T06 := X1  xor X3;
      T07 := X0  and T04;
      T08 := X0  or T02;
      T09 := T07 xor T05;
      X   := T06 xor T08;
      W   :=     not T09;
      T12 := X1  and w;
      T13 := T01 and T05;
      T14 := T01 xor T12;
      T15 := T07 xor T13;
      T16 := X3  or T02;
      T17 := X0  xor x;
      Z   := T17 xor T15;
      Y   := T16 xor T14;
    elsif R = 7 then
      -- InvS7:   3  0  6 13  9 14 15  8  5 12 11  7 10  1  4  2
      -- depth := 9,7,3,3, Total gates:=18
      T01 := X0  and X1;
      T02 := X0  or X1;
      T03 := X2  or T01;
      T04 := X3  and T02;
      Z   := T03 xor T04;
      T06 := X1  xor T04;
      T07 := X3  xor z;
      T08 :=     not T07;
      T09 := T06 or T08;
      T10 := X1  xor X3;
      T11 := X0  or X3;
      X   := X0  xor T09;
      T13 := X2  xor T06;
      T14 := X2  and T11;
      T15 := X3  or x;
      T16 := T01 or T10;
      W   := T13 xor T15;
      Y   := T14 xor T16;
    end if;
    X0 := W;
    X1 := X;
    X2 := Y;
    X3 := Z;
  end SI;


  -- Linear Transform

  procedure Tr (X0, X1, X2, X3 : in out Unsigned_32) is
  begin
    X0 := Rotate_Left(X0, 13);
    X2 := Rotate_Left(X2, 3);
    X1 := X1 xor X0 xor X2;
    X3 := X3 xor X2 xor Shift_Left(X0, 3);
    X1 := Rotate_Left(X1, 1);
    X3 := Rotate_Left(X3, 7);
    X0 := X0 xor X1 xor X3;
    X2 := X2 xor X3 xor Shift_Left(X1, 7);
    X0 := Rotate_Left(X0, 5);
    X2 := Rotate_Left(X2, 22);
  end Tr;


  -- Inverse Linear Transform

  procedure TrI (X0, X1, X2, X3 : in out Unsigned_32) is
  begin
    X2 := Rotate_Right(X2, 22);
    X0 := Rotate_Right(X0, 5);
    X2 := X2 xor X3 xor Shift_Left(X1, 7);
    X0 := X0 xor X1 xor X3;
    X3 := Rotate_Right(X3, 7);
    X1 := Rotate_Right(X1, 1);
    X3 := X3 xor X2 xor Shift_Left(X0, 3);
    X1 := X1 xor X0 xor X2;
    X2 := Rotate_Right(X2, 3);
    X0 := Rotate_Right(X0, 13);
  end TrI;


  procedure Keying (W : Key_Schedule;
                    R : Integer;
       X0, X1, X2, X3 : in out Unsigned_32) is
  begin
    X0 := X0 xor W(4*R);
    X1 := X1 xor W(4*R+1);
    X2 := X2 xor W(4*R+2);
    X3 := X3 xor W(4*R+3);
  end Keying;


  pragma Inline(S, SI, Tr, TrI, Keying);


  procedure Prepare_Key (K : in Key; W : out Key_Schedule) is
  begin
    for I in 0..7 loop
      W(-8+I) := Bytes_To_Word(K(4*I .. 4*I+3));
    end loop;
    for I in 0..131 loop
      W(I) := Rotate_Left(W(I-8) xor W(I-5) xor W(I-3) xor W(I-1) xor
              16#9e3779b9# xor Unsigned_32(I), 11);
    end loop;
    S(3, W(  0), W(  1), W(  2), W(  3));
    S(2, W(  4), W(  5), W(  6), W(  7));
    S(1, W(  8), W(  9), W( 10), W( 11));
    S(0, W( 12), W( 13), W( 14), W( 15));
    S(7, W( 16), W( 17), W( 18), W( 19));
    S(6, W( 20), W( 21), W( 22), W( 23));
    S(5, W( 24), W( 25), W( 26), W( 27));
    S(4, W( 28), W( 29), W( 30), W( 31));
    S(3, W( 32), W( 33), W( 34), W( 35));
    S(2, W( 36), W( 37), W( 38), W( 39));
    S(1, W( 40), W( 41), W( 42), W( 43));
    S(0, W( 44), W( 45), W( 46), W( 47));
    S(7, W( 48), W( 49), W( 50), W( 51));
    S(6, W( 52), W( 53), W( 54), W( 55));
    S(5, W( 56), W( 57), W( 58), W( 59));
    S(4, W( 60), W( 61), W( 62), W( 63));
    S(3, W( 64), W( 65), W( 66), W( 67));
    S(2, W( 68), W( 69), W( 70), W( 71));
    S(1, W( 72), W( 73), W( 74), W( 75));
    S(0, W( 76), W( 77), W( 78), W( 79));
    S(7, W( 80), W( 81), W( 82), W( 83));
    S(6, W( 84), W( 85), W( 86), W( 87));
    S(5, W( 88), W( 89), W( 90), W( 91));
    S(4, W( 92), W( 93), W( 94), W( 95));
    S(3, W( 96), W( 97), W( 98), W( 99));
    S(2, W(100), W(101), W(102), W(103));
    S(1, W(104), W(105), W(106), W(107));
    S(0, W(108), W(109), W(110), W(111));
    S(7, W(112), W(113), W(114), W(115));
    S(6, W(116), W(117), W(118), W(119));
    S(5, W(120), W(121), W(122), W(123));
    S(4, W(124), W(125), W(126), W(127));
    S(3, W(128), W(129), W(130), W(131));
  end Prepare_Key;


  procedure Encrypt (W : in Key_Schedule; Plaintext  :  in Block;
            Ciphertext : out Block) is
    X0, X1, X2, X3 : Unsigned_32;
  begin
    X0 := Bytes_To_Word(Plaintext( 0 ..  3));
    X1 := Bytes_To_Word(Plaintext( 4 ..  7));
    X2 := Bytes_To_Word(Plaintext( 8 .. 11));
    X3 := Bytes_To_Word(Plaintext(12 .. 15));

    Keying(W,  0, X0, X1, X2, X3); S(0, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  1, X0, X1, X2, X3); S(1, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  2, X0, X1, X2, X3); S(2, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  3, X0, X1, X2, X3); S(3, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  4, X0, X1, X2, X3); S(4, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  5, X0, X1, X2, X3); S(5, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  6, X0, X1, X2, X3); S(6, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  7, X0, X1, X2, X3); S(7, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  8, X0, X1, X2, X3); S(0, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W,  9, X0, X1, X2, X3); S(1, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 10, X0, X1, X2, X3); S(2, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 11, X0, X1, X2, X3); S(3, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 12, X0, X1, X2, X3); S(4, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 13, X0, X1, X2, X3); S(5, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 14, X0, X1, X2, X3); S(6, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 15, X0, X1, X2, X3); S(7, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 16, X0, X1, X2, X3); S(0, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 17, X0, X1, X2, X3); S(1, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 18, X0, X1, X2, X3); S(2, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 19, X0, X1, X2, X3); S(3, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 20, X0, X1, X2, X3); S(4, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 21, X0, X1, X2, X3); S(5, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 22, X0, X1, X2, X3); S(6, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 23, X0, X1, X2, X3); S(7, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 24, X0, X1, X2, X3); S(0, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 25, X0, X1, X2, X3); S(1, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 26, X0, X1, X2, X3); S(2, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 27, X0, X1, X2, X3); S(3, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 28, X0, X1, X2, X3); S(4, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 29, X0, X1, X2, X3); S(5, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 30, X0, X1, X2, X3); S(6, X0, X1, X2, X3); Tr(X0, X1, X2, X3);
    Keying(W, 31, X0, X1, X2, X3);
    S(7, X0, X1, X2, X3);
    Keying(W, 32, X0, X1, X2, X3);

    Ciphertext( 0 ..  3) := Word_To_Bytes(X0);
    Ciphertext( 4 ..  7) := Word_To_Bytes(X1);
    Ciphertext( 8 .. 11) := Word_To_Bytes(X2);
    Ciphertext(12 .. 15) := Word_To_Bytes(X3);
  end Encrypt;


  procedure Decrypt (W : in Key_Schedule; Ciphertext :  in Block;
            Plaintext  : out Block) is
    X0, X1, X2, X3 : Unsigned_32;
  begin
    X0 := Bytes_To_Word(Ciphertext( 0 ..  3));
    X1 := Bytes_To_Word(Ciphertext( 4 ..  7));
    X2 := Bytes_To_Word(Ciphertext( 8 .. 11));
    X3 := Bytes_To_Word(Ciphertext(12 .. 15));

    Keying(W, 32, X0, X1, X2, X3);
    SI(7, X0, X1, X2, X3);
    Keying(W, 31, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(6, X0, X1, X2, X3); Keying(W,30, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(5, X0, X1, X2, X3); Keying(W,29, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(4, X0, X1, X2, X3); Keying(W,28, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(3, X0, X1, X2, X3); Keying(W,27, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(2, X0, X1, X2, X3); Keying(W,26, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(1, X0, X1, X2, X3); Keying(W,25, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(0, X0, X1, X2, X3); Keying(W,24, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(7, X0, X1, X2, X3); Keying(W,23, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(6, X0, X1, X2, X3); Keying(W,22, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(5, X0, X1, X2, X3); Keying(W,21, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(4, X0, X1, X2, X3); Keying(W,20, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(3, X0, X1, X2, X3); Keying(W,19, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(2, X0, X1, X2, X3); Keying(W,18, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(1, X0, X1, X2, X3); Keying(W,17, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(0, X0, X1, X2, X3); Keying(W,16, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(7, X0, X1, X2, X3); Keying(W,15, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(6, X0, X1, X2, X3); Keying(W,14, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(5, X0, X1, X2, X3); Keying(W,13, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(4, X0, X1, X2, X3); Keying(W,12, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(3, X0, X1, X2, X3); Keying(W,11, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(2, X0, X1, X2, X3); Keying(W,10, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(1, X0, X1, X2, X3); Keying(W, 9, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(0, X0, X1, X2, X3); Keying(W, 8, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(7, X0, X1, X2, X3); Keying(W, 7, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(6, X0, X1, X2, X3); Keying(W, 6, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(5, X0, X1, X2, X3); Keying(W, 5, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(4, X0, X1, X2, X3); Keying(W, 4, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(3, X0, X1, X2, X3); Keying(W, 3, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(2, X0, X1, X2, X3); Keying(W, 2, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(1, X0, X1, X2, X3); Keying(W, 1, X0, X1, X2, X3);
    TrI(X0, X1, X2, X3); SI(0, X0, X1, X2, X3); Keying(W, 0, X0, X1, X2, X3);

    Plaintext( 0 ..  3) := Word_To_Bytes(X0);
    Plaintext( 4 ..  7) := Word_To_Bytes(X1);
    Plaintext( 8 .. 11) := Word_To_Bytes(X2);
    Plaintext(12 .. 15) := Word_To_Bytes(X3);
  end Decrypt;


  procedure Selftest is
    K     : Key    := (others => 0);
    P     : Block  := (others => 0);
    P2, C : Block;
    W     : Key_Schedule;
  begin
    for I in 1 .. 128 loop
      Prepare_Key(K, W);
      Encrypt(W, P, C);
      Decrypt(W, C, P2);
      if (P2 /= P) then
        raise Implementation_Error;
      end if;
      P           := K( 0 .. 15);
      K( 0 .. 15) := K(16 .. 31);
      K(16 .. 31) := C;
    end loop;
    if C /= (16#A2#, 16#46#, 16#AB#, 16#69#, 16#0A#, 16#E6#, 16#8D#, 16#FB#,
             16#02#, 16#04#, 16#CB#, 16#E2#, 16#8E#, 16#D8#, 16#EB#, 16#7A#)
      then
      raise Implementation_Error;
    end if;
  end Selftest;

end SMG_Serpent;

The Selftest procedure above runs one single test for Serpent encrypt/decrypt, failing with an “Implementation_Error” if the result is not as expected. I added a bigger set of tests on all the test vectors and cases publicly available as part of the NESSIE project. The eucrypt/smg_serpent/tests/smg_serpent_tests.gpr can be used with gprbuild to compile the code for the automated tests:

 -- Tests for SMG_Serpent (part of EuCrypt)
 -- S.MG, 2018

project SMG_Serpent_Tests is
  for Source_Dirs use (".", "../src");
  for Object_Dir use "obj";
  for Exec_Dir use ".";

  for Main use ("testall.adb");
end SMG_Serpent_Tests;

EuCrypt/smg_serpent/tests/test_serpent.ads defines a package Test_Serpent that includes a procedure for running one single hard-coded test (similar to Selftest in Serpent’s own code) and another procedure for running all tests on all test vectors and cases from a given file:

-- S.MG, 2018

with Ada.Strings.Fixed;
use Ada.Strings.Fixed;

package Test_Serpent is
	procedure test_from_file(filename: String);
	procedure test_one;
end Test_Serpent;

The implementation of those two testing procedures is in eucrypt/smg_serpent/tests/test_serpent.adb, where a lot of space is taken mainly by reading and interpreting the plain-text NESSIE format of the test vectors:

 -- S.MG, 2018
 -- Testing of Serpent implementation using Nessie-format test vectors

with SMG_Serpent; use SMG_Serpent;

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Command_Line; use Ada.Command_Line; -- set exit status on fail
with Interfaces; use Interfaces; -- unsigned_8

package body Test_Serpent is
  Test_Fail : exception;  -- raised if a test fails

  procedure test_from_file (filename: String) is
    file     : FILE_TYPE;
    keylen   : constant := 256;
    blocklen : constant := 128;
    octets   : constant := 16;
    K        : Key;
    P, P2    : Block;	--plain text
    C, C2    : Block;	--cipher (encrypted) text
    times100 : Block;	--value after 100 iterations
    times1k  : Block;	--value after 1000 iterations
    W        : Key_Schedule;
    Test_No  : Positive := 1;
  begin
    begin
      open(file, In_File, filename);
      exception
      when others =>
        Put_Line(Standard_Error, "Can not open the file '" & filename &
                                 "'. Does it exist?");
        Set_Exit_Status(Failure);
        return;
    end;

    loop
      declare
        Line1      : String := Get_Line(file);
        Line2      : String := Line1;
        key1, key2 : String( 1..octets*2 );
        len        : Natural := 0;
      begin
        --check if this is test data of any known kind
        if index( Line1, "key=", 1 ) > 0 then
          Line2 := Get_Line( file );
          key1  := Tail( Line1, octets*2  );
          key2  := Tail( Line2, octets*2 );
          for jj in 1..octets loop
            K(jj-1) := Unsigned_8'Value("16#" &
                                        key1( ( jj - 1 ) * 2 + 1 .. jj * 2 ) &
                                        "#");
            K( jj + octets - 1 ) :=
                       Unsigned_8'Value("16#" &
                                        key2( ( jj - 1 ) * 2 + 1 .. jj * 2 ) &
                                        "#");
          end loop;

        elsif index( Line1, "plain=", 1 ) > 0 then
          key1 := Tail( Line1, octets * 2 );
          for jj in 1..octets loop
            P(jj-1) := Unsigned_8'Value("16#" &
                                        key1( ( jj - 1 ) * 2 + 1 .. jj * 2 ) &
                                        "#");
          end loop;
        elsif index( Line1, "cipher=", 1 ) > 0 then
          key1 := Tail( Line1, octets * 2 );
          for jj in 1..octets loop
            C(jj-1) := Unsigned_8'Value("16#" &
                                        key1( ( jj - 1 ) * 2 + 1 .. jj * 2) &
                                        "#");
          end loop;
        elsif index( Line1, "100 times=", 1 ) > 0 then
          key1 := Tail( Line1, octets * 2 );
          for jj in 1..octets loop
            times100(jj-1) :=
                       Unsigned_8'Value("16#" &
                                        key1( ( jj - 1 ) * 2 + 1 .. jj * 2 ) &
                                        "#");
          end loop;
        elsif index( Line1, "1000 times=", 1 ) > 0 then
          key1 := Tail( Line1, octets * 2 );
          for jj in 1..octets loop
            times1k(jj-1) :=
                       Unsigned_8'value("16#" &
                                        key1( ( jj - 1 ) * 2 + 1 .. jj * 2 ) &
                                        "#");
          end loop;
          --at this stage we should have ALL needed, so run test
          Put("-----Test " & Positive'Image(Test_No) & ": encryption...");
          Prepare_Key(K, W);
          Encrypt(W, P, C2);
          if C2 /= C then
            raise Test_Fail;
          else
            Put_Line("Passed-----");
          end if;
          Put("-----Test " & Positive'Image(Test_No) & ": decryption...");
          Decrypt(W, C2, P2);
          if P /= P2 then
            raise Test_Fail;
          else
            Put_Line("Passed-----");
          end if;

          Put("-----Test " & Positive'Image(Test_No) & ": 100 iterations...");
          for jj in 1 .. 100 loop
            Encrypt(W, P, C2);
            Decrypt(W, C2, P2);
            if (P2 /= P) then
              raise Test_Fail;
            end if;
            P := C2;
          end loop;
          Put_Line("Passed-----");

          Put("-----Test " & Positive'Image(Test_No) & ": 1000 iterations...");

          for jj in 1 .. 900 loop
            Encrypt(W, P, C2);
            Decrypt(W, C2, P2);
            if (P2 /= P) then
              raise Test_Fail;
            end if;
            P := C2;
          end loop;
          Put_Line("Passed-----");
          Test_No := Test_No + 1;
        end if;
        exit when End_Of_File(file);
      end;
    end loop;
    Close(file);
  end test_from_file;

  procedure test_one is
    K: Key;
    P, P2: Block;
    C: Block;
    W: Key_Schedule;
  begin
    K := (16#80#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
          16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
          16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
          16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#);

    P := (16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
          16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#);

    SMG_Serpent.Prepare_Key(K, W);
    Encrypt(W, P, C);
    if C /= (16#A2#, 16#23#, 16#AA#, 16#12#, 16#88#, 16#46#, 16#3C#, 16#0E#,
             16#2B#, 16#E3#, 16#8E#, 16#BD#, 16#82#, 16#56#, 16#16#, 16#C0#)
      then
      raise Test_Fail;
    end if;

    for I in 1 .. 100 loop
      Encrypt(W, P, C);
      Decrypt(W, C, P2);
      if (P2 /= P) then
        raise Test_Fail;
      end if;
      P := C;
    end loop;

    if C /= (16#73#, 16#9E#, 16#01#, 16#48#, 16#97#, 16#1F#, 16#D9#, 16#75#,
             16#B5#, 16#85#, 16#EA#, 16#FD#, 16#BD#, 16#65#, 16#9E#, 16#2C#)
      then
      raise Test_Fail;
    end if;

    for I in 1 .. 900 loop
      Encrypt(W, P, C);
      Decrypt(W, C, P2);
      if (P2 /= P) then
        raise Test_Fail;
      end if;
      P := C;
    end loop;

    if C /= (16#BE#, 16#FD#, 16#00#, 16#E0#, 16#D6#, 16#E2#, 16#7E#, 16#56#,
             16#95#, 16#1D#, 16#C6#, 16#61#, 16#44#, 16#40#, 16#D2#, 16#86#)
      then
      raise Test_Fail;
    else
      Put_Line("PASSED: test single case.");
    end if;

  end test_one;

end Test_Serpent;

To put everything together, the calls to the two testing procedures are in eucrypt/smg_serpent/tests/test_serpent.ads:

-- S.MG, 2018

with Ada.Strings.Fixed;
use Ada.Strings.Fixed;

package Test_Serpent is
	procedure test_from_file(filename: String);
	procedure test_one;
end Test_Serpent;

The test vectors from the NESSIE project are also provided in eucrypt/smg_serpent/tests/nessie_vectors.txt but I won’t paste them here directly seeing how the file has 10309 lines.

For the actual .vpatch this time I even had a choice of vdiff tools to use, since phf conveniently just published the first part of his work on vtools. I can therefore happily report that his patches press fine and his resulting vdiff worked on this chapter’s files all right as far as I can see. A comparison of the .vpatch obtained with the old vdiff vs the .vpatch obtained with phf’s vdiff reveals (as phf has already noted) that there a differences only with respect to the order in which the files are considered. For the curious reader, here is the diff of the two vpatches, obtained with “diff eucrypt_ch11_serpent.vpatch test.vpatch” where test.vpatch is the result of running phf’s vdiff while eucrypt_ch11_serpent.vpatch is the result of running the old vdiff: diff_order.txt. In the interest of consistency, I’ll publish this .vpatch as obtained with the same .vdiff as all other EuCrypt vpatches, but as EuCrypt will soon end, I’ll gladly move on to using phf’s vdiff for future projects.

As usual, the .vpatch for this chapter, together with my signature for it can be found on my Reference Code Shelf as well as by following the direct links copied here for your convenience:

In the next chapter I’ll finally bring everything together through a .vpatch that should (among other things) provide a way to compile the whole EuCrypt as one single aggregate library.

  1. You might call this “hashing” or “encryption/decryption” or even voodoo. After all this work on EuCrypt, I am increasingly drawn towards calling it bit byte diddling – fits better both the actual happenings and the level of proof as to what exactly is achieved.[]
  2. I strongly recommend using Adacore’s gprbuild as opposed to the gnu/gcc gprbuild. []
  3. X(3), X(2), X(1), X(0[]

February 20, 2018

EuCrypt: Correcting an Error in OAEP Check

Filed under: EuCrypt — Diana Coman @ 3:56 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

As I was (re)reading today through my own OAEP with Keccak code from the previous chapter, I found an error 1. And while I take all the blame for it being there in the first place and for it remaining there for all of 5 days until today, I have to take also all the credit for finding it and fixing it. There is power in all the numbers, including one, apparently. Without further talk, let’s see the error itself, in eucrypt/smg_keccak/smg_oaep.adb:

MaxLen       : constant Natural := OAEP_LENGTH_OCTETS - 11;

That MaxLen is the maximum length of plain text that can fit in a single OAEP block (hence, the maximum number of characters that can be encrypted with a single call to OAEP_Encrypt and at the same time the maximum number of characters that OAEP_Decrypt can ever return as decrypted message). It is used later on in the OAEP_Decrypt procedure to do a basic validity check on the “decrypted” message: if the length as read from the decrypted OAEP block is higher than this MaxLen then there is clearly something wrong (corrupt/invalid OAEP block most likely) and the procedure simply sets the success flag to “False” without any further operations (avoiding thus among other things attempts to read outside the bounds of the arrays involved for instance). The error in the above line is the use of OAEP_LENGTH_OCTETS which stands for the length in octets of the whole, encrypted OAEP block. The correct calculation has to be made either with OAEP_LENGTH_OCTETS/2 or more directly with the actual constant defined precisely for this purpose and otherwise used correctly at OAEP_Encrypt: OAEP_HALF_OCTETS. So the correct line is:

MaxLen       : constant Natural := OAEP_HALF_OCTETS - 11;

Before actually making that change in code however, the very first thing to do in such a case is a… test that exposes this type of issue. As current tests really are absolutely minimal, there can’t be a better time than right now to add to them anyway so here are 2 additional tests that use OAEP_Decrypt on invalid input and, respectively, both encrypt and decrypt on an initial message longer than the maximum length (eucrypt/smg_keccak/tests/smg_keccak-test.adb):

    -- test decrypt on invalid (non-OAEP) string
    Flag := True;
    C := Encr( Encr'First );
    Encr( Encr'First ) := Character'Val( Character'Pos( C ) / 2 );
    Decr := ( others => ' ' );
    OAEP_Decrypt( Encr, Len, Decr, Flag );

    if Flag = True then
      Put_Line("FAILED: oaep test with invalid package");
    else
      Put_Line("PASSED: oaep test with invalid package");
    end if;

    -- test encrypt on message longer than maximum payload (1096 bits)
    Flag := False;
    Len := 0;
    LongMsg( 1..Msg'Length ) := Msg;
    Encr := ( others => '.' );
    OAEP_Encrypt( LongMsg, Entropy, Encr);
    OAEP_Decrypt( Encr, Len, Decr, Flag);

    if Flag = False or
       Len /= MaxLen * 8 or
       Decr( Decr'First .. Decr'First + Len / 8 - 1 ) /=
             LongMsg( LongMsg'First..LongMsg'First + MaxLen - 1 )
       then
      Put_Line("FAILED: oaep test with too long message");
      Put_Line("Msg is: "  & LongMsg);
      Put_Line("Decr is: " & Decr);
      Put_Line("Flag is: " & Boolean'Image( Flag ) );
      Put_Line("Len is: "  & Natural'Image( Len ) );
    else
      Put_Line("PASSED: oaep test with too long message");
    end if;

The first test basically messes one character from a valid OAEP-encrypted string and then tries to pass the result on to the OAEP_Decrypt procedure. The test fails if the success flag is set to true (since that means OAEP_Decrypt reports success on an invalid packet). The second test provides an initial plain-text message longer than the maximum length and checks that OAEP uses indeed only the first MaxLen bits ignoring the rest, as stated in the procedure’s own description of behaviour (see the relevant comments in the code).

Running the above tests with the original code results, as expected, in trouble. However, having used the very reliable Ada language means that the code fails very clearly and obviously: as the length is greater than the available space, the boundary checks fail and the execution is aborted. Once the error is corrected, the code recompiled and the tests run again, the execution proceeds correctly and all the tests (new and old) pass as expected.

Digging a bit deeper into this reveals that there is scope for further improving the code itself to limit the opportunity for such mistake in the future: the MaxLen value is in fact a constant shared by all the OAEP procedures. Consequently, MaxLen should be MAX_LEN_MSG, defined right under OAEP_LENGTH_OCTETS and the others, together with its supporting “TMSR” string, like this:

  TMSR               : constant String := "TMSR-RSA";
  MAX_LEN_MSG        : constant := OAEP_HALF_OCTETS - TMSR'Length - 3;

And since I’m doing some refactoring essentially, I further take this chance to also remove a few prints (Put_Line in Ada’s terms) from the oaep tests: the reason for removing them is that they are not of much help when the tests pass anyway (and if the tests fail you’ll need to dig deeper than looking at those prints anyway) and moreover, they can mess up your console since they effectively print as characters the gobbledygook resulting from OAEP encrypt. When all is finished, it’s time to run all tests again and make sure they all pass.

The .vpatch for all the above changes and refactoring (including updating comments as needed to reflect the removal of MaxLen and the new MAX_LEN_MSG), together with my signature for it will live like all the other EuCrypt .vpatches on my Reference Code Shelf. For your convenience, I link here as well the full .vpatch and my signature for it:

As a fitting ending to this unexpected but necessary post in the EuCrypt series, I’ll just link here for comparison the story of the last time I had to correct an error as part of this EuCrypt series: the MPI error that survived for years and was finally uncovered hidden under the carpet. I’ll let my readers compare the respective lives of the two errors and the three involved corrections. Meanwhile, I’ll just get back to work on the next EuCrypt chapter that will be published as usual, on Thursday.

  1. No, not a “bug”, not a “silly mistake” nor anything else but exactly what it is: an error. My error, too. []

February 15, 2018

EuCrypt Chapter 10: OAEP with Keccak a la TMSR

Filed under: EuCrypt — Diana Coman @ 9:38 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

As I finally have both Keccak at bit-level (reference version) and Keccak at word-level (working horse version because reality bytes), the next step is to implement the TMSR OAEP (optimal asymmetric encryption padding) of a message. The OAEP in there stands indeed for the original approach by Bellare and Rogaway 1, while the TMSR in there stands – as usual – for an honest discussion of issues encountered and decisions made as well as a better approach to very pointy and very real problems (not a perfect approach, sure; just a few degrees of magnitude better than what one finds for instance in GnuPG).

Before proceeding to the actual code, note that the “padding” term in OAEP is rather misleading with respect to the goal of the code in this chapter: while some actual padding is involved indeed, the whole process is best thought of as a type of encryption really. So EuCrypt Chapter 10 provides the implementation of an OAEP-based encryption / decryption scheme, a la TMSR, using Keccak as the underlying hash function. Specifically:

  • OAEP Encryption: result will *always* be a block of 4096 bits (512 octets). Each such resulting block can hold at most 1960 bits (245 octets) of the original message. Longer messages will simply have to be split into blocks of 1960 bits and then passed to the OAEP encryption procedure.

    • Step 1: [random octet][size1][size2][‘T’][‘M’][‘S’][‘R’][‘-‘][‘R’][‘S’][‘A’][random octet padding]*M

      The original message M is encapsulated in a block of 256 octets that has the following format: [random octet][size1][size2][‘T’][‘M’][‘S’][‘R’][‘-‘][‘R’][‘S’][‘A’][random octet padding]*M . Essentially the block starts with a random octet followed by 2 octets that hold the actual length (in bits) of M, followed by 8 reserved octets, followed by as many (or as few, potentially none) random octets as needed to pad M to the maximum size of 245 octets, followed by M itself. Whenever M has already the full length of 245 octets, there will be no padding at all. Whenever M is shorter than 245 octets, the remaining octets up to 245 will contain random bits and will be placed before M. 2. The reserved octets are shown as “TMSR-RSA” currently – the values stored there however are not part of the standard so any implementation can store anything else, random bits included.

      This approach to encapsulating the original message M in a clear format that however does *not* use fixed values has significant benefits over the current approach used by GnuPG: first, the format is more flexible and easy to expand than any inband approach; second, the use of random (true random as well, NOT pseudo-random) rather than fixed-value padding fits the purpose: the whole point is to introduce entropy, not to take it away by having fixed values that even end up in predictable, fixed output (just go ahead and use GnuPG to encrypt various messages with various keys, compare the results and you’ll see what I mean). And note the improvement here: there are still some fixed bits indeed 3 but they are very, very few compared to quite anything else in common use at the moment.

    • Step 2: A block R of 2048 bits is filled with random bits.
    • Step 3: X = M00 xor hash(R)

      A block X of 2048 bits is calculated as X = M00 xor hash(R), where M00 is the block of 2048 bits from step 1, R is the block of 256 octets from step 2 and hash is the Keccak f-1600 permutation (aka the Keccak sponge with an internal state of 1600 bits as implemented in EuCrypt) used as a hash with the default bitrate (currently 1344 bits) and the output length set at 2048 bits.
    • Step 4: Y = R xor hash(X)

      A block Y of 2048 bits is calculated as Y = R xor hash(X), where R, X and hash are as described in previous steps above.
    • Step 5: Result = X || Y

      The OAEP encrypted block of 4096 bits is obtained by concatenating the previous 2 parts: X || Y.
  • OAEP Decryption: the result should be the original message previously encrypted with OAEP. This is effectively the reverse operation:

    • Step 1: Obtain X and Y as the two halves of the input block of 4096 bits (NB: input block HAS TO be precisely 4096 bits long as otherwise it is not a valid OAEP-encrypted block and therefore the decryption can’t succeed.)
    • Step 2: R = Y xor hash(X)
    • Step 3: M00 = X xor hash(R)
    • Step 4: length of M is extracted as the value stored in bits 9 to 32 of M00 while M itself is extracted as the corresponding last bits at the end of M00.

The implementation for the above is done in a separate package that lives at the moment in eucrypt/smg_keccak/ (smg_oaep.ads smg_oaep.adb). This place for smg_oaep reflects the fact that the word-level implementation of keccak is the everyday workhorse for Eulora but note that the TMSR OAEP implementation itself is not dependent in any way on a specific Keccak implementation. The decision to implement this in Ada rather than C fits of course the long-term preference for Ada as main programming language for S.MG but it turned out it also fits very, very well my own personal preference (despite the fact that I really have very little experience with Ada and way more with C) 4: while I *did* implement the thing in C as well, it took me twice as long and it was 10 times more painful (and it’s still less clear to read in any case). Despite those huge advantages of Ada, legacy code in C (such as mpi) means that the smg_keccak library will have to communicate with C code, inevitably. The first steps towards that are included in this chapter as I add a hash function with C-style char* parameters so that C code can directly call Keccak for any hashing needs. This is how it looks like (in eucrypt/smg_keccak/oaep.ads):

  -- wrapper for calling from C
  -- @param Input the input string, as array of characters (C style)
  -- @param LenIn the length of the input string (as number of BITS)
  -- @param LenOut the desired number of bits to be returned as output
  -- @param Block_Len the bitrate used by the Keccak sponge (number of BITS)
  -- @return an array of characters with first LenOut bits set to Keccak output

  -- NB: caller HAS TO provide the length of the Input (parameter LenIn)
  -- NB: caller HAS TO provide the length of the Output (parameter LenOut)
  function Hash( Input     : Interfaces.C.Char_Array;
                 LenIn     : Interfaces.C.size_t;
                 LenOut    : Interfaces.C.size_t;
                 Block_Len : Interfaces.C.int := Default_Bitrate)
                 return Interfaces.C.Char_Array;
  pragma Export( C, Hash, "hash" );

One observation regarding this C-style Hash function: it will generate warnings at compilation due to the fact that Ada uses sane strings while C uses the “null-terminated string” approach 5. For once (it’s very rare indeed) however I’ll live with those warnings: the LenIn and LenOut parameters are there precisely to specify the length and therefore avoid going outside the bounds of the allocated memory for that string.

The smg_oaep.ads file also defines the relevant constants and subtypes for OAEP encoding, the oaep_encrypt and oaep_decrypt procedures, a helper procedure xor_strings, conversion methods from string to bitstream as well as from bitstream to string and a wrapper for the Keccak sponge function so that hash can be called directly with string input and output (the wrapper converts to bitstream and back as relevant). While those conversions have a cost of course, it is unclear at this stage that this cost is indeed problematic for Eulora’s needs. Consequently, any potential optimisations here (and in the whole EuCrypt for that matter) are an issue left for a later time. The full content of eucrypt/smg_keccak/smg_oaep.ads:

-- Implementation of TMSR's OAEP with Keccak as hash function
--
-- S.MG, 2018

with SMG_Keccak; use SMG_Keccak; -- Keccak is used as hash function
with Interfaces; use Interfaces; -- for Unsigned_8 type and bit-level ops
with Interfaces.C; use Interfaces.C; -- for interop with C

package SMG_OAEP is
  pragma Pure( SMG_OAEP ); -- stateless, no side effects -> can cache calls

  -- fixed length of OAEP block in bits and in octets
  OAEP_LENGTH_BITS   : constant := 4096;
  OAEP_LENGTH_OCTETS : constant := 512;
  OAEP_HALF_OCTETS   : constant := OAEP_LENGTH_OCTETS / 2;

  -- subtypes used by the OAEP encrypt/decrypt
  subtype OAEP_Block is String( 1 .. OAEP_LENGTH_OCTETS );
  subtype OAEP_HALF is String( 1 .. OAEP_HALF_OCTETS );

  -- padding & formatting of maximum 1960 bits of the given String
  -- uses TMSR's OAEP schema:
  -- 1.format M00 as: [random octet][sz1][sz2]"TMSR-RSA"[random]*Message
  --    where sz1 and sz2 store the length of the message in bits
  --    the random octets before message are padding to make OAEP_LENGTH_OCTETS
  -- 2. R = OAEP_HALF_OCTETS random bits
  -- 3. X = M00 xor hash(R)
  -- 4. Y = R xor hash(X)
  -- 5. Result is X || Y
  -- NB: the Entropy parameter should be random octets from which this method
  -- will use as many as required for the OAEP encryption of given Msg
  -- NB: at MOST OAEP_LENGTH_OCTETS - 11 octets of Msg! (Msg at most 1960 bits)
  procedure OAEP_Encrypt( Msg     : in String;
                          Entropy : in OAEP_Block;
                          Output  : out OAEP_Block);

  -- This is the opposite of OAEP_Encrypt above.
  -- @param Encr - an OAEP block previously obtained from OAEP_Encrypt
  -- @param Len - this will hold the length of the obtained message (in bits!)
  -- @param Output - the first Len octets of this are the recovered message
  -- @param Success - set to TRUE if message was recovered, false otherwise
  -- NB: when Success is FALSE, both Len and Output have undefined values
  procedure OAEP_Decrypt( Encr    : in OAEP_Block;
                          Len     : out Natural;
                          Output  : out OAEP_HALF;
                          Success : out Boolean);

  -- helper method, xor on strings
  -- NB: only Output'Length bits will be considered from S1 and S2
  -- NB: caller is responsible for S1 and S2 being long enough!
  procedure XOR_Strings( S1: in String; S2: in String; Output: out String );

  -- gnat-specific methods for bit-level operations
	function Shift_Right( Value  : Unsigned_8;
                        Amount : Natural )
                        return Unsigned_8;
  pragma Import(Intrinsic, Shift_Right);

	function Shift_Left( Value  : Unsigned_8;
                        Amount : Natural )
                        return Unsigned_8;
  pragma Import(Intrinsic, Shift_Left);

  -- conversions between bitstream and string
  -- NB: caller has to ensure correct size of output parameter! no checks here.
  procedure ToString( B: in Bitstream; S: out String );
  procedure ToBitstream( S: in String; B: out Bitstream );

  -- public wrapper for Sponge to use String for input/output
  procedure HashKeccak( Input     : in String;
                        Output    : out String;
                        Block_Len : in Keccak_Rate := Default_Bitrate);

  -- wrapper for calling from C
  -- @param Input the input string, as array of characters (C style)
  -- @param LenIn the length of the input string (as number of BITS)
  -- @param LenOut the desired number of bits to be returned as output
  -- @param Block_Len the bitrate used by the Keccak sponge (number of BITS)
  -- @return an array of characters with first LenOut bits set to Keccak output

  -- NB: caller HAS TO provide the length of the Input (parameter LenIn)
  -- NB: caller HAS TO provide the length of the Output (parameter LenOut)
  function Hash( Input     : Interfaces.C.Char_Array;
                 LenIn     : Interfaces.C.size_t;
                 LenOut    : Interfaces.C.size_t;
                 Block_Len : Interfaces.C.int := Default_Bitrate)
                 return Interfaces.C.Char_Array;
  pragma Export( C, Hash, "hash" );

end SMG_OAEP;

The corresponding implementation of those methods in eucrypt/smg_keccak/smg_oaep.adb follows closely the steps discussed above (and clearly stated in comments throughout the code as well):

-- S.MG, 2018

package body SMG_OAEP is

  procedure HashKeccak( Input     : in String;
                        Output    : out String;
                        Block_Len : in Keccak_Rate := Default_Bitrate) is
    BIn  : Bitstream( 0 .. Input'Length * 8 - 1 );
    BOut : Bitstream( 0 .. Output'Length * 8 - 1 );
  begin
    ToBitstream( Input, BIn);
    Sponge( BIn, BOut, Block_Len);
    ToString( BOut, Output );
  end HashKeccak;

  function Hash( Input     : Interfaces.C.Char_Array;
                 LenIn     : Interfaces.C.size_t;
                 LenOut    : Interfaces.C.size_t;
                 Block_Len : Interfaces.C.int := Default_Bitrate)
                 return Interfaces.C.Char_Array is
    AdaLenIn  : Natural := Natural(LenIn);
    AdaLenOut : Natural := Natural(LenOut);
    InStr     : String( 0 .. AdaLenIn-1 )  := (others => '0');
    OutStr    : String( 0 .. AdaLenOut-1 ) := (others => '0');
    COut      : Interfaces.C.Char_Array( 0 .. LenOut-1 );
    Count     : Natural := AdaLenOut;
    CCount    : Interfaces.C.size_t := LenOut;
  begin
    Interfaces.C.To_Ada( Input, InStr, AdaLenIn );
    HashKeccak( InStr, OutStr, Keccak_Rate(Block_Len) );
    Interfaces.C.To_C( OutStr, COut, CCount );
    return COut;
  end Hash;

  -- conversion between types
  procedure ToString(B: in Bitstream; S: out String ) is
    N   : Natural;
    Pos : Natural;
  begin
    Pos := B'First;
    for I in S'Range loop
      N := Natural( B( Pos     ) ) +
           Natural( B( Pos + 1 ) ) * 2 +
           Natural( B( Pos + 2 ) ) * 4 +
           Natural( B( Pos + 3 ) ) * 8 +
           Natural( B( Pos + 4 ) ) * 16 +
           Natural( B( Pos + 5 ) ) * 32 +
           Natural( B( Pos + 6 ) ) * 64 +
           Natural( B( Pos + 7 ) ) * 128;
      Pos := Pos + 8;
      S( I ) := Character'Val( N );
    end loop;
  end ToString;

  procedure ToBitstream(S: in String; B: out Bitstream ) is
    V   : Unsigned_8;
    Pos : Natural;
  begin
    Pos := B'First;
    for C of S loop
      V := Character'Pos( C );
      B( Pos     ) := Bit( V and 1 );
      B( Pos + 1 ) := Bit( Shift_Right( V, 1 ) and 1 );
      B( Pos + 2 ) := Bit( Shift_Right( V, 2 ) and 1 );
      B( Pos + 3 ) := Bit( Shift_Right( V, 3 ) and 1 );
      B( Pos + 4 ) := Bit( Shift_Right( V, 4 ) and 1 );
      B( Pos + 5 ) := Bit( Shift_Right( V, 5 ) and 1 );
      B( Pos + 6 ) := Bit( Shift_Right( V, 6 ) and 1 );
      B( Pos + 7 ) := Bit( Shift_Right( V, 7 ) and 1 );

      Pos := Pos + 8;
    end loop;
  end ToBitstream;

  -- padding & formatting of maximum 1960 bits of the given String
  -- uses TMSR's OAEP schema:
  -- 1.format M00 as: [random octet][sz1][sz2]"TMSR-RSA"[random]*Message
  --    where sz1 and sz2 store the length of the message in bits
  --    the random octets before message are padding to make OAEP_LENGTH_OCTETS
  -- 2. R = OAEP_HALF_OCTETS random bits
  -- 3. X = M00 xor hash(R)
  -- 4. Y = R xor hash(X)
  -- 5. Result is X || Y
  -- NB: the Entropy parameter should be random octets from which this method
  -- will use as many as required for the OAEP encryption of given Msg
  -- NB: at MOST OAEP_LENGTH_OCTETS - 11 octets of Msg! (Msg at most 1960 bits)
  procedure OAEP_Encrypt( Msg     : in String;
                          Entropy : in OAEP_Block;
                          Output  : out OAEP_Block) is
    M00    : OAEP_HALF;
    R      : OAEP_HALF;
    HashR  : OAEP_HALF;
    X      : OAEP_HALF;
    HashX  : OAEP_HALF;
    Y      : OAEP_HALF;
    MsgLen : Natural;
    MaxLen : Natural;
    PadLen : Natural;
    TMSR   : constant String := "TMSR-RSA";
  begin
    -- calculate maximum length of msg and needed amount of padding
    -- make sure also that only MaxLen octets at most are used from Msg
    MaxLen := OAEP_HALF_OCTETS - TMSR'Length - 3;  -- maximum msg that fits
    MsgLen := Msg'Length;                          -- real msg length
    if MsgLen > MaxLen then
      MsgLen := MaxLen;  --only first MaxLen octets will be considered
      PadLen := 0;       --no padding needed
    else
      PadLen := MaxLen - MsgLen; -- msg is potentially too short, add padding
    end if;

    -- step 1: header and format to obtain M00
      -- first octet is random bits
    M00( M00'First ) := Entropy( Entropy'First );

      -- next 2 octets hold the used length of Msg (number of octets)
    M00( M00'First + 2) := Character'Val( ( MsgLen * 8 ) mod 255 );
    M00( M00'First + 1) := Character'Val( ( (MsgLen * 8 ) / 255 ) mod 255 );

      -- next 8 octets are reserved for later use, currently "TMSR-RSA"
    M00( M00'First + 3 .. M00'First + 10 ) := TMSR;

      -- random bits for padding, if Msg is less than 245 octets
    for I in 1 .. PadLen loop
      M00( M00'First + 10 + I ) := Entropy( Entropy'First + I );
    end loop;

      -- the message itself
    M00( M00'Last - MsgLen + 1 .. M00'Last ) :=
                               Msg( Msg'First .. Msg'First + MsgLen - 1 );

    -- step 2: R = OAEP_HALF_OCTETS random octets
    -- can take LAST octets from given entropy as they are NOT used before
    -- (even if original message was empty, padding uses at most half - 10
    --   while entropy has full block length)
    R := Entropy( Entropy'Last - OAEP_HALF_OCTETS + 1 .. Entropy'Last );

    -- step 3: X = M00 xor hash(R)
    HashKeccak( R, HashR );
    XOR_Strings( M00, HashR, X );

    -- step 4: Y = R xor hash(X)
    HashKeccak( X, HashX );
    XOR_Strings( R, HashX, Y );

    -- step 5: Output is X || Y
    Output( Output'First .. Output'First + X'Length - 1 ) := X;
    Output( Output'Last - Y'Length + 1 .. Output'Last )   := Y;

  end OAEP_Encrypt;

  procedure OAEP_Decrypt( Encr    : in OAEP_Block;
                          Len     : out Natural;
                          Output  : out OAEP_HALF;
                          Success : out Boolean ) is
    X, Y, M, R   : OAEP_HALF;
    HashX, HashR : OAEP_HALF;
    MaxLen       : constant Natural := OAEP_LENGTH_OCTETS - 11;
    LenOctets    : Natural;
  begin
    -- step 1: separate X and Y
    X := Encr( Encr'First .. Encr'First + X'Length - 1 );
    Y := Encr( Encr'Last - Y'Length + 1 .. Encr'Last );

    -- step 2: R = Y xor hash(X)
    HashKeccak( X, HashX );
    XOR_Strings( Y, HashX, R );

    -- step 3: M = X xor hash(R)
    HashKeccak( R, HashR );
    XOR_Strings( X, HashR, M );

    -- step 4: extract length and message
    Len := Character'Pos( M( M'First + 1 ) ) * 255 +
           Character'Pos( M( M'First + 2 ) );
    LenOctets := Len / 8;

    if LenOctets > MaxLen or LenOctets < 0 then
      Success := False;  -- error, failed to retrieve message
    else
      Success := True;
      Output( Output'First .. Output'First + LenOctets - 1 ) :=
        M( M'Last - LenOctets + 1 .. M'Last );
    end if;

  end OAEP_Decrypt;

  -- helper method, xor on strings
  -- NB: only Output'Length bits will be considered from S1 and S2
  -- NB: caller is responsible for S1 and S2 being long enough!
  procedure XOR_Strings( S1: in String; S2: in String; Output: out String ) is
    V1, V2: Unsigned_8;
  begin
    for I in Output'Range loop
      V1 := Character'Pos( S1( I ) );
      V2 := Character'Pos( S2( I ) );
      Output( I ) := Character'Val( V1 xor V2 );
    end loop;
  end XOR_Strings;
end SMG_OAEP;

You might have noticed in the above that the OAEP_Encrypt method does not directly access a source of random bits. Instead, it simply relies on the caller to provide it with whatever random bits they want used (through the parameter Entropy). The main reason for this is the fact that access to a source of entropy is not in itself an OAEP concern and there is no reason for mixing it in here. Since the format used may require at the very most 4080 random bits 6, the size of the Entropy parameter (4096) covers all situations and allows the OAEP method to simply use the random bits as they are given and to use each bit only once (if it is used). Essentially the responsibility for good entropy rests with the caller of OAEP, as it should be: it's your tool, you can use it poorly to poor results, certainly.

The helper method XOR_Strings sticks out a bit to me like a sore thumb. On one hand, Ada correctly refuses to treat characters (hence, strings) directly as numbers. On the other hand however the task at hand does exactly that, no matter how much sugar-coating one may put on it: the whole encrypting/hashing/decrypting/padding goes from characters to their numerical representation and back. Consequently, at this stage at least, I made myself the tool I needed and that is that. Once it exists, it can be changed or abandoned later, such is life, but more importantly it can actually be... used first.

You might have noticed in the above the call to the Keccak sponge being slightly different (parameter order) than in the previous chapter. This is indeed the case and the reason for it is the fact that I've introduced a default bitrate for the sponge so that the caller can rely on this if they don't specify a bitrate. The code with this small change in eucrypt/smg_keccak/smg_keccak.ads:

  Default_Bitrate: constant := 1344; --max bits the sponge can eat/spit without
                                     --needing to scramble the state

...and the new signature for the Sponge:

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Output - a bitstream of desired size for holding output
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  procedure Sponge(Input      : in Bitstream;
                   Output     : out Bitstream;
                   Block_Len  : in Keccak_Rate := Default_Bitrate );

As usual, there are also tests for all the new methods (in eucrypt/smg_keccak/tests/smg_keccak-test.adb):

  procedure test_bitstream_conversion is
    S: String := "Aa*/";
    E: Bitstream( 0 .. 31 ) := (1, 0, 0, 0, 0, 0, 1, 0,
                                1, 0, 0, 0, 0, 1, 1, 0,
                                0, 1, 0, 1, 0, 1, 0, 0,
                                1, 1, 1, 1, 0, 1, 0, 0);
    B: Bitstream( 0 .. 31 );
    SS: String := "  t ";
  begin
    Put_Line("---Testing string to bitstream conversion---");
    ToBitstream( S, B );
    if E /= B then
      Put_Line("FAILED: string to bitstream conversion.");
    else
      Put_Line("PASSED: string to bitstream conversion.");
    end if;

    Put_Line("---Testing bitstream to string conversion---");
    ToString( B, SS );
    if SS /= S then
      Put_Line("FAILED: bitstream to string conversion");
      Put_Line("EXPECTED: " & S);
      Put_Line("OUTPUT: " & SS);
    else
      Put_Line("PASSED: bitstream to string conversion");
    end if;
  end test_bitstream_conversion;

  procedure test_hash_keccak is
    S: String := "X";
    O: String := "abc";
    B: Bitstream( 0 .. 23 );
    BB: Bitstream( 1.. 8):= (0, 0, 0, 1, 1, 0, 1, 0);
    Exp: Bitstream( 0 .. 23 ) := (1, 1, 1, 0, 0, 0, 0, 1,
                                  0, 1, 1, 0, 0, 0, 1, 0,
                                  1, 1, 1, 0, 0, 0, 1, 1);
  begin
    Put_Line("----Testing hash keccak on string " & S & "----");
    HashKeccak(S, O);
    Put_Line("OUTPUT: " & O);
    ToBitstream( O, B );
    if B /= Exp then
      Put_Line("FAILED: testing hash keccak on string");
      Put_Line("Output:");
      for I of B loop
        Put( Bit'Image( I ) );
      end loop;
      new_line(1);
      Put_Line("Expected:");
      for I of Exp loop
        Put( Bit'Image( I ) );
      end loop;
    else
      Put_Line("PASSED: testing hash keccak on string");
    end if;
    new_line(1);
  end test_hash_keccak;

  procedure test_xor_strings is
    S1     : String := "ABC";
    S2     : String := "CBA";
    Exp    : String := "...";
    Result : String := "...";
  begin
    Exp( Exp'First     ) := Character'Val( 2 );
    Exp( Exp'First + 1 ) := Character'Val( 0 );
    Exp( Exp'First + 2 ) := Character'Val( 2 );

    Put_Line("----Testing xor on strings---");
    XOR_Strings( S1, S2, Result);
    Put_Line("S1 is " & S1);
    Put_Line("S2 is " & S2);
    Put_Line("S1 xor S2 is " & Result);
    Put_Line("Result is: ");
    for C of Result loop
      Put( Natural'Image( Character'Pos( C ) ) );
    end loop;
    new_line(1);

    if Result /= Exp then
      Put_Line("FAILED: xor on strings");
    else
      Put_Line("PASSED: xor on strings");
    end if;
  end test_xor_strings;

  procedure test_oaep is
    Msg     : String := "abcdefghij jihgfedcba123456789";
    Encr    : OAEP_Block := ( others => ' ' );
    Decr    : OAEP_HALF  := ( others => ' ' );
    Entropy : OAEP_Block := ( others => 'e' );
    Len     : Natural;
    Flag    : Boolean;
  begin
    Put_Line("----Testing OAEP Encrypt----");
    OAEP_Encrypt( Msg, Entropy, Encr );

    Put_Line("----Testing OAEP Decrypt----");
    OAEP_Decrypt( Encr, Len, Decr, Flag );

    Put_Line("Msg is: "  & Msg);
    Put_Line("Encr is: " & Encr);
    Put_Line("Decr is: " & Decr);
    Put_Line("Flag is: " & Boolean'Image( Flag ) );
    Put_Line("Len is: "  & Natural'Image( Len ) );

    if Flag = False or
       Len /= Msg'Length * 8 or
       Decr( Decr'First .. Decr'First + Msg'Length - 1 ) /= Msg
       then
      Put_Line("FAILED: oaep test");
    else
      Put_Line("PASSED: oaep test");
    end if;

  end test_oaep;

Corresponding calls from the main function in eucrypt/smg_keccak/smg_keccak-test.adb:


  -- test bitstream conversion
  test_bitstream_conversion;

  -- test hash keccak (strings version)
  test_hash_keccak;

  -- test oaep encrypt + decrypt
  test_oaep;

  -- test xor on strings
  test_xor_strings;

The .vpatch for this chapter can be found on my Reference Code Shelf and is linked here too, for your convenience. UPDATE 1 October 2018 - added patch to fix the error of using 255 instead of 256 in the oaep part 7.:

  1. Bellare, M. and Rogaway, P., 1994. “Optimal Asymmetric Encryption – How to Encrypt with RSA”, in Advances in Cryptology – Eurocrypt 94 Proceedings, Lecture Notes in Computer Science Vol. 950, A. De Santis ed., Springer-Verlag.[]
  2. The reason for this, directly from S.MG boardroom discussions: “mircea_popescu: motivul are de-a face cu modelele de interpolare si cu faptul ca e in principiu mai simplu sa ghicesti msb decat lsb.”[]
  3. The size of the message is stored on 16 bits but given that the maximum message size is actually 1960 bits, it follows that the first 5 bits will be for now always 0 (1960 is 111 1010 1000 in binary, so it needs only 11 bits out of the 16 bits available). It can also be argued that the “TMSR-RSA” octets are further fixed values but note that the hard standard is simply on reserving those octets, not on the values that are stored in them per se. In other words, you are free to store random bits in there as well and all will be fine. Overall there are basically 5 bits with fixed values in this schema, compared to 24 bits with fixed values + a lot more pseudo-random ones in other existing schema such as 0x00||0x02||pseudo-random-non-zero-octets||0x00||M []
  4. I guess in this code for OAEP my inexperience with Ada also shows more than usual, as everything gets more complex. For that matter I’m quite sure that it *could* be done better, more elegantly and so on but for all this “could” look that it hasn’t been and so I’m stuck doing it now. So if you see something horrible, then go and implement your full version of EuCrypt, sign it, explain it and only then come and say anything to me about how ugly mine is.[]
  5. I could have avoided this situation by using Strings of variable length in Ada for instance but I don’t see the real benefits of that, while I can certainly see a few downsides that I don’t like at all.[]
  6. This extreme case of needing 4080 random bits occurs only when the message is empty and the 64 reserved bits are filled with random bits rather than "TMSR-RSA", there are only 16 non-random bits, namely the ones storing the length of the message.[]
  7. Thanks PeterL for spotting the error! Note that the encrypt/decrypt still worked but 1. only for as long as one used the Eucrypt code for both (i.e. symmetrical mistake) and 2. it does reduce the length that can be stored (although this is not in itself a problem since anyway the maximum length as per TMSR OAEP would still fit) []

February 8, 2018

EuCrypt Chapter 9: Byte Order and Bit Disorder in Keccak

Filed under: EuCrypt — Diana Coman @ 3:00 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

The potential bit disorder trouble with Keccak highlighted at the end of the previous chapter calls for some decision to be made since a hash function won’t be of much help if bits come out in different orders from different implementations. So now and here is your time to have your say on this matter – I’ll first describe the changes made for this chapter, since they are directly linked to the issue anyway and then I’ll lay out the current options as I see them. You are warmly invited to tell me what’s wrong with them, come up with better ones, compare them or shred them to bits or… shut up and then live with them.

Strictly speaking, this chapter simply brings in a few modifications and additions to the previous “word-level” version of the Keccak sponge, introduced in Chapter 7. The goal is to make this relatively fast 1 Keccak (compared to the bit-level version from Chapter 8) run with the same results on Big Endian as well as Little Endian machines. As I don’t actually have a Big Endian machine at the moment and the virtualisation option with Qemu proved a HUGE headache, I’ll be grateful for any feedback from someone who can run the code and tests provided (or any other tests they choose) on a Big Endian platform.

Before digging into the code itself, it’s quite useful to review a bit the Endianness issue. In itself, big/little endian refers to the order in which octets (bytes) are stored in memory: either most significant octet first (big endian) or least significant octet first (little endian). So in principle as long as one doesn’t address different octets directly, there is no problem. However, the compounded trouble with Keccak is that the darned specification of the Keccak state and its transformations in fact has unstated assumptions about both octet ordering and bit ordering! So on top of octet (byte) order, one needs to make sure that bit order assumptions also hold at all times and to say that this gives me a headache is an understatement. Essentially, Keccak’s specification relies (implicitly, not explicitly…) on Big Endian octet (byte) ordering when describing the transformations but it uses Little Endian octet ordering and LSB (least significant *bit*) ordering for the Keccak state so that 0x80 for instances is represented as 0000 0001. Oh, the joys of consistent, crystal-clear and totally-not-confusing specifications!

Just about the only bit ray of light in this whole mess is that most of Keccak’s transformations themselves are actually immune to the specific byte/bit order in use (as long as it’s consistently used, of course). Nevertheless, there is potentially big trouble whenever the Z dimension of the Keccak state is directly involved and especially for the likes of the Iota transformation that uses 64-bit pre-defined values (the round constants). To address those issues, the current approach is as follows:

Reflecting the above, the .vpatch for this chapter adds a method for flipping octets of the input stream and checks for local bit order whenever changing from bits to words or the other way around. Moreover, the conversion methods from bits to words and the other way around are updated to match the LSB bit-order that Keccak supposes. The relevant parts are in eucrypt/smg_keccak/smg_keccak.adb:

  -- convert from a bitstream of ZWord size to an actual ZWord number
  function BitsToWord( BWord: in Bitword ) return ZWord is
    W    : ZWord;
    Bits: Bitword;
  begin
    -- just copy octets if machine is little endian
    -- flip octets if machine is big endian
    if Default_Bit_Order = Low_Order_First then
      Bits := BWord;
    else
      Bits := FlipOctets( BWord );
    end if;
    -- actual bits to word conversion
    W := 0;
    -- LSB bit order (inside octet) as per Keccak spec
    for I in reverse Bitword'Range loop
      W := Shift_Left( W, 1 ) + ZWord( Bits( I ) );
    end loop;

    return W;
  end BitsToWord;

  -- convert from a ZWord (lane of state) to a bitstream of ZWord size
  function WordToBits( Word: in ZWord ) return Bitword is
    Bits: Bitword := (others => 0);
    W: ZWord;
  begin
    W := Word;
    for I in Bitword'Range loop
      Bits( I ) := Bit( W mod 2 );
      W := Shift_Right( W, 1 );
    end loop;

    -- flip octets if machine is big endian
    if Default_Bit_Order = High_Order_First then
      Bits := FlipOctets( Bits );
    end if;

    return Bits;
  end WordToBits;

  -- flip given octets (i.e. groups of 8 bits)
  function FlipOctets( BWord : in Bitword ) return Bitword is
    Bits : Bitword;
  begin
    -- copy groups of 8 octets changing their order in the array
    -- i.e. 1st octet in BWord becomes last octet in Bits and so on
    for I in 0 .. ( Bitword'Length / 8 - 1 ) loop
      Bits ( Bits'First  + I * 8     .. Bits'First + I * 8 + 7 ) :=
      BWord( BWord'Last  - I * 8 - 7 .. BWord'Last - I * 8);
    end loop;
    return Bits;
  end FlipOctets;

As you might notice in the above, there are a few other small changes to the conversion functions, mainly replacing direct divisions/multiplications by powers of 2 with corresponding shifts – since this implementation uses anyway rotation of bits, there really is no reason not to take advantage of bit shifting too, where possible.

In addition to the code itself, there are of course new tests as well, mainly for the bit flipping:

  procedure test_flip is
    B: constant Bitword := (1, 0, 1, 1, 1, 1, 0, 0,
                            1, 1, 1, 0, 0, 0, 0, 1,
                            0, 1, 1, 0, 0, 0, 1, 0,
                            1, 1, 1, 1, 1, 1, 1, 1,
                            1, 1, 0, 1, 1, 0, 0, 1,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 1, 1, 0, 0, 0, 1,
                            0, 0, 0, 1, 1, 0, 0, 0);
    Expected: Bitword :=   (0, 0, 0, 1, 1, 0, 0, 0,
                            0, 0, 1, 1, 0, 0, 0, 1,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            1, 1, 0, 1, 1, 0, 0, 1,
                            1, 1, 1, 1, 1, 1, 1, 1,
                            0, 1, 1, 0, 0, 0, 1, 0,
                            1, 1, 1, 0, 0, 0, 0, 1,
                            1, 0, 1, 1, 1, 1, 0, 0);
    Output : Bitword;
  begin
    Output := FlipOctets( B );
    if Output /= Expected then
      Put_Line( "FAILED: flip octets" );
      Put_Line( "Expected: " );
      for I of Expected loop
        Put(Bit'Image(I));
      end loop;
      new_line(1);
      Put_Line( "Output: " );
      for I of Output loop
        Put(Bit'Image(I));
      end loop;
      new_line(1);
    else
      Put_Line( "PASSED: flip octets" );
    end if;
  end test_flip;

In addition to the new test for flipping, the older tests for the sponge are updated to be directly comparable with those from the bit-level version of Keccak so that one can easily check that the output of the two implementations is indeed the same. So the current test_sponge function is the following:

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input     : Bitstream(1..5) := (1, 1, 0, 0, 1);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    C         : Natural;
    HexPos    : Natural;
    Error     : Natural;
    Pos       : Natural;
    ExpHex    : constant String :=
              "CB7FFB7CE7572A06C537858A0090FC2888C3C6BA9A3ADAB4"&
              "FE7C9AB4EFE7A1E619B834C843A5A79E23F3F7E314AA597D"&
              "9DAD376E8413A005984D00CF954F62F59EF30B050C99EA64"&
              "E958335DAE684195D439B6E6DFD0E402518B5E7A227C48CF"&
              "239CEA1C391241D7605733A9F4B8F3FFBE74EE45A40730ED"&
              "1E2FDEFCCA941F518708CBB5B6D5A69C30263267B97D7B29"&
              "AC87043880AE43033B1017EFB75C33248E2962892CE69DA8"&
              "BAF1DF4C0902B16C64A1ADD42FF458C94C4D3B0B32711BBA"&
              "22104989982543D1EF1661AFAF2573687D588C81113ED7FA"&
              "F7DDF912021FC03D0E98ACC0200A9F7A0E9629DBA33BA0A3"&
              "C03CCA5A7D3560A6DB589422AC64882EF14A62AD9807B353"&
              "8DEE1548194DBD456F92B568CE76827F41E0FB3C7F25F3A4"&
              "C707AD825B289730FEBDFD22A3E742C6FB7125DE0E38B130"&
              "F3059450CA6185156A7EEE2AB7C8E4709956DC6D5E9F99D5"&
              "0A19473EA7D737AC934815D68C0710235483DB8551FD8756"&
              "45692B4E5E16BB9B1142AE300F5F69F43F0091D534F372E1"&
              "FFC2E522E71003E4D27EF6ACCD36B2756FB5FF02DBF0C96B"&
              "CAE68E7D6427810582F87051590F6FB65D7B948A9C9D6C93"&
              "AF4562367A0AD79109D6F3087C775FE6D60D66B74F8D29FB"&
              "4BA80D0168693A748812EA0CD3CA23854CC84D4E716F4C1A"&
              "A3B340B1DED2F304DFDBACC1D792C8AC9A1426913E3F67DB"&
              "790FD5CFB77DAA29";
    Output    : Bitstream( 1 .. ExpHex'Length * 4 );
    HexString : String( 1 .. ExpHex'Length );
  begin
    Put_Line("---sponge test---");
    Sponge(Input, Bitrate, Output);
    Put_Line("Input is:");
    for I of Input loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output'Length/4 loop
      Pos := Output'First + (I-1)*4;
      C := Natural( Output( Pos ) ) +
           Natural( Output( Pos + 1 ) ) * 2 +
           Natural( Output( Pos + 2 ) ) * 4 +
           Natural( Output( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
      Hexstring(HexPos) := Hex( C );
      if Hexstring(HexPos) /= ExpHex(HexPos) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex);
    Put_Line("Obtained: ");
    Put_Line(Hexstring);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

The .vpatch for all the above and its signature can be found as usual on my Reference Code Shelf as well as directly from the links below:

To make it perfectly clear before I even get into any options regarding the bit & byte mess: the whole trouble arises essentially because Keccak’s specification can’t stick to one thing and one thing only i.e. either “this is a bit-level futzing” or “this is a byte level fizzing.” So instead it ends up as a bit-byte-bit level-hopping mad tzizzing: theoretically it works at bit-level since the state and transformations really are defined at bit level; in practice and *at the same time* the whole thing is however specified and discussed at byte (octet) level with the Z dimension of the state pretty much ignored (since that is the bit-level basically) and algorithms duly given at… word level even (speed! implementation concern! Yes, yes, but why, WHY can’t you keep the two well separated and labeled as such? WHY?). Moreover, note that the padding rule for instance is specified at bit-level: “10*1” so 2 bits of 1 at the ends with as many 0s as needed in between them so that the whole input is brought to a length that is a multiple of the selected block (Keccak rate) length. But in test vectors this ends up however as 0x80 because of the unstated assumptions of bit order in the Keccak state – basically, why not, we have 2 bit orders so why not use them, right? MOREOVER, the whole thing is anyway meant to work as a hashing of messages hence of characters hence of… octets (bytes)! 2 So on one hand there are test vectors with all sort of number of bits as input, on the other hand you’ll likely want it to handle *consistently* inputs that are inescapably in multiples of 8 bits and nothing else. “Oh, but this is solved already” you’ll say – sure, it is of sorts, basically by NIST having on one hand their own whole damned Appendix for SHA-3 just to sort-of, ass-backwards explain the absorb into a state because it’s so unclear due to bit order and on the other hand by Keccak having one padding rule while SHA-3 has yet another padding rule (hey, we put in a “delimiter” which is 01 for some reason). What, isn’t that clear? NO? But how surprising!

To bring back some sanity, let’s see a few options:

1. Input/Output size & padding:
1.1 Fix both input and output at octet (byte) level, reflecting the intended reality of 8 bits/ character anyway. In this case:

  • valid input for Keccak is a multiple of 8 bits;
  • valid output from Keccak is a multiple of 8 bits;
  • Keccak rate (block size i.e. maximum number of bits absorb/squeezed between two scramblings of the Keccak state) is a multiple of 8 bits;
  • input is padded with 10*1 up to closest multiple of block length (Keccak rate). Consequently, the length of padding will be a minimum of 8 bits and a maximum of block length bits.

1.2 Keep input and output at bit level, reflecting the bit-level nature of the Keccak permutation. In this case:

  • valid input for Keccak is any number of bits > 0
  • valid output from Keccak is any number of bits > 0
  • Keccak rate (block size) is any number of bits < length of Keccak state in bits
  • input is padded with 10*1 up to closest multiple of block length (Keccak rate). Consequently, the length of padding will be a minimum of 2 bits and a maximum of block length bits. In this case however the conversion from characters to bit streams needs to be specified clearly too: does one still consider the whole octet anyway or does one discard any trailing 0s for instance?

2. Input bit and byte (octet) order:
2.1 Input is first padded with the rule 10*1 and *the result* considered to have Little Endian byte order (hence octets will be flipped on Big Endian iron) and LSB bit order (hence absorbed bit by bit in order into the Keccak state).
2.2 The padding itself is flipped first to LSB (so the end octet if it’s one octet full of padding will be 0x80 rather than 0x01) and then attached to the input that is considered as a result to have Little Endian byte order (hence octets will be flipped on Big Endian iron) and LSB bit order (hence absorbed bit by bit in order into the Keccak state).

3. Output bit and byte (octet) order:
3.1 Output is extracted bit by bit from the state, meaning that the output will have LSB bit order. For example, Keccak will output 1101 0011 that is LSB for the corresponding 1100 1011 MSB (or 0xCB). Moreover, if requested output is NOT a multiple of 8 bits, the “stray” bits at the end will be the least significant from the state’s relevant octet. This is what the current SMG implementations do. Essentially this makes the absorb/squeeze quite straightforward and it leaves the interpretation of bits up to the user of Keccak, so outside the permutations themselves.
3.2 Output is extracted octet by octet from the state, meaning that the output will have MSB bit order. For example, Keccak will output 1100 1011 in the case above at 3.1 (0xCB). Note that in this case an output that is NOT a multiple of 8 bits will end in the most significant bits of the relevant octet in the state. As a concrete example: 4 bits of output in the same 0xCB case used before will be “1100” with this option but “1101” with the previous option from 3.1.

There are of course further options that go deeper into the internals of Keccak. For instance, it is certainly possible to use MSB bit ordering in the Keccak state and change accordingly the relevant transformations. However, such approaches are more time-consuming at this stage and I don’t see the clear benefit that would justify that. Nevertheless, if you see such benefit, kindly argue your case in the comments section below. Once again, now it’s the time to provide any input on this because once it’s fixed that’s how it stays and we’ll have to live with it as it is.

  1. This is not even by far the fastest implementation possible. At the very least, a faster implementation would unroll the loops and try to parallelize tasks whenever possible.[]
  2. And NO, I shall not even entertain the additional madness of Unicode and other Unicorns, thank you.[]

February 1, 2018

EuCrypt Chapter 8: Bit-Level Keccak Sponge

Filed under: EuCrypt — Diana Coman @ 9:45 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Implementing the Keccak Sponge at bit-level turns out to be a more enjoyable experience than the previous contortions for a “word”-level (64 bits to be precise) version of the sponge. The implementation itself is more straightforward and the resulting code really is way clearer and easier to follow, especially as it takes advantage of Ada’s very convenient modular types. And as a bonus side effect, working at bit level means also that there really is no need anymore for importing those gnat-specific methods that I never wanted in the first place. As to the “but it’s going to be slooooooow!” worries, I’ll leave those for later: at this stage the goal is to have a reference implementation that is first of all clear and easy to understand. Once this is available, I can look into the speed issue and evaluate if needed the degree to which it really is an issue for Eulora’s needs at any rate. Not to mention the fact that a bit-level Keccak does not in any way mean that a word-level Keccak cannot be made as well.

Without further delay, let’s see what this patch adds: first of all a new component for EuCrypt, namely smg_bit_keccak. I decided to keep this bit-level implementation separate from the word-level one because I see this implementation as an alternative rather than a replacement necessarily. As a result, this chapter’s vpatch will fork directly from EuCrypt’s genesis yet again and to make this clear, I edited the eucrypt/README file first, adding the full list of current components of EuCyrpt:

Components:
1. mpi
  Arbitrary length integers and operations.
  Implemented in C.

2. smg_bit_keccak
  Bit-level implementation of the Keccak sponge according to The Keccak Reference v 3.0.
  Implemented in Ada.

3. smg_keccak
  Word (64 bits) level implementation of the Keccak sponge according to The Keccak Reference v 3.0.
  Implemented in Ada.

4. smg_serpent
  Serpent hash method.
  Implemented in Ada.

5. smg_rsa
  RSA implementation using TMSR specification.
  Implemented in C.

6. smg_comm
  Communications for Eulora (server <-> client). Relies on all the other components.

The description of the SMG_Bit_Keccak package is quite similar to that of the previous SMG_Keccak. The main difference is that the lanes of a Keccak state (i.e. the Z dimension) are now arrays of bits rather than values modulo 2^Z_Length. Reflecting this, the Bitword type is an array of Bit with index of ZCoord type specifically. The Sponge procedure itself has the very same signature since it still receives a Bitstream and a Keccak_Rate as input, while spitting a different Bitstream as output. There is no “Plane” type anymore because it is not actually needed when working at bit-level. The resulting public part of the SMG_Bit_Keccak package definition in eucrypt/smg_bit_keccak/smg_bit_keccak.ads:

 -- S.MG bit-level implementation of Keccak-f permutations
 -- (Based on The Keccak Reference, Version 3.0, January 14, 2011, by
 --   Guido Bertoni, Joan Daemen, Michael Peeters and Gilles Van Assche)

 -- S.MG, 2018

package SMG_Bit_Keccak is
	pragma Pure(SMG_Bit_Keccak);  --stateless, no side effects -> can cache calls

  --knobs (can change as per keccak design but fixed here for S.MG purposes)--
  Keccak_L: constant := 6;  --gives keccak z dimension of 2^6=64 bits and
                            --therefore keccak function 1600 with current
                            --constants (5*5*2^6)

  --constants: dimensions of keccak state and number of rounds
  XY_Length: constant := 5;
  Z_Length: constant := 2 ** Keccak_L;
  Width: constant := XY_Length * XY_Length * Z_Length;
  N_Rounds: constant := 12 + 2 * Keccak_L;

  --types
  type XYCoord is mod XY_Length;
  type ZCoord is mod Z_Length;
  type Round_Index is mod N_Rounds;

  type Bit is mod 2;
  type Bitstream is array( Natural range <> ) of Bit; -- any length; message
  type Bitword is array( ZCoord ) of Bit; -- a keccak "word" of bits

  type State is array( XYCoord, XYCoord ) of Bitword; -- the full keccak state

  type Round_Constants is array(Round_Index) of Bitword; --magic keccak values

  -- rate can be chosen by caller at each call, between 1 and width of state
  -- higher rate means sponge "eats" more bits at a time but has fewer bits in
  --   the "secret" part of the state (i.e. lower capacity)
  subtype Keccak_Rate is Positive range 1..Width;  -- capacity = width - rate

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  -- Output - a bitstream of desired size for holding output
  procedure Sponge(Input      : in Bitstream;
                   Block_Len  : in Keccak_Rate;
                   Output     : out Bitstream);

In the private part of SMG_Bit_Keccak, there are 3 new methods, namely Next_Pos, First_Pos and BWRotate_Left. As you might guess, BWRotate_Left rotates a given Bitword to the left by the specified number of bits. This effectively replaces the previous gnat-specific Rotate_Left method and is used by one of the Keccak transformations of state. The First_Pos effectively sets the X, Y, Z coordinates to point to the first bit of the Keccak state. It’s implemented as a method on its own because it is used in several places and moreover because the “first” position in the cuboid is at the end of the day a matter of convention. Similarly, Next_Pos receives a set of 3 values (X, Y, Z) and changes those to point to the *next* bit in the Keccak state. Once again, what constitutes “next” is a matter of convention – basically it depends on the direction in which one moves along the Z, Y and X dimensions.

private
  -- these are internals of the keccak implementation, not meant to be directly
  --  accessed/used
  -- moving one bit forwards in Keccak state
  procedure Next_Pos( X : in out XYCoord;
                      Y : in out XYCoord;
                      Z : in out ZCoord
                    );
  -- set coordinates to first bit of Keccak state
  procedure First_Pos( X : out XYCoord;
                       Y : out XYCoord;
                       Z : out ZCoord
                     );

  -- operations with Bitwords
  function BWRotate_Left( Input: in Bitword;
                          Count: in Natural)
                          return Bitword;

The rest of the SMG_Bit_Keccak package contains the SqueezeBlock and AbsorbBlock helper methods, the 5 Keccak transformations of state (Theta, Rho, Pi, Chi and Iota) and the Keccak function that does a full scramble of state by calling all the transformations together in the correct order and with the corresponding constants for each round. The only difference with respect to the word-level implementation is the way in which the round constants are given: here they are directly given as Bitword so arrays of bits. Moreover, the order of the bits in the array corresponds to the convention adopted for the Z dimension in this implementation:

  -- this will squeeze Block'Length bits out of state S
  -- NO scramble of state in here!
  -- NB: make SURE that Block'Length is the correct bitrate for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure SqueezeBlock( Block: out Bitstream; S: in State);

  -- This absorbs into sponge the given block, modifying the state accordingly
  -- NO scramble of state in here so make sure the whole Block fits in state!
  -- NB: make SURE that Block'Length is *the correct bitrate* for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure AbsorbBlock( Block: in Bitstream; S: in out State );

  -- Keccak magic bitwords
  RC : constant Round_Constants :=
    (
--   16#0000_0000_0000_0001#, round 0
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),
--   16#0000_0000_0000_8082#, round 1
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,1,0),
--   16#8000_0000_0000_808A#, round 2
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_8000#, round 3
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0),

--   16#0000_0000_0000_808B#, round 4
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#0000_0000_8000_0001#, round 5
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),

--   16#8000_0000_8000_8081#, round 6
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,1),

--   16#8000_0000_0000_8009#, round 7
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,1),

--   16#0000_0000_0000_008A#, round 8
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,0),

--   16#0000_0000_0000_0088#, round 9
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,0,0),

--   16#0000_0000_8000_8009#, round 10
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,1),

--   16#0000_0000_8000_000A#, round 11
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#0000_0000_8000_808B#, round 12
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#8000_0000_0000_008B#, round 13
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#8000_0000_0000_8089#, round 14
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,0,1),

--   16#8000_0000_0000_8003#, round 15
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,1),

--   16#8000_0000_0000_8002#, round 16
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,0),

--   16#8000_0000_0000_0080#, round 17
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,0),

--   16#0000_0000_0000_800A#, round 18
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_000A#, round 19
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_8081#, round 20
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,1),

--   16#8000_0000_0000_8080#, round 21
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,0),

--   16#0000_0000_8000_0001#, round 22
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),

--   16#8000_0000_8000_8008#, round 23
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,0)
    );

  -- Keccak transformations of the internal state
  function Theta ( Input       : in State ) return State;
  function Rho   ( Input       : in State ) return State;
  function Pi    ( Input       : in State ) return State;
  function Chi   ( Input       : in State ) return State;
  function Iota  ( Round_Const : in Bitword; Input : in State ) return State;

  -- Keccak function with block width currently 1600 (Width constant above)
  -- It simply applies *all* keccak transformations in the correct order, using
  -- the keccak magic numbers (round constants) as per keccak reference
  function Keccak_Function(Input: in State) return State;

end SMG_Bit_Keccak;

The actual implementation of the bit-level Keccak is in eucrypt/smg_bit_keccak/smg_bit_keccak.adb:

 -- S.MG, 2018

package body SMG_Bit_Keccak is

  -- public function, sponge
  procedure Sponge( Input      : in Bitstream;
                    Block_Len  : in Keccak_Rate;
                    Output     : out Bitstream) is
    Internal  : State := (others => (others => (others => 0)));
  begin
    --absorb input into sponge in a loop on available blocks, including padding
    declare
      -- number of input blocks after padding (between 2 and block_len bits pad)
      Padded_Blocks : constant Positive := 1 + (Input'Length + 1) / Block_Len;
      Padded        : Bitstream ( 1 .. Padded_Blocks * Block_Len );
      Block         : Bitstream ( 1 .. Block_Len );
    begin
      -- initialise Padded with 0 everywhere
      Padded := ( others => 0 );
      -- copy and pad input with rule 10*1
      Padded( Padded'First .. Padded'First + Input'Length - 1 ) := Input;
      Padded( Padded'First + Input'Length )                     := 1;
      Padded( Padded'Last )                                     := 1;

      -- loop through padded input and absorb block by block into sponge
      -- padded input IS a multiple of blocks, so no stray bits left
      for B in 0 .. Padded_Blocks - 1 loop
        -- first get the current block to absorb
        Block   := Padded( Padded'First + B * Block_Len ..
                           Padded'First + (B+1) * Block_Len - 1 );
        AbsorbBlock( Block, Internal );
        -- scramble state with Keccak function
        Internal := Keccak_Function( Internal );

      end loop; -- end absorb loop for blocks
    end; -- end absorb stage

    --squeeze required bits from sponge in a loop as needed
    declare
      -- full blocks per output
      BPO     : constant Natural := Output'Length / Block_Len;
      -- stray bits per output
      SPO     : constant Natural := Output'Length mod Block_Len;
      Block   : Bitstream( 1 .. Block_Len );
    begin
      -- squeeze block by block (if at least one full block is needed)
      for I in 0 .. BPO - 1 loop
        SqueezeBlock( Block, Internal );
        Output( Output'First + I * Block_Len ..
                Output'First + (I + 1) * Block_Len -1) := Block;

        -- scramble state
        Internal := Keccak_Function( Internal );
      end loop;  -- end squeezing full blocks

      -- squeeze any partial block needed (stray bits)
      if SPO > 0 then
        SqueezeBlock( Block, Internal );
        Output( Output'Last - SPO + 1 .. Output'Last ) :=
                Block( Block'First .. Block'First + SPO - 1 );
      end if; -- end squeezing partial last block (stray bits)

    end; -- end squeeze stage

  end Sponge;

  -- helper procedures for sponge absorb/squeeze

  -- NO scramble here, this will absorb ALL given block, make sure it fits!
  procedure AbsorbBlock( Block: in Bitstream; S: in out State ) is
    X, Y                  : XYCoord;
    Z                     : ZCoord;
  begin
    -- xor current block, bit by bit, into first Block'Length bits of state
    First_Pos( X, Y, Z);
    for B of Block loop
      -- xor this bit into the state
      S( X, Y )( Z ) := S( X, Y )( Z ) + B;
      -- move to next bit of the state
      Next_Pos( X, Y, Z );
    end loop;
  end AbsorbBlock;

  -- NO scramble here, this will squeeze Block'Length bits out of *same* state S
  procedure SqueezeBlock( Block: out Bitstream; S: in State) is
    X, Y    : XYCoord;
    Z       : ZCoord;
  begin
    -- start with first position of the state
    First_Pos( X, Y, Z );
    -- squeeze bit by bit, as many bits as needed to fill Block
    for I in Block'Range loop
      -- squeeze current bit from state
      Block( I ) := S( X, Y )( Z );
      -- advance to next bit of state
      Next_Pos( X, Y, Z);
    end loop;
  end SqueezeBlock;

  -- moving one bit forwards in Keccak state
  procedure Next_Pos( X : in out XYCoord;
                      Y : in out XYCoord;
                      Z : in out ZCoord
                    ) is
  begin
    Z := Z - 1;
    if Z = ZCoord'Last then
      X := X + 1;
      if X = XYCoord'First then
        Y := Y + 1;
      end if;
    end if;
  end Next_Pos;

  -- position of first bit in Keccak state
  procedure First_Pos( X : out XYCoord;
                       Y : out XYCoord;
                       Z : out ZCoord
                     ) is
  begin
    X := XYCoord'First;
    Y := XYCoord'First;
    Z := ZCoord'Last;
  end First_Pos;

  -- operations with Bitwords
  function BWRotate_Left( Input: in Bitword;
                          Count: in Natural)
                          return Bitword is
    Output  : Bitword;
    Advance : constant ZCoord := ZCoord( Count mod Z_Length );
  begin
    for I in ZCoord loop
      Output( I ) := Input( I + Advance );
    end loop;
    return Output;
  end BWRotate_Left;

  -- Keccak transformations of the internal state
  function Theta ( Input       : in State) return State is
    Output : State;
    S1, S2 : Bit;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        for Z in ZCoord loop
          S1 := 0;
          S2 := 0;
          for Y1 in XYCoord loop
            S1 := S1 + Input( X - 1, Y1 )( Z );
            -- Z direction is opposite to the one assumed in the ref so Z + 1
            S2 := S2 + Input( X + 1, Y1 )( Z + 1 );
          end loop;
          Output( X, Y )(Z) := Input( X, Y )( Z ) + S1 + S2;
        end loop;
      end loop;
    end loop;

    return Output;
  end Theta;

  function Rho   ( Input       : in State) return State is
    Output      : State;
    X, Y, Old_Y : XYCoord;
  begin
    Output( 0, 0) := Input( 0, 0);
    X := 1;
    Y := 0;

    for T in 0 .. 23 loop
      Output(X, Y) := BWRotate_Left(Input(X,Y), (T+1)*(T+2)/2);
      Old_Y := Y;
      Y := 2 * X + 3 * Y;
      X := Old_Y;
    end loop;
    return Output;
  end Rho;

  function Pi    ( Input       : in State) return State is
    Output : State;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        Output( Y, 2 * X + 3 * Y ) := Input( X, Y );
      end loop;
    end loop;

    return Output;
  end Pi;

  function Chi   ( Input       : in State) return State is
    Output : State;
  begin
    for Y in XYCoord loop
      for X in XYCoord loop
        for Z in ZCoord loop
          Output(X, Y)(Z) :=   Input( X, Y )( Z ) +
                             ( Input( X + 1, Y )( Z ) + 1 ) *
                             ( Input( X + 2, Y )( Z )     );
        end loop;
      end loop;
    end loop;

    return Output;
  end Chi;

  function Iota  ( Round_Const : in Bitword; Input : in State) return State is
    Output : State;
  begin
    Output := Input;
    for Z in ZCoord loop
      Output( 0, 0 )(Z) := Input( 0, 0 )( Z ) + Round_Const( Z );
    end loop;
    return Output;
  end Iota;

  function Keccak_Function(Input: in State) return State is
    Output: State;
  begin
    Output := Input;
    for I in Round_Index loop
      Output := Iota(RC(I), Chi(Pi(Rho(Theta(Output)))));
    end loop;

    return Output;
  end Keccak_Function;

end SMG_Bit_Keccak;

Note in the above that the Sponge, AbsorbBlock and SqueezeBlock methods are rather simpler than they used to be. Absorbing/squeezing one block is now simply a matter of reading bit by bit from the Keccak state from the starting position and up to the block length. While one could arguably read Bitword by Bitword rather than bit by bit, I kept it bit by bit for clarity for now. Moreover, when absorbing a block, the xor would still need to be performed bit by bit essentially since the Z dimension of the state is defined as an array of bits rather than a value.

The Next_Pos procedure above takes advantage of the fact that all coordinates in a Keccak sponge are modular types so they will automatically wrap-around as needed. Consequently, to advance one position further, it’s enough to decrease the Z coordinate (by convention movement is in the negative direction of the Z axis here) and then, if needed, to increase X in order to move on to a different Bitword and/or possibly Y as well in order to move on to a different plane of the cuboid too.

The First_Pos procedure simply sets X, Y and Z to match the convention that movement in a Keccak state happens in the positive direction of the X and Y axes but in the negative direction of the Z axis.

The Theta transformation of the state is a direct implementation of Theta’s definition, working directly bit by bit. By contrast, the implementation from the previous chapter used the algorithm given in the reference paper, which took advantage of the fact that lanes were represented as values. However, this bit-level implementation does not gain anything from using that algorithm (on the contrary, it would end up with even more operations) so the direct implementation of Theta’s definition is preferred. Note that the “Z-1” from Theta’s definition effectively means “the bit before Z”, which translates to “Z+1” with the current convention for movement on the Z axis. The rest of the Keccak transformations remain quite similar to the previous implementation.

As usual, there are some automated tests using existing test vectors for the transformations as well as for the sponge itself. The long of it is in eucrypt/smg_bit_keccak/tests/smg_bit_keccak-test.adb:

with SMG_Bit_Keccak; use SMG_Bit_Keccak;
with Ada.Exceptions; use Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Interfaces; use Interfaces;

procedure SMG_Bit_Keccak.Test is
  --types
  type Keccak_Perms is ( None, Theta, Rho, Pi, Chi, Iota );
  type Test_Vector  is array( Keccak_Perms ) of State;
  type Test_Round   is array( Round_Index ) of Test_Vector;
  subtype Hexstring is String( 1 .. Z_Length / 4 ); --word as hex string
  subtype Bitstring is String( 1 .. Z_Length ); -- word as binary string
  type Bithex       is array( 0 .. 3 ) of Bit;

  -- helper methods
  procedure HexCharToBit( H : in Character; B: out Bithex) is
  begin
    case H is
      when '0' => B := (0, 0, 0, 0);
      when '1' => B := (0, 0, 0, 1);
      when '2' => B := (0, 0, 1, 0);
      when '3' => B := (0, 0, 1, 1);
      when '4' => B := (0, 1, 0, 0);
      when '5' => B := (0, 1, 0, 1);
      when '6' => B := (0, 1, 1, 0);
      when '7' => B := (0, 1, 1, 1);
      when '8' => B := (1, 0, 0, 0);
      when '9' => B := (1, 0, 0, 1);
      when 'A' => B := (1, 0, 1, 0);
      when 'B' => B := (1, 0, 1, 1);
      when 'C' => B := (1, 1, 0, 0);
      when 'D' => B := (1, 1, 0, 1);
      when 'E' => B := (1, 1, 1, 0);
      when 'F' => B := (1, 1, 1, 1);
      when others => null;
    end case;
  end HexCharToBit;

  function HexToBitword( H: in Hexstring ) return Bitword is
    BW         : Bitword;
    B1, B2     : Bithex;
    PosH, PosB : Natural;
  begin
    -- read the hexstring octet by octet
    for I in 1 .. Z_Length / 8 loop
      PosH := Integer(H'First) + (I - 1) * 2;
      HexCharToBit( H(PosH), B1 );
      HexCharToBit( H(PosH + 1), B2 );

      PosB := Integer(BW'First) + (I - 1) * 8;
      for J in 0 .. 3 loop
        BW ( ZCoord(PosB + J) ) := B1(J);
        BW ( ZCoord(PosB + 4 + J) ) := B2(J);
      end loop;
    end loop;
    return BW;
  end HexToBitword;

  -- prints one bitword as an array of bits
  procedure print_bitword( B: in Bitword ) is
    bstr: Bitstring;
  begin
    for I in ZCoord loop
      if B( I ) > 0 then
        bstr( Bitstring'First + Integer(I) ) := '1';
      else
        bstr( Bitstring'First + Integer(I) ) := '0';
      end if;
    end loop;
    Put(bstr);
  end print_bitword;

  -- prints a keccak state, bitword by bitword
  procedure print_state( S: in State; Title: in String) is
  begin
    Put_Line("---------" & Title & "---------");
    for Y in XYCoord loop
      for X in XYCoord loop
        Put( "S(" & XYCoord'Image(X) & ", " & XYCoord'Image(Y) & ")= ");
        print_bitword( S( X, Y ) );
        new_line(1);
      end loop;
    end loop;
  end print_state;

  function read_state(File: in FILE_TYPE; Oct: Positive :=8) return State is
    S: State;
    Line1: String := "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000";
    StartPos, EndPos: Positive;
    Len: Positive := Oct*2;
    HStr: Hexstring;
  begin
    for Y in XYCoord loop
      Line1 := Get_Line(File);
      StartPos := Line1'First;
      EndPos := StartPos + Len-1;

      for X in XYCoord loop
        HStr := Line1( StartPos .. EndPos );
        S( X, Y ) := HexToBitword(HStr);
        StartPos := EndPos + 2;	--one space to skip
        EndPos := StartPos + Len - 1;
      end loop;
    end loop;
    return S;
  end read_state;

  --reads a full test round from specified file (pre-defined format)
  function read_from_file (filename : in String;
                           T        : out Test_Round)
                           return Boolean is
    file: FILE_TYPE;
    InputMarker: String := "lanes as 64-bit words:";
    octets: Positive := 8;
    RoundNo: Round_Index;
  begin
    -- try to open the input file
    begin
      open(file, In_File, filename);
    exception
      when others =>
        Put_Line(Standard_Error,
                 "Can not open the file '" & filename & "'. Does it exist?");
        return False;
    end;

  -- find & read input state first
    RoundNo := -1;
    loop
      declare
        Line: String := Get_Line(file);
      begin
        --check if this is test data of any known kind
        if index(Line, InputMarker, 1) > 0 then
          T(0)(None) := read_state(file, octets);
          print_state(T(0)(None), "Read Input State");
        elsif index(Line, "Round ", 1) > 0 then
          RoundNo := RoundNo +1;
        elsif index(Line, "theta", 1) > 0 then
          T(RoundNo)(Theta) := read_state(file, octets);
          if (RoundNo > 0) then
            T(RoundNo)(None) := T(RoundNo-1)(Iota);  -- previous state as input
          end if;
        elsif index(Line, "rho", 1) > 0 then
          T(RoundNo)(Rho) := read_state(file, octets);
        elsif index(Line, "pi", 1) > 0 then
          T(RoundNo)(Pi) := read_state(file, octets);
        elsif index(Line, "chi", 1) > 0 then
          T(RoundNo)(Chi) := read_state(file, octets);
        elsif index(Line, "iota", 1) > 0 then
          T(RoundNo)(Iota) := read_state(file, octets);
        end if;
        exit when End_Of_File(file);
      end;
    end loop;
    Close(file);
    return True;
  end read_from_file;


  -- performs one single round of Keccak, step by step
  -- each permutation is tested separately
  -- test fails with exception raised at first output not matching expected
  procedure test_one_round(T: Test_Vector; Round: Round_Index) is
    Input: State;
    Expected: State;
    Output: State;
    Test_One_Round_Fail: Exception;
  begin
    Input := T(None);
    for I in Keccak_Perms range Theta..Iota loop
      Expected := T(I);
      case I is
        when Theta  => Output := SMG_Bit_Keccak.Theta(Input);
        when Rho    => Output := SMG_Bit_Keccak.Rho(Input);
        when Pi     => Output := SMG_Bit_Keccak.Pi(Input);
        when Chi    => Output := SMG_Bit_Keccak.Chi(Input);
        when Iota   => Output := SMG_Bit_Keccak.Iota(RC(Round), Input);
        when others => null;
      end case;

      if (Output /= Expected) then
        print_state(Output, "----------real output-------");
        print_state(Expected, "----------expected output--------");
        raise Test_One_Round_Fail;
      else
        Put_Line("PASSED: " & Keccak_Perms'Image(I));
      end if;
      -- get ready for next permutation
      Input := Expected;
    end loop;
  end test_one_round;

  procedure test_bwrotate_left( Input    : in Bitword;
                                N        : Positive;
                                Expected : in Bitword) is
    Output: Bitword;
  begin
    Output := BWRotate_Left( Input, N );
    if Output /= Expected then
      Put_Line("FAIL: test bitword rotate left");
      Put_Line("Output:");
      print_bitword( Output );
      Put_Line("Expected:");
      print_bitword( Expected );
    else
      Put_Line("PASS: test bitword rotate left");
    end if;
  end test_bwrotate_left;

  procedure test_keccak_function(T: in Test_Round) is
    S: State;
  begin
    Put_Line("---Full Keccak Function test---");
    S := Keccak_Function(T(Round_Index'First)(None));
    if S /= T(Round_Index'Last)(Iota) then
      Put_Line("FAILED: full keccak function test");
    else
      Put_Line("PASSED: full keccak function test");
    end if;
  end test_keccak_function;

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input1    : Bitstream( 1 .. 5 ) := (1, 1, 0, 0, 1);
    Input2    : Bitstream( 1 .. 30) := (1, 1, 0, 0,
                                        1, 0, 1, 0,
                                        0, 0, 0, 1,
                                        1, 0, 1, 0,
                                        1, 1, 0, 1,
                                        1, 1, 1, 0,
                                        1, 0, 0, 1,
                                        1, 0);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    C         : Natural;
    ExpHex1   : constant String :=
              "CB7FFB7CE7572A06C537858A0090FC2888C3C6BA9A3ADAB4"&
              "FE7C9AB4EFE7A1E619B834C843A5A79E23F3F7E314AA597D"&
              "9DAD376E8413A005984D00CF954F62F59EF30B050C99EA64"&
              "E958335DAE684195D439B6E6DFD0E402518B5E7A227C48CF"&
              "239CEA1C391241D7605733A9F4B8F3FFBE74EE45A40730ED"&
              "1E2FDEFCCA941F518708CBB5B6D5A69C30263267B97D7B29"&
              "AC87043880AE43033B1017EFB75C33248E2962892CE69DA8"&
              "BAF1DF4C0902B16C64A1ADD42FF458C94C4D3B0B32711BBA"&
              "22104989982543D1EF1661AFAF2573687D588C81113ED7FA"&
              "F7DDF912021FC03D0E98ACC0200A9F7A0E9629DBA33BA0A3"&
              "C03CCA5A7D3560A6DB589422AC64882EF14A62AD9807B353"&
              "8DEE1548194DBD456F92B568CE76827F41E0FB3C7F25F3A4"&
              "C707AD825B289730FEBDFD22A3E742C6FB7125DE0E38B130"&
              "F3059450CA6185156A7EEE2AB7C8E4709956DC6D5E9F99D5"&
              "0A19473EA7D737AC934815D68C0710235483DB8551FD8756"&
              "45692B4E5E16BB9B1142AE300F5F69F43F0091D534F372E1"&
              "FFC2E522E71003E4D27EF6ACCD36B2756FB5FF02DBF0C96B"&
              "CAE68E7D6427810582F87051590F6FB65D7B948A9C9D6C93"&
              "AF4562367A0AD79109D6F3087C775FE6D60D66B74F8D29FB"&
              "4BA80D0168693A748812EA0CD3CA23854CC84D4E716F4C1A"&
              "A3B340B1DED2F304DFDBACC1D792C8AC9A1426913E3F67DB"&
              "790FD5CFB77DAA29";
    ExpHex2   : constant String :=
              "35F4FBA9D29E833B1DB17CA2077C11B3348C8AF2A29344AE"&
              "6AAA1F63FC4536CE795C54F0359953B97CEA27491691E93E"&
              "E4829EAB388211E6E8BD3EDA74366D0947DFA3D65D127593"&
              "0AFC42884B7324717DCB003D7B3B5C2E92B84F478CC8DBB5"&
              "174EB4BAC6207BD22E56FCC6E5FB11BC598FDBE6208913CE"&
              "34BC03837FDBFCDFF9407D948531B5FC7FFE7029F30E7EDC"&
              "F9282F0A630FA99839776F5EEA485449F62E421552AF9571";
    HexStr1   : String( 1 .. ExpHex1'Length );
    Output1   : Bitstream( 1 .. ExpHex1'Length * 4 );
    HexStr2   : String( 1 .. ExpHex2'Length );
    Output2   : Bitstream( 1 .. ExpHex2'Length * 4 );
    Error     : Natural;
    Pos       : Natural;
    HexPos    : Natural;
  begin

  -- test 1
    Put_Line("---sponge test 1---");
    Sponge(Input1, Bitrate, Output1);
    Put_Line("Input is:");
    for I of Input1 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output1 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output1'Length/4 loop
      Pos := Output1'First + (I-1)*4;
      C := Natural( Output1( Pos ) ) +
           Natural( Output1( Pos + 1 ) ) * 2 +
           Natural( Output1( Pos + 2 ) ) * 4 +
           Natural( Output1( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
			Hexstr1( HexPos ) := Hex(C);
      if Hexstr1( HexPos ) /= ExpHex1( HexPos ) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex1);
    Put_Line("Obtained: ");
    Put_Line(Hexstr1);
    Put_Line("Errors found: " & Natural'Image(Error));

  -- test 2
    Put_Line("---sponge test 2---");
    Sponge(Input2, Bitrate, Output2);
    Put_Line("Input is:");
    for I of Input2 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output2 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output2'Length/4 loop
      Pos := Output2'First + (I-1)*4;
      C := Natural( Output2( Pos ) ) +
           Natural( Output2( Pos + 1 ) ) * 2 +
           Natural( Output2( Pos + 2 ) ) * 4 +
           Natural( Output2( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
			Hexstr2( HexPos ) := Hex(C);
      if Hexstr2( HexPos ) /= ExpHex2( HexPos ) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex2);
    Put_Line("Obtained: ");
    Put_Line(Hexstr2);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

  -- end of helper methods

	--variables
  T     : Test_Round;
  BW, E : Bitword;
begin
  BW:=(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
       0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,0);
  E:=(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
       0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,0);
  test_bwrotate_left(BW, 2, E);

  Put_Line("-----Testing with zero state as input------");
  if (not read_from_file("testvectorszero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  -- test also Keccak_Function as a whole --
  test_keccak_function(T);

  Put_Line("-----Testing with non-zero state as input------");
  if (not read_from_file("testvectorsnonzero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  -- test also Keccak_Function as a whole --
  test_keccak_function(T);

  -- test Sponge construction
  test_sponge;

end SMG_Bit_Keccak.Test;

To test both Keccak transformations and the sponge construction itself, I used this time only data from the existing test vectors for Keccak. While the testing of the Keccak transformations themselves did not provide any surprises, the testing of the sponge did provide a bit of a headache due to the fact that existing test data is really meant at octet (byte) level rather than bit level – both as input and as output of the sponge. While the “input bits” are given explicitly, the output is specified only in hexadecimal and it turns out that the squeezing from Keccak is *also* meant to be one octet at a time rather than 1 bit at a time. This wouldn’t be a problem, of course, if it weren’t for the different order in which the bits end up in the output stream depending on how you squeeze them from the state. To give an example: the “octet” 0011 0101 (or 35 in hex) is squeezed as 10101100, basically back to front.

At the moment this slight issue of bit order is “handled” outside of Keccak itself based on the reasoning that octet-level interpretation is outside of Keccak itself and therefore entirely up to the user of Keccak – they can use any rule they want regarding the value represented by 8 (or any other number) of squeezed bits. As long as the output bits from Keccak are correct and in consistent order, the implementation is correct from my point of view. However, there are of course a few other potential approaches to this and I’m still considering them. For now though, here is the .vpatch with the full bit-level Keccak transformations+sponge as described above, together with the corresponding signature:

January 25, 2018

EuCrypt Chapter 7: Keccak Sponge

Filed under: EuCrypt — Diana Coman @ 4:40 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Using the Keccak transformations from Chapter 6, I can now finally implement the actual Keccak sponge 1 that is useful for EuCrypt mainly as a hashing function. To start with, I should say that “sponge” really doesn’t strike me as a very useful name or sort of metaphor to use – the whole thing is still essentially a Rubik’s cuboid of bits that can be loaded with input bits (through a xor operation), scrambled (by means of the set of transformations discussed in the previous chapter) and read as needed. I get it that Rubik’s is trademarked and one wants a “new” name/metaphor and what not. I also totally agree that an actual sponge-organism absorbing/sucking and spitting bits would make at least a livelier sort of hash function but Keccak is what it is and that’s a very cold mechanism of scrambling bits (in a 3D sort of way at least, granted) repeatedly. Anyway, to preserve history and make it easy to follow the match between implementation and reference document, I’ll use the authors’ terminology to denote the thing although I will keep using the Rubik’s cube analogy as needed, to explain how it actually works.

The Keccak sponge takes as input a stream of bits of arbitrary length and spits as output another stream of bits of any specific length that is requested. For this reason, the ideal Keccak sponge implementation does not care at all about endianness: bits are absorbed as they come, whether most significant or least significant first. This approach has the significant advantage of clarity: one can easily follow the stream of bits as they make it into the sponge’s structure and then as they are squeezed out. Moreover, there is no need to cater explicitly for big endian or little endian machines as such. The choice of Ada as programming language for this implementation makes it also particularly straightforward to model precisely what is described i.e. a transformation at bit level rather than octet level, hence unconcerned with endianness. However, it turns out that my current implementation is not yet there because at the moment it relies on the round constants (for the iota transformation in the previous chapter) as numbers (hence with bit-level representation dependent on endianness) rather than bit streams. Consequently, I’ll have to go back over this in next chapter and change everything to bit-level operation. For now though, let’s see what the current sponge looks like anyway.

The Keccak sponge itself comes with an important parameter, namely bitrate: this represents the number of bits that the sponge can absorb or squeeze in one single iteration, meaning without needing to scramble the state. The way the sponge works is that it first pads the input stream with 10*1 (so a minimum of 2 bits set to 1 with as many or as few 0 in between as needed) so that the length of the input stream is a multiple of the requested bitrate. Then it splits the padded input stream into chunks of bitrate length and it proceeds to “absorb” those chunks one by one (i.e. xor in the first part of its state), scrambling the state after each chunk. When all the input stream has been absorbed, the sponge moves onto the “squeezing” stage where it simply reads bits from the first part of its state, scrambling the state again after each bitrate bits read. Essentially, the bitrate parameter stands for the number of bits of internal state that are used directly for absorbing and squeezing between two consecutive scramblings of the Keccak state.

The bitrate parameter decides indirectly a sponge’s “capacity“. This is the remaining number of bits of the state and it can be calculated as widthbitrate. The capacity of a sponge is the number of bits that are never directly read at a squeeze nor directly xored with input bits when absorbing input. The capacity bits contribute otherwise to the scrambling of state at each iteration of the sponge and they represent therefore the “secret” part of a sponge’s internal state. Consequently, there is a tradeoff between a higher bitrate that would make the Keccak hashing faster in principle and a higher capacity that increases the “secret” part of the sponge. Note that a sponge’s bitrate is a matter of choice at each and every use of the sponge – there is no reason really for this to be fixed necessarily by the Keccak implementation itself. Consequently, my current approach is to have it as a parameter of the Sponge procedure itself, to be decided on by the calling code, at each call. Because this bitrate is effectively used to split the input stream into blocks of equal length, I’m calling the parameter Block_Len, specifying however that it has to be a value of the Keccak_Rate type that models the constraints of a valid bitrate (0 < bitrate < width).

To model the Keccak sponge with its specific input/output requirements, I first define a few new types and subtypes, in eucrypt/smg_keccak/smg_keccak.ads:

  -- rate can be chosen by caller at each call, between 1 and width of state
  -- higher rate means sponge "eats" more bits at a time but has fewer bits in
  --   the "secret" part of the state (i.e. lower capacity)
  subtype Keccak_Rate is Positive range 1..Width;  -- capacity = width - rate

  type Bit is mod 2;
  type Bitstream is array( Natural range <> ) of Bit; -- any length; message
  subtype Bitword is Bitstream( 0..Z_Length - 1 ); -- bits of one state "word"

Note in the above the Bitword subtype of Bitstream: this is a stream of bits of precisely same length as the Z dimension of the Keccak cuboid (EuCrypt uses 64 bits as the value of the Z_Length constant in smg_keccak code). In other words, a Bitword is the bistream equivalent of the numerical “ZWord” value stored at any (X,Y) position of the Keccak sponge. Consequently, there is a need for some ways to convert between the two (Bitword to ZWord and back):

  -- type conversions
  function BitsToWord( Bits: in Bitword ) return ZWord;
  function WordToBits( Word: in ZWord ) return Bitword;

The type conversions are provided as above in the public part of the SMG_Keccak package for now, since they might conceivably be useful to users of Keccak too, at some point. However, as I’ll change the Keccak implementation to get rid of multi-octet numbers and work simply at bit level *everywhere*, it’s quite likely that those methods will simply be discarded as they are not needed anymore. For the time being though, once those conversions are in place, there is only the sponge function itself to define immediately afterwards:

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  -- Output - a bitstream of desired size for holding output
  procedure Sponge(Input      : in Bitstream;
                   Block_Len  : in Keccak_Rate;
                   Output     : out Bitstream);

Finally, in the same eucrypt/smg_keccak/smg_keccak.ads, I added to the private part of the SMG_Keccak package two new procedures, SqueezeBlock and AbsorbBlock that are used by the Sponge function. Note that these two procedures are not really meant for external use, mainly because they represent intermediate steps in the sponge operation and therefore are not of much use by themselves.

  -- this will squeeze Block'Length bits out of state S
  -- NO scramble of state in here!
  -- NB: make SURE that Block'Length is the correct bitrate for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure SqueezeBlock( Block: out Bitstream; S: in State);

  -- This absorbs into sponge the given block, modifying the state accordingly
  -- NO scramble of state in here so make sure the whole Block fits in state!
  -- NB: make SURE that Block'Length is *the correct bitrate* for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure AbsorbBlock( Block: in Bitstream; S: in out State );

The detailed implementations of all the above new procedures and functions are in eucrypt/smg_keccak/smg_keccak.adb:

-- public function, sponge
  procedure Sponge( Input      : in Bitstream;
                    Block_Len  : in Keccak_Rate;
                    Output     : out Bitstream) is
    Internal  : State := (others => (others => 0));
  begin
    --absorb input into sponge in a loop on available blocks, including padding
    declare
      -- number of input blocks after padding (between 2 and block_len bits pad)
      Padded_Blocks : constant Positive := 1 + (Input'Length + 1) / Block_Len;
      Padded        : Bitstream ( 1 .. Padded_Blocks * Block_Len );
      Block         : Bitstream ( 1 .. Block_Len );
    begin
      -- initialise Padded with 0 everywhere
      Padded := ( others => 0 );
      -- copy and pad input with rule 10*1
      Padded( Padded'First .. Padded'First + Input'Length - 1 ) := Input;
      Padded( Padded'First + Input'Length )                     := 1;
      Padded( Padded'Last )                                     := 1;

      -- loop through padded input and absorb block by block into sponge
      -- padded input IS a multiple of blocks, so no stray bits left
      for B in 0 .. Padded_Blocks - 1 loop
        -- first get the current block to absorb
        Block   := Padded( Padded'First + B * Block_Len ..
                           Padded'First + (B+1) * Block_Len - 1 );
        AbsorbBlock( Block, Internal );
        -- scramble state with Keccak function
        Internal := Keccak_Function( Internal );

      end loop; -- end absorb loop for blocks
    end; -- end absorb stage

    --squeeze required bits from sponge in a loop as needed
    declare
      -- full blocks per output
      BPO     : constant Natural := Output'Length / Block_Len;
      -- stray bits per output
      SPO     : constant Natural := Output'Length mod Block_Len;
      Block   : Bitstream( 1 .. Block_Len );
    begin
      -- squeeze block by block (if at least one full block is needed)
      for I in 0 .. BPO - 1 loop
        SqueezeBlock( Block, Internal );
        Output( Output'First + I * Block_Len ..
                Output'First + (I + 1) * Block_Len -1) := Block;

        -- scramble state
        Internal := Keccak_Function( Internal );
      end loop;  -- end squeezing full blocks

      -- squeeze any partial block needed (stray bits)
      if SPO > 0 then
        SqueezeBlock( Block, Internal );
        Output( Output'Last - SPO + 1 .. Output'Last ) :=
                Block( Block'First .. Block'First + SPO - 1 );
      end if; -- end squeezing partial last block (stray bits)

    end; -- end squeeze stage
  end Sponge;

  -- convert from a bitstream of ZWord size to an actual ZWord number
  -- first bit of bitstream will be most significant bit of ZWord
  function BitsToWord( Bits: in Bitword ) return ZWord is
    W: ZWord;
    P: Natural;
  begin
    W := 0;
    P := 0;
    for I in reverse Bitword'Range loop
      W := W + ZWord( Bits(I) ) * ( 2**P );
      P := P + 1;
    end loop;
    return W;
  end BitsToWord;

  -- convert from a ZWord (lane of state) to a bitstream of ZWord size
  -- most significant bit of ZWord will be left most bit of bitstream
  function WordToBits( Word: in ZWord ) return Bitword is
    Bits: Bitword := (others => 0);
    W: ZWord;
  begin
    W := Word;
    for I in reverse Bitword'Range loop
      Bits( I ) := Bit( W mod 2 );
      W := W / 2;
    end loop;
    return Bits;
  end WordToBits;

-- helper procedures for sponge absorb/squeeze
  -- NO scramble here, this will absorb ALL given block, make sure it fits!
  procedure AbsorbBlock( Block: in Bitstream; S: in out State ) is
    WPB: constant Natural := Block'Length / Z_Length;   -- words per block
    SBB: constant Natural := Block'Length mod Z_Length; -- stray bits
    FromPos, ToPos        : Natural;
    X, Y                  : XYCoord;
    Word                  : ZWord;
    BWord                 : Bitword;
  begin
    -- xor current block into first Block'Length bits of state
    -- a block can consist in more than one word
    X := 0;
    Y := 0;
    for I in 0..WPB-1 loop
      FromPos := Block'First + I * Z_Length;
      ToPos   := FromPos + Z_Length - 1;
      Word := BitsToWord( Block( FromPos .. ToPos ) );
      S( X, Y ) := S( X, Y ) xor Word;
      -- move on to next word in state
      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;
    end loop;
    -- absorb also any remaining bits from block
    if SBB > 0 then
      ToPos := Block'Last;
      FromPos := ToPos - SBB + 1;
      BWord := (others => 0);
      BWord(Bitword'First .. Bitword'First + SBB - 1) := Block(ToPos..FromPos);
      Word := BitsToWord( BWord );
      S( X, Y ) := S( X, Y ) xor Word;
    end if;
  end AbsorbBlock;

  -- NO scramble here, this will squeeze Block'Length bits out of *same* state S
  procedure SqueezeBlock( Block: out Bitstream; S: in State) is
    X, Y    : XYCoord;
    BWord   : Bitword;
    FromPos : Natural;
    Len     : Natural;
  begin
    X := 0;
    Y := 0;
    FromPos := Block'First;

    while FromPos <= Block'Last loop
      BWord := WordToBits( S(X, Y) );

      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;

      -- copy full word if it fits or
      --   only as many bits as are still needed to fill the block
      Len := Block'Last - FromPos + 1;
      if Len > Z_Length then
        Len := Z_Length;
      end if;

      Block(FromPos..FromPos+Len-1) := BWord(BWord'First..BWord'First+Len-1);
      FromPos := FromPos + Len;
    end loop;
  end SqueezeBlock;

As usual, there are a few tests added to eucrypt/smg_keccak/tests/smg_keccak-test.adb. First, there is a simple test of the Keccak function itself (the one that puts together the transformations for a full scramble of the state). Second, there are tests of the two conversion functions from Bitword to ZWord and the other way around. Finally, there are tests for the sponge itself. The test data for the sponge was obtained by running the transformations separately according to the reference paper. The additional methods in the test file are the following:

 procedure print_bitstream(B: in Bitstream; Title: in String) is
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..B'Length/4);
    C         : Natural;
    Pos       : Natural;
  begin
    for I in 1..B'Length/4 loop
      Pos := (I-1)*4 + B'First;
      C := Natural( B(Pos) ) * 8 +
           Natural( B(Pos + 1) ) * 4 +
           Natural( B(Pos + 2) ) * 2 +
           Natural( B(Pos + 3) );
			HexString(I) := Hex(C);
    end loop;
    Put_Line("---" & Title & "---");
    Put_Line(HexString);
  end print_bitstream;

  procedure test_bits_to_word_conversion is
    bits: Bitword := (others => 0);
    obtained_bits: Bitword := (others => 0);
    expected: ZWord;
    obtained: ZWord;
  begin
    expected := 16#E7DDE140798F25F1#;
    bits := (1,1,1,0, 0,1,1,1, 1,1,0,1, 1,1,0,1, 1,1,1,0, 0,0,0,1, 0,1,0,0,
             0,0,0,0, 0,1,1,1, 1,0,0,1, 1,0,0,0, 1,1,1,1, 0,0,1,0, 0,1,0,1,
             1,1,1,1, 0,0,0,1);
    obtained := BitsToWord(bits);
    obtained_bits := WordToBits(expected);

    if obtained /= expected then
      Put_Line("FAIL: bits to word");
      Put_Line("Expected: " & ZWord'Image(expected));
      Put_Line("Obtained: " & ZWord'Image(obtained));
    else
      Put_Line("PASSED: bits to word");
    end if;

    if obtained_bits /= bits then
      Put_Line("FAIL: word to bits");
      Put("Expected: ");
      for I in Bitword'Range loop
        Put(Bit'Image(bits(I)));
      end loop;
      Put_Line("");
      Put_Line("Obtained: ");
      for I in Bitword'Range loop
        Put(Bit'Image(obtained_bits(I)));
      end loop;
      Put_Line("");
    else
      Put_Line("PASSED: word to bits");
    end if;
  end test_bits_to_word_conversion;

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input     : Bitstream(1..5) := (1, 1, 0, 0, 1);
    Output    : Bitstream(1..Bitrate*2);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..Bitrate/2);
    C         : Natural;
    ExpHex    : String(1..Bitrate/2);
    Error     : Natural;
    Pos       : Natural;
  begin
    ExpHex := "B57B7DAED6330F79BA5783C5D45EABFFA1461FAC6CEA09BD"&
              "AAC114F17E23E5B349EECBC907E07FA36ECF8374079811E8"&
              "5E49243D04182C389E68C733BE698468423DB9891D3A7B10"&
              "320E0356AB4AB916F68C0EA20365A1D4DBA48218CA89CBB8"&
              "6D08A34E04544D4100FFE9CB138EADC2D3FC0E8CC2BC15A7"&
              "5B950776970BFC310F33BF609630D73CAD918CF54657589E"&
              "42CF7CBF20DE677D2AB7E49389F6F6C3B3FE2992905325CE"&
              "60931C1515043595ADC1619CB7E034EF52BDC485D03B7FDD"&
              "7345E849FFB4C4426195C8D88C1E7BF9ADA41B92E006C3DA"&
              "F1ED0FD63ADD9408A3FC815F727457692727637687C1F79D"&
              "837DE20798E64C878181C02DF56A533F684459E8A03C8EF6"&
              "234854531110E6CD9BDEFEA85E35C802B1ACDDF29C9332E2"&
              "53C0FA72F3ED1ABA274838CFE6EF8BD572E89E1C2135F6A7"&
              "5BC5D6EA4F85C9A757E68E561A56AC0FC19F1F086C43272F";

    Put_Line("---sponge test---");
    Sponge(Input, Bitrate, Output);
    Put_Line("Input is:");
    for I of Input loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output'Length/4 loop
      Pos := Output'First + (I-1)*4;
      C := Natural( Output( Pos ) ) * 8 +
           Natural( Output( Pos + 1 ) ) * 4 +
           Natural( Output( Pos + 2 ) ) * 2 +
           Natural( Output( Pos + 3 ) );
			Hexstring(I) := Hex(C);
      if Hexstring(I) /= ExpHex(I) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex);
    Put_Line("Obtained: ");
    Put_Line(Hexstring);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

  procedure test_keccak_function(T: in Test_Round) is
    S: State;
  begin
    Put_Line("---Full Keccak Function test---");
    S := Keccak_Function(T(Round_Index'First)(None));
    if S /= T(Round_Index'Last)(Iota) then
      Put_Line("FAILED: full keccak function test");
    else
      Put_Line("PASSED: full keccak function test");
    end if;
  end test_keccak_function;

The .vpatch for the above, together with its signature are as usual on my Reference Code Shelf and linked for your convenience below:

In the next chapter I’ll get rid of the endianness trouble by changing everything in the Keccak implementation (constants’ representation included) so that all of it works at bit level, as it should! Feel free to play around with this current version anyway but be aware that it’s not the one that will be used in EuCrypt. For the proper, bit-level implementation of Keccak, see Chapter 8 of this series (to be published next week).

  1. As described in The Keccak Reference v. 3.0, Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011.[]

January 18, 2018

EuCrypt Chapter 6: Keccak Transformations

Filed under: EuCrypt — Diana Coman @ 8:46 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

EuCrypt will use Keccak for all its hashing and RSA padding needs, as per TMSR-RSA specification. In this age of ever-mutating labels on top of labels though, I have to clearly state that EuCrypt will actually use Keccak itself and not SHA-3 or whatever other thing the original Keccak currently morphed into. More specifically and true to its name, EuCrypt’s Keccak is a direct implementation of The Keccak Reference, Version 3.0 1. This means of course that I have to do the implementation from scratch since history is apparently re-written on the webs rather than preserved: current keccak website morphed at some point 2 to the “new keccak” aka SHA-3, simply wiping the previous code from sight. Why would one assume their own past, why would one even need or want or try to actually follow the evolution of anything at all, right? There is apparently only “now” on the wide webs outside TMSR and everything is just re-written and replaced with no traceability to speak of. 3

On the bright side, this Keccak implementation does not need to rely on the rather unreliable MPI or any other existing parts for that matter. Moreover, it is meant to work perfectly fine as part of EuCrypt itself but also as a standalone component. Consequently, I’m in the happy position of being able to gladly discard C/C++ as programming language and proceed in a much saner and altogether more pleasant to use language: Ada. The discussion of this choice and of Ada itself is outside the scope of this series but the interested reader can find quite a lot on this topic in the logs. This choice of a new programming language comes with its own challenge of course, as Ada is quite new to me, but this is the sort of challenge that you should actively search for and run after rather than the sort to run away from and turn down when it appears. So I’ll write it in Ada – nothing better than a pointed need to actually learn something anyway.

Still on this pleasantly bright side, the task of implementing Keccak in Ada is relatively straightforward to start with, mainly due to the clear description of the Keccak permutations in the reference paper. It also helped to be able to play around a bit in the beginning with a previous attempt at Keccak in Ada, by Peter Lambert. Although I discovered a few problems with that initial attempt, it was nevertheless quite useful as a stepping stone to better understand what Keccak is about and what sort of troubles one might have when implementing it in Ada. Once that initial needed exploration was done I proceeded however to implement Keccak separately, from scratch and this new version of mine is the one I will focus on here. In other words, if there are any errors in this code they are entirely mine.

According to the reference paper, Keccak “is a family of sponge functions that use as a building block a permutation from a set of 7 permutations” 4. If that doesn’t clarify the issue, it’s mainly because hash functions are essentially voodoo or in other words, *what* they really do is not all that clearly (as in mathematically) proven anywhere. That aside, *how* this unknown actual effect is achieved is quite clearly defined. In a nutshell, the working is this: a stream of input bits (the original text) are loaded into a 3D structure (essentially a cuboid made of bits, think of a larger, non-standard Rubik’s cube where instead of colours you have bits in each cell and the range of permitted movements is not limited by physical considerations); this Keccak cuboid (called “state”) is then scrambled by means of 5 transformations applied in a pre-defined order; the same sequence of 5 transformations is then repeated several times with different constants affecting each repetition (each full set of transformations is called a round); the resulting scrambled bits can be extracted then back into a stream that would represent the hash of the original text. Note that the “7 permutations” mentioned in the definition are not the 5 transformations of the state. Instead, the idea is that Keccak itself, as a whole, acts as a permutation over a number of bits b, where b can take 7 distinct values (hence the 7 permutations). Essentially there is only one mechanism but it can work on 7 different sizes of a bitstream.

The first step (the part that this chapter covers) is to implement therefore the Keccak permutation, meaning in more detail precisely this “state” structure and the 5 transformations that can work with it. I’ll start by defining the needed knobs, constants and types, all of them part of the SMG_Keccak package in a new file eucrypt/smg_keccak/smg_keccak.ads:

 -- S.MG implementation of Keccak-f permutations

 -- (Based on The Keccak Reference, Version 3.0, January 14, 2011, by
 --   Guido Bertoni, Joan Daemen, Michael Peeters and Gilles Van Assche)

 -- S.MG, 2018

package SMG_Keccak is
  pragma Pure(SMG_Keccak);  --stateless, no side effects -> can cache calls

  --knobs (can change as per keccak design but fixed here for S.MG purposes)--
  Keccak_L: constant := 6;  --gives keccak z (word) dimension of 2^6=64 and
                            --therefore keccak function 1600 with current
                            --constants (5*5*2^6)

  --constants: dimensions of keccak state and number of rounds
  XY_Length: constant := 5;
  Z_Length: constant := 2**Keccak_L;
  Width: constant := XY_Length * XY_Length * Z_Length;
  N_Rounds: constant := 12 + 2*Keccak_L;

  --types
  type XYCoord is mod XY_Length;
  type ZCoord is mod Z_Length;
  type Round_Index is mod N_Rounds;

  type ZWord is mod 2**Z_Length;	--"lane" in keccak ref
  type Plane is array(XYCoord) of ZWord; --a "horizontal slice" of keccak state
  type State is array(XYCoord, XYCoord) of ZWord; --the full keccak state

  type Round_Constants is array(Round_Index) of ZWord;  --magic keccak constants

The “pragma Pure” line at the beginning of the SMG_Keccak package indicates the fact that this implementation of Keccak is made on purpose to be *stateless*. This means that none of the procedures and functions in the package affect any global variables or state, indeed that there are no such global variables or state(s) in the first place. If this strikes you as odd given that Keccak itself has effectively states (through the distinct rounds for instance and further deep down the different transformations that have to be applied in a pre-defined order) note that those are at most *internal* states of Keccak rather than external and there is no reason whatsoever for those “states” to be visible from outside or indeed to be actually stored as such. Each procedure and function in SMG_Keccak really operates on a *given* (as opposed to stored) state (and round constant for the iota function) producing another state, without any need to rely on anything else. In other words, there are no side effects of calling SMG_Keccak functions/procedures. As I am indeed very happy with *not* having any side effects if I can help it at all, that pragma stays exactly where it is.

As you can notice above further in the code after the pragma, there is a single knob for the user to play with, namely Keccak_L or length. The value of this knob however is used to effectively choose one of the “7 permutations” meaning in practice to calculate the number of Keccak rounds (i.e. how many times the full set of 5 transformations are applied), the Z dimension of the Keccak cuboid (Z_Length) and consequently the total width (i.e. how many bits can fit at any one given time). By adjusting this knob, the user can obtain a wider or narrower Keccak cuboid, trading to some extent width for speed (since there are fewer bits and also fewer rounds for a smaller width). However, the other 2 dimensions (X and Y, named for convenience) are fixed at 5, as per Keccak reference documentation. Similarly, the number of rounds takes the length knob into account but it is nevertheless at least 13, as an absolute minimum.

The types defined in the code above take advantage of one of Ada’s very useful approaches: each type really spans the set of values that are valid for the intended use and nothing else. For instance, XY_Coord is defined as a modular type, based on XY_Length. This means that valid values of XY_Coord type are only 0 to XY_Coord -1 and moreover, any calculations with XY_Coord type will be considered modulo XY_Length. No need for further code to check on this, no headaches having to check again and again explicitly at all times 5 that only this subset of values are valid X, Y coordinates: it’s enough to define the type properly here and then simply use it throughout as intended!

Similarly to XY_Coord, there is Z_Coord as modular type with only difference that this is modulo Z_Length, since the Z dimension is not fixed and potentially different from X/Y dimensions. Using those, the Keccak cuboid is defined as “State”: a matrix of ZWords, where each ZWord is of length 2^Z_Length (i.e. contains Z_Length bits). The additional type Plane represents a horizontal “slice” of the cuboid and is defined for convenience since it comes in very handy for some of the permutations later on. Note that the reference documentation defines vertical slices as well but I did not find (at least not yet) any actual need for them, so I did not include them as separate types.

The next part of the same file contains the definition of the internal constants and methods of Keccak:

private
  -- these are internals of the keccak implementation, not meant to be directly
  --  accessed/used

  --Keccak magic numbers
  RC : constant Round_Constants :=
    (
     16#0000_0000_0000_0001#,
     16#0000_0000_0000_8082#,
     16#8000_0000_0000_808A#,
     16#8000_0000_8000_8000#,
     16#0000_0000_0000_808B#,
     16#0000_0000_8000_0001#,
     16#8000_0000_8000_8081#,
     16#8000_0000_0000_8009#,
     16#0000_0000_0000_008A#,
     16#0000_0000_0000_0088#,
     16#0000_0000_8000_8009#,
     16#0000_0000_8000_000A#,
     16#0000_0000_8000_808B#,
     16#8000_0000_0000_008B#,
     16#8000_0000_0000_8089#,
     16#8000_0000_0000_8003#,
     16#8000_0000_0000_8002#,
     16#8000_0000_0000_0080#,
     16#0000_0000_0000_800A#,
     16#8000_0000_8000_000A#,
     16#8000_0000_8000_8081#,
     16#8000_0000_0000_8080#,
     16#0000_0000_8000_0001#,
     16#8000_0000_8000_8008#
    );

  --gnat-specific methods to have bit-ops for modular types
  function Rotate_Left( Value  : ZWord;
	                      Amount : Natural)
	                      return ZWord;
  pragma Import(Intrinsic, Rotate_Left);

  function Shift_Right( Value  : ZWord;
                        Amount : Natural)
                        return ZWord;
  pragma Import(Intrinsic, Shift_Right);

  --Keccak permutations
  function Theta ( Input       : in State) return State;
  function Rho   ( Input       : in State) return State;
  function Pi    ( Input       : in State) return State;
  function Chi   ( Input       : in State) return State;
  function Iota  ( Round_Const : in ZWord; Input : in State) return State;

  --Keccak full function with block width currently 1600 (Width constant above)
  --this simply applies *all* keccak permutations in the correct order and using
  -- the keccak magic numbers (round constants) as per keccak reference
  function Keccak_Function(Input: in State) return State;

end SMG_Keccak;

In the internals (private part) of the SMG_Keccak package, there are first the actual values of the constants that essentially differentiate each round from the others. Those are for all intents and purposes magic numbers, no way around it, so they get called in the code precisely that: Keccak magic numbers. After those, there are two gnat-specific methods imported for bit rotation and bit shifting of modular types. While I still don’t like these imports, I don’t have a good alternative for now, so there they are. Finally, the actual Keccak transformations follow, each of them taking as input a Keccak State (and in the case of the last transformation, iota, also a round constant) and providing as output another Keccak State. The Keccak_Function that follows applies the 5 transformations (theta, rho, pi, chi, iota) in the correct order and moreover iteratively and with the correct constants as required by the pre-established (for this particular Keccak permutation) number of rounds.

The implementation of all the above Keccak transformations and function can be found in eucrypt/smg_keccak/smg_keccak.adb. The code should be relatively easy to follow as it adheres quite closely to the pseudo-code given in the Keccak reference:

 -- S.MG, 2018

package body SMG_Keccak is

  function Theta(Input : in State) return State is
    Output : State;
    C      : Plane;
    W      : ZWord;
  begin
    for X in XYCoord loop
      C(X) := Input(X, 0);
      for Y in 1..XYCoord'Last loop
        C(X) := C(X) xor Input(X, Y);
      end loop;
    end loop;

    for X in XYCoord loop
      W := C(X-1) xor Rotate_Left(C(X+1), 1);
      for Y in XYCoord loop
        Output(X,Y) := Input(X,Y) xor W;
      end loop;
    end loop;

    return Output;
  end Theta;

  function Rho(Input : in State) return State is
    Output      : State;
    X, Y, Old_Y : XYCoord;
  begin
    Output(0,0) := Input(0,0);
    X           := 1;
    Y           := 0;

    for T in 0..23 loop
      Output(X, Y) := Rotate_Left(Input(X,Y), ( (T+1)*(T+2)/2) mod Z_Length);
      Old_Y := Y;
      Y := 2*X + 3*Y;
      X := Old_Y;
    end loop;
    return Output;
  end rho;

  function Pi(Input : in State) return State is
    Output: State;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        Output(Y, 2*X + 3*Y) := Input(X, Y);
      end loop;
    end loop;
    return Output;
  end pi;

  function Chi(Input : in State) return State is
    Output: State;
  begin
    for Y in XYCoord loop
      for X in XYCoord loop
        Output(X, Y) := Input(X, Y) xor
                        ( (not Input(X + 1, Y)) and Input(X + 2, Y) );
      end loop;
    end loop;
    return Output;
  end chi;

  function Iota(Round_Const : in ZWord; Input : in State) return State is
    Output: State;
  begin
    Output := Input;
    Output(0,0) := Input(0,0) xor Round_Const;
    return Output;
  end iota;

  function Keccak_Function(Input: in State) return State is
    Output: State;
  begin
    Output := Input;
    for I in Round_Index loop
      Output := Iota(RC(I), Chi(Pi(Rho(Theta(Output)))));
    end loop;

    return Output;
  end Keccak_Function;

end SMG_Keccak;

To round this off, all we need is a way to compile everything. If you are using gnatmake, it’s quite straightforward as it’s only a file for now. However, for the future and in the interest of making choices explicit, I’ve wrote a .gpr file as well (eucrypt/smg_keccak/smg_keccak.gpr), for use with gprbuild:

 -- S.MG, 2018
project SMG_Keccak is
  for Languages use ("Ada");
  for Library_Name use "SMG_Keccak";
  for Library_Kind use "static";

  for Source_Dirs use (".");
  for Object_Dir use "obj";
  for Library_Dir use "lib";
end SMG_Keccak;

As usual, an implementation cannot really be published without any tests at all, so there is a tests folder too, containing two text files with one distinct test case each (taken from keccak archives that I managed to find) and the corresponding Ada testing code that reads those text files, runs the Keccak implementation and reports at each step if the expected and actual outputs of each transformation match or not. As usual again, this is of course more code than in the implementation that it tests, apparently I can’t escape this. Moreover, a lot of it is the faffing about with parsing the input since the original “format” of the test vectors does not strike me at all as particularly friendly for automated tests. The whole testing code is quite strict on having only one single test case per file as well as some specific markers in the text itself to be able to identify correctly each round and state. Nevertheless, strict and long as it is, you can find it all in eucrypt/smg_keccak/tests/smg_keccak.adb:

with SMG_Keccak; use SMG_Keccak;
with Ada.Exceptions; use Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Interfaces; use Interfaces;

procedure SMG_Keccak.Test is
  --types
  type Keccak_Perms is (None, Theta, Rho, Pi, Chi, Iota);
  type Test_Vector is array(Keccak_Perms) of State;
  type Test_Round is array(Round_Index) of Test_Vector;

  --helper methods

  procedure print_state(S: in State; Title: in String) is
    Hex: array(0..15) of Character := ("0123456789ABCDEF");
    Len: constant Natural := Z_Length / 4;
    HexString: String(1..Len);
    W: ZWord;
  begin
    Put_Line("---------" & Title & "---------");
    for Y in XYCoord loop
      for X in XYCoord loop
        W := S(X,Y);
        for Z in 0..Len-1 loop
          HexString(Natural(Len-Z)) := Hex(Natural(W mod 16));
          W := W / 16;
        end loop;
        Put(HexString & " ");
      end loop;
      Put_Line("");
    end loop;
  end;

  function read_state(File: in FILE_TYPE; Oct: Positive :=8) return State is
    S: State;
    Line1: String := "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000";
    StartPos, EndPos: Positive;
    Len: Positive := Oct*2;
  begin
    for Y in XYCoord loop
      Line1 := Get_Line(File);
      StartPos := Line1'First;
      EndPos := StartPos + Len-1;

      for X in XYCoord loop
        S(X,Y) := ZWord'value("16#" & Line1(StartPos..EndPos) & "#");
        StartPos := EndPos + 2;	--one space to skip
        EndPos := StartPos + Len - 1;
      end loop;
    end loop;
    return S;
  end read_state;

  --reads a full test round from specified file (pre-defined format)
  function read_from_file (filename : in String;
                           T        : out Test_Round)
                           return Boolean is
    file: FILE_TYPE;
    InputMarker: String := "lanes as 64-bit words:";
    octets: Positive := 8;
    RoundNo: Round_Index;
  begin
    -- try to open the input file
    begin
      open(file, In_File, filename);
    exception
      when others =>
        Put_Line(Standard_Error,
                 "Can not open the file '" & filename & "'. Does it exist?");
        return False;
    end;

  -- find & read input state first
    RoundNo := -1;
    loop
      declare
        Line: String := Get_Line(file);
      begin
        --check if this is test data of any known kind
        if index(Line, InputMarker, 1) > 0 then
          T(0)(None) := read_state(file, octets);
          print_state(T(0)(None), "Read Input State");
        elsif index(Line, "Round ", 1) > 0 then
          RoundNo := RoundNo +1;
        elsif index(Line, "theta", 1) > 0 then
          T(RoundNo)(Theta) := read_state(file, octets);
          if (RoundNo > 0) then
            T(RoundNo)(None) := T(RoundNo-1)(Iota);  -- previous state as input
          end if;
        elsif index(Line, "rho", 1) > 0 then
          T(RoundNo)(Rho) := read_state(file, octets);
        elsif index(Line, "pi", 1) > 0 then
          T(RoundNo)(Pi) := read_state(file, octets);
        elsif index(Line, "chi", 1) > 0 then
          T(RoundNo)(Chi) := read_state(file, octets);
        elsif index(Line, "iota", 1) > 0 then
          T(RoundNo)(Iota) := read_state(file, octets);
        end if;
        exit when End_Of_File(file);
      end;
    end loop;
    Close(file);
    return True;
  end read_from_file;

  -- performs one single round of Keccak, step by step
  -- each permutation is tested separately
  -- test fails with exception raised at first output not matching expected
  procedure test_one_round(T: Test_Vector; Round: Round_Index) is
    Input: State;
    Expected: State;
    Output: State;
    Test_One_Round_Fail: Exception;
  begin
    Input := T(None);
    for I in Keccak_Perms range Theta..Iota loop
      Expected := T(I);
      case I is
        when Theta => Output := SMG_Keccak.Theta(Input);
        when Rho   => Output := SMG_Keccak.Rho(Input);
        when Pi    => Output := SMG_Keccak.Pi(Input);
        when Chi => Output := SMG_Keccak.Chi(Input);
        when Iota => Output := SMG_Keccak.Iota(RC(Round), Input);
        when others => null;
      end case;

      if (Output /= Expected) then
        print_state(Output, "----------real output-------");
        print_state(Expected, "----------expected output--------");
        raise Test_One_Round_Fail;
      else
        Put_Line("PASSED: " & Keccak_Perms'Image(I));
      end if;
      -- get ready for next permutation
      Input := Expected;
    end loop;
  end test_one_round;
  -- end of helper methods

	--variables
  T: Test_Round;
begin
  Put_Line("-----Testing with zero state as input------");
  if (not read_from_file("testvectorszero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  Put_Line("-----Testing with non-zero state as input------");
  if (not read_from_file("testvectorsnonzero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

end SMG_Keccak.Test;

The .gpr file for building the test suite that can then be simply executed:

 -- Tests for SMG_Keccak (part of EuCrypt)
 -- S.MG, 2018


project SMG_Keccak_Test is
  for Source_Dirs use (".", "../");
  for Object_Dir use "obj";
  for Exec_Dir use ".";

  for Main use ("smg_keccak-test.adb");
end SMG_Keccak_Test;

The .vpatch and its signature for this chapter are made quite on purpose to have only the genesis of EuCrypt as ascendant. This reflects the fact that smg_keccak itself does not depend on mpi or smg_rsa. It also allows any users of smg_keccak to potentially take just the smg_keccak tree if that’s all they need out of EuCrypt. So the next chapters will build on this one and essentially further develop the smg_keccak branch of the big EuCrypt tree. When the whole smg_keccak is ready, a unifying .vpatch will bring everything back together into a common trunk, as everything gets used together as intended. Until then, here’s this first keccak .vpatch and its signature:

In the next chapter I’ll further expand this Keccak implementation so stay tuned!

  1. Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011. The Keccak Reference. Version 3.0[]
  2. As the man says: holly shit the original keccak www is gone.[]
  3. Why exactly is this so? Try and tell yourself why. From where I am, I can only really see a whole army of underlings (not even employees for they do this dirty work unpaid even, for the saddest part of it) at the Ministry of Truth busily at work, what is there more to say about it.[]
  4. Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011. The Keccak Reference, Version 3.0, p. 7[]
  5. of course one still keeps that in mind, but it doesn’t have to be at the forefront at all times since there’s no need to re-implement the check each time coordinates are used.[]

January 11, 2018

EuCrypt Chapter 5: C = M ^ e mod n

Filed under: EuCrypt — Diana Coman @ 4:21 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Having a true random number generator (trng) and on top of it a true random prime number generator (trpng) from previous chapters, I can now finally touch on RSA 1 itself: this chapter adds a way to generate RSA keys and to actually use them directly to encrypt/decrypt a chunk of octets as they are given 2. Future chapters will build further on this by adding for instance slicing of messages into adequate blocks, padding and hashing. For now though, I’ll focus strictly on RSA itself, meaning on obtaining the components of a working RSA key pair and then using those for the two main RSA operations of encryption and decryption. Specifically:

  • RSA public key: n, e, where n is called the modulus and e is called the public exponent. Most importantly, n is obtained as a product of 2 secret primes (usually called p and q) that are randomly chosen.
  • RSA private key: n, d, where n is the same one as above in the public key and d is essentially an inverse of e that depends not only on e itself but also on the hidden p and q.
  • encryption: C = M ^ e mod n, where C is the resulting encrypted message (cipher), M is the message in plain-text, e is the public exponent of an RSA key pair and n is the public modulus of the same key pair.
  • decryption: M = C ^ d mod n, where M, C and n are the very same as in the encryption formula above and d is the secret exponent of the same key pair as above (the inverse of e, essentially).

Note that mathematically encryption and decryption are really the very same operation: an exponentiation modulo n (even same n). However, in practice the two operations need to be treated quite separately because they involve knowledge of different parts and that is the whole point from a cryptographic perspective: encryption means using someone’s public key, hence the assumed knowledge is of e and n, nothing else; decryption however means using one’s own key, hence there is full knowledge of not only e and n but also the hidden parts, p and q. This difference of knowledge translates at implementation time in a different route taken to perform those same exponentiations modulo n: while encryption has to proceed directly as it is, decryption can take a faster route using the Chinese Remainder Theorem (CRT).

It’s worth mentioning that the use of CRT at this stage is considered acceptable for Eulora’s needs. However, you need to make your own decision on this. Note that a truly non-leaking RSA requires effectively constant-time operations at all levels, so you’ll need to throw away more than CRT itself if that’s what you are aiming for – you’ll probably want to have a look at FFA in such case.

Generating a RSA key pair in EuCrypt consists in the following steps:

  • use the true random prime number generator from Chapter 4 to obtain 2 random primes, p and q, of 2048 bits (256 octets) each. This value is precisely half of the intended key length, as per current TMSR RSA specification. Note that both primes will have top 2 bits set to 1 precisely to ensure that their product is indeed 4096 bits in length in all cases. See discussion in Chapter 4 for more details on the working of the generator itself.
  • Compare p and q and swap them if needed so that p is always less than q. Note that this is NOT a requirement of RSA itself but it is needed basically as helper for the use of the Chinese Remainder Theorem (CRT) to speed-up decryption, see next step and then the decryption part itself.
  • Calculate u so that u * p = 1 (mod q). This is effectively the inverse of p, mod q. Due to the previous step, p here is always less than q. As with the previous step, this is not required by RSA itself – it’s simply a value that is used at decryption to speed up the calculations by means of using CRT.
  • Calculate the modulus n of the key pair, as n = p * q.
  • Calculate the Euler totient 3 of n, phi = (p – 1) * (q – 1)
  • Choose another random prime between 3 and phi. This is e (3 < e < phi), the public exponent of the newly generated key pair. A few comments here:

    • The public exponent is not a fixed value and there really is no defensible reason why it should be fixed at the level of an RSA implementation meant to really serve the user 4 rather than anyone else. So EuCrypt does not fix the public exponent to any value. By contrast, GnuPG fixes the public exponent to 65537 as it really knows better than you what you need and even what you could possibly want, what want and what choice, why should you even think of any such things? You as user of EuCrypt can of course fix anything you want, public exponent included, IF you want to.
    • Mathematically, RSA requires e to be merely co-prime with (p-1)*(q-1), NOT necessarily prime in itself. However, EuCrypt chooses here the stronger constraint of e strictly prime, as per previous discussion of the TMSR spec.
    • The chosen size of e is the same as that of p and q. This means that the public exponent will likely be large but at the same time the corresponding private exponent will not become tiny either. For more details see the same previous discussion of the TMSR spec, as linked above.
    • Because e is obtained the same way as any other prime in EuCrypt, it will also always have top 2 bits and bottom 1 bit (so 3 bits in total) all set to 1. Combined with the fact that minimum length accepted by the prime generator is 1 octet (8 bits), it follows that, as long as the current generator of primes is used, the e in EuCrypt will be in fact at all times at least 193 (1+64+128) even if the length of e is lowered (with current length of 256 octets, that minimum is significantly higher). Nevertheless, this higher limit is imposed by the current generator, not by the RSA algorithm itself and for this reason the check here is more lenient as it aims to reflect requirements of RSA rather than other characteristics of its current surroundings – basically the generator can be changed at a later time without having to touch the RSA code itself.
    • The search for a suitable e is iterative: if a prime happens to fail the boundary checks (so it’s either too small or too large) then it is discarded and another prime is generated.
  • calculate the private exponent, d, such that e * d = 1 mod phi (the inverse of e, mod phi).

The implementation of the above relies on two new structures that hold the various parts of a public and private RSA key, respectively. Those two new structures are added by the .vpatch for this chapter to eucrypt/smg_rsa/include/smg_rsa.h:

typedef struct {
    MPI n;      /* modulus */
    MPI e;      /* public exponent */
} RSA_public_key;

typedef struct {
    MPI n;      /* public modulus */
    MPI e;      /* public exponent */
    MPI d;      /* private exponent: e*d=1 mod phi */
    MPI p;      /* prime  p */
    MPI q;      /* prime  q */
    MPI u;      /* inverse of p mod q */
} RSA_secret_key;

In the same file eucrypt/smg_rsa/include/smg_rsa.h, there is a further addition of the signature of the method that generates a RSA key pair:

/*********rsa.c*********/
/*
 * Generates a pair of public+private RSA keys using directly the entropy source
 * specified in eucrypt/smg_rsa/include/knobs.h
 *
 * ALL RSA keys are 4096 bits out of 2 2048 bits primes, as per TMSR spec.
 *
 * @param sk a fully-allocated structure to hold the generated keypair (secret
key structure holds all the elements anyway, public key is a subset of this)
 *
 * NB: this procedure does NOT allocate memory for components in sk!
 *     caller should ALLOCATE enough memory for all the MPIs in sk
 * Precondition:
 * MPIs in sk have known allocated memory for the nlimbs fitting their TMSR size
 */
void gen_keypair( RSA_secret_key *sk );

It’s worth noting what the comments shout at you there as loud as they can: as in previous code, I avoid separating allocation of memory from de-allocation and for this reason, it’s the caller’s responsibility to allocate memory for the things they want to use (in this case the MPIs that make up the key pair).

If you wonder why the gen_keypair method takes only one argument rather than two (i.e. only a secret key structure rather than a secret key AND a public key structure) scroll back and look again at the two structures: the public key is really but a subset of the secret key, so the secret key structure effectively holds the whole “pair” of keys by itself.

The actual body of the gen_keypair method lives in a new file, eucrypt/smg_rsa/rsa.c:

*
 * An implementation of TMSR RSA
 * S.MG, 2018
 */

#include "smg_rsa.h"
#include 

void gen_keypair( RSA_secret_key *sk ) {
  /* precondition: sk is not null */
  assert(sk != NULL);

  /* precondition: enough memory allocated, corresponding to key size */
  int noctets_pq = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs_pq = mpi_nlimb_hint_from_nbytes( noctets_pq);
  unsigned int nlimbs_n = mpi_nlimb_hint_from_nbytes( KEY_LENGTH_OCTETS);
  assert( mpi_get_alloced( sk->n) >= nlimbs_n);
  assert( mpi_get_alloced( sk->p) >= nlimbs_pq);
  assert( mpi_get_alloced( sk->q) >= nlimbs_pq);

  /* helper variables for calculating Euler's totient phi=(p-1)*(q-1) */
  MPI p_minus1 = mpi_alloc(nlimbs_pq);
  MPI q_minus1 = mpi_alloc(nlimbs_pq);
  MPI phi = mpi_alloc(nlimbs_n);

  /* generate 2 random primes, p and q*/
  /* gen_random_prime sets top 2 bits to 1 so p*q will have KEY_LENGTH bits */
  /* in the extremely unlikely case that p = q, discard and generate again */
  do {
    gen_random_prime( noctets_pq, sk->p);
    gen_random_prime( noctets_pq, sk->q);
  } while ( mpi_cmp( sk->p, sk->q) == 0);

  /* swap if needed, to ensure p < q for calculating u */
  if ( mpi_cmp( sk->p, sk->q) > 0)
    mpi_swap( sk->p, sk->q);

  /* calculate helper for Chinese Remainder Theorem:
      u = p ^ -1 ( mod q )
     this is used to speed-up decryption.
  */
  mpi_invm( sk->u, sk->p, sk->q);

  /* calculate modulus n = p * q */
  mpi_mul( sk->n, sk->p, sk->q);

  /* calculate Euler totient: phi = (p-1)*(q-1) */
  mpi_sub_ui( p_minus1, sk->p, 1);
  mpi_sub_ui( q_minus1, sk->q, 1);
  mpi_mul( phi, p_minus1, q_minus1);

  /* choose random prime e, public exponent, with 3 < e < phi */
  /* because e is prime, gcd(e, phi) is always 1 so no need to check it */
  do {
    gen_random_prime( noctets_pq, sk->e);
  } while ( (mpi_cmp_ui(sk->e, 3) < 0) || (mpi_cmp(sk->e, phi) > 0));

  /* calculate private exponent d, 1 < d < phi, where e * d = 1 mod phi */
  mpi_invm( sk->d, sk->e, phi);

  /*  tidy up: free locally allocated memory for helper variables */
  mpi_free(phi);
  mpi_free(p_minus1);
  mpi_free(q_minus1);
}

As usual, the method checks as well as it can its own preconditions that reduce in this case to some checks on the known size of allocated memory for some of the MPIs. However, these checks don’t change the fact that it still is the caller’s responsibility to allocate memory for *all* MPIs in the structure and to allocate *enough* such memory for each of them, too. The unfortunate souls who have some knowledge of the mpi lib might interject at this point with the observation that mpi methods tend to re-allocate memory whenever they deem it necessary without any concern whatsoever about whether it is their role to do so. This is true unfortunately and it’s not something I’m going to sink time into fixing at this moment. Still, it’s not in itself a valid excuse or otherwise a “reason” to fail to allocate the correct amount of memory from the start. Don’t add to existing garbage lest you’ll get eaten by rats later and don’t get lazy just because there’s nobody looking right now.

Encryption with a previously generated public RSA key is a simple exponentiation modulo n. However, as usual, digging through the parts of the mpi lib that are needed for this reveals that the mpi_powm method that does this exponentiation is not only rather gnarly but also unable to handle properly the corner case when the MPIs for storing the input (the message to encrypt so the MPI raised to power e) and the output (the result of the encryption, hence the result of the exponentiation) are the same. The “solution” of GnuPG on this is -again, as usual- to paper it over and force the RSA layer to take care to avoid this case by allocating memory and using basically a temporary copy of the original input MPI. As you can probably tell by now, I don’t like this approach at all because it neither solves the problem nor flags it as such, clearly 5. Instead, it avoids the problem but it does so in the wrong place (why should the encryption operation allocate new memory?) and with the unfortunate effect of hiding it from anything that builds on top of the RSA layer.

Since fixing this wobble of the mpi method promises to be more time-consuming than it’s currently worth it, EuCrypt takes the second-best option on it: the encryption method clearly states as its precondition that input and output have to be two different MPIs; the reason for this is given too, so that the problem is documented clearly, not hidden; the method then checks this precondition and it aborts execution if the check fails; whenever the precondition is met, the method simply does precisely what it promised (i.e. the exponentiation) and nothing more. The signature of this method is in eucrypt/smg_rsa/include/smg_rsa.h:

/****************
 * Public key operation. Encrypt input with pk and store result into output.
 *
 *  output = input^e mod n , where e,n are elements of pkey.
 * NB: caller should allocate *sufficient* memory for output to hold the result.
 * NB: NO checks are made on input!
 *
 * @param output MPI with enough allocated memory to hold result of encryption
 * @param input MPI containing content to encrypt; it *has to be* different from
output!
 * @param pk the public key that will be used to encrypt input
 *
 * Precondition:
 *  output != input
 * Output and input have to be two distinct MPIs because of the sorry state of
the underlying mpi lib that can't handle properly the case when those are the
same.
 */
void public_rsa( MPI output, MPI input, RSA_public_key *pk );

The implementation of public_rsa is in eucrypt/smg_rsa/include/rsa.c and it barely has 3 lines in total, comment line included:

void public_rsa( MPI output, MPI input, RSA_public_key *pk ) {

  /* mpi_powm can't handle output and input being same */
  assert (output != input);

  mpi_powm( output, input, pk->e, pk->n );
}

Decrypting with a previously generated RSA private key is mathematically the same operation as encrypting. However, given the additional information available about the modulus (namely its factors, p and q), the implementation can take advantage of CRT to perform the same calculation faster (~4 times faster). This is implemented in EuCrypt in the method secret_rsa with signature in eucrypt/smg_rsa/include/smg_rsa.h:

/****************
 * Secret key operation. Decrypt input with sk and store result in output.
 *
 *  output = input^d mod n , where d, n are elements of skey.
 *
 * This implementation uses the Chinese Remainder Theorem (CRT):
 *
 *      out1   = input ^ (d mod (p-1)) mod p
 *      out2   = input ^ (d mod (q-1)) mod q
 *      h      = u * (out2 - out1) mod q
 *      output = out1 + h * p
 *
 * where out1, out2 and h are intermediate values, d,n,p,q,u are elements of
skey. By using CRT, encryption is *faster*. Decide for yourself if this fits
your needs though!
 * NB: it is the caller's responsibility to allocate memory for output!
 * NB: NO checks are made on input!
 *
 * @param output MPI with enough allocated memory to hold result of decryption
 * @param input MPI containing content to decrypt
 * @param sk the secret key that will be used to decrypt input
 */
void secret_rsa( MPI output, MPI input, RSA_secret_key *sk );

And the implementation of secret_rsa, in eucrypt/smg_rsa/rsa.c, with comments literally at every step:

void secret_rsa( MPI output, MPI input, RSA_secret_key *skey ) {
  /* at its simplest, this would be input ^ d (mod n), hence:
   *    mpi_powm( output, input, skey->d, skey->n );
   * for faster decryption though, we'll use CRT and Garner's algorithm, hence:
   *        u = p ^ (-1) (mod q) , already calculated and stored in skey
   *       dp = d mod (p-1)
   *       dq = d mod (q-1)
   *       m1 = input ^ dp (mod p)
   *       m2 = input ^ dq (mod q)
   *        h = u * (m2 - m1) mod q
   *   output = m1 + h * p
   * Note that same CRT speed up isn't available for encryption because at
encryption time not enough information is available (only e and n are known).
   */
  /* allocate memory for all local, helper MPIs */
  MPI p_minus1 = mpi_alloc( mpi_get_nlimbs( skey->p) );
  MPI q_minus1 = mpi_alloc( mpi_get_nlimbs( skey->q) );
  int nlimbs   = mpi_get_nlimbs( skey->n ) + 1;
  MPI dp       = mpi_alloc( nlimbs );
  MPI dq       = mpi_alloc( nlimbs );
  MPI m1       = mpi_alloc( nlimbs );
  MPI m2       = mpi_alloc( nlimbs );
  MPI h        = mpi_alloc( nlimbs );

  /* p_minus1 = p - 1 */
  mpi_sub_ui( p_minus1, skey->p, 1 );

  /* dp = d mod (p - 1) aka remainder of d / (p - 1) */
  mpi_fdiv_r( dp, skey->d, p_minus1 );

  /* m1 = input ^ dp (mod p) */
  mpi_powm( m1, input, dp, skey->p );

  /* q_minus1 = q - 1 */
  mpi_sub_ui( q_minus1, skey->q, 1 );

  /* dq = d mod (q - 1) aka remainder of d / (q - 1) */
  mpi_fdiv_r( dq, skey->d, q_minus1 );

  /* m2 = input ^ dq (mod q) */
  mpi_powm( m2, input, dq, skey->q );

  /* h = u * ( m2 - m1 ) mod q */
  mpi_sub( h, m2, m1 );
  if ( mpi_is_neg( h ) )
    mpi_add ( h, h, skey->q );
  mpi_mulm( h, skey->u, h, skey->q );

  /* output = m1 + h * p */
  mpi_mul ( h, h, skey->p );
  mpi_add ( output, m1, h );

  /* tidy up */
  mpi_free ( p_minus1 );
  mpi_free ( q_minus1 );
  mpi_free ( dp );
  mpi_free ( dq );
  mpi_free ( m1 );
  mpi_free ( m2 );
  mpi_free ( h );

}

And now that we have everything in place to actually generate RSA key pairs and proceed to encrypt and decrypt, a few new tests and timing methods are needed in eucrypt/smg_rsa/tests/tests.c:

/* Test encryption+decryption on noctets of random data, using sk
 * Output is written to file.
 */
void test_rsa_keys( RSA_secret_key *sk, unsigned int noctets, FILE *file ) {
  RSA_public_key pk;
  MPI test = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );
  MPI out1 = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );
  MPI out2 = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );

  pk.n = mpi_copy(sk->n);
  pk.e = mpi_copy(sk->e);
  unsigned char *p;
  p = xmalloc(noctets);

  fprintf(file, "TEST encrypt/decrypt on %d octets of random datan", noctets);
  fflush(file);
  if (get_random_octets( noctets, p) == noctets) {
    mpi_set_buffer( test, p, noctets, 0 );

    fprintf(file, "TEST data:n");
    mpi_print(file, test, 1);
    fprintf(file, "n");
    fflush(file);

    public_rsa( out1, test, &pk );
    secret_rsa( out2, out1, sk );

    fprintf(file, "ENCRYPTED with PUBLIC key data:n");
    mpi_print(file, out1, 1);
    fprintf(file, "n");
    fflush(file);

    fprintf(file, "DECRYPTED with SECRET key:n");
    mpi_print(file, out2, 1);
    fprintf(file, "n");
    fflush(file);

    if( mpi_cmp( test, out2 ) )
      fprintf(file, "FAILED: RSA operation: public(secret) failedn");
    else
      fprintf(file, "PASSED: RSA operation: public(secret) passedn");
    fflush(file);

    secret_rsa( out1, test, sk );
    public_rsa( out2, out1, &pk );
    if( mpi_cmp( test, out2 ) )
      fprintf(file, "FAILED: RSA operation: secret(public) failedn");
    else
      fprintf(file, "PASSED: RSA operation: secret(public) passedn");
  }
  else
    fprintf(file, "FAILED: not enough bits returned from entropy sourcen");

  fflush(file);
  xfree(p);
  mpi_free( pk.n);
  mpi_free( pk.e);

  mpi_free( test );
  mpi_free( out1 );
  mpi_free( out2 );
}

void test_rsa( int nruns, FILE *fkeys, FILE *fout) {
  RSA_secret_key sk;
  int noctets = KEY_LENGTH_OCTETS;
  int noctets_pq = noctets / 2;
  int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int nlimbs_pq = mpi_nlimb_hint_from_nbytes(noctets_pq);
  int i;

  sk.n = mpi_alloc(nlimbs);
  sk.e = mpi_alloc(nlimbs);
  sk.d = mpi_alloc(nlimbs);
  sk.p = mpi_alloc(nlimbs_pq);
  sk.q = mpi_alloc(nlimbs_pq);
  sk.u = mpi_alloc(nlimbs_pq);

  printf("TEST RSA key generation and use with %d runsn", nruns);
  fflush(stdout);

  for (i = 0;i < nruns; i++) {
    gen_keypair(&sk);
    printf(".");
    fflush(stdout);

    mpi_print(fkeys, sk.n, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.e, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.d, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.p, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.q, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.u, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    test_rsa_keys(&sk, noctets_pq, fout);
    printf("*");
    fflush(stdout);
  }

  mpi_free(sk.n);
  mpi_free(sk.e);
  mpi_free(sk.d);
  mpi_free(sk.p);
  mpi_free(sk.q);
  mpi_free(sk.u);

}

void test_rsa_exp() {
  MPI msg = mpi_alloc(0);
  MPI expected = mpi_alloc(0);
  MPI result;

  RSA_public_key pk;
  pk.n = mpi_alloc(0);
  pk.e = mpi_alloc(0);

  printf("TEST verify of rsa exponentiation on input data: n");

  mpi_fromstr(msg, "0x
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97");

  mpi_fromstr(expected, "0x
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE");
  result = mpi_alloc( mpi_get_nlimbs(expected) );

  mpi_fromstr(pk.n, "0x
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1");

  mpi_fromstr( pk.e, "0x10001");

  mpi_print( stdout, msg, 1);
  printf("n");

  public_rsa( result, msg, &pk);
  if ( mpi_cmp( result, expected) != 0 )
    printf( "FAILEDn");
  else
    printf( "PASSEDn");

  printf("Expected:n");
  mpi_print( stdout, expected, 1);
  printf("n");

  printf("Obtained:n");
  mpi_print( stdout, result, 1);
  printf("n");

  mpi_free( pk.n );
  mpi_free( pk.e );
  mpi_free( msg );
  mpi_free( expected );
  mpi_free( result );
}

void time_rsa_gen( int nruns ) {
  struct timespec tstart, tend;
  long int diff;
  int i;

  RSA_secret_key sk;
  int noctets = KEY_LENGTH_OCTETS;
  int noctets_pq = noctets / 2;
  int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int nlimbs_pq = mpi_nlimb_hint_from_nbytes(noctets_pq);
  sk.n = mpi_alloc(nlimbs);
  sk.e = mpi_alloc(nlimbs);
  sk.d = mpi_alloc(nlimbs);
  sk.p = mpi_alloc(nlimbs_pq);
  sk.q = mpi_alloc(nlimbs_pq);
  sk.u = mpi_alloc(nlimbs_pq);

  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i = 0;i < nruns; i++) {
    gen_keypair(&sk);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("TOTAL: %ld seconds for generating %d key pairsn", diff, nruns);
  printf("Average (%d runs): %f seconds per TMSR RSA key pair.n",
        nruns, diff / (1.0*nruns));
  mpi_free(sk.n);
  mpi_free(sk.e);
  mpi_free(sk.d);
  mpi_free(sk.p);
  mpi_free(sk.q);
  mpi_free(sk.u);
}

The testing suite grows as usual much more than the code itself. To put those new tests to use, there are a few more cases in the main function in tests.c:

    case 6:
      fk = fopen("keys.asc", "a");
      if ( fk == NULL )
        err("Failed to open file keys.asc!");
      fout = fopen("check_keys.asc", "a");
      if ( fout == NULL ) {
        fclose(fk);
        err("Failed to open file keys_check.asc!");
      }
      test_rsa(nruns, fk, fout);
      fclose(fk);
      fclose(fout);
      break;
    case 7:
      test_rsa_exp();
      break;
    case 8:
      time_rsa_gen(nruns);
      break;
    default:
      printf("Current test ids:n");
      printf("0 for timing entropy sourcen");
      printf("1 for entropy output testn");
      printf("2 for is_composite (Miller-Rabin) testn");
      printf("3 for timing Miller-Rabinn");
      printf("4 for random prime number generator testn");
      printf("5 for timing random prime number generatorn");
      printf("6 for testing rsa key pair generation and use; 
writes to keys.asc and check_keys.ascn");
      printf("7 for testing rsa exponentiation (fixed data)n");
      printf("8 for timing rsa key pair generatorn");
  }

  return 0;
}

You are of course invited to write your own tests and run them to your satisfaction. The tests provided are meant as *minimal* tests, not by any means a full testing suite of any sort. Note that option 7 (test_rsa_exp) effectively performs the verification of asciilifeform's signature for his Chapter 6 of FFA (which you are warmly invited to read for that matter). Take it as a little exercise to change there the data so that you check instead my signature of the .vpatch for EuCrypt's chapters or any other RSA signatures you might have.

Some preliminary timings for this implementation of RSA key generation suggest that a key pair can take on average almost one hour to generate (54 minutes averaged time per key pair on a set of 30 generated pairs). This relatively long wait for a key pair is the unavoidable cost of using truly random primes as opposed to pseudo-random ones (since a non-suitable candidate is discarded rather than massaged into primality for instance) and of increasing the number of iterations that Miller-Rabin performs when checking whether a candidate number is indeed prime. Note that the timings here are not directly comparable in any case with the timings previously reported for a different version of the rsa code, due to differences in both the code itself (most significant: fewer Miller-Rabin iterations, fixed exponent) and in the choice of measurement method (most significant: the time reported here is overall execution time that is ~always more than the actual CPU time in any case).

The .vpatch for all of the above and its signature live from now on on my Reference Code Shelf with links here too, for your convenience:

In the next chapter I'll let the RSA folder alone for once (I'll get back to it later) and move on to the chosen hashing function, namely keccak. So for the next chapter, dust your keccak references and have your Ada manuals at the ready, as keccak implementation will be in Ada, no more C for a while. Hooray!

  1. Rivest, Shamir and Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems[]
  2. It took only 4 chapters or otherwise a whole month not to mention the work done prior to that, just to get to actually generate some RSA keys but hey, it could have been worse! By worse I mean I could have taken the “easy” route, naively using GnuPG itself, trusting it because everybody uses it and it is open source and it’s been around for so long and it says it does RSA and Miller-Rabin and random numbers and saying it is surely now just as good as doing it only a whole lot easier, isn’t it?[]
  3. This gives the number of co-primes between 0 and the given number. As it is a multiplicative function, the totient phi(p*q) = phi(p)*phi(q) and since p and q are chosen to be prime numbers, they will each have all smaller positive numbers as their co-primes. Consequently, phi(n) = phi(p*q) = (p-1)*(q-1).[]
  4. Note that the user is the one who reads and understands what they are running, as a bare minimum requirement. Most pointedly, a monkey pressing some/any/all keys is not user of anything, it’s still a monkey and nothing else.[]
  5. These 2 options are the only acceptable ways of dealing with a problem really: ideally you solve it, meaning you address its root cause and from there up; if the ideal option is not chosen because of any less-than-ideal but very real world constraints of any kind, the honest approach is to flag the issue, making it, if at all possible, even more visible than it was, certainly not less visible![]

January 4, 2018

EuCrypt Chapter 4: Random Prime Number Generator

Filed under: EuCrypt — Diana Coman @ 10:43 pm

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

2018 starts well for EuCrypt as I finally get to put to some use all the building blocks of years past and then simply expand the library further. The aim of this chapter is to provide a random prime number generator (rpng) that actually fits the name. As usual, the code is commented and choices made are discussed in more detail here. Following the sane TMSR design principle of “fits in head“, the algorithm is as simple as possible: keep getting random odd numbers of the required length until a prime one is chanced upon. The signature of the procedure doing this can be found in eucrypt/smg_rsa/include/smg_rsa.h:

/**
 * Generates a random number that has passed the Miller-Rabin test for primality (see function is_composite above).
 * NB: top 2 bits and bottom bit are ALWAYS 1! (i.e. a mask 110....01 is applied to the random bits)
 *    a prime of 8*noctets long will have only (8*noctets-3) bits that are randomly chosen!
 * NB: this method does NOT allocate space for the requested MPI; it is the caller's responsibility to allocate it!
 * The source of randomness is ENTROPY_SOURCE in eucrypt/smg_rsa/include/knobs.h
 * The number of witnesses checked by Miller-Rabin is M_R_ITERATIONS in eucrypt/smg_rsa/include/knobs.h
 * Preconditions:
 *      noctets > 0 (at least one octet!)
 *      output has known allocated memory for at least nlimbs(noctets)
 *      successful access to the entropy source
 * @param noctets the length of the desired prime number, in octets
 * @param output the result: an MPI with sufficient memory allocated for a number that is noctets long
 */
void gen_random_prime( unsigned int noctets, MPI output);

Before even going to the implementation, note here two design choices:

  1. The length of the desired random prime number is expected by gen_random_prime in *octets* (bytes if you must but the point is that this is 8 bits and nothing else) and not in bits! The conversion between bits and octets is of course straightforward but the choice is made here to use octets because this reflects more clearly the underlying reality: while individual bits can certainly be altered or otherwise worked with, the data type in use at all further levels down (unsigned char) is still an octet long rather than a bit long.Obviously, nothing stops you, the user, from making a different choice and changing this to use number of bits if that fits your needs better but do make sure you follow through all the code that is used and make any other changes that might be required to get exactly what you want.
  2. The result is returned via the second parameter to gen_random_prime, an MPI called output. Moreover, gen_random_prime does *not* allocate memory for this MPI! It is instead the caller’s duty to make sure that the MPI they provide as argument when calling gen_random_prime has indeed enough memory allocated for the desired length. This makes memory allocation and de-allocation neatly happen in the same place (i.e. the caller) rather than splitting them needlessly and messily across function calls. While there is no way around the fact that a pointer has to be passed on here from a piece of code to another, there really is no need to split the responsibility of memory allocation and de-allocation: the one who allocates memory should de-allocate it as well and that is the caller here.Note that gen_random_prime will do however as much as it can to ensure that it doesn’t proceed working with unallocated memory. Concretely, gen_random_prime checks as a precondition the known allocated memory for the output MPI and if the check fails, the execution is aborted, *not* papered over with some “fix” where it doesn’t belong. Failure as the best sort of feedback that can’t be ignored is really the most appropriate response to a mess-up so that’s what gen_random_prime aims for. Don’t rely on it to do more than it says. Just do your own share of work and call it properly with correctly allocated MPI and it will then gladly do its job, simple as that.

While gen_random_prime is by design flexible to accommodate any number of octets you might wish to give as length of your desired prime MPI, TMSR RSA specification is rather strict on this: RSA keys are meant to be exactly 4096 bits (512 octets) long and made of 2 distinct 2048 bits (256 octets) long primes. Consequently, the RSA key length is defined in EuCrypt as a *constant* and *not* as a knob. You’ll find it therefore in the same file as the above, eucrypt/smg_rsa/include/smg_rsa.h:

/*
 * These are constants as per TMSR RSA specification, NOT knobs!
 * TMSR key length is 4096 bits (512 octets); this means 2 primes of 2048 bits (256 octets) each.
 * NB: if you choose here an odd key length in octets you might end up with a smaller actual key, read the code.
 */
static const int KEY_LENGTH_OCTETS = 512;

The actual implementation of gen_random_prime is in eucrypt/smg_rsa/primegen.c and it is quite straightforward:

void gen_random_prime( unsigned int noctets, MPI output )
{
  /* precondition: at least one octet long */
  assert(noctets > 0);

  /* precondition: enough memory allocated for the limbs corresponding to noctets */
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  assert(mpi_get_alloced(output) >= nlimbs);

  /* precondition: access to the entropy source */
  int entropy_source = open_entropy_source(ENTROPY_SOURCE); /* source of random bits */
  assert(entropy_source >= 0);

  unsigned int nbits = 8*noctets;                           /* length of MPI in bits */

  /*
   * loop until a prime is found: get noctets of random bits, trim and apply 110...01 mask, check if prime
   */
  unsigned char *p = xmalloc( noctets );
  do {
    get_random_octets_from( noctets, p, entropy_source );
    mpi_set_buffer( output, p, noctets, 0); /* convert to MPI representation */
    mpi_set_highbit( output, nbits - 1 );   /* trim at required size and set top bit */
    mpi_set_bit( output, nbits - 2);          /* set second top bit */
    mpi_set_bit( output, 0 );               /* set bottom bit to ensure odd number */
  } while (is_composite(output, M_R_ITERATIONS, entropy_source));

  /* tidy up, a prime was found */
  xfree(p);
  close(entropy_source);
}

As expected, the first things gen_random_prime does is to check its clearly stated preconditions: a minimal length of 1 octet for the requested number (there is no such thing as a number represented on 0 octets, that’s nonsense!); enough allocated memory at least as far as one can tell from within this code; accessible entropy source. Note that the mpi_get_alloced function is a new addition to mpi, since there was as far as I could tell no other function to expose this simple information: how much memory is allocated for this here MPI? Sure, there *was* direct access to this information via an MPI’s internal structure, meaning calling something of the sort mpi->alloced. However, using this directly is messy because it forces the calling code to *know* the internal structure of an MPI, which is a bad idea and if you can’t give at least 2 reasons why then you have no business writing any code yet, go and read, re-read and especially understand some basic books first 1. The new mpi_get_alloced function has its signature in eucrypt/mpi/include/mpi.h, which is the header included by any user of the mpi library:

int mpi_get_alloced (MPI a);  /* returns the allocated memory space for this MPI, in number of limbs */

The implementation of mpi_get_alloced is a tiny bit of code in eucrypt/mpi/mpiutil.c where it belongs together with the other utility functions for MPIs:

/*
 * Returns the allocated space for the given MPI, as number of limbs.
 */
int
mpi_get_alloced (MPI a)
{
  return a->alloced;
}

After all three preconditions of gen_random_prime have passed the checks, the next step is simply to loop through random numbers of the required size until a prime one is found. Note that each random number will have only 8*noctets-3 actually random bits. This is because the top 2 bits and the bottom bit are all fixed at 1. Since the goal is to find a large prime number, the bottom bit has to be 1 for basic mathematical reasons (as otherwise the number is even, hence divisible by 2). The top 2 bits are set at 1 however for length-related reasons: both those bits need to be 1 to ensure that in all cases the resulting RSA key is indeed of the desired 4096 bits length (although of those 4096 bits the top and bottom ones will always be 1).

Note that EuCrypt’s random prime number generator relies on two main things: a truly random source of bits (the excellent Fuckgoats from S.NSA) and the Miller-Rabin algorithm previously implemented and discussed in Chapter 3 of this series. This is in rather stark contrast with the GnuPG implementation that takes up a lot more space to do essentially the following: use a pseudo-random source of bits, try first all small primes between 3 and 4999 2, then do a Fermat test, then do the Miller-Rabin test with a fixed base 2 and only then do the Miller-Rabin test with 4 more pseudo-randomly chosen bases. Does that seem “more secure” to you or merely confusing? Because no, piling up more checks without concrete and clear reason is not a way to achieve security, quite on the contrary. And EuCrypt’s approach that seems “simple” by contrast to GnuPG’s is not an accident nor the result of laziness, quite on the contrary: it takes a lot of work behind the scenes to end up with only what is truly useful for the task at hand. Shocking, right? Read on.

Since GnuPG’s code certainly does not discuss this mess of “choices” in any way, the only thing left to do is to guess at some reasons: supposedly all the initial dancing about with small primes and Fermat would be a fast(-er, -ish?) way of discarding composites before reaching the more expensive Miller-Rabin test. However, there’s hardly any clear support for such reasoning 3, especially for a prime number generator that is specifically aiming to find *large* and *randomly generated* prime numbers: sure, 2 is a witness for many odd composite numbers (and presumably this is the reason for the fixed 2 witness attempt in GnuPG) but it is known also that there are *infinitely* many pseudoprimes to base 2! Moreover, there are also *infinitely* many n for which the smallest witness is greater than ln(n)^(1/(3*ln(ln(ln(n))))). And finally, Miller-Rabin, as discussed previously has very good chances of finding a witness if it really looks for them randomly, seeing how three quarters of the numbers between 1 and n-1 are reliable witnesses anyway. So if your candidate primes are really both large and randomly chosen then there is little point in introducing some pre-Miller-Rabin overhead that is anyway NOT going to work in all cases. Better increase the number of Miller-Rabin iterations and gain something useful 4 instead of wasting time to try first less effective approaches just for the sake of the fact that in *some* cases they might be enough. To see the new rpng in action, I wrote a few tests as well. And in this process I ended up simplifying the configuration of the entropy source because the previous code turned out to be unnecessarily long and rather not as fit-in-head as it could be. So the new set_usb_attribs function in eucrypt/smg_rsa/truerandom.c is this:

int set_usb_attribs(int fd, int speed) {
  struct termios tty;
  if (tcgetattr(fd, &tty) < 0) {
    return -1;
  }

  /* input and output speeds */
  cfsetospeed(&tty, (speed_t)speed);
  cfsetispeed(&tty, (speed_t)speed);

  /* raw */
  tty.c_lflag &= ~(ECHO | ECHOE | ECHOK);
  tty.c_oflag &= ~OPOST;

  /* read at least one octet at a time; BLOCK until at least VMIN octets read */
  tty.c_cc[VMIN] = 1;
  tty.c_cc[VTIME] = 0;

  if (tcsetattr(fd, TCSAFLUSH, &tty) != 0)
    return -1;

  return 0;
}

As with all changes, this got at least a bit of testing too. Testing here meant using the new version of the code to get 390MB of random data on which I ran ent and dieharder. Ent reported:

Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size
of this 390672000 byte file by 0 percent.

Chi square distribution for 390672000 samples is 277.91, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.5026 (127.5 = random).
Monte Carlo value for Pi is 3.141530471 (error 0.00 percent).
Serial correlation coefficient is -0.000012 (totally uncorrelated = 0.0).

Dieharder reported 3 “FAILED” tests, 5 “WEAK” tests and 106 “PASSED” tests. You can download and look at the full dieharder report and ent report for this data. Note however that you should run at least a similar test (preferably on more data, preferably several times, preferably periodically and on all your Fuckgoats) before even thinking of relying on your setup as a source of random bits.

And if the above was not enough, I also ended up writing quite a bit in… eucrypt/smg_rsa/tests/tests.c. Fancy that, I wrote more lines of code for the tests than for the actual code! Anyway, here are the new tests and the new structure of the main function in eucrypt/smg_rsa_tests.tests.c:

void time_mr(int nruns) {
  struct timespec tstart, tend;
  long int diff;
  int i;
  MPI prime;
  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);

  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");

  /* first generate a prime of half key length, to make sure M-R will run max number of iterations */
  printf("Generating a prime number of %d octets length for M-R timing testn", noctets);
  prime = mpi_alloc(nlimbs);
  gen_random_prime(noctets, prime);

  printf("Running timing test for Miller-Rabin with %d repetitions and %d witnesses on prime number ", nruns, M_R_ITERATIONS);
  mpi_print(stdout, prime, 1);
  printf("n");
  /* now do the actual runs and time it all */
  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i=0; i < nruns; i++) {
    if (is_composite(prime, M_R_ITERATIONS, entropy_source))
      printf("FAIL");
    else printf(".");
    fflush(stdout);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("nTimings on prime number %d octets long, %d runs of MR with %d iterations (witnesses checked) eachn", 
    noctets, nruns, M_R_ITERATIONS);
  printf("Total time: %ld secondsnTime per MR run: %f secondsnTime per MR iteration: %f secondsn",
       diff, diff / (1.0*nruns), diff / (1.0*nruns * M_R_ITERATIONS));

  mpi_free(prime);
  close(entropy_source);
}
void test_rpng(int nruns) {
  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");

  MPI prime = mpi_alloc(nlimbs);
  int i;

  printf("TEST: random prime number generator with %d runsn", nruns);
  for (i = 0;i < nruns; i++) {
    gen_random_prime(noctets, prime);
    printf("Run %d: ", i+1);
    mpi_print(stdout, prime, 1);
    if (is_composite(prime, M_R_ITERATIONS, entropy_source))
      printf("  **FAIL**n");
    else
      printf("  **PASS**n");
  }

  mpi_free(prime);
  close(entropy_source);
}

void time_rpng(int nruns) {
  struct timespec tstart, tend;
  long int diff;

  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);

  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");
  MPI prime = mpi_alloc(nlimbs);
  int i;

  printf("TIMING: random prime number generator with %d runsn", nruns);
  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i = 0;i < nruns; i++) {
    gen_random_prime(noctets, prime);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("TOTAL: %ld secondsn", diff);
  printf("Average: %f seconds to generate one random prime of %d octets lengthn", diff / (1.0*nruns), noctets);
  mpi_free(prime);
  close(entropy_source);
}

int main(int ac, char **av)
{
  int nruns;
  int id;
  if (ac<2) {
    printf("Usage: %s number_of_runs/octets [testID]n", av[0]);
    return -1;
  }
  nruns = atoi(av[1]);

  if (ac < 3)
    id = -1;
  else
    id = atoi(av[2]);

  switch ( id ) {
    case 0:
      printf("Timing entropy source...n");
      time_entropy_source(nruns, 4096);
      break;
    case 1:
      test_entropy_output(nruns, "entropy_source_output.txt");
      break;
    case 2:
      /* tests on miller-rabin */
      /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */
      test_is_composite(nruns, "0x10001", 0);
      test_is_composite(nruns, "0x1C6DF", 0);
      test_is_composite(nruns, "0x188DD82F", 0);
      test_is_composite(nruns, "0x4E0516E57", 0);
      /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */
      test_is_composite(nruns, "0x1FFF", 0);
      test_is_composite(nruns, "0x1FFFF", 0);
      test_is_composite(nruns, "0x7FFFFFFF", 0);
      /* a few carmichael numbers, in decimal: 561, 60977817398996785 */
      test_is_composite(nruns, "0x231", 1);
      test_is_composite(nruns, "0xD8A300793EEF31", 1);
      /* an even number */
      test_is_composite(nruns, "0x15A9E672864B1E", 1);
      /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */
     test_is_composite(nruns, "0x80000000000000000000000000000003", 1);
      break;
    case 3:
      time_mr(nruns);
      break;
    case 4:
      test_rpng(nruns);
      break;
    case 5:
      time_rpng(nruns);
      break;
    default:
      printf("Current test ids:n");
      printf("0 for timing entropy sourcen");
      printf("1 for entropy output testn");
      printf("2 for is_composite (Miller-Rabin) testn");
      printf("3 for timing Miller-Rabinn");
      printf("4 for random prime number generator testn");
      printf("5 for timing random prime number generatorn");
  }

  return 0;
}

There are 3 new tests in there: one for timing Miller-Rabin (id 3), one for the functioning of the rpng itself (id 4) and one for timing the rpng (id 5). The timing test for Miller-Rabin first generates (with the rpng described in this post) a prime number of 256 octets since this is the length that is most relevant to EuCrypt. Then a timer is started and the Miller-Rabin test is ran repeatedly on this known-prime for the requested number of times. At the end, the total time as well as the average time per M-R run and per M-R iteration are reported. As the number is known to be prime, M-R will always run all its 16 iterations and therefore the test can actually calculate the average time per iteration. On 1000 runs, this test reported an average of 9.78 seconds per M-R run, which corresponds to 0.61 seconds per iteration of M-R.

The test of rpng requests gen_random_prime to generate a random prime number of 256 octets and then it runs another time the Miller-Rabin test on it reporting fail or pass depending on the result. This is obviously a very basic test of gen_random_prime – it’s meant at this stage more as an example of use rather than anything else. Run it though and see for yourself or modify it/add to it as you see fit, as usual.

The timing test for the rpng starts a timer and then calls gen_random_prime as many times as requested, reporting at the end the total time as well as the resulting average time per prime. A relatively short test run obtained 40 random primes of 2048 bits each in 13274 seconds in total (3.7 hours) meaning on average 331.85 seconds per prime (~6 minutes).

Finally, the .vpatch itself and its corresponding signature (you’ll need all the previous patches from the EuCrypt series to be able to press to this one):

On a slightly different note for the end: this “simple” loop rpng of a chapter ended up close to 3000 words in total here on the blog and I didn’t even include all the references considered; chapter 3 just before this one was 3500 words; correcting MPI before that was a mere 2000 words; chapter 2 was 1500 words; the introduction to EuCrypt was close-to-but-not-quite 1000 words, as befits an introduction. At this rate I’m clearly writing here some brick of a book and most of it is certainly NOT code, but all the discussion around it. Shock and horror and all that. See you at the next chapter!

  1. The Art of Computer Programming by Donald Knuth is an absolute must.[]
  2. WHY 4999? Magic number![]
  3. To be fair, there probably was not much reasoning really, just successive additions of code on top of existing code, always adding, rarely even considering how to prune the resulting mess. They writing the code had no time to write short and “simple” code, so they wrote it long and messy. But they worked hard and had all the good intentions (those can be as long as you like really, they don’t cost anything), you know?[]
  4. as more M-R iterations lower at least the maximum probability of error, since p(error)< (1/4)^iterations for M-R.[]

December 28, 2017

EuCrypt Chapter 3: Miller-Rabin Implementation

Filed under: EuCrypt — Diana Coman @ 9:26 pm

~ This is part 4 of the EuCrypt series. Start with Introducing Eucrypt. ~

Primality testing 1 is a key part of any implementation of RSA 2 and therefore a key part of EuCrypt as well. At first glance, there is a wide choice of primality tests that one can use, from naive direct divisions in search of factors to prime number generators and statistical primality tests. However, many of those are not currently very practical for EuCrypt as they simply take too long to run on the sort of very large numbers that RSA has to use to avoid the simplest brute-force attacks. As a result, the choice narrows down considerably to include mainly probabilistic primality tests: Fermat, Solovay-Strassen and Miller-Rabin. Of those, Miller-Rabin is simply best currently: it has the lowest error probability upper bound among the three and it is at most as expensive computationally as the others. Arguably a deterministic, polynomial-time algorithm such as AKS 3 would be even better than Miller-Rabin, but unfortunately AKS is currently still too slow for EuCrypt’s needs. Consequently, the chosen algorithm for primality testing in EuCrypt is Miller-Rabin mainly because of a lack of a working better alternative.

If you think that “everyone uses Miller-Rabin anyway”, think again. While GnuPG and mostly everyone else using RSA is indeed likely to claim that they are also using Miller-Rabin for primality testing, you’d be well advised to check such claims very, very closely because claims are just cheap labels that stick to anything just the same. If you actually take the time to check those claims and then necessarily find yourself peeling down label after label trying to get to the actual thing that is there as opposed to what the labels claim it is, you might find that every new label further dilutes the original meaning. That’s how Koch in his GnuPG sticks the “random” label on pseudo-random at best, that’s how 4096 bit randomly-generated keys contain actually at most 4090 pseudo-randomly generated bits and so on until you might even find as I did last time that bits and parts of the implementation do nothing in fact. Don’t take my word for it either: go and check for yourself, it’s a very healthy habit that might save your very skin some day.

Despite being called a “primality test”, Miller-Rabin (like all the other probabilistic primality tests) is more of a compositeness test: the algorithm can prove that a number is composite, but it can not actually prove that a number is indeed prime. Essentially, the given number is suspected of the crime of being composite (as opposed to the desired prime) and witnesses for its compositeness are sought. If one single witness for compositeness is found, then the given number is indeed composite. However, if no witness is found, Miller-Rabin can only reach a relatively weaker conclusion, namely that the given number is likely to be prime. How likely? That depends to a significant degree on the choice of candidate witnesses: how many candidate witnesses the algorithm was asked to investigate and how it actually chose them.

In its search for witnesses, Miller-Rabin relies on the important fact that most numbers between 1 and n-1 are reliable witnesses for n, if n is indeed a composite number. More precisely, at most 1/4 of those numbers are strong liars for n, meaning that at most 1/4 of them will fail to reveal the compositeness of n, when investigated by Miller-Rabin. As a result, the more witnesses investigated, the lower the chances of a composite number to pass for a prime one. Assuming that witnesses are indeed chosen randomly 4, the algorithm’s error probability is at most (1/4)^t, where t is the number of witnesses investigated. Obviously, each additional witness adds to the cost of running the algorithm and for this reason EuCrypt exposes this as a knob 5 for you, the user, to set depending on your own needs. Use it and use it wisely!

The updated eucrypt/smg_rsa/include/knobs.h:

#ifndef SMG_RSA_KNOBS_H
#define SMG_RSA_KNOBS_H

#define ENTROPY_SOURCE "/dev/ttyUSB0"

/*
 * This is the number of witnesses checked by the Miller-Rabin (MR) algorithm for each candidate prime number.
 * The value of M_R_ITERATIONS directly affects the outer bound of MR which is calculated as 4^(-M_R_ITERATIONS)
 * S.MG's choice of 16 here means an outer bound of 4^(-16) = 0.0000000002,
    which is currently considered sufficient for Eulora's needs.
    If your needs are different, change this knob accordingly.
 * NB: if you use this to make keys for some serious use, an outer bound of 1e-10 is really not nearly good enough
    and therefore you'll probably want to *increase* the value of this knob.
 */
#define M_R_ITERATIONS 16


#endif /*SMG_RSA_KNOBS_H*/

The default EuCrypt value for M_R_ITERATIONS is 16 and that means 16 randomly chosen candidate witnesses that are checked. By contrast, GnuPG 1.4.10 at first glance appears to check 5 candidate witnesses (as per cipher/primegen.c call is_prime(ptest, 5, &count2)) and at a deeper investigation it turns out that it checks 1 fixed witness (magic number 2, because why not) and 4 pseudo-randomly chosen ones at best. The label that was 5 but acted more like 4 and the parameter that didn’t quite stand for what you’d expect, isn’t that precisely the sort of thing you want in your cryptographic tool? No? Then stop using code from the swamps, start using signed code and in any case always read the darned code before you use it because otherwise that’s exactly what you will get, each and every time: something other than what it seems, something continuously and rather invisibly to you drifting further away from what you need.

Leaving aside GnuPG for now, let’s dive straight in and implement Miller-Rabin using the MPI functions as if they actually worked well 6. The algorithm is quite straight forward and the code aims to be as short and clear as possible, with comments to help you follow along. The function is called is_composite, to reflect the fact that Miller-Rabin really checks for compositeness, regardless of the fact that we might prefer it to be otherwise. The n parameter is the actual large number (hence, stored as an MPI) that is suspected of being composite. The nwitnesses parameter is the number of randomly chosen witnesses to check (this is called “security parameter” in some reference books, most notably in Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone, 1997). You can also think of this nwitnesses as “number of iterations” because each iteration is effectively the check of one candidate witness. Finally, the third parameter, entropy_source, is the handler of an already opened and properly configured source of true random bits (see Chapter 2 for how this is set up in EuCrypt). First, the added function signature in eucrypt/smg_rsa/include/smg_rsa.h:

/*********primegen.c*********/

/*
 * This is an implementation of the Miller-Rabin probabilistic primality test:
 *   checking the specified number of randomly-chosen candidate witnesses
 *    (i.e. with an outer bound of (1/4)^nwitnesses).
 * NB: a 1 result from this test means that the given n is indeed composite (non-prime)
    but a 0 result does not fully guarantee that n is prime!
    If this doesn't make sense to you, read more on probabilistic primality tests.
 * @param n the candidate prime number;
    the function will investigate whether this number is composite or *likely* to be prime.
    How likely? It depends on the number of witnesses checked, see next parameter.
 * @param nwitnesses this is the number of randomly chosen candidate witnesses to the compositeness of n
      that will be checked; the outer bound of the algorithm depends on this.
 * @param entropy_source the source of entropy (ready to read from) that will be used
    to choose candidate witnesses to the compositeness of n.
* @return 1 if at least one witness to the compositeness of n has been found
      (i.e. n is certainly composite);
      0 if no witness to the compositeness of n was found (i.e. it is likely that n is prime)
 * NB: the probability that n is *not* prime although this function returned 0 is
      less than (1/4)^nwitnesses, but it is NOT zero.
 */
int is_composite( MPI n, int nwitnesses, int entropy_source);

And the corresponding implementation, in a new file eucrypt/smg_rsa/primegen.c :

/* primegen.c - prime number generation and checks
 *
 * S.MG, 2017
 *
 */

#include
#include
#include

#include "smg_rsa.h"

/****************
 * is_composite
 * Returns 1 if it finds evidence that n is composite and 0 otherwise.
 * NB: this is a probabilistic test and its strength is directly linked to:
 *  - the number of witnesses AND
 *  - the quality of the entropy source given as arguments!
 */

int is_composite( MPI n, int nwitnesses, int entropy_source) {
  int evidence = 0;
  int nlimbs = mpi_get_nlimbs(n);       /* see MPI representation   */
  unsigned int nbits = mpi_get_nbits(n);        /* used bits        */
  unsigned int noctets = (nbits + 7) / 8; /* source works on octets */
  MPI nminus1 = mpi_alloc(nlimbs);      /* n-1 value is used a LOT  */
  MPI mpi2 = mpi_alloc_set_ui(2);         /* 2 as MPI               */
  MPI a = mpi_alloc(nlimbs);      /* candidate witness              */
  MPI y = mpi_alloc(nlimbs);      /* intermediate values            */
  MPI r = mpi_alloc(nlimbs);      /* n = 1 + 2^s * r                */
  int s;                          /* n = 1 + 2^s * r                */
  int j;                          /* counter for loops              */
  int nread;              /* number of random octets actually read  */

  /* precondition: n > 3 */
  assert( nbits > 2 );

  /* nminus1 = n - 1 as MPI                                         */
  mpi_sub_ui( nminus1, n, 1);

  /*
   * find r odd and s so that n = 1 + 2^s * r
   * n-1 = 2^s * r
   * s is given by the number of trailing zeros of binary n-1
   * r is then obtained as (n-1) / (2^s)
   */
  s = mpi_trailing_zeros( nminus1 );
  mpi_tdiv_q_2exp(r, nminus1, s);

  /*
   * Investigate randomly chosen candidate witnesses.
   * Stop when either:
      * the specified number of witnesses (nwitnesses) have
        been investigated OR
      * a witness for compositeness of n was found
   */
  while (nwitnesses > 0 && evidence == 0) {
    unsigned char *p = xmalloc(noctets);
    do {
      nread = get_random_octets_from(noctets, p, entropy_source);
    } while (nread != noctets);

    mpi_set_buffer(a, p, noctets, 0);
    /* ensure that a < n-1 by making a maximum nbits-1 long:
        * clear all bits above nbits-2 in a
        * keep value of bit nbits-2 in a as it was
    */
    if (mpi_test_bit(a, nbits - 2))
      mpi_set_highbit(a, nbits - 2);
    else
      mpi_clear_highbit(a, nbits - 2);

    /* ensure that 1 < a < n-1; if not, try another random number
     * NB: true random means a CAN end up as 0 or 1 here.
     */

    if (mpi_cmp(a, nminus1) < 0 && mpi_cmp_ui(a, 1) > 0) {
      /* calculate y = a^r mod n */
      mpi_powm(y, a, r, n);
      if (mpi_cmp_ui(y, 1) && mpi_cmp(y, nminus1)) {
        j = 1;
        while ( (j < s) && mpi_cmp(y, nminus1) && (evidence == 0) ) {
          /* calculate y = y^2 mod n */
          mpi_powm(y, y, mpi2, n);
          if (mpi_cmp_ui(y, 1) == 0)
            evidence = 1;
          j = j + 1;
        } /* end while */
        if (mpi_cmp(y, nminus1))
          evidence = 1;
      } /* end if y != 1 and y != n-1 */
      nwitnesses = nwitnesses - 1;
    } /* end if 1 < a < n-1 */
    xfree(p);
  } /* end while for investigating candidate witnesses */

  mpi_free( nminus1 );
  mpi_free( mpi2 );
  mpi_free( a );
  mpi_free( y );
  mpi_free( r );

  return evidence;
}

The variable evidence is initially 0 as Miller-Rabin does not yet have any evidence about n being composite. When and if evidence of compositeness is found for n, this variable will get updated to 1. If the whole algorithm finishes without updating this variable, it means that n is probably prime, with a maximum probability error of (1/4)^nwitnesses, as previously discussed. In any case, this variable holds the result of the Miller-Rabin test at any moment and its value is the one returned when the test finishes.

The nlimbs and nbits are basically measures of how long the MPI n actually is and they are initialised with values returned by the corresponding MPI functions. The nbits value is then converted to number of octets (in noctets) for the very simple reason that the source of randomness in EuCrypt anyways reads at the moment full octets rather than individual bits. This is of course a matter of choice as you could change the setting of the source to read bit by bit, but I can’t quite see at the moment any significant advantage to that.

Having obtained this basic length-information on n, the function then goes on to declare and allocate memory for a set of MPI variables that it will need for the Miller-Rabing algorithm itself. Note that there is *no* use of the so-called “secure” memory thing from MPI for the unfortunate reason that the existing implementation of secure is very much an empty label: all it does is to set a flag so that theoretically the memory is not swapped, but there is no guarantee to either that or to the more useful fact that nobody else can read that memory. So given that there is in fact no secure memory implementation no matter how much it would be useful if there was one, EuCrypt takes instead the honest and practical approach of making it clear that it uses plain memory and nothing else. No label if no actual matching object to stick it on, as simple as that.

Once the needed variables are declared and initialised when appropriate, a precondition is checked: assert( nbits > 2); What’s this? It’s a sanity check essentially because Miller-Rabin is meant for checking large integers, not tiny ones. Moreover, due to the insanity of the underlying MPI which considers in its infinite stupidity that 0 for instance is represented on 0 bits, tiny values of less than 4 (hence, represented on less than 3 bits) will… block the whole thing. Let me point out for now just the very simple fact that the algorithm uses nbits-1 and nbits-2 meaning that nbits should better be at least 3 or otherwise it ends up trying to work with numbers represented on 0 bits and other such nonsense. So instead of risking working with nonsense, EuCrypt uses this assert call to abort the whole thing rather than propagate the nonsense even one instruction further.

Oh, if you wonder by any chance whether GnuPG bothers to even consider such corner cases, that’s good for you. I’m sure it’s all right if they don’t because such cases “should never happen” and “nobody calls Miller-Rabin on small numbers” and all those wonderful castles of trust built on nothing but air. Come to think about it, I even enjoy blowing up such air-supported castles and what not, they make a most-satisfying poufffff! Do you enjoy living in them? POUFFFFF!

The rest of the function closely follows the Miller-Rabin algorithm and the comments in the code hopefully make it easier to understand what each line does even when using the MPI calls. Note that candidate witnesses are chosen indeed randomly by using the specified source and making sure that the call truly returned the requested number of random octets. However, the actual number of random bits in any candidate random number will be by necessity nbits-1 because of the constraint that the random candidate should be less than n-1. This constraint is enforced by simply clearing any bits above nbits-2 (bit numbering starts at 0 so last bit is n-1 rather than n) but keeping at the same time the value of bit on position nbits-2. If that was 1 then mpi_set_highbit(a, nbits-2) is called. If that was 0 then mpi_clear_highbit(a, nbits-2) is called instead.

Note that those 2 mpi functions (mpi_set_highbit and mpi_clear_highbit) are supposedly similar in that they clear all bits above the position indicated and otherwise set to 1 or, respectively, to 0 the bit on the given position. However, the actual code reveals that they are not entirely similar: mpi_set_highbit allocates more memory if the position given is above the current size of the mpi; mpi_clear_highbit doesn’t allocate memory in this case. This means effectively that mpi_set_highbit returns an mpi of specified length, while mpi_clear_highbit returns always an mpi of length smaller than the specified bit position. At first glance this might seem to make some sense but the reality is worse than that: mpi_clear_highbit sometimes trims the leading 0 bits of a number and sometimes… doesn’t! Possibly for this reason of rather dubious behaviour, GnuPG’s Miller-Rabin avoids using mpi_clear_highbit entirely and dances around instead with double calls to mpi_set_highbit instead, on both branches of the if. Since I’m doing coding here rather than voodooing or naked-dances-in-the-code, I fixed instead mpi_clear_highbit to at least reliably trim leading 0s at all times and I’m using it where it is needed. The memory allocation issue is not relevant for my code here anyway because there is already enough memory allocated for the MPI at the beginning of this function.

A core aspect of any Miller-Rabin implementation is the way in which candidate witnesses are chosen. EuCrypt chooses them entirely at random in the interval [2, n-2]. Different options were considered, most notably that of choosing a smaller, potentially more relevant interval, for instance by increasing the lower bound of this interval to at least 256 (2^8). Moreover, according to Eric Bach’s paper 7, an appropriate upper bound for witnesses would be 2*log(n)*log(n). However, Bach’s result relies on the Extended Riemann Hypothesis (ERH) which hasn’t been proved so far. So although those bounds are very appealing, EuCrypt sticks for now with using randomly chosen witnesses over the whole interval.

To sum it all up, the main changes brought by today’s vpatch are the following:

  1. Addition of primegen.c as part of smg_rsa. This includes the actual implementation of the Miller-Rabin algorithm with the choices discussed above. It uses the MPI implementation introduced in Chapter 1 and corrected along the way.
  2. Changes to MPI, namely further identification and, to the extent needed for current needs of EuCrypt, the killing of additional cockroaches that were identified in the MPI code as a result of investigating the functions that are needed for Miller-Rabin. Most notably: fixing mpi_clear_highbit so that it always trims leading 0 if any, as opposed to current functionality where it sometimes trims them and sometimes not; identifying the fact that MPI currently considers that 0 is represented on 0 bits. Note that this last issue is flagged up and made obvious through updated tests but it is not changed mainly because following all its potential implications through MPI at this stage would eat up so much time as to make it cheaper to just implement from scratch something solid at least. So for the time being at least, the decision made is to honestly admit and clearly highlight this existing fault of MPI.
  3. New tests for MPI highlighting the new issues uncovered.
  4. New tests for smg_rsa focusing on Miller-Rabin: the testing program allows the user to specify the number of runs for each data point and reports a test as failed if at least one run returned a result different from the one expected; test data for Miller-Rabin is chosen to include a few mersenne primes, carmichael composites and a phuctor-found non-prime public exponent of someone’s RSA key. Feel free to add to them anything you consider relevant and then run those tests!
  5. A small change to the function that fetches random octets from the FG: the errno is now set to 0 prior to every call that reads from the USB port and its value is then checked in addition to the return value of the read function. This change is needed in order to avoid the unfortunate case when no bits are read but there is apparently no underlying error. Previous version of my function would still abort in such case but current version would instead keep trying as this is a more useful approach for EuCrypt.

The updated get_random_octets_from function in eucrypt/smg_rsa/truerandom.c:

int get_random_octets_from(int noctets, unsigned char *out, int from) {

  int nread;
  int total = 0;

  while (total < noctets) { errno = 0; nread = read(from, out+total, noctets-total); //on interrupt received just try again if (nread == -1 && errno == EINTR) continue; //on error condition abort if (errno != 0 && (nread == -1 || nread == 0)) { printf("Error reading from entropy source %s after %d read: %sn", ENTROPY_SOURCE, total, strerror(errno)); return total; //total read so far } if (nread > 0)
      total = total + nread;
  }
  return total; //return number of octets read
}

The new test_is_composite function in eucrypt/smg_rsa/tests/tests.c:

void test_is_composite(int nruns, char *hex_number, int expected) {
  int i;
  int output;
  int count_ok = 0;
  int source = open_entropy_source(ENTROPY_SOURCE);
  MPI p = mpi_alloc(0);

  mpi_fromstr(p, hex_number);
  printf("TEST is_composite on MPI(hex) ");
  mpi_print(stdout, p, 1);
  for (i=0; i < nruns; i++) {
    printf(".");
    fflush(stdout);
    output = is_composite(p, M_R_ITERATIONS, source);
    if (output == expected)
      count_ok = count_ok + 1;
  }
  printf("done, with %d out of %d correct runs for expected=%d: %sn", count_ok, nruns, expected, count_ok==nruns? "PASS"$
  mpi_free(p);
  close(source);
}

The updated main in eucrypt/smg_rsa/tests/tests.c:

int main(int ac, char **av)
{
  int nruns;
  int id;

  if (ac<2) {
    printf("Usage: %s number_of_runs [testID]n", av[0]);
    return -1;
  }
  nruns = atoi(av[1]);

  if (ac < 3) id = 0; else id = atoi(av[2]); if (id == 0 || id == 1) { printf("Timing entropy source...n"); time_entropy_source(nruns,4096); } if (id == 0 || id == 2) { /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */ test_is_composite(nruns, "0x10001", 0); test_is_composite(nruns, "0x1C6DF", 0); test_is_composite(nruns, "0x188DD82F", 0); test_is_composite(nruns, "0x4E0516E57", 0); /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */ test_is_composite(nruns, "0x1FFF", 0); test_is_composite(nruns, "0x1FFFF", 0); test_is_composite(nruns, "0x7FFFFFFF", 0); /* a few carmichael numbers, in decimal: 561, 60977817398996785 */ test_is_composite(nruns, "0x231", 1); test_is_composite(nruns, "0xD8A300793EEF31", 1); /* an even number */ test_is_composite(nruns, "0x15A9E672864B1E", 1); /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */ test_is_composite(nruns, "0x80000000000000000000000000000003", 1); } if (id > 2)
    printf("Current test ids: 0 for all, 1 for entropy source test only, 2 for is_composite test only.");

  return 0;
}

Finally, the .vpatch itself and my signature (you’ll need all previous .vpatches of EuCrypt to press this one):

In the next chapter we take one step further towards having RSA, so stay tuned, drink only the good stuff and make sure you won’t go… POUFFF!

  1. Primality testing means answering the question: is this number prime or not?[]
  2. If this is not clear to you, it’s best to just review RSA itself. In a nutshell: the whole working of RSA as cryptographic tool relies on properties of prime numbers; both secret and private RSA keys are essentially made out of prime numbers. Don’t eat only this nutshell though – better read the original paper by Rivest, Shamir and Adleman: A Method for Obtaining Digital Signatures and Public-Key Cryptosystems.[]
  3. Agrawal, Kayal and Saxena, Primes is in P, Ann. of Math, Vol. 2, pp. 781-793.[]
  4. Which is *not* the case in GnuPG where first witness is actually…fixed and the rest are anyway chosen pseudo-randomly, go and read that code. By contrast, EuCrypt actually chooses them randomly, using a true random number generator, the FG.[]
  5. Like all other knobs, this can be found in include/knobs.h[]
  6. No, they don’t, surprise, surprise. So I’ll fix what I use and otherwise at least highlight what other problems I find, as I find them, such is this wonderful world we live in.[]
  7. Bach, Eric, 1990. Explicit Bounds for Primality Testing and Related Problems. Mathematics of Computation, 55, pp. 355-380.[]

December 21, 2017

EuCrypt: Correcting MPI Implementation

Filed under: EuCrypt — Diana Coman @ 9:53 pm

~ An unexpected part of the EuCrypt library series. Start with Introducing EuCrypt. ~

This is a sad but necessary interruption in the EuCrypt series itself: although coming immediately after chapter 2, this is not chapter 3 at all, I’m sorry to say. Instead of adding another useful part of smg-rsa as the actual chapter 3 does, I’m forced by the uncovered reality in the field to publish this correction: a correction of a coding error that has lived for years in the very much used GnuPG 1.4.10 open sore code. As a wonderful and truly epic example of all the great qualities of the open-source approach that thrives on thousands of eyes that surely quickly and accurately find and correct any bug, this error has survived perfectly fine until now, undetected and even papered over and buried deeper when it tried to manifest! It is quite natural after all, not to mention according to the “law”: given enough eyeballs, any bug will be quickly plastered over and buried as deep as it can be. There, Mr. Raymond, I fixed that Bazaar description for you. Let it be said again and again and forever that in anything that matters, it’s never quantity that matters, but always quality.

The error itself is a tiny thing with a huge impact: a single wrong letter in the wrong place, transforming a piece of code meant to copy over a whole MPI into a piece of code that does precisely… nothing 1. Obviously, typing the wrong letter can and does happen from time to time to anyone who types. However, it takes a very sloppy sort of monkey pushing the keys to fail to notice its effect, namely that the resulting code does… nothing. You’d think that anyone would at least read their code once more at a later time to catch such silly mistakes, pretty much like one reads any other text for mere proofreading if nothing more. You’d think that anyone at anytime would at least run their code once to see it doing the thing they wanted it to do, wouldn’t you? Well, it clearly shows this is not the way of the great Bazaar, no. So let me show you the resulting code that achieves nothing but still actually eats up some resources whenever called, here you are, straight from mpi/include/mpi-internal.h:

#define MPN_COPY_INCR( d, s, n)   
    do {        
  mpi_size_t _i;      
  for( _i = 0; _i < (n); _i++ ) 
      (d)[_i] = (d)[_i];    
    } while (0)

 

Speaking of eyeballs, take your time a bit and spot the error. Then call your child or your grandpa over, ask them to spot the error too and let me know how this addition of another pair of eyeballs clearly helped.

Once the error is spotted, the reader might ask the obvious question, of course: “why does the caller appear to work?” The answer to this is… layered let’s say. The caller does not actually work in fact, unsurprisingly. The caller is a method called mpi_tdiv_q_2exp(MPI w, MPI u, unsigned count). This method supposedly shifts u by count bits to the right and stores the result in w. Except that it doesn’t actually *always* do this: in some cases, all it does is to trim from w count bits and nothing more because it relies on the above good-for-nothing code to copy the relevant bits from u to w. Here, let’s look at its code that is found in mpi/mpi-div.c:

void
mpi_tdiv_q_2exp( MPI w, MPI u, unsigned count )
{
    mpi_size_t usize, wsize;
    mpi_size_t limb_cnt;

    usize = u->nlimbs;
    limb_cnt = count / BITS_PER_MPI_LIMB;
    wsize = usize - limb_cnt;
    if( limb_cnt >= usize )
  w->nlimbs = 0;
    else {
  mpi_ptr_t wp;
  mpi_ptr_t up;

  RESIZE_IF_NEEDED( w, wsize );
  wp = w->d;
  up = u->d;

  count %= BITS_PER_MPI_LIMB;
  if( count ) {
      mpihelp_rshift( wp, up + limb_cnt, wsize, count );
      wsize -= !wp[wsize - 1];
  }
  else {
      MPN_COPY_INCR( wp, up + limb_cnt, wsize);
  }

  w->nlimbs = wsize;
    }
}

Let’s actually read that a bit and see how it works: usize stores the number of “limbs” of an MPI, while limb_cnt stores the number of full limbs that need to be discarded as a result of a shift right by count bits. Those limbs are essentially machine words, groups of bits that the machine can work with directly. Their length in bits is that BITS_PER_MPI_LIMB, hence the integer division of count by it to find out just how many full limbs a shift by count bits implies. Once that is known, the code sets the remaining size (as number of limbs) of the result w and proceeds to copy over from u the bits that remain after the shift (the wp and up are simply pointers to the internal array of limbs of w and u respectively). However, at this point, the path splits by means of that if (count): the shift can further affect another limb by discarding (shifting right) only some of its bits and in this case there is in fact a shift-and-copy operation by means of calling mpihelp_rshift; if this is not the case and the original shift was made by a multiple of BITS_PER_MPI_LIMB, there is only a straightforward copy of limbs to make and that is supposedly achieved by our do-nothing-by-error code, called MPN_COPY_INCR.

Considering the above, it’s hopefully no surprise to you when I say that mpi_tdiv_q_2exp fails of course, as soon as it makes use of the buggy code of MPN_COPY_INCR. And you’d think the error would at least be caught at this point if it wasn’t caught earlier, wouldn’t you? Basic testing of the most obvious kind, meaning both branches of an if, something everyone always does in one way or another, right? Bless your heart and sweet youth and all that but wake up already, as this is open sore we are talking about: no, it’s not caught here either, of-course-not, it’s just-not-enough-eyeballs-yet. Honestly, I suspect they meant “given enough eyeballs LOST as a result, any bug becomes shallow.” We might even live to see this, you know?

Still, there is more! This defective mpi_tdiv_q_2exp is actually used by another method in the original GnuPG so surely the error is exposed there! It has to be, it’s by now the third layer and presumably it’s really not all that many layers that can be built on code-that-does-nothing + code-that-works-only-sometimes. Oh, but the Bazaar is all mighty, the Bazaar can 2, unlike those pesky cathedrals that actually had any foundation. Let me illustrate this to you, with code taken straight from GnuPG 1.4.10, from the primegen.c file, from method is_prime that supposedly does probabilistic primality testing and such little things on which the security of your RSA key depends:

    q = mpi_copy( nminus1 );
    k = mpi_trailing_zeros( q );
    mpi_tdiv_q_2exp(q, q, k);

What’s that, you ask? Why, nothing terrible, just a bit of code meant to calculate q as nminus1 / (2^k). Since a division by 2^k is simply a right shift by k bits, that’s really the perfect job for mpi_tdiv_q_2exp, isn’t it? So then why does this code do first a copy of nminus1 into q and only then a shift right by k bits of resulting q into… same variable q. Why exactly doesn’t it do simply a shift right by k bits of nminus1 into q? Like this:

  k = mpi_trailing_zeros( nminus1 );
  mpi_tdiv_q_2exp( q, nminus1, k );

Two method calls instead of three, two lines instead of three, overall much clearer and easier to follow code anyway, since we are dividing indeed nminus1 by 2^k, not some “q” that is meant to be the result, really. So why do it in a complicated way when you can do it in a simple way? Except, you can’t, you see, because you don’t actually read the code you use and the code you use is broken but nobody actually bothered to check it. And when the error that is by now 3 layers deep manifests itself through unexpected results of the simple and straightforward code that should have been there, you don’t spend like me the hours needed to track down the error and actually, finally, mercifully put it out of its misery correcting the code. Oh no, that would be wasted time, wouldn’t it? Instead you are being productive and you find a workaround, simply papering the error over and dancing around it with an idiotic extra-copy that makes *this* code more difficult to follow and further *hides* the previous error, pushing it one layer deeper. Oh, now I plainly see the main advantage of open source: since you are not responsible in any way for the code you write, you can get away with such despicable behaviour, can’t you?

Well, here is not open source. Here code gets actually read and errors get dissected, documented and corrected when found, not swept under the carpet. So, to correct this error, first thing written is a basic test that runs all the branches of that if in mpi_tdiv_q_2exp. Here it is, together with its helper print function, both to be found from now on in mpi/tests/test_mpi.c:

void print_results(MPI in, MPI out, char * title)
{
  fprintf(stdout, "******** %s ********", title);
  terpri(stdout);

  fprintf(stdout, "input : ");
  mpi_print(stdout, in, 1);
  terpri(stdout);

  fprintf(stdout, "output: ");
  mpi_print(stdout, out, 1);
  terpri(stdout);

  terpri(stdout);
  fflush(stdout);
}

/*
 * Test that will fail on original code and will pass after EuCrypt fix is applied.
 */
void test_rshift()
{
  MPI out, in, copy_in;
  out = mpi_alloc(0);
  in = mpi_alloc(0);
  copy_in = mpi_alloc(0);

  mpi_fromstr(out, "0x20E92FE28E1929");   /* some value */
  mpi_fromstr(in, "0x2000000010000001000000002");
  mpi_fromstr(copy_in, "0x2000000010000001000000002"); /* to make sure the actual input is print, since call can modify in */

  /* print value of BITS_PER_MPI_LIMB */
  fprintf(stdout, "BITS_PER_MPI_LIMB is %dn", BITS_PER_MPI_LIMB);

  /* shift by 0 */
  mpi_tdiv_q_2exp(out, in, 0);
  print_results(copy_in, out, "TEST: right shift by 0");

  /* shift by multiple of BITS_PER_MPI_LIMB */
  mpi_fromstr(in, "0x2000000010000001000000002");

  mpi_tdiv_q_2exp(out, in, BITS_PER_MPI_LIMB);
  print_results(copy_in, out, "TEST: right shift by BITS_PER_MPI_LIMB");

  /* shift by non-multiple of BITS_PER_MPI_LIMB */
  mpi_fromstr(in, "0x2000000010000001000000002");
  mpi_tdiv_q_2exp(out, in, BITS_PER_MPI_LIMB - 3);
  print_results(copy_in, out, "TEST: right shift by BITS_PER_MPI_LIMB - 3");

  mpi_free(copy_in);
  mpi_free(out);
  mpi_free(in);
}

Running this with the old code from mpi (so the code from chapter 2, earlier) will show just how mpi_tdiv_q_2exp “works” so that there is no doubt remaining about that. After this and before doing any change, the next step is to search in the code for *any other* users of either mpi_tdiv_q_2exp or the MPN_COPY_INCR itself, since those users for all I know might even rely by now on the wrong behaviour. Mercifully, this search returned empty and so I can proceed finally to the fix itself, which is of course very easy to do *after all this work* i.e. once it was found and tracked down and isolated to ensure that a change does not break down something else. And after this, the last but mandatory step is of course to run the previously written test again and check its output. The corrected code in mpi/include/mpi-internal.h:

#define MPN_COPY_INCR( d, s, n)   
    do {        
  mpi_size_t _i;      
  for( _i = 0; _i < (n); _i++ ) 
      (d)[_i] = (s)[_i];    
    } while (0)

After all this work that was caused not by the original tiny error itself, not by the wrong letter at the wrong place, but by the total and repeated failure to correct it afterwards, even 3 layers up, tell me again about all those eyeballs and about how productive it is to write a quick workaround instead of searching for the error and eliminate it at source. Then go and run a grep -r “workaround” . in any source directory of any big open source project, just for… fun let’s say, what else.

To wrap this up, here is the vpatch with all the above changes, and my signature for it:

Hopefully no other similar trouble will surface until next week, so that I can finally move on to Chapter 3 that is all about the Miller-Rabin algorithm. Stay tuned!

  1. You know, by this point one might even be relieved to find that it does at least nothing. As opposed to doing something worse.[]
  2. “Yes, we can”, right? []

December 14, 2017

EuCrypt Chapter 2: A Source of Randomness

Filed under: EuCrypt — Diana Coman @ 11:53 pm

EuCrypt uses as source of randomness the Fuckgoats auditable TRNG (True Random Number Generator) from S.NSA (No Such lAbs). The choice here was made very easy by a basic combination of facts: on one hand, EuCrypt needs an actual, auditable source of randomness 1 as opposed to anything else, pseudo-random generators included; on the other hand, the Fuckgoats (FG) device is the only currently available auditable TRNG. So problem solved and even rather narrowly solved at that. I’m quite grateful that for once there IS actually something matching the requirements and therefore I don’t have to cobble it together myself from bits and pieces.

The above being said, you as the user of EuCrypt are *expected* to make your own decisions. Consequently, there really is nothing stopping you from using whatever you want as your own “source of randomness”, be it actually random or pseudo-random or straight non-random or anything in between, why would EuCrypt care? Just change existing knobs in EuCrypt (see below) or directly replace the relevant methods (mainly get_random_octets, discussed below) with your own code and that’s what will get used. For my own needs however, I’ll use FG and moreover, one that I actually bought (even with some additional cost) and tested myself.

To keep this as clear as possible, let’s start with the tiniest part, namely a single lonely knob for EuCrypt, giving the name of the entropy source to use and defined in knobs.h, as part of the smg_rsa component of EuCrypt (eucrypt/smg_rsa/include/knobs.h):

#ifndef SMG_RSA_KNOBS_H
#define SMG_RSA_KNOBS_H

#define ENTROPY_SOURCE "/dev/ttyUSB0"

#endif /*SMG_RSA_KNOBS_H*/

As it is quite clear from above, current version of EuCrypt assumes that an FG is connected simply to an USB port that can be accessed via /dev/ttyUSB0. If the device dev path on your machine is different, change the value of this knob accordingly. If your FG is connected to something other than an USB port (such as a serial port for instance), you’ll need to change more than this knob here.

Now for the signatures of the functions that will provide access to the specified source of randomness, have a look at smg_rsa.h:

/* smg_rsa.h
 * S.MG, 2017
 */

#ifndef SMG_RSA_H
#define SMG_RSA_H

#include "mpi.h"
#include "knobs.h"

/*********truerandom.c*********/

/*
 * Opens and configures (as per FG requirements) the specified entropy source (e.g. "/dev/ttyUSB0")
 * @param source_name the name of the file to open (e.g. "/dev/ttyUSB0")
 * @return the descriptor of the open file when successful; negative value otherwise
 */
int open_entropy_source(char* source_name);


/*
 * Returns noctets random octets (i.e. 8*noctets bits in total) as obtained from EuCrypt's preferred source.
 * Preferred source is defined in knobs.h as ENTROPY_SOURCE and should be a TRNG (e.g. Fuckgoats).
 * @param nboctets the length of desired random sequence, in octets
 * @param out pointer to allocated memory space for the requested random noctets;
 * NB: this method does NOT allocate space!
 * @return the actual number of octets that were obtained from the currently configured entropy source
 * (this is equal to noctets on successful read of required noctets)
 */
int get_random_octets(int noctets, unsigned char *out);

/* Returns noctets random octets as obtained from the specified "from" source;
 * NB: the "from" source is considered to be the handle of an already opened stream;
 * This method will simply attempt to read from the source as needed!
 *
 * @param noctets the length of desired random sequence, in octets
 * @param out pointer to allocated memory space for the requested random octets;
 * NB: this method does NOT allocate space!
 * @param from handle of an already opened entropy source - this method will just READ from it as needed
 * @return the actual number of octets that were obtained
 */
int get_random_octets_from(int noctets, unsigned char *out, int from);


#endif /*SMG_RSA*/

The difference between the two functions that retrieve a specified number of octets is that one opens and closes the source itself (hence, every time it is called) while the second one simply reads the specified number of bits from an already opened source that is given as argument. Note that the function opening and closing the source itself uses the other one for the actual reading. The reason for providing both functions is simply the fact that opening/closing the source can easily be a significant overhead when reading only a few octets at a time.

To configure and open the source, the function used is open_entropy_source. This function provides a way to obtain a handler of the opened source that can then be passed repeatedly to the function that retrieves octets, as needed.

One important aspect to note above is that the two functions that retrieve random octets do NOT allocate memory for the output. They assume that the caller has allocated enough memory and provided a valid pointer. The reason for this is two-fold: first, as a design principle I prefer to keep allocation and de-allocation of memory in the same function as much as possible without passing responsibility around; second, for RSA purpose, the memory allocation is often done with specific MPI methods and using those (or indeed being aware of them) is outside the scope of this bit of code as it has nothing at all to do with getting random bits.

The actual implementations of the above functions are found in truerandom.c. UPDATED on 4 January 2018: this code has been changed, see Chapter 4 and corresponding .vpatch!

#include < stdio.h>
#include < stdlib.h>
#include < string.h>

#include < fcntl.h>
#include < unistd.h>
#include < termios.h>
#include < errno.h>

#include "smg_rsa.h"


int set_usb_attribs(int fd, int speed) {
	struct termios tty;
	if (tcgetattr(fd, &tty) < 0) {
		return -1;
	}

	//input and output speeds
	cfsetospeed(&tty, (speed_t)speed);
	cfsetispeed(&tty, (speed_t)speed);

	tty.c_cflag |= (CLOCAL | CREAD);	//ignore modem controls
	tty.c_cflag &= ~CSIZE;
	tty.c_cflag |= CS8;			//8 bit characters
	tty.c_cflag &= ~PARENB;	//no parity bit
	tty.c_cflag &= ~CSTOPB;	//only need 1 stop bit
	tty.c_cflag &= ~CRTSCTS;	//no hardware flow control

	//non-canonical mode
	tty.c_cflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP | INLCR | IGNCR | ICRNL | IXON);
	tty.c_cflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
	tty.c_cflag &= ~OPOST;

	//read at least one octet at a time; timeout 1 tenth of second between octets read
	tty.c_cc[VMIN] = 1;
	tty.c_cc[VTIME] = 1;

	if (tcsetattr(fd, TCSANOW, &tty) != 0)
		return -1;

	return 0;
}

int open_entropy_source(char* source_name) {
	int in, err;

	in = open(source_name, O_RDONLY | O_NOCTTY | O_NDELAY);
	if (in == -1) {
		printf("ERROR: failure to open entropy source %s: %sn", source_name, strerror(errno));
		return in;	//failed to access entropy source
	}

	fcntl(in, F_SETFL, 0);

	err = set_usb_attribs(in, B115200);
	if (err==-1) {
		printf("Error setting attributes on %s: %sn", source_name, strerror(errno));
		return err;
	}

	return in;	//source opened, return its descriptor
}

int get_random_octets_from(int noctets, unsigned char *out, int from) {

	int nread;
	int total = 0;

	while (total < noctets) {
           nread = read(from, out+total, noctets-total);
           //on interrupt received just try again
           if (nread == -1 && errno == EINTR)
             continue;
           //on error condition abort
           if (nread == -1 || nread == 0) {
             printf("Error reading from entropy source %s: %sn", ENTROPY_SOURCE, strerror(errno));
             return total; //total read so far
           }
           if (nread > 0)
		total = total + nread;
	}
	return total;	//return number of octets read
}

int get_random_octets(int noctets, unsigned char *out) {
	int in;
	int nread = 0;

	in = open_entropy_source(ENTROPY_SOURCE);
	if (in > 0) {
		nread = get_random_octets_from(noctets, out, in);
		close(in);
	}
	return nread;
}


As it can be seen above, a significant part of the code is simply for configuring the device. Most importantly, the configuration aims to turn OFF all flow control and to set the baud rate as required by FG. While this should work under most versions of Linux, be aware of the known pl2303 vs pl2303x issue with some connectors on older systems.

Note that an incorrectly configured device will simply block and since the functions above are written to always wait for the full number of bits required, they will *also* block in this case.

Finally, a basic test in tests/test.c:

#include "smg_rsa.h"

#include < stdlib.h>
#include < time.h>

void err(char *msg)
{
  fprintf(stderr, "%sn", msg);
  exit(1);
}

void time_entropy_source(int nruns, int noctets) {
	unsigned char buffer[noctets];
	int read, i;
	struct timespec tstart, tend;
	long int diff;

	clock_gettime(CLOCK_MONOTONIC, &tstart);
	for (i=0; i < nruns; i++) {
		read = get_random_octets(noctets,buffer);
		if (read != noctets)
			err("Failed reading from entropy source!");
	}
	clock_gettime(CLOCK_MONOTONIC, &tend);

	diff = tend.tv_sec-tstart.tv_sec;
	double kbps = (nruns*noctets) / (diff*1000.0);
	printf("ENTROPY source timing: %d kB in %ld seconds, at an average speed of %f kB/s over %d runs of %d octets eachn", nruns*noctets, diff, kbps, nruns, noctets);
}


int main(int ac, char **av)
{
	int nruns;

	if (ac<2) {
		printf("Usage: %s number_of_runsn", av[0]);
		return -1;
	}
	nruns = atoi(av[1]);

	printf("Timing entropy source...n");
	time_entropy_source(nruns,4096);

  return 0;
}

For testing, simply plug into an USB port your (previously audited, hopefully) FG, compile everything and then run the test with as many runs as you want. When it’s done (so after a while, depending on how many runs you asked for), it should print on screen the speed at which it obtained the random bits from FG.

Following the sad realisation that I can’t currently safely alter folder structure under V, I created a genesis patch for EuCrypt containing the intended structure (more like: writing in stone the intended structure) and it’s on top of this that each chapter of EuCrypt adds new content by means of vpatches. Here you have everything you need so far:

Note that the patch in Chapter 1 is NOT needed anymore directly for EuCrypt (it still is valid though in itself and as a further snip on the standalone mpi so I will keep it where it is). The changes that patch makes are already included in the version of mpi that ch1_mpi.vpatch simply bring into EuCrypt.

In the next chapter, since we have already an MPI implementation as well as a way to access true randomness, we can get ever so closer to the actual RSA itself, so stay tuned!

  1. I’ll likely expand on this as soon as I get to the actual implementation of RSA so in the next few chapters.[]

December 7, 2017

Introducing EuCrypt

Filed under: EuCrypt — Diana Coman @ 6:09 pm

EuCrypt is a self-contained library that Eulora server will use for its communication needs. EuCrypt has the following 5 main components:

  1. smg-comm – the implementation of the basic client-server communication protocol. This makes use of all the other components, namely:
  2. smg-serpent – the symmetric cipher that is used by smg-comm for everyday message exchanges between Eulora clients and server.
  3. smg-keccak – the implementation of the keccak function and sponge construction, used by smg-comm mainly as part of the data padding scheme.
  4. smg-rsa – the implementation of the RSA encryption algorithm using a source of true randomness and the sane-mpi implementation of big number arithmetics:
  5. sane-mpi – the implementation of big number arithmetics, as extracted from GnuPG 1.4.10 by Stanislav Datskovskiy.

The above structure is only meant to give you a high-level, top-down idea of what EuCrypt is made of. However, to actually understand EuCrypt at all 1, you’ll need to read the code and discussion for each of the above components and so I will start from bottom-up with the code + discussion of sane-mpi and then proceed to add parts and pieces until we get the whole library. Note that some of the components above are bigger than others and therefore I will split the discussion of those in several installments, so expect overall more than 5 posts on this. Comments and questions are welcome at any time and on any of the parts, so don’t be shy.

Chapter 1: sane-mpi

This component is implemented in C and offers support for storing and working with arbitrary large integers (multi precision integers or MPIs as we’ll call them from now on). The code is rather messy but at the moment there isn’t really anything better available and as it stands, this sane-mpi is at least readable now – mainly through the efforts of Stanislav Datskovskiy who extracted it from the big ball of mess that is GnuPG 1.4.10. I’ve made only a small further snip to his version, discarding a set of methods for accessing specific parts of an MPI. While such methods could conceivably be useful at some point, the point is that EuCrypt at the moment does *not* need them and moreover the existing implementation was so ugly as to need a re-write in any case. In short those parts failed to be either useful or at the very least sane, so they are no more.

There are lots of very useful comments all through the code of sane-mpi so the best option is really to actually read each file, at least once. There’s no point in repeating those comments in here. I’ll focus instead on a few aspects that I think are most relevant to EuCrypt and its purpose.

First issue is related to how and where MPIs are stored, because EuCrypt uses MPIs for storing private RSA keys. In principle, sane-mpi offers methods for allocating secure memory for any given MPI (see secmem.c and memory.c). However, a correct description of the service is that sane-mpi will attempt to allocate secure memory, with results depending on the machine and operating system you are running it on.

Note also that it pays to be mindful when performing operations with MPIs that are a mix of secure and insecure since undesired leaks may happen at intermediate steps. Sane-mpi attempts to avoid this by allocating additional secure memory space for intermediate results when one of the operands is in secure memory (see for instance mpi_mul method in mpi-mul.c). However, this (among others) means also that a simple arithmetical operation involving 2 MPIs will actually differ in execution depending not only on the values of the 2 numbers but also on where they are each stored. Moreover, this sort of side-effect memory allocation happens in fact in more than one place in sane-mpi.

Second issue refers to the way in which MPI operations are implemented in sane-mpi. Essentially following the implementation of one single arithmetical operation is not always very straightforward as it can easily lead one through several files (even when leaving aside the side-effect memory allocation issues and focusing instead only on the actual operations performed). From the point of view of fully and comfortably holding sane-mpi in head, this is rather unfortunate but it is what it is.

Since EuCrypt uses by necessity not only the multiplication and exponentiation but also direct bit setting of MPIs, I’ll highlight mpi-bit.c as another file that should be studied in more detail. In particular, the method mpi_set_highbit can be confusing as it does a bit more than the name suggests: it sets the indicated bit but it ALSO clears all bits above the one indicated. Similarly, mpi_clear_highbit clears the indicated bit AND all bits above it. By contrast, mpi_set_bit and mpi_clear_bit are the more straightforward versions, simply setting or clearing, respectively, the indicated bit and nothing more.

To download, build and test sane-mpi as will be used in EuCrypt you will need:

If you need help with V, there are is a gentle introduction courtesy of Ben Vulpes.

In the next chapter I’ll proceed to introduce parts of EuCrypt’s own RSA that puts all this sane-mpi to some actual use. Stay tuned!

 

 

  1. Note that you will be effectively trusting this library with your data and money whenever you play Eulora so better understand it before playing the game.[]

September 30, 2015

Foxy’s Little Bots

Filed under: EuCrypt — Diana Coman @ 12:07 pm

(This is Day 21 of Foxy’s Diary).

Update April 2021: latest versions of Foxybot working with Eulora 0.1.2b.

Like anywhere else, work in Eulora is a most wonderful thing …to watch. So rather than work on working, I worked on avoiding working – and here are my two bots ready to take over Euloran world:

BOTS ARCHIVE – 30 September 2015 sha512sum: Updated (29 October 2015): full client archive is not needed at all, use the minimal archive:
63e8dcbdd46b6103e3825909a7085cace6ea164086a33c2197dae81bd984ce2ae1e1d02af784db140f971a77ca41bb6c2431b7dee26105dee572ab2d62908ef8

And a minimal BOTS ARCHIVE – 30 September 2015 sha512sum:

3aa209ae2dc9566756a9a47a89f7d7fb45471e5f5ab739137c5cc0a81e56903568328e427d3514a88526f0098a086773ed9079b6966f8deb06eae1691cb601d5

Update (29 October 2015): for those who can’t review the code but otherwise trust its existing reviews by others, the slightly older version of the minimal archive (it contains additional files that are actually not needed as they aren’t changed with respect to standard client version) is still available to download, with sha512sum:

2432d6dacdd33f74a0e230ced27c838ff820d94e68f78abf85f24b7b0bf2f848d4a45bbf61dbe79f17560416e6a6eb732e37af99b563318e80c717ef23c6c8d9

Installation
Download the file (BOTS ARCHIVE), check its checksum and otherwise just compile it according to the instructions for your OS. If you download the minimal version, you’ll have to copy its contents to an existing client folder, as it contains only the src and data folders. Alternatively, you can run a diff and make a patch to apply to your own current version of the client if you already have some changes that you don’t want to lose. The bots are quite contained and in a folder of their own (src/client/foxybot) – other than that there are only minimal changes to the main client code source, most notably in cmdusers.cpp (for the /pilot command) and psengine (to register the bots’ window/widget). There are also help entries added for the added /pilot and /bot commands.

Usage

The archive above contains three main additions to the standard Eulora V0.1.1 client:
– /pilot command – this is very useful if you need to get to a specific point (such as a claim that you already have and you want to build now). Just give it the pos coordinates of the point you want to get to and it will rotate your character to face towards that point. After that just keep going forward until you reach your desired destination.
– /bot command – this is the control centre for Foxy’s bots. Currently, the /bot command supports crafting and exploring as activities. If you type just /bot craft or /bot explore, there will be some help info shown in the bots’ window. Basically the structure is /bot craft|explore numberOfAttempts [parameters], where craft and explore have their own, specific parameters. You can also stop or reset the activities at any time with /bot stop and /bot reset respectively.

Features

The exploring bot will move in a straight line of length equal to the size parameter (measured in steps so if you want the claims more spaced out, set your mode to running, otherwise set your mode to walking to cover less ground). If size is smaller than total number of attempts, the bot will go forward size times and then backward size time and repeat. Provided you have all required ingredients, the exploring bot will build all tiny and small claims. If there are no ingredients, it will just continue ignoring those claims (it will NOT lock small and tiny claims). Any claim bigger than small will be locked. Note that the exploring bot does NOT train – it will ignore any ranking message and just keep going until it does the specified number of explore attempts. If you start with a tool in hand, the bot will change it when it is worn-out. Please note that in this situation the bot will just take the first tool of the same kind it finds in your inventory, without checking its quality or anything else. Hence, if all you have are worn-out tools, the bot will happily keep changing them – it won’t fail, but you won’t get a lot of such exploring either.

The crafting bot WILL train if Heina is in range. Prior to a training attempt, it will move the ingredients for your current craft to the container in order to avoid being overweight (since you can’t train when overweight). However, if you have other things in your inventory and you are still overweight after moving the ingredients, it will just fail to train and keep crafting. The crafting bot requires you to equip the recipe/blueprint for what you want to craft – that’s how it will know what it is you want to craft. It can handle imprecise recipes using either minimum number of ingredients (use parameter m) or maximum number of ingredients (M). By default, it will use maximum number of ingredients. There is NO use of storage, hence you need to have in your inventory all ingredients for your whole crafting run. If it runs out of ingredients, the bot will simply stop. Similarly, the bot will also stop if your tool and/or container are worn-out (it will NOT change tool/container).

Both crafting and exploring will also write out a log with some relevant information. These logs are found in the same place as your other logs (such as the chat log).

Design

Note that the bots are almost entirely re-written since the previous version. The new overall architecture consists in a control centre (foxybot.cpp) which receives and handles all commands passed as parameters to /bot. Foxybot holds a list of activities (botactivity.cpp) that it can start, stop and reset. Each different “bot” is implemented as a new activity inheriting from the generic botactivity (abstract). Hence, exploring for instance is ExploreActivity and crafting is CraftActivity. All activities interact with the “world” (main client code) through the WorldHandler class.

As previously, the crafting and exploring activities are modeled as sequential state machines. Once started, each activity will perform its first step and then schedule itself (through an internal timer) to be called again after a specified interval. Each time the activity is called again, it will attempt to perform one additional step and then move onto the next state. However, on top of that, there are also higher-level states defined for a generic activity, such as  Ongoing, Stopped, Error, Finished. Specific states include init readRecipe, use_combine, train and take_result for crafting. The internal logic for changing states is implemented in method NextAction. Method Perform is called each time the timer expires and it will call the method corresponding to the current internal state (DoInit for the init state, DoExplore for the explore state etc.)

If you want to implement a new activity, create a class inheriting from BotActivity, implement its specific behaviour and then add an object of it to the list of activities in foxybot.

Testing and Issues

As a known issue, the exploring bot will tend to slowly creep backwards if your overall number of attempts is larger than size (hence, length of line). This is because Eulorans move faster backwards than forwards… I’m not entirely sure if that truly says anything about Eulora as a world :D

Some parts of the bots have seen very little testing, mainly because it was really quite difficult to get more testing without a test environment. Most notably, the m/M part was minimally tested since at the moment I don’t have the ingredients/interest for most of recipes that are imprecise. Also note that you probably do NOT want to use this feature for the shredding recipe, unless you either want to use m or you really, truly have a ton of ALL the recipes that are being shred (the bot knows no middle ground and it doesn’t make exceptions if you have a ton of cft recipes but only 10 congressional pulp recipes).

Enjoy the bots, have fun in Eulora, let me know how it goes and don’t get your hopes too high on me fixing /changing/ altering those bots any time soon.

Work on what matters, so you matter too.