BBC-Quality Concurrent Hash Functions

(Last Mod: 27 November 2010 21:37:21 )

NOTE: WORK VERY MUCH IN PROGRESS

Bookmarks on this page

The View from 30,000 feet

The View from 30,000 feet

The BBC algorithms, which are used to construct BBC codewords from messages and, conversely, extract messages from BBC codewords, have at their heart a publicly-known hash function. This hash function is used to map a given message prefix to one or more mark locations within the codeword. As with most things, the characteristics that a hash function must possess in order to be "good" depend heavily on the application being considered. More to the point, the characteristics that make a hash function good for one application may be entirely irrelevant for another application. Much work has been done to define the necessary and/or desirable characteristics of hash functions for a variety of applications, such as information storage and retrieval or cryptography, and to construct hash functions that possess those characteristics. Unfortunately, the relatively new Theory of Concurrent Codes has never been the focus of this kind of work and so the very question of what constitutes a good concurrent hash function has never been considered in any depth, particularly with any degree of mathematical rigor.

At the present time, only a working definition of what constitutes a bad concurrent hash function exists: a bad concurrent hash function allows an attacker to construct effective attack packets. An effective attack packet is simply a packet that forces the decoder to expend a significantly disproportionate amount of decoding effort while remaining within the threat model. To date, the threat model limits attack packets to no more than a 33% mark density. To get a sense of what constitutes a "disproportionate amount of decoding effort," we must first understand what the decoding effort is. In most decoders, the two functions that are most likely to be the performance-limiters are the ability to check for marks in the codeword and the ability to authenticate legitimately encoded messages. The system design will ultimately dictate which one proves to be the actual performance-limiter, so ideally we must consider both; however, since the the number of marks is far greater than the number of messages and, more importantly, decoding a packet is intrinsically tied to the real-time performance while message authentication can tolerate significantly greater latencies, we will focus primarily on the number of marks that the decoder must examine.

To provide a baseline for comparison, we will define the "nominal" decoder work load to be the number of marks that would be examined if only marks necessary to proceed down the correct branches of the tree are found -- in other words, no false hits during decoding. Thus a message of m message bits with k checksum bits represents a nominal workload of 2m+k.

Now that we have some sense of what determines the decoding effort, it is useful to consider the amount of effort that an attacker can force upon the legitimate receiver even if the hash function has no exploitable weaknesses. If the attacker merely transmits a packet consisting of random marks, then at a density of 33% the decoder is expected to expend twice nominal amount of effort (assuming that the basic BBC algorithm is in use). If there is no legitimate message in the packet, then the number of marks that the decoder will have to examine only four since the nominal effort would only require examining two marks (the case of a leading 0-bit and a leading 1-bit). If there are friendly messages in the packet, then the decoder will explore slightly less than twice as many marks as it nominally would; the reason that the expected effort is not exactly twice is the presence of the checksum bits, which can't spawn new false messages.

However, an attacker can use a BBC encoder to place the marks in their attack packet, so there is nothing to prevent them from constructing a packet that consists of nothing but random legitimate messages and overlay a sufficient number of them to produce the desired overall mark density. Consider a BBC-encoder that is configured to encode 1024-bit messages with 32 terminal checksum bits and a codeword expansion factor of 1024. Note that while each message places 1056 marks into the packet, the decoder only has a nominal workload of 2064 marks. For this example, the codewords are thus 1048576-bits long and, at the bounds of the threat model, an attack packet would contain 349525 marks. If all messages produced distinct marks, this would represent 331 messages. Taking into account that, at this density, many of the marks will collide it would be expected that the attacker will be able to add about 425,000 marks to the packet representing approximately 403 messages. In the decoding process, the receiver would not only need to examine all 425,000 of these mark locations but, for each one (except the relatively small number of checksum bits), examine both possible children in the decoding tree. Since one of these children is legitimate and, therefore, already included in the 425,000 marks already counted, the total count of marks to be examined doubles to 850,000. This is the baseline decoder effort that would be required even if no false nodes in the decoding tree were associated with a mark in the packet. However, in the vast majority of cases the second mark will be a false path and will, at 33% density, have a 33% chance of surviving.. Beyond that point each false path has two potential children that could keep it alive or even spawn new false messages. The net effect is that, on average, another 850,000 marks will be examined in order to decode the packet. Thus, for every mark in the packet, the decoder will have to examine, on average, four marks during the decoding process. Taking into account the effect of the checksum bits and assuming that the packet consisted of 1 friendly message and 403 random attack messages had to be decoded, the decoder workload would be 802 times the nominal workload for a single message. In addition, the attacker would force the decoder to authenticate 404 times as many messages.

Another obvious attack strategy would be the "bushy-tailed" attack in which all of the attack messages share a single prefix so as to increase the number of messages in the packet for the same number of marks. Using this strategy in the example system used previously, the attacker could force the decoder to authenticate approximately 12,500 attack messages. However, because the majority of the marks in these messages are the checksum bits, the decoder would only need to examine 500,000 marks to process the attack packet. If a friendly message were also contained in the packet, the decoder would therefore need to perform 243 times the nominal workload to recover the message but authenticate 12,500 times as many messages.

Thus we get a feel for the tradeoff that is taking place, while the decoding workload was reduced by a factor of 3.3, the authentication workload was increased by a factor of 31, 9.4:1 tradeoff. However, this may or may not be a beneficial trade for the attacker to make since that determination depends completely on the decoding and authenticating performance and limitations of the receiving system. What is valuable to note, however, is that the designer of the receiver has control over this tradeoff by specifying how many checksum bits are required. For instance, let's consider the case where the prior example is changed from 32 checksum bits to 64 checksum bits. The random message attack would be virtually unchanged with 780 times the decoding workload and 390 times the authentication workload. The bushy-tailed attack, on the other hand, would increase the decoding workload by 223 but the authentication workload by only 6440. Thus the tradeoff is only 4.7 to one. In general, doubling the number of checksum bits will reduce the tradeoff by a factor of two.