BINARY DIVISION

*(Last Modified:
04 November 2010 06:08:17 PM
)*

A very simple and reasonably efficient algorithm for dividing one binary integer by another can be developed by directly applying the rules for performing long division that are (or at least used to be) a required part of an elementary school education, typically in grades 4 or 5.

Consider the case of, in base-10, dividing 42 into 38963

The traditional long division steps would look something like this:

**
927**

**
-------**

** 42 )38963**

**
-378**

**
-----**

**
116**

**
-84**

**
----**

**
323**

**
-294**

**
-----**

**
29**

Hence the result is 927 with a remainder of 29. Since, when performing integer division, the fractional part is truncated (i.e., discarded), we are looking to implement an algorithm that would yield 927 as the answer.

Since we are implementing this algorithm on a computing device of some kind, let's formulate the steps taken a bit more rigorously. As an aid to understanding, the above problem has been augmented to better match the explanation.

**
927 (Quotient)**

**
-------**

** (Divisor) 42 )38963
= 0th Remainder = (Dividend)**

**
-37800 = 9*42*100 = 1st Product**

**
-------**

**
1163 = 1st Remainder**

**
-840 = 2*42*10 = 2nd Product**

**
----**

**
323 = 2nd Remainder**

**
-294 = 7*42*1 = 3rd Product**

**
-----**

**
29 = 3rd Remainder = (Remainder)**

The idea is to find the values for Quotient and Remainder that satisfy the following two requirements:

- Dividend = Quotient * Divisor + Remainder
- 0 <= Remainder < Divisor

We can satisfy the first requirement by initializing Quotient to 0 and Remainder to the Dividend. Our algorithm progressively increases Quotient and decreases Remainder, while maintaining the equality required by the first constraint, until the second condition is also satisfied.

Our algorithm is nothing more than a simple cycle that is repeated until the second condition is met. For each cycle, we

- Multiply the Divisor by the largest integer power of the number base that keeps the product from exceeding the Remainder from the previous cycle.
- Multiply the product from the first step by the largest digit value that still keeps the new product from exceeding the Remainder from the previous cycle.
- Add the product to the Quotient from the previous cycle to get the quotient from the current cycle.
- Subtract the product from the Remainder from the previous cycle to get the Remainder from the current cycle.

The following table shows the effect of each step for each of the three cycles through the above algorithm. Notice that "Cycle 0" is simply the initialization step.

Cycle | Step 1 | Step 2 | Step 3 | Step 4 |

0 | N/A | N/A | Quotient = 0 | Remainder = Dividend = 38963 |

1 | Product = 42*100 = 4200 | Product = 42*900 = 37800 | Quotient = 0 + 900 = 900 | Remainder = 38963 - 37800 = 1163 |

2 | Product = 42 * 10 = 420 | Product = 42*200 = 840 | Quotient = 900 + 20 = 920 | Remainder = 1163 - 840 = 323 |

3 | Product = 42 * 1 = 42 | Product = 42*7 = 294 | Quotient = 920 + 7 = 927 | Remainder = 323 - 294 = 29 |

Before we apply this to binary division, let's make note of a couple of points. The first is that after we have multiplied the divisor by the number base a sufficient number of times, we can obtain the subsequent values needed at each step by dividing it by the number base once. Second, in binary the only two values for each digit are 0 and 1, hence determining which is needed is trivial and can be performed with a single test. Therefore Step 1 and Step 2 of the algorithm can be effectively combined.

The following is pseudo-C code for a function that performs binary division.

BinaryDivision(Dividend, Divisor)

{

`Quotient = 0;`

Remainder = Dividend;

Product = Divisor;

`Term = 1;`

MAXTERM = 2^(BITS-1) // BITS: width of Divisor,Dividend

while (Term < MAXTERM) AND (Product < Remainder)

{

Product = 2 * Product;

Term = 2 * Term;

}

while (Term >= 1)

{

if (Product <= Remainder)

{

Quotient = Quotient + Term;

Remainder = Remainder - Product;

}

Product = Product / 2;

Term = Term / 2;

}

return Quotient;

}

The final value of the Remainder is also known as the modulus. Some applications may find this useful.

Notice in the above algorithm that once Dividend and Divisor are used to initialize Remainder and Product that they are never used again. It is therefore possible to save time and memory by eliminating these last two variables and simply using Dividend and Divisor directly. However, doing so does sacrifice a bit of readability and, hence, maintainability -- appropriate comments in the code can take care of this, though.

One question that should always be asked when implementing any algorithm is how it behaves for special cases that are problematic. For division, this is obviously when the Divisor is 0. How does this algorithm behave? Is it reasonable? Is it acceptable?

Depending on how the algorithm is implemented, there are some performance tricks that can be used. For instance, multiplying and dividing a value by 2 is respectively equivalent to left-shifting and right-shifting the value by one bit. Since dividing 1 by 2 yields a value of 0 (in integer division), the loop test for the second loop can be done by simply checking if Term is non-zero.

Notice that we really want to stop the first loop when we have found the largest value of Product that is not greater than Remainder. The loop in this algorithm will normally find the first value that is greater. However, this is not a functional problem as it only results in an extra pass through the second loop in which the test for the if () statement will fail and, hence, no harm done other than the wasted time. We could avoid this by automatically dividing Product and Term by 2 before we start the second loop, however note the (Term < MAXTERM) condition in the loop test for the first loop. This is there because, unlike when we use pencil and paper, the numbers in our computing device have a fixed number of bits and we can't multiply Product by 2 so many times that it ends up exceeding this limit. Thus we could exit the first loop as a result of the first condition ending with a Product that is not greater than Remainder. Since this is the very question asked within the second loop, there is nothing to be gained by attempting to avoid the extra pass.

We can also scale back some of the tests by noting that Term will always have exactly one bit set in it and that MAXTERM is simply the value corresponding to when the most-significant-bit of Term is set, hence we can just test for this condition. Also, we know that the last pass through the second loop will occur when Term is exactly equal to 1 and that this is the only time when the least significant bit of Term will be set. Hence we can just look for that condition and stop once we detect it recognizing that we will still need to perform one more pass through the loop. However, that pass only has to perform the if() statement since the adjustments to Dividend and Term are now extraneous. Hence we have:

BinaryDivision(Dividend, Divisor)

{

`Quotient = 0;`

Remainder = Dividend;

`Term = 1;`

Product = Divisor;

while (~Term<MSB> AND (Product < Remainder) // While MSB of Term is 0

{

Product = 2 * Product;

Term = 2 * Term;

}

while (~Term<LSB>) // While LSB of Term is 0

{

if (Product <= Remainder)

{

Remainder = Remainder - Product;

Quotient = Quotient + Term;

}

Product = Product / 2;

Term = Term / 2;

}

if (Product <= Remainder)

{

Quotient = Quotient + Term;

Remainder = Remainder - Product;

}

return Quotient;

}

The adjustment of Remainder in the trailing if() statement could also be omitted if the modulus is not needed.

In programs running on processors, performing tests at the bit-level are usually not efficient since processors are optimized for byte- and higher-level operations. However, programmable logic such as ASICs and FPGAs lend themselves well to bit-level operations and generally perform better if higher-level operations are minimized.

Although not at all obvious, the second loop can be implemented in programmable logic such that it only requires a single clock cycle per pass. Also, register widths are for more flexible in programmable logic, thus it is possible to completely forego the first loop and start with the Divisor pre-multiplied by the maximum number of bits. Effectively, this makes the algorithm:

BinaryDivision(Dividend, Divisor)

{

`Quotient = 0;`

Remainder = Dividend;

`Term = 2^(BITS);`

Product = Divisor*Term;

while (~Term<LSB>)

{

Product = Product / 2;

Term = Term / 2;

if (Product <= Remainder)

{

Quotient = Quotient + Term;

Remainder = Remainder - Product;

}

}

return Quotient;

}

By starting with Product equal to Divisor multiplied by 2^{BITS} we guarantee that
divisor will be greater than Remainder the first time through the loop. We can
therefore automatically perform the adjustments on Product and Term before the
first invocation of the if() statement. By doing so we can also exit the loop as soon as we detect the lsb of
Term being set without needing to invoke the if() statement a final time.

This algorithm will always require exactly BITS passes through the loop. The first several of these passes will almost never be needed or productive. Unfortunately, the most obvious steps needed to restrict the number of passes to the minimum necessary require about as much time as the passes they trim. On the other hand, especially in embedded applications, this very consistency in the number of passes required is frequently an advantage worth the associated performance penalty since it lends itself to reliable timing and pipelining.

The following Verilog code implements the algorithm detailed in the previous
section. Taking advantage of the flexible register sized in an HDL, the initial
value of working product is set equal to divisor*2^{31} by simply
storing it in the appropriate place within the product field. Note that it is
multiplied by 2^{31} and not 2^{31} simply because it is also
implementing the initial division of both Product and Term by 2 as well.

Note that all assignments for all possible cases are explicitly coded. The is to ensure that no inferred latches are created during synthesis. This is not strictly necessary, but is commonly believed to be good practice. Also note that some registers are larger than they absolutely have to be. This is to keep the operands in arithmetic operations the same size. The synthesis tool may not care about this and will almost certainly trim the unneeded registers. However, it will produce some warnings in the process. If the synthesis tool can handled mismatched operand sizes, these warnings can be eliminated by removing the unneeded bits.

//////////////////////////////////////////////////////////////////////////

`// William L. Bahn`

reg state, next_state;

//

// FILE: division32.v

//

// DESCRIPTION: This module divides one integer by another producing

// an integer quotient and integer remainder.

//

// It works with 32-bit unsigned integers.

//

//////////////////////////////////////////////////////////////////////////

//========================================================================

module division32(

input wire [31:0] dividend,

input wire [31:0] divisor,

output reg [31:0] quotient,

output reg [31:0] modulus,

input wire go,

output reg done,

input wire clk, rst);

parameter S_LO = 1'd0, S_HI = 1'd1;

reg [31:0] q, next_q; //
Working Quotient

reg [31:0] t, next_t; //
Working Term

reg [62:0] r, next_r; //
Working Remainder

reg [62:0] p, next_p; //
Working Product

reg [31:0] next_quotient,
next_modulus;

reg next_done;

next_t = 32'h80000000; // t = 2^31

parameter

S_IDLE = 1'd0,

S_RUN = 1'd1;

//--------------------------------------------------------

// FSM Registers

//--------------------------------------------------------

always @(posedge clk or posedge rst)

begin

if (rst)

begin

q <= 32'd0;

r <= 63'd0;

p <= 63'd0;

t <= 32'd0;

quotient <= 32'd0;

modulus <= 32'd0;

done <= HI;

state <= S_IDLE;

end

else

begin

q <= next_q;

r <= next_r;

p <= next_p;

t <= next_t;

quotient <= next_quotient;

modulus <= next_modulus;

done <= next_done;

state <= next_state;

end

end

//--------------------------------------------------------

// FSM Next State Logic

//--------------------------------------------------------

always @*

begin

case(state)

S_IDLE:

begin

next_p = {divisor,31'd0};
// p = divisor*2^31

next_q = 32'd0;
// q = 0

next_r = {31'd0, dividend}; // r =
dividend

next_quotient = quotient;

next_modulus = modulus;

`if (go)`

next_done = LO;

begin

next_state = S_RUN;

`end`

next_done = done;

else

begin

next_state = state;

`end`

next_t = {LO,t[31:1]}; // t = t/2

end

S_RUN:

begin

next_p = {LO,p[62:1]};
// p = p/2

next_q = q + t;

if (p <= r)

begin

```
```

next_r = r - p;

`end`

next_q = q;

else

begin

```
```

next_r = r;

`end`

next_quotient = q + ((p <= r)? t:0);

if (t[0]) // Term == 1

begin

next_modulus = r - ((p <= r)? p:0);

next_done = HI;

next_state = S_IDLE;

`end`

next_quotient = quotient;

else

begin

next_modulus = modulus;

next_done = done;

next_state = state;

`end`

end

endcase

end

`endmodule`

//========================================================================