ECE-1021

Miscellaneous Operators

(Last Mod: 27 November 2010 21:38:41 )

Objectives
Overview
Grouping Parentheses
Type Cast
Size In Bytes
The Conditional Operator
The Assignment Operator
The Comma Operator

Objectives

Understand purpose and use of the six operators not grouped in any of the other categories.

The operators in this section are, as the name implies, the operators that do not fit well into any of the other categories. That is really all they have in common, so they will be treated as completely unrelated entities here.

Operation	Operator	Level	Assoc	Syntax	Evaluates to
Grouping	`()`	`1`	`L`	`( expression )`	`value of expression`
Type Cast	`( type )`	`2`	`R`	`( type ) expr`	`expr represented as type 'type'`
Size in Bytes	`sizeof`	`2`	`R`	`sizof object`	`size needed to store object`
Conditional	`? :`	`13`	`R`	`test? expr1:expr2`	`expr1 if test T, expr2 otherwise`
Assignment	`=`	`14`	`R`	`lvalue = expr`	`The value assigned`
Comma	`,`	`15`	`L`	`expr1 , expr2`	`expr2`

As can be seen, these operators exist at the two extremes of the Precedence List.

Grouping Parentheses

Grouping parentheses are the highest priority operator, along with the array index operator and the two structure access operators. Grouping parentheses serve the same person in an expression as they have throughout your mathematics education - they override the "standard" order of operations and force evaluation in the order indicated by the parentheses.

A loosely related use of the parentheses is the function expression operator - in fact many texts make no distinction between the two. But there is a difference. When used as a function expression, the expression within the parentheses (which even with multiple arguments could be viewed as an expression in light of the comma operator) is treated as a void expression in that it produces no value upon evaluation. Instead, it passes the values of each argument - after evaluating it if necessary - to one of the parameters in the function being called. Furthermore, it forces the function to be evaluated (called) and the return value of the function used in further evaluation of the expression. The grouping parentheses, on the other had, effectively force evaluation of the expression within the parentheses and replace the expression and its surrounding parentheses with the resulting value.

Type Cast

Syntax:

( type ) expression

The type cast operator evaluates to a value equal to the value of its operand - subject to the constraints of any necessary data type conversions - but represented in the indicated data type. Conceptually, it effectively creates a temporary read-only variable of the type indicated and copies the value of the expression being cast to that variable performing any necessary conversions. That temporary variable is then used in the expression instead of the original value and the variable is destroyed immediately after its use. While this may sound confusing, it is no different than what happens with any other operator. For instance, consider the expression:

y = m + x + z;

The left most addition operator is executed and the value is returns is stored someplace in memory using some type of representation. If m and x are both of type int, then the result is represented as an int. But if one of them is of type double, then the result is represented as type double. This value is then one of the operands for the right most addition operator. That first intermediate result has to be stored someplace. The compiler sets aside some memory for that storage and once it is done with that value it frees up that memory for other uses. Since we have no control over where that storage is located or how long it exists, we can't access it or alter it. As a result, the following statements are illegal:

y = (m + k)++; /* illegal */

y = (m = k) += 3; /* illegal */

The expressions (m + k) and (m = k) both evaluate to values, not objects.

The type cast operator does NOT change the type of a variable nor does it alter in any way the contents stored in a variable. This can be remembered by noting that operand is an expression and not an object - therefore the operator works with values and not with the storage elements of those values. For instance, consider the expression:

m = (int) (k + y);

The cast operator is not working on the variable k or y, it is working with the value resulting from adding the value stored in k to the value stored in y. This does not change if there is a single variable or even no variable at all:

m = (int) y;

m = (int) 3.2;

Are both perfectly valid expressions.

It is important to note that the cast operator only casts the value of the its operand expression. The expression is fully evaluated and only the result is cast. For instance, consider the following:

double x, y, z;

int j, k;

j = 12;

k = 8;

x = j / k;

y = (double) j / k;

z = (double) (j / k);

Since j and k are both integers, integer division will be used any time the expression j/k is evaluated resulting in a value of 1 instead of 1.5. Therefore a value of 1 will be stored in x.

But we are not evaluating the expression j/k in the next statement. Because the cast operator has higher precedence than the division operator, it is executed first and division operator's left operand is therefore not j but, instead, the temporary variable of type double whose value is the same as the value stored in j. Since one of its operands is of type double, floating point division is used and the value 1.5 is stored in y.

In the last statement, however, we are performing integer division because the cast operator is working on the result of the expression (j/k) and therefore the value stored in z is 1.

The cast operator can be abused since any time the cast operator is used it suppresses compiler warnings that might otherwise result - basically the programmer is telling to compiler that they are confident they know what they are doing and to ignore any potential problem that it might otherwise flag. As a result, most programmers contend that the cast operator should only be used where it is used for a very specific and necessary reason.

Having said that, one place where it is good practice (and there are certainly those that would disagree) is to cast any numerator or denominator that is of an integer data type in a division operation if floating point division is intended. Strictly speaking, it is sufficient to ensure that only one of them is of a floating point data, but inadvertent integer division is such a common and such a subtle mistake that this is one situation when a bit of overkill is justifiable. Therefore, the best way to write the assignment operation from the previous example is as follows:

y = (double) j / (double) k;

Or even:

y = ((double) j) / ((double) k);

This last expression is actually in response to another common programmer error - namely not ensuring that the numerator and denominator are what was actually intended. Therefore it is not a bad practice to surround the numerator and the denominator for division and modulo operators with parentheses.

Size In Bytes

The sizeof operator requires the expression to the right of it to evaluate, at compile time, to either a data type or an identifier for a data object of known size. It then evaluates to the number of bytes needed to store that object or a an object of that data type.

Although the sizeof operator does not require the use of parentheses - it is an operator, not a function - it is a good practice to do so. It has a sufficiently high precedence that strange errors can occur if this is not done - by using parentheses you are explicitly dictating which object or data type you want to know the storage requirements for. A common symptom of not doing this is for the pointer indicator to not be included as part of the data type expression correctly with the compiler complaining about suspicious pointer conversions as a result - a warning that is seldom safe to ignore.

The most common usage, at least for novice programmers, is to determine the size of a certain data type. While perfectly reasonable to do this, a better practice - when feasible - is to use the name of the relevant object. For instance, consider the following code:

int x;

int bytes;

/* ... a few pages of code */

bytes = sizeof (int); /* Number of bytes required to store the variable x */

This works just fine and will behave as intended even if the code is later compiled on a compiler using a different size for the int data type. That is, until the programmer decides they need for x to be a long and change the code to:

long x;

int bytes;

/* ... a few pages of code */

bytes = sizeof (int); /* Number of bytes required to store the variable x */

This can lead to extremely difficult to diagnose problems. As we have done with several other issues, the goal is to write the code so that the entire code is updated any time we make a single change - in other words, to key changes to a single location whenever possible. Using #define macros is one very common way to do this and using the object name instead of the data type as sizeof operands is another.

Since the compiler knows that x is of type int, in knows that the number of bytes required to store x is the same as the number of bytes needed to store a value of type int. Hence we can write the original code as follows:

int x;

int bytes;

/* ... a few pages of code */

bytes = sizeof (x); /* Number of bytes required to store the variable x */

But now if we change x so that it is of type long (or any other type) the value stored in bytes will automatically compensate.

Since the sizeof operator evaluates to the number of bytes required to store a value of a particular type, the following expressions are legal:

bytes = sizeof (3.2)

bytes = sizeof (x + y);

The in each case, the value of the expression has a particular data type and the compiler knows what the type is, so the sizeof operator evaluates to the number of bytes required to store that value as that type.

If the expression evaluates to an identifier for a static array - an array whose size is known at compile time, then the sizeof operator evaluates to the total number of bytes required to store the entire array. For example:

double x[100];

int total, each, count;

/* ... a few pages of code */

total = sizeof (x); /* Number of bytes for entire array */

each = sizeof (x[0]); /* Number of bytes for one element */

count = total / each; /* Number of elements in array */

While the above trick might look quite useful, it is actually of very limited utility. In order to work, the array must be statically defined in which case the number of elements in the array must be known at compile time. Instead of playing the above game to determine the number of elements in the array, it is far better to key the number of elements in the array to a symbolic constant:

#define COUNT (100)

double x[COUNT];

int total, each, count;

/* ... a few pages of code */

total = sizeof (x); /* Number of bytes for entire array */

each = sizeof (x[0]); /* Number of bytes for one element */

count = COUNT; /* Number of elements in array */

It is still conceivable that the programmer might want to verify that COUNT is exactly equal to total/each before entering some critical region of their code so that if they made some kind of a logic error someplace - perhaps they have many arrays and all of them should be keyed to COUNT but one of them is not - then they can detect the error and exit gracefully.

The Conditional Operator

Syntax:

(test expression) ? (true_expression) : (false expression)

Also know as the ternary operator because it is the only operator that takes three operands, this operator is perhaps one of the most useful constructs in the C language. It is also one that seems to be among the most intimidating to new C programmers.

Like any other operator, when used in an expression the operator and its operands are evaluated to yield a value and the operator and its operands are then replaced by that value within the expression.

In this case, the first operand - the test_expression - is evaluated first and, if it anything other than zero (i.e., if it is True) then the value of the entire expression is the same as whatever the value of the second operand - the true_expression - is. If the first operand evaluates to exactly zero (i.e., if it is False) then the value of the entire expression is the same as whatever the value of the third operand - the false_expression - is.

Like the logical operators, there is a guarantee that is included with this operator - namely that if one of the operands is not needed to determine the value of the expression then it is guaranteed not to be evaluated at all. As a result, the first operand will always be evaluated. But, based on the resulting value of the first operand, only one of the remaining two operands will be evaluated.

Here are a number of ways to convert the value stored in a variable to its absolute value:

if ( x < 0.0 )

x = -x;

...

x = ( x < 0.0 )? -x : x;

...

x *= ( x < 0.0 )? -1.0 : 1.0;

...

x -= ( x < 0.0 )? 2.0*x : 0.0;

...

To select the maximum value from among two values:

max = (x > y)? x : y;

The operator can be used within more complicated expressions. For instance, the last example can be expanded to store the maximum of four different values:

max = ( ((a>b)? a:b) > ((c>d)? c:d) )? ((a>b)? a:b) : ((c>d)? c:d);

While correct, the above code is not very readable and is presented only to provide an example.

Because the conditional operator is nearly at the bottom of the precedence list, it is very easy to encounter problems with improper evaluation order - meaning an order other than the one intended - unless parentheses are used pretty liberally. For instance, consider the following:

max = ( (a>b)? a:b > (c>d)? c:d )? (a>b)? a:b : (c>d)? c:d;

This may look like the same expression - all that was done was was to remove the parentheses surrounding each of the individual inner conditional expressions. But after grouping the operators according to the their precedence and noting that relational operators are significantly higher than the conditional operators, we have the following:

max = ( (a>b)? a:[b > (c>d)]? c:d )? (a>b)? a:b : (c>d)? c:d;

Here the square brackets are used to show the implied groupings due to the relational operators (which are left associative). Clearly the actual order is fundamentally different than the intended order.

Another example might be:

maxTimes2 = (a>b)? a:b * 2.0;

Since the multiplication operator is higher precedence that the conditional operator, this evaluates as:

maxTimes2 = (a>b)? a:(b * 2.0);

The moral of the story is to be liberal with parentheses whenever the conditional operator is used. In fact, it is not unreasonable to adopt an approach similar to that used for function-like macros - namely to surround each of the three operands with parentheses as well as the conditional expression as a whole. This would make the above expression look like:

maxTimes2 = ( (a>b)? (a):(b) ) * 2.0;

The Assignment Operator

Syntax

lvalue = expression

The right hand operand, expression, is evaluated and the result is then stored in the object referenced by lvalue. Hence lvalue must refer to an object (recall that an object is defined as being a region of data storage whose contents can represent values).

The discussion in this section applies equally to the ten abbreviated assignment operators, even though those operators have been (somewhat arbitrarily) categorized according to the underlying operation that is performed prior to the assignment. All ten of these operators have the same precedence as the basic assignment operator.

It is important to note that this discussion does not apply, at least not completely, to the increment and decrement operators. Not only do these two operators have much higher precedence (2 instead of 14) but the value that an expression containing them evaluates to is not always equal to the value that gets assigned to the object.

Like nearly all operators, the expression consisting of the operator and its operands evaluates to a value and that value can then be used in a larger expression. In the case of the assignment operator, the value of the expression as a whole is the value that was assigned to the object. In fact, the assigning the value to the object is a side effect of the operator. Although most of the time the assignment operator will be used solely for its side effect (namely changing the value stored in an object) and the result of the expression will be discarded, there are a couple of common uses for the expression result as well. One is when we wish to initialize several variables to the same value:

x = y = z = 1; /* simple and clear way to initialize several variables */

The intent of the above expression is clear and it saves a couple of lines of code. It also emphasizes that all three variables are getting set equal to the same value. In some circumstances, that might be worth emphasizing or even imposing. The use of the above statement would make it difficult to not set them equal to the same thing.

This above statement equivalent to:

x = (y = (z = 1)); /* = is right associative */

Let's say that, after initializing one variable to a particular value, we wanted to initialize another variable to twice that value and a third variable to three time whatever the second variable was initialized to. We might try writing something like:

x = 3 * y = 2 * z = 1; /* illegal since * outranks = */

But this would not compile because it would parse as follows:

x = ((3 * y) = ((2 * z) = 1)); /* illegal */

And neither (3 * y) nor (2 * z) refer to objects that can be written to.

If we were really intent on patching up the above expression so that it works, we could write:

x = 3 * (y = 2 * (z = 1)); /* legal, but not recommended */

This is arguably not too offensive to the senses, but isn't the following significantly clearer?

z = 1;

y = 2*z;

x = 3*y;

Another situation when using the value of an assignment expression is when we want to both check the return value of a function and also save that value for later use. An example is the following:

if ( NULL == (fp = fopen(filename, mode)) )

exit(EXIT_FAILURE);

The fopen() function returns a value after attempting to open a file and if that file is equal to the symbolic constant NULL then we want to exit the program because the file failed to open. However, if the value is not equal to NULL, then we need to store the value that is returned because we will need to pass it to other file-oriented functions that we call. Using the value of the assignment statement gives us a short, compact way to do this.

The alternative, which is perfectly acceptable and even preferred by many, would be:

fp = fopen(filename, mode);

if ( NULL == fp )

exit(EXIT_FAILURE);

The Comma Operator

Syntax

expression1 , expression2

The comma operator is somewhat similar to the curly braces used to create compound statements in that it can be used to join what would otherwise be multiple statements into a single statement. The primary difference is that the compound statement formed by the curly braces is a "void expression" meaning that it produces no value. The comma operator, on the other hand, produces a value that is equal in type and value to the right hand expression.

The most common place where the comma operator is used is in the for() loop since the control expression for this construct uses exactly three statements. Since the primary utility of the for() loop is to gather the initialization and the increment logic together and bind it to the looping construct itself, it would be very useful to be able to do so in situations where either of these consists of multiple statements. Consider the following:

x = 1.0;

y = 2.0;

while (y > x)

{

    /* Do something */

    y = 10.0 + x;

    x *= 2.0;

}

Without the comma operator, if we wanted to cast this into a for() loop we would have to pick one of the initialization statements to use and leave the other outside of the loop. With the comma operator, we can write this as:

for (x = 1.0, y = 2.0; y > x; y = 10.0 + x, x *= 2.0)

{

/* Do something */

}

Notice that, in the increment statement, we use the variable x in an expression that has a side-effect and we also use it within another expression that makes up the statement. Normally this would invoke undefined behavior, but the C Language Standard dictates that the comma operator establishes what is known as a "sequence point". In practice, then means that the left hand expression is completely evaluated, then all side-effects are applied, then the right expression is evaluated.

Do not confuse comma separate lists, such as found in function and function-like macro invocations, with the comma operator. These are two different things.