Fermat's Library | A Catalogue of Optimizing Transformations annotated/explained version.

Frances E. Allen

John Cocke

IBM Thomas J. Watson Research Center

Yorktown Heights

A pesuZt of the peaent 1A1oPk in optimization

has been to systematize the potpouppi of optimizing

tpansfopmations that a aompiZep aan make to a pPO-

gpamt This papep aataZogues many of these tpans-

fopmations.

INTRODUCTION

A catalogue of optimizing transformations which

a compiler can make on a program is presented in

this paper. The catalogue does not pretend to in~

clude all possible transformations; what is presen-

ted is a categorization of most of the transforma-

tions which are currently reasonably well understood.

The basic purpose of the transformations pre-

sented is to improve the execution time of compiled

programs. Some attention is given to execution

space, but no attention is given to optimizing com-

pile time or to the more general questions of total

job or system optimization.

The term optimization

is a misnomer in that it

2

E .

Allen,

John Cocke

frances

is not generally clear that a particular, so called,

optimizing transformation even results in an

improvement to the program. A more correct term

would be "amelioration." Aho, Sethi and Ullman [I]

have been able to prove the optimality of a set of

transformations on straight-Iine code. The criter--

ion they used for measuring optimality was the

length of machine code generated. We do not have,

however, any general means of establishing optimal

bounds, much less any means of determining that a

general set of transformations produces code satis-

fying given bounds.

Another aspect of optimization which needs

further study is the question of the correctness of

transformations. We would like to be able to prove

that applying a set of transformations to a program

produces an equivalent program. Some investigations

into this question are reported by Ullman [15] .In

developing the transformations presented here, care

has been taken not only to assure that the obvious

data values are correct and available when required

but that "side effects" are maintained and language

rules honored. These assurances are built into the

algorithms implementing these transformations and

are not reported here. They are generally a rather

ad hoc collection of considerations.

The survey of optimizing transformations which

follows is organized so that the interprocedural

optimizations are presented first, transformations

best performed on a program form close to the source

language are presented next, followed by the SO-

called machine independent and machine dependent

optimizations. The classification of transformations

into machine independent optimizations and machine

dependent optimizations is used to distinguish the

transformations which are applicable to mostregister-

oriented computers from those which require knowledge

of the specific characte"ristics of a given computer .

For example, eliminating redundant instructions is

a machine independent transformation, whereas regis-

ter allocation is machine dependent. Very little is

said about the language level best suited for the

transformations. Clearly, however, the machine

Optimizing

Transformations

3

dependent optimizations create and/or transform

instructions close to the machine level. Most of

the machine independent optimizations are best per-

formed on subexpressions: the language level of

these subexpressions can be high, that is, close to

source language level, or low, close to the machine

language level.

Each transformation included in the catalogue

is identified and described briefly (but is not

formally defined) .An example is given, followed

by some discussion of the transformation. The fi"rst

transformation effects an interprocedural optimiza-

tion.

PROCEDURE INTEGRATION

In order to smooth subprogram linkages and to

provide more global program units for optimization,

subprograms and/or their prologues may be merged

into their calling subprograms. If subprogram C

calls subprogra~ S (C + S) , then one of the

following linkages may be established.

I. CLOSED. This is a standard linkage and no

intersubprogram optimization is performed. S, the

called routine, is known to the system loader. No

information about S is available during the com-

pilation of c. This kind of linkage is inefficient

because

a. registers must be saved and restored on

and exit.

entry

b. specific registers must be allocated to

specific functions. This constrains register

assignment, and may cause "register busy interlock"

delays at execution time on close sequential calls

in pipelined CPU's.

c. the parameters and global variables (e,g.,

variables in COMMON) set and/or used in S are not

known during the compilation of C. The compilation

4

Frances All en ,E .

John

Cocke

of C must assume that all such variables can be set

and used in s. At the time of the CALL memory must

contain the current values of such variables; first

uses after the CALL must access storage.

d. in most cases, all information must be

passed in storage.

e. the called program cannot take advantage

of particular argument values, recurrences or

relationships. For example, the called routine

cannot take advantage of a constant valued argu~ent

2. OPEN. An open linkage is not a linkage at all.

The called routine, S, replaces the CALL statement

in the caller, C, after the arguments on the CALL

have replaced their corresponding parameters in s.

The program S is completely integrated in C and is

not known to;~he system.

There are many obvious advantages to this kind

linkage; two are:

of

There is no linkage overhead whatsoever.a.

b. C and S are optimized together. Advantage

can be taken of particular argument values and

relationships: constant arguments can be folded

into the code, invariant instructions in S can be

moved to infrequently executed areas of C, etc.

The major disadvantages and constraints on

opening up subprograms are:

program sizes may become too large.

a.

b. irreducible subprograms require special

handling. (An irreduaible subprogram is one which

maintains a history, performs 1/0 operations, can

return different function values for identical

argument values, or does not return through a

standard return point.

a high-level version

(not necessarily

c~

Optimizing

Transformations 5

source) of the program being called must be avail-

able to the compiler at the time the caller is

compiled~

d. a change in the called routine obsoletes

object modules into which it has been merged.

3. SEMI-OPEN, A semi-open linkage is a non-stand-

ard linkage in which the called routine, S, is com-

piled with its caller, C. S and C become part of

the same object module; S is not known to the system

The parameters become real locations which are used

to hold argument values or locations. During

optimization it may be recognized that actual para-

meter locations are not needed, and S and C become

indistinguishable. All CALL's to S.in C become

branches to a location internal to the module.

On a computer with a paged memory or base-

offset addressing, both the open and the semi-open

forms increase the possibility of both routines

being in the same page or in the same addressing

area~ The semi-open form of linkage has some of

the same disadvantages and constraints as the open

form. The primary advantages arise from the effects

of optimizing the two routines as a unit. The

object-time-association is, of course, more direct

and faster than a standard linkage.

4. SEMI-CLOSED. The semi-closed linkage is a non-

standard linkage in which the two routines are

compiled as separate modules and are both known to

the system. The called routine, S, is compiled

first. The compilation of S determines the linkage

registers and the parameter passing conventions to

be used when S is called. This information, along

with other information about S, is used during the

compilation of C to establish the calling linkage.

The other information about S which is available to

C is

a. which global

and/or used by S.

variables and parameters are

6

Frances

Allen,

E .

John Cocke

b.

which registers are used by s.

The advantages of this type of linkage are:

a. the compilation of S is not constrained by

fixed registef assignment at entry and exit points~

b. the compilation of C can take advantage

of knowing how global variables are used in S to

avoid memory references across CALL points and to

carry information in registers.

c. registers unused in s need not be saved

and restored at linkage time. The unused registers

can be active in C across the linkage.

The major disadvantage is that a recompilation

of S may obsolete all programs which have a semi-

closed linkage to s. A sophisticated loader may

avoid this difficulty.

Although most of the above description has

been given in terms of subprogram linkages, two of

the linkages can be generated in a single COBOL

program. A COBOL PERFORM block which is always

entered at its beginning (i.e., there are no branches

into the middle of it} can be treated as a called

subprogram to which internal linkages are generated

or which is opened up. In the latter case, a PERFORM

of the block is replaced by the block of code. If

the PERFORM has a VARYING clause, this becomes

analogous to the FORTRAN DO.

Many compilers automatically open certain

functions. Beyond this the optimization described

here does not exist in any commercially available

compiler. In order to perform a procedure integra-

tion, the compiler has to have access to both pro-

cedures at essentially the same time; Therefore,

a change has to be made to the standard compiler

organization in which each procedure or subroutine

is compiled to binary code independently of all

others submitted with it. In addition to a differ-

ent compiler design some criteria for determining

Optimizing

Transformations

7

which type of linkage to make are needed. This may

be best established by the programmer based on what

he knows about the future usages of procedures, but

it can be made by the compiler itself in certain

cases. For example, it might be fairly easy to

determine that a routine could be opened up if it

appears just once in the programs that are associ-

ated with a job or if it is very small. The

criteria for automatically integrating routines

need further investigation and probably cannot be

established without knowledge of the program

environment.

LOOP TRANSFORMATIONS

Several transformations can be made on program

loops to improve the execution time of the program.

Although it is not necessary that these transfor-

mations be made on a language level which is close

to the source language, they seem to be more easily

identified and performed at that level~ In the

examples which follow, DOs are used to identify the

loops being transformed.; however, other encodings of

loops could trigger the transformations. Three

types of loop transformations are considered in

this paper~

I. Loop Unrolling. A loop can be unrolled

completely so that the successive computations

implied by the loop appear sequentially or it can

be partially unrolled as in the following example:

DO I = I TO 100 BY 1;

A(I)

= A(I)

+ 8(1) ;

END7

becomes when unrolled by 2:

E. Allen,

John Cocke

8 Frances

DO I = 1 TO 100 BY 2;

A(I) = A(I) + B(I) ;

A(I+l)

END~

= A(I+l

+ B(I+l} ;

The advantages of loop unrolling are that

a. the number of instructions executed is

reduced. In the preceeding example the number of

increments and tests for loop control is halved.

b. more instructions are exposed for parallel

execution. The two statements in the unrolled form

of the preceding example can be executed at the

same time since they are independent.

The major disadvantage of loop unrolling is the

increased instruction space required. For this

reason the criteria for unrolling loops should

include the size of the loop and the relative fre-

quency of executing the loop. Other factors are

the severity of the object space constraints and

the form of the loop itself. A loop with variable

control parameters (e.g., DO I = J to K by L;) can

be unrolled but requires code to test for end con-

ditions~ Nests of loops can be unrolled; an

example of this will be given in the next section.

2. Jamming or Loop Fusion. The second loop-

related transformation is called jamming or loop

fusion. In this transformation two loops are put

together and expressed by one loop as in the follow-

ing example:

DO I = 1 TO 100;

A(I)

= 0-:

END;

DO I = 1 TO 100;

8(1) = X(1) + Y;

END:

Optimizing

Transformations

9

becomes

DO I =

1 TO 100;

A(I)

= 0.;

= X(I)

B(I)

+ Y;

END;

The advantages of

this transformation are that

a.

b.

c.

loop overhead is reduced;

code space is reduced;

more instructions are exposed for parallel

execution and for local optimization.

The cases which can be found and transformed

can be fairly elaborate or relatively simple. The

simplest case involves two loops which together

satisfy the following criteria:

a. If one loop is executed the other one is

also; that is, they have the same execution condi-

tionality.

b. The computations in either loop do not

depend upon the computations in the other. This

criterion can easily be relaxed to more particular

situations.

c. The loops are executed the same number of

times. By generating code for the end conditions,

this criterion can be deleted.

The need for this transformation may occur

during the compilation of languages which have

array or vector operations. If statements involving

array or vector operations are translated to the

more basic element-by-element operations, fusible

loops and nests of loops may appear quite frequently

in the code. This optimization has been done by

Ershov in his ALPHA compiler [9] and was described

in a thesis by Wagner [16] .

10 Frances E . Al John Cocke

en,

An example follows in which a matrix multiply

program is unrolled by 2 and jammed to eventually

yield a program which multiplies 2 x 2 matrices.

The steps are:

origi'nal program:

a.

DO I = 1 TO 100;

DO J = 1 TO 100;

A(I,J) = 0. ;

DO K = 1 TO 100;

A(I,J) = B(I,K)

END; END; END;

* C(K,J

+ A(I,J):

b. the outer loop is unrolled by 2:

the two inner sets of loops are jammed:

c.

DO I = 1 TO 100 BY 2;

DO J = 1 TO loo;

A ( I ,J) = 0. ;

A(I+l,J) = 0.;

DO K = 1 TO loo;

A(I,J) = B(I,K) * C(K,J) + A(I,J) ;

A(I+l,J) = B(I+l,K) * C(K,J) + A(I+l,J

ENDi END; END;

) :

d. the next inner loop, the J loop, is

unrolled by 2 and the two resulting K loops are

jammed; then the innermost loop, the K loop, is un-

rolled by 2. The result of these transformations

is:

Optimizing

11

Transformations

DO I = 1 TO 100 BY 2i

DO J = 1 TO loo BY 2i

A ( I ,J) = 0. :

A(I,J+l) = 0.;

A(I+l,J) = 0.;

A(I+l,J+l) = 0.;

DO K = 1 TO loo BY 2i

A(I,J) = B(I,K) * C(K,J) + A(I,J) ;

A(I,J) = B(I,K+l) * C(K+l,J) + A(I,J) ;

A(I,J+l) = B(I,K) * C(K,J+l) + A(I,J+l) ;

A(I,J+l) = B(I,K+l) * C(K+l,J+l) + A(I,J+l) ;

A(I+l,J) = B(I+l,K) * C(K,J) + A(I+l,J) ;

A(I+l,J) = B(I+l,K+l) * C(K+l,J) + A(I+l,J) i

A(I+l,J+l) = B(I+l,K)*C(K,J+l)+A(I+l,J+l) ;

A(I+l,J+l)=B(I+l,K+l)*C(K+l,J+l)+A(I+l,J+l) ;

END; ENDJ END;

e. the eight expressions in the inner loop can

be simplified to 4 by a transformation alluded to

later in this paper~

DO I = 1 TO 100 BY 2;

DO J = 1 TO loo BY 2;

A ( I ,J) = 0. ;

A(I,J+l) = 0.;

A(I+l,J) = 0.;

A(I+l~J+l) = 0.;

DO K = 1 TO loo BY 2;

A(I,J) = B(I,K+l) * C(K+l,J) + B(I,K)

* C(K,J) + A(I,J) ;

A(I+l,J+l) = B(I+l,K+l) * C(K+l,J+l

+ B(I+l,K) * C(K,J+l)

+ A(I+l,J+l) ;

END; END;

END;

3. Un8~itahing. The third transformation

related to loops is the opposite of jamming. It is

called unswitching and involves breaking a loop

coptaining a loop-independent test into two loops

with the test selecting which of the two loops to

execute. The following is the quintessential

example of this transformation~

12

Frances

E. Allen, John Cocke

= A(I)

+ B(I) ;

-B(I) ;

END;

becomes

(T)

IF

THEN

DO I = 1 TO 100;

X(1) = A(I) + 8(1) ;

END;

ELSE

DO I = 1 TO 100;

X(I) = A(I) -B(I)

END;

This optimization has the advantage of reducing

the number of instructions executed. More instruc-

tion space is required, however.

It is not clear how frequently unswitchable

loops appear in programs and, therefore, how impor-

tant this transformation is.

The machine independent optimizations which

transform subexpressions and subexpr~ssion rela-

tionships are now considered. In this transforma-

tions a distinction is made between those involving

subexpressions in the same "basic block" and those

involving subexpressions in different basic blocks.

A basic block is a linear sequence of instructions

with one entry point, the first instruction in the

block, and one exit point, the last instruction in

the block. Since the internal data dependency

relationships within a basic block can be found by

relatively simple analysis techniques [2,8] , the

optimizations performed on basic blocks can, in

many cases, be more elaborate than those performed

on the program as a whole. For example, an algorithm

exists [12] for finding the best assignment of n

registers to the given sequence of instructions in

the block. Such an algorithm does not exist in the

presence of flow. An arithmetic statement by itself

constitutes a basic block --usually a non-maximal

Optimizing

13

Transformations

basic block --and the literature [3,5,6,10,11]

describes many optimizing transformations for such

statements.

The primary emphasis in the collection of

transformations presented here is on global optimi-

zation; that is, it is on transforming the program

based upon the dependency relationships which are

found to exist between expressions in different

basic blocks as well as in'the same block.

In the examples that follow, a directed graph

is used to express the control flow relationships

between the basic blocks in a program. The nodes

of the graph represent basic blocks and the edges

control flow paths.

REDUNDANT

SUBEXPRESSION ELIMINATION

This optimization, which is also called aommon

subexpression elimination, involves finding and

eliminating those computations which calculate

values already available. Consider the following

examples in which the redundant subexpression is

identified by being enclosed in a box.

Allen, John Cocke

14

Frances

E ..

The subexpression A*B has, of course, a sub-

sequent use: its value is used in another expres-

sion, is assigned to a variable or used in a test.

In both examples, correct though not necessarily

identical values are available on every path to the

subexpression and hence to its use. The subexpres-

sion is therefore redundant.

In the two examples the redundant subexpression

was formally identical with the subexpressions pro-

ducing usable values. One form [7] of global

analysis for this optimization depends upon the

existence of formal identities. Another form of

analysis, based upon the value number algorithm [8] ,

does not depend upon explicit formal identities for

the identification of a redundant calculation.

Consider the following example of straight-line

code:

A * B

-

c

A

=

c * B

-

C*B is not formally identical with A*B but computes

the same value so is a redundant calculation. The

two methods for detecting redundant subexpressions

find different cases as well as some of the same

cases. The value number method as it is currently

formulated would not find either of the cases shown

in the first two examples in this section; the

method requiring formal identities would not find

C*B in the last example.

The major advantages of this optimization are

that

and

fewer instructions are executed,

a.

instruction space is saved.

b.

The disadvantage is that register usage is

extended.

Optimizing

Transformations

15

In the next section it will be seen that code

motion and redundant subexpression elimination are

intimately related. By inserting instructions at

propitious places in the code, more subexpressions

can be eliminated than could be in the original

program form.

CODE MOTION

A subexpression can be moved if the value

available to its uses is not changed by this move

and if the move is "safe." The safety criterion is

somewhat vague, but it essentially means that a

moved subexpression will not cause side effects to

occur which would not have occurred if the sub~

expression had remained in its original position.

Two examples of movable subexpressions are:

In each case it was assumed that neither A

nor B were changed by the other instructions in

nodes 2, 3, 4 and 5, that is A*B is invariant with

respect to the loop. Even if it is assumed that

A*B might cause an overflow condition it is clear

that the occurrence of this side effect is not being

altered by moving the subexpression into node I.

John Cocke

E. Allen,

'6

Frances

It will be noted that the number of times the over-

flow may occur is probably being altered. The true

correctness of moving any subexpression which can

cause side effects is probably questionable and

certainly dependent on language rules. If a standard

system action is taken such as stopping the program

or flagging the result, then maintaining the number

qf occurrences of the side effect is not important,

but maintaining the conditionality for its occurrence

is important. Consider the following two examples:

Assuming A/!3-is considered an unsa£e subexpres-

sion, i.e., one which might cause a side effect,

then, clearly, A/!3 in node 3 in the first example

cannot be moved to node I. However, the second

example has an A/!3 subexpression in node 4 which

may, at first glance, appear to be movable to node

I. It is not, however, because a divide check error

may result which would not occur in the program as

given.

The basic intent of code motion is to move

instructions from frequently executed areas of the

program to less frequently executed areas. Since

it is not always apparent what the relative execu-

tion frequencies of various areas are, an improve-

ment may not always result.

Transformations 17

Optimizing

Consider the following example:

Assuming that I+J is movable and is a safe

subexpression, i.e., it cannot cause any side

effects, then it may still not be acceptable to

move I+J to node I: node 3 may never be executed.

In the absence of any information to the contrary,

it is generally assumed that the branch back to 2

from 4 is more frequently taken tharl the exit from

4, and'that the branch from 2 to 3 is taken 50%

of the time.

The primary advantage of the code motion trans-

formation, however, is to reduce the number of

instructions executed. It has the same d-isadvantage

as redundant subexpression elimination: register

usage is extended.

A variant on code motIon which does not reduce

the number of instructions executed but does save

instruction space is called hoisting. This trans-

formation is exemplified by the following:

A*B

18

E .

Allen,

John Cocke

Frances

So far in this section, pure code motion has

been discussed. Techniques currently exist which

essentially combine code motion with redundant sub-

expression elimination, and indeed subexpressions

are not moved at all but eliminated against inserted

subexpressions. The following example shows how

this can happen.

The A*B in 4 cannot be safely moved and the

A*B in 5 cannot be profitably moved. However, if

an A*B were placed in node I, then both of them

could be eliminated. An analysis procedure exists

which accomplishes this.

CONSTANT

FOLDING

Constant folding, also called subsumption and

constant propagation, replaces uses of variables

that have been set to a constant by the constant

itself. Consider the following example:

19

Optimizing

Transformations

(-:\ A=1

~ 8=5

(;\ A=1

~ 6=5

T=6

IF (6=6) GO TO @

ELSE @

T=A+B

IF (T=6} GO TO @

ELSE @

The constant assignments to A and B in node 1

replace their uses in node 2. This finds T = 6

which is then propagated to the use of T in the IF

statement. As can be seen by this example, constant

folding not only involves replacing uses Of vari-

ables by constants but also involves performing

operations whose operands are constants and propa-

gating the result.

A variant on constant propagation involves

collecting constants into addresses. In the

following example, which is expressed in a very

mixed language level, the resulting A+5 represents

an address rather than an execution time add.

K = 5

K =

5

SI = 1*10

{1*lO}+K

s =

becomes

(5')

Load A+5

(s')

Load A

Constant propagation has many obvious advantages

and no disadvantages. It is a particularly important

optimization on code in which subprograms have been

opened. Arguments to subprograms are frequently

E. Allen, John Cocke

20

Frances

constants and, when constants replace the parameters

of the subprogram, many transformations may be made

to the subprogram.

ELIMINATIONDEAD CODE

Primarily as a result of constant folding,

instructions become "dead. II Instructions are con-

sidered dead when they cannot be executed because

they are in an area of the program which cannot be

reached, or when their results are never used.

After constant folding, the first example in the

previous section had an unreachable block, and,

assuming there were no other uses for the definitions

of A and B in block 1 and of T in block 2, those

instructions are dead.

0

~ A=1

\}...) B = 5

T=6

IF (6=6} GO TO @

ELSE @

STRENGTH REDUCTION

The strength reduction optimization replaces

certain computations using recursively defined

variables by recursively defined computations. This

transformation is a generalization of a relatively

common optimization: the replacement of subscript

calculations involving the DO loop induction variable

Optimizing

Transformations

2'

by index register increments.

Two examples follow:

The first example i£ the traditional case if

1*5 is part of a subscript calculation. It should

be noted that 1*5 in block 1 is a candidate for

constant folding. The second example is a more

general form of strength reduction in which the

variables I ~d J are recursively defined in terms

of each other.

"

The advantage of strength reduction is that

faster computations are used. Although it cannot

of course be guaranteed that the new recursively

defined variables (the t's in the above examples)

Allen,

John Cocke

22 Frances

E .

will end up in index registers and be updated by

register increments, they will be prime contenders

for such register instructions. The strength

reduction optimization is not at all limited to

instructions transformable to register increments.

More complete descriptions of this optimization are

contained in [2] and [8] .

REPLACEMENT

LINEAR FUNCTION TEST

After performing the strength-reduction

optimization and introducing new recursively defined

variables, it is frequently the case that the only

use for the original recursively defined variable is

in a test. The test can often be replaced by a test

on an introduced variable, thereby making the

instructions associated with initializing and

incrementing the program variable dead. Suppose

that the first example in the preceding section were

derived from a DO loop in which the DO statement

was DO I = 1 TO 10; then after strength reduction

and folding this becomes

r-;"'\ 1=1

~ t=5

I = I +1

t=t+5

IF {I $10) THEN GO TO 2

The

test can be replaced by a test on t:

Optimizing

Transformations

23

I = 1 and I = I +

eliminated.

1 are now dead and can be

CARRY'S

The carry optimization is a somewhat specialized

optimization which recognizes when a subscript cal-.

culation for referencing sequential elements of an

array does not need to be re-initialized when the

reference changes to the next row or column. The

following example shows the result of applying the

carry, linear function test replacement, strength

reduction, folding and dead expression elimination

optimizations.

DCL A(l0,lO) ;

becomes

t = 11

DO I = 1 TO 10;

Ll:

A- 11 (t)

' J

t=t+l

= 0

DO J = 1 TO 10;

A(I,J)

IF

(t<lll)

THEN

= 0.;

END; GO TO Ll

END;

(A -11 is an address)

24

Frances E.

Allen,

John Cocke

We now consider the machine dependent transformations.

INSTRUCTION SCHEDULING

In this optimization, sequences of instructions

are ordered to minimize the execution time of the

sequence. This optimization is used when the target

computer has pipelined units.

Suppose a computer has the following character-

istics:

a. an instruction can be started every cycle

provided there are no register or data interlocks

b.

the instructions are taken in order

c. the divider takes 12 cycles

d. the three-cycle adder is pipelined and

can start a new add on every cycle

stores take 1 cycle

e.

= x/y

TI = A + B

T4

may

T2 = TI + C

TI = A + B

become

T3 = T2

+ D

T2 = Tl + C

E = T3

T3

-T + D

-2

= X/y

T4

E = T3

z = T4

The original sequence, the one on the left,

takes 23 cycles; the reordered sequence takes 13

cycles. In effect the adds and the store into E

are free.

Transformations

25

Optimizing

The advantage is clear: execution time is

decreased. The disadvantage is that register usage

is extended.

PARSING METHODS

Several variants on the usual left-to-right

associative parse can produce better code. An

interesting one is the minimum depth parse.

Since pipelined units are capable of having

several independent instructions in execution at

the same time, the operations in a source expression

can be associated to produce an instruction sequence

which minimizes hardware delays due to data depen-

dencies and to unit-busy interlocks. For example,

the expression A+B+C+D can be parsed as if it had

be'en written as (A+B) + (C+D) rather than by the

usual left-to-right association (((A+B) + C) + D) ..~

Depicted in tree form, this is:

A

B c

D

A

B

Such a parse is called a minimum depth parse

because it minimizes the depth of the parse tree

and hence the depth of the data dependency tree.

For the computer described in the previous section,

the instruction sequence for the minimum depth

parse of the expression A+B+C+D takes 7 cycles

instead of 9.

A secondary but very important effect of the

26 Frances E .

Allen,

John Cocke

minimum depth parse is to expose subexpression

independence. Many optimizations involve transfor-

mations to and permutations of subexpressions.

These transformations and permutations are generally

severely constrained by data dependency considera-

tions. In the above example, the subexpression

C+D is exposed as independent of A and B by the

minimum depth parse and can be moved and transformed

independently.

Thus the minimum depth parse optimizes in that

it exposes more independent subexpressions for

optimization and for execution. The disadvantage

is that register usage is extended.

REGISTER ALLOCATION

Allocating registers is quite possibly the

hardest optimization to perform.

Several subproblems can be isolated:

a. Allocation can be separated from assign-

ment. Allocation involves determining how many

registers are required and allocating registers

symbolically. Assignment involves determining which

of the actual hardware registers will be used for

each allocated register. Allocation is cognizant

of hardware constraints. One of the subproblems in

assignment is boundary matching: the same entity

in program areas which may be executed sequentially

should be assigned to the same register if possible.

b. Load-store motion moves loads and stores

out of loops in order to be able to retain inter-

mediate results in registers and to avoid unneces-

sary storage references. Other instructions can

sometimes also be moved to obtain a better alloca-

tion.

c. The scope of the allocation and assign-

ment can be local or global. The straight-line

code of basic blocks can be allocated by simpler

Optimizing

27

Transformations

techniques than an allocation which must be aware

of the control flow. An optimizing allocation

normally consists of both local and global alloca-

tion. The problem and solutions are described in

more detail in [4] and [13] .

STORAGE MAPPING

There are two aspects of storage mapping which

are optimizable:

a. Space. The total amount of active space

required £or variables and for temporaries can be

decreased by reuse of dead storage. Ershov [9]

treated storage mapping as a generalization of the

register allocation and allocated memory cells by

some of the same techniques used in register

allocation.

b. Adjacency. For computers with paging,

base-offset addressing or for which there is any

sort of storage staging system it is important that

information which is used at the same time should

be mapped together.

Other than the work already cited,

been done in this area.

little has

very

We now consider some miscellaneous optimiza-

tions.

SHADOW VARIABLES

In this optimization variables are not only

kept in the form specified by the program but also

in a form more appropriate for their use. For

example, it is frequently the case in COBOL pro-

grams that all numbers are kept in the decimal

radix. It may, however, be determined that certain

numbers are frequently used in arithmetic opera-

tions and, assuming binary arithmetic is faster

28

Frances

E. Allen, John Cocke

than decimal arithmetic on the target comput~r, can

be kept in binary.

ANCHOR POINTING

Anchor point analysis minimizes the number of

logical tests performed in a Boolean expression

before a branch or fallthrough occurs. For example,

IF (A.AND.B.AND.C) GOTO 10; can branch to the

next statement if A is false without testing

B and C.

SPECIAL

CASE CODE GENERATION

This very important optimization involves

detecting special situations in order to generate

'better code. For example, A2 may be implemented

by A*A rather than using a general exponential

routine. There are dozens of examples and there

are many compilers which have detected special cases.

(or WINDOW)

OPTIMIZATION

PEEPHOLE

The final code from a compiler can frequently

be improved simply by a local scan oj the sequence

of instructions [14] .A window of, for example, 10

or 12 instructions can be examined for possible

transformations.

This completes our catalogue of optimizations.

It is not encyclopedic in any sense, nor is it all

definitive. It does, however, report on that part

of the recent work in program optimization which has

attempted to systematize and extend the collection

of optimizing transformations that have long existed.

Transformations

29

Optimizing

REFERENCES

Aho, Alfred V., Sethi, Ravi, and Ullman, J.D.,

"A Formal Approach to Code Optimization, II

Ppoa. of a Symposium on CompiZep Optimization~

SIGPLAN Notices, July 1970.

Annual

5,

Allen, F.E., "Program Optimization,"

Revie~ in Automatia Programming, Vol.

Pergamon, New York, 1969.

Bagwell, John T., Jr., "LocalOptimizations,"

PPOC. of a Symposium on Compilep Optimization,

SIGPLAN Notices, July 1970.

Beatty, J.C., "A Global Register Assignment

Algorithm," this voLume.

Breuer, Melvin A., "Generation of Optimal

for Expressions via Factorization," CACM,

June, 1969.

Code

Busam, VoAo, and Eng1und, DoE., "Optimization

of Expressions in FORTRAN," CACM, December

19690

Cocke, John, "Global Common Subexpression

Elimination," Proa. of a Symposium on Compiler

Optimization, SIGPLAN Notices, July 1970.

Cocke, J. , and Schwartz, J .T ., "Progranuning

Languages and Their Compilers," Preliminary

Notes, Courant Institute of Mathematical

Sciences, New York University, N.Y., April 1970

Ershov, A.P., "ALPHA -An Automatic Program-

ming System of High Efficiency," JACM,

January 1966.

Floyd, R.W., "An Algorithm for Coding Efficient

Arithmetic Operations," CACM, January 1961.

Frailey, Dennis J., "Expression Optimization

Using Unary Complement Operators," Proc. of

a Symposium on Compiler Optimization, SIGPLAN

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

30

Frances E ~

John Cocke

Notices, July 1970

[12]

Horwitz, L.P., Karp, R.M., Miller, R.E., and

Winograd, s., "Index Register Allocation,"

JACM, January 1966.

[13]

Kennedy, Ken, "Index Register Allocation in

Straight-Line Code and Simple Loops, I' thi s

volume.

[14]

"Peephole Optimization,"

CACM,

McKeeman,

July 1965

W.M.,

[15]

Ullman, Jeffrey, and A.V. Aho, "Code optimiza-

tion and Finite Church-Rosser Systems," this

volume.

[16]

Wagner, R., "Some Techniques for Algorithm

Optimization with Application to Matrix Arith-

metic Expressions, II Ph.D. Thesis, Department of

Computer Science, Carnegie-Mellon University,

June 1969.

Comments

Products

Project