### Vector Displays
The Sketchpad, developed at MIT’s Lincoln Lab,...

Below you can see a picture of a a IBM 709 along with the included ...

The resulting line on the raster screen will be something like this...

The **Newton-Raphson** method is an iterative root-finding algorith...

This iterative method of finding square roots has been known since ...

Finding the square root of a number $V$ is equivalent to using Newt...

I think $G_n+1$ is a misprint for $G_{n+1}$.

Here is a simple implementation of the Bresenham’s algorithm in Jav...

### Floating Point Numbers
You can think of floating point numbe...

Given that $x$ is the fraction and $a$ is the characteristic:
$$\s...

In _Figure 3_ it would be better to say that the circle is circumsc...

There is a long history of computing more and more precise approxim...

Using a rapidly converging series:
$\pi = \sqrt{12}\left(1-\frac{1...

Professor John von Neumann was one of the people that expressed int...

### Modern computation of π
In the early 20th century the Indian m...

Of course the second identity immediately implies the first by sett...

The following three algorithms don’t have

much in common except for two things: They

were implemented between 1949 and 1965—

during the so-called ﬁrst generation of comput-

ing hardware—and they are good examples of

rigorous thinking. First, I explain Bresenham’s

algorithm and apply it to a fundamental process

of computer graphics—drawing a line on a

raster screen. Second, although there are sever-

al ways to calculate a square root, the square

root algorithm is interesting because it takes

advantage of hardware design. And, finally,

while the calculation of π goes back more than

2,200 years, Machin’s algorithm from 1706 is of

importance to the history of computing.

Bresenham’s algorithm

Cathode-ray tube (CRT) displays for com-

puters were developed principally along two

lines, vector and raster, until the 1970s, when

memory became inexpensive. Initially, the

advantage of a vector display was that it could

draw a line by passing its end points to analog

circuitry, which moved an electron beam across

the CRT. To draw a line on a raster display, on

the other hand, required calculating the loca-

tions of every point that the line passed

through and illuminating each one.

Raster technology won out when it became

economically feasible to draw a display in a

dedicated memory and refresh the CRT from it.

Even though raster computation is more com-

plicated, vector display time varied with the

complexity of the diagram, making realistic

animation, for example, impossible.

In the 1950s, the IBM 704 and 709 comput-

ers included a CRT display with an addressable

raster of 1,024 × 1,024 dots. Once illuminated,

a dot would persist on the screen for about two

seconds. The CPU—excessively slow by today’s

standards—controlled the display, which had

no refresh buffer. The 704, for example, had a

12-microsecond cycle, meaning that the fastest

instruction executed in 24 microseconds.

Floating-point calculations were much slower;

a ﬂoating divide instruction took 18 cycles, or

216 microseconds.

Each dot was activated by the program’s

building a 36-bit word containing the X- and Y-

address of the dot, with the X-address in bits

8–17 and the Y-address in bits 26–35 and “copy-

ing” the word to the CRT (see Figure 1).

Obviously, the fastest way to refresh the screen

was to create a word for each illuminated dot

and to set up a “copy” loop. However, the maxi-

mum memory size for the 704 was 32,000 words,

including the program, so prestoring all the dots

wasn’t always feasible.

In actual use, the visible CRT was a “slave” to

another unit in a black box with a shutterless

35-mm camera. The general purpose of this

arrangement was to enable the camera to take a

picture of the master screen by illuminating dots

only once. Nevertheless, the process of calculat-

ing the positions of dots was so slow that only a

few dots could be displayed before the ﬁrst one

began to fade. There was a demonstration pro-

gram—a tic-tac-toe game—for the display, and

even this simple picture ﬂickered badly.

It was in this context, around 1960, that

computer scientist Jack Bresenham

1

consid-

ered how to minimize the calculation time for

the dots that make up a line. For a line with a

slope between 0 and 1, the Y-difference

between consecutive points on a straight line

is less than or equal to 1. For the line between

X1,Y1 and X2,Y2, such that 0 ≤∆Y ≤∆X and

X1 < X2, the X-addresses of the dots are X1,

X1 + 1, X1 + 2, … , X2. If ∆X = X2 – X1 and

∆Y = Y2 – Y1, then for each step in X, the val-

ues of Y are Y1, Y1 +∆Y/∆X, Y1 + 2∆Y/∆X, … ,Y2;

but raster point addresses are integers, so we

10 IEEE Annals of the History of Computing 1058-6180/02/$17.00 © 2002 IEEE

Three Early Algorithms

Keith S. Reid-Green

Three well-known algorithms, implemented separately within a span

of 20 years, demonstrate the use of mathematics developed

hundreds of years ago.

Figure 1. This 36-bit word activates a dot of a raster display.

S 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435

X-address Y-address

use Y1, int(Y1 +∆Y/∆X), int(Y1 + 2∆Y/∆X), and

so forth, without rounding.

In the following implementation of

Bresenham’s algorithm, Ratio and Delta are real

numbers; all other variables are integers. Note

that only addition and subtraction are used in

the For … Next loop.

DeltaX = X2 – X1

DeltaY = Y2 – Y1

Ratio = DeltaY / DeltaX

Delta = Ratio – 0.5

j = Y1

For i = X1 to X2

Call Dot (i, j)

If Delta > 0 Then

Delta = Delta – 1.0

j = j + 1

End If

Delta = Delta + Ratio

Next i

The For … Next loop is executed once per

dot, and so this part of the code should be

minimized. This loop would have taken about

60 cycles, or 720 microseconds. Because the

raster size of the master CRT was 3.366 × 3.366

inches, there were about 300 dots per inch.

Thus it took about 0.25 seconds to plot a line

one inch long. Note that the algorithm can be

improved by avoiding floating-point opera-

tions entirely:

DeltaX = X2 – X1´ All integer

variables

DeltaY = Y2 – Y1

Dy2 = 2 x DeltaY

Ratio = Dy2 – 2 x DeltaX

Delta = Dy2 – DeltaX

j = Y1

For i = X1 to X2

Call Dot (i, j)

If Delta > 0 Then

Delta = Delta + Ratio

j = j + 1

Else

Delta = Delta + Dy2

End If

Next i

This algorithm improves the loop’s speed by

about 15 percent.

Generalizing the algorithm to plot any line

requires determining which X-value is the

smaller, determining if the slope is ≤ 1 or > 1,

and requires using X or Y, respectively, as the

indexing value. A special case is if X1 = X2, but

this is easy to deal with.

A square root algorithm

From the very beginning of computing,

mathematical subroutines such as sine, cosine,

logarithm, A

X

, square root, and so on were avail-

able to programmers. The importance of opti-

mizing their speed is obvious because most

mathematical programs used them, and a single

program could call them thousands of times.

In the early days of minicomputers, hardware

costs were kept as low as possible so most mini-

computers did not have ﬂoating-point hard-

ware. Instead, ﬂoating-point operations were

accomplished in subroutines. At the same time,

functions like SQRT were done in software, so a

square root function with ﬂoating-point soft-

ware could be slow. The usual way to ﬁnd a

square root was to use a Newton’s iteration:

where V is the value for which the square root

is to be found, and G1 is the first approxima-

tion (or ﬁrst guess) of the square root. The cal-

culation is repeated until the difference

between G

n

+ 1 and G

n

is small enough, that is,

until G

2

n+1

is approximately equal to V.

Given that V can range over the set of posi-

tive numbers, a good ﬁrst approximation of G

1

is not easy to ﬁnd, and of course the better the

first guess, the faster the iteration converges.

However, for values of V less than 1, conver-

gence is fast.

Here’s where knowledge of the structure of

a floating-point number comes in handy. A

ﬂoating-point number mimics scientiﬁc nota-

tion—for example, we write 12.35 as 0.1235 ×

10

2

. The power of 10, called the characteristic, is

stored as a signed integer along with the frac-

tion and its sign. This structure suggests that we

can calculate the square root of the number by

dividing the characteristic by 2 and ﬁnding the

square root of the fraction.

Floating-point numbers are normalized, that

is, the first digit to the right of the point is

never zero (unless the value of the number is

zero). In a binary floating-point number, this

means that the first digit to the right of the

binary point is always a 1, thus the value of the

fraction is at least 0.5. So the square root of a

binary floating-point number 0.11000 … × 2

3

(in decimal, 6.0) can be found by halving the

characteristic and finding the square root of

0.11000 …, except that the characteristic is an

GG

GV

G

nn

n

n

+

=−

−

1

2

2

October–December 2002 11

odd number. This is easily remedied by adding

1 to the characteristic if it is odd and shifting

the fraction one place to the right, thereby

dividing it by two, yielding 0.011000 … × 2

4

, or

0.375 × 16—still 6.0. The square root of the

characteristic is 2

2

, and we can ﬁnd the square

root of the fraction in four Newton’s iterations

to an error of not more than 0.000002, using

G

1

= 0.7. (It’s necessary to set the characteristic

to 2

0

so that the fraction V is between 0.25 and

1.) Furthermore, with a little effort, we can

determine a single calculation that develops

four iterations without a loop:

and

thus

and so on.

Finally, the resulting square root G

4

must be

multiplied by 10 to the power of the original

characteristic divided by 2, to yield the square

root of the original number V.

Machin’s algorithm; computation of π

The pursuit of more and more digits of π

may start with a reference in the Old

Testament,

2

to zero decimal places; that is, π=

3. The Babylonians got by nicely with π=3.125

(well, strictly speaking, 3 1/8, because decimal

fractions were unknown to the Babylonians).

The ancient Egyptians used π=4(8/9)

2

, which

is about 3.1605, not really any better than the

Babylonian estimate.

It was Archimedes (287–212 BCE) who start-

ed the contest of ﬁnding more and more digits

of π by inscribing and circumscribing a regular

polygon about a circle. This was a big step,

because it provided both an upper and a lower

bound for π, and the more sides to the polygon,

the tighter the bounds became. The area of a cir-

cle is πr

2

, so the area of a circle of unit radius is π.

For a hexagon, we see in Figure 2 that the

inscribed polygon consists of six equilateral

triangles with sides of length 1. Each has an

area of the square root of 3 divided by 4, so the

area of the inscribed hexagon is a bit more

than 2.598.

For a circumscribed hexagon (see Figure 3)

the height of each triangle is 1, so the area is

slightly greater than 0.57735, giving the area of

the circumscribed hexagon as about 3.4641.

Archimedes used these numbers to bound π.

He wrote the equivalent of

2.598 < π < 3.464

which suggests that π is about 3.03. This isn’t

very good, except that Archimedes went on to

use the same approach with 96-sided polygons

and reﬁned his bounds to

3.14084 < π < 3.142858

although he had to use proper fractions:

3 10/71 < π < 3 1/7

This is more like it—the average of the

bounds is 3.14185; accurate to three decimals.

GG

GV

G

G

GV

G

V

G

GV

G

31

1

2

1

1

1

2

1

2

1

1

2

1

2

2

2

2

2

=−

−

−

−

−

−

−

GG

GV

G

32

2

2

2

2

=−

−

GG

GV

G

21

1

2

1

2

=−

−

12 IEEE Annals of the History of Computing

Three Early Algorithms

R=1

Figure 2. An inscribed hexagon decomposed into

six equilateral triangles.

R=1

Figure 3. A circumscribed hexagon.

For almost 2,000 years, reﬁnements of π as cal-

culated in the Western world were done by

increasing the number of sides of the polygons.

Finally, with the advent of the calculus in

the 17th century came new ways to find π.

Notable was John Machin, a London professor

of astronomy, who in 1706 used the difference

between two arctangents. It was this method

that scientists used on ENIAC

3

in 1949 at the

University of Pennsylvania, setting a new

record of 2,037 digits:

π / 4 = 4 tan

– 1

(1/5) – tan

– 1

(1/239)

We know that π / 4 radians is 45 degrees, and

that the angle whose tangent is 1 is 45 degrees.

Unfortunately, we can’t calculate tan

– 1

(1) using

a series because the arctangent series only con-

verges for values less than 1—hence Machin’s

equation.

It is easy to test the equation:

Debug.Print 4# * (4# * Atn(1# / 5#) –

Atn(1# / 239#))

gives the result 3.14159265358979, exactly

right as far as it goes, but not a proof.

To show that

tan

– 1

(1) = 4 tan

– 1

(1/5) – tan

– 1

(1/239)

we need two trigonometric identities:

tan(2a) = (2 tan(a)) / (1 – tan

2

(a)) (1)

and

tan(a + b) = (tan(a) + tan(b)) / (1 – tan(a)

tan(b)) (2)

If we let x = atn(1/5), then using Equation 1 we

get

tan(2x) = (2 tan x) / (1 – tan

2

(x)) = 5/12

and

tan(4x) = (2 tan(2x)) / (1 – tan

2

(2x))

= 120/119

but we want the result to be 1. We can ﬁnd the

difference from Equation 2:

tan(4x – π / 4) = (tan(4x) – 1) / (1 + tan(4x))

= 1/239

hence tan(4 atn(1/5) – atn(1/239)) resolves to 1.

Machin found

π / 4 to 100 places using two arc-

tangent series:

π / 4 = 4 (1/5 – 1 / (3 * 5

3

) + 1 / (5 * 5

5

) – …)

– (1/239 – 1 / (3 * 239

3

) + 1 / (5 * 239

5

) – …)

and this is how it was done on ENIAC to 2,037

places. Finding a very high precision result is a

nontrivial programming exercise, but that’s

another matter.

Acknowledgments

I would like to acknowledge the assistance of

Michael Greenwald, Dick Painter, and Dennis

Quardt in the preparation of this article.

References and notes

1. W.M. Newman and R.F. Sproull, Principles of Inter-

active Computer Graphics, 2nd ed., McGraw-Hill,

New York, 1979. Jack Bresenham is currently a

professor of computer science at Winthrop Uni-

versity. He retired from IBM as a senior technical

staff member in 1987 after 27 years. He is best

known for fast line and circle algorithms.

2. 1 Kings 23, Authorized (King James) Version.

The verse reads: “And he made a molten sea,

ten cubits from the one brim to the other: it

was round all about, and his height was ﬁve

cubits: and a line of thirty cubits did compass

it round about.” The “molten sea” is a brass

vat. Its diameter is ten cubits, thus its radius is

ﬁve cubits. The given circumference—the line

of 30 cubits that “did compass it round

about”—implies that π = 3, because the

circumference is computed as 2πr, from 2π * 5

= 30.

3. P. Beckmann, A History of Pi, Dorset Press, New

York, 1989.

Keith S. Reid-Green retired in

1999 after 42 years as a com-

puter scientist. He began as an

IBM 704 programmer in 1957

and is known for pioneering

work in data reduction, com-

puter-aided manufacturing,

and image processing. He is

widely published in the history of computing and

was an adjunct professor at the University of Utah

and at Rutgers, where he taught computer graphics

to graduate students. He holds bachelor’s degrees in

mathematics and physics and a master’s degree in

computer science.

Readers may contact Keith Reid-Green at 23 Point

Court, Lawrenceville, NJ 08648, email KReid-

Green@msn.com.

October–December 2002 13

There is a long history of computing more and more precise approximations of π. Broadly you can divide it into 2 eras: before and after electronic digital computers.
Throughout the centuries many mathematicians worked on developing new methods of computing π. The 16th century German-Dutch mathematician Ludolph van Ceulen computed the first 35 decimal places of π with a 262 sided polygon. He was so proud of his achievement that he had the digits inscribed on his tombstone.
![Ludolph van Ceulen tombstone](https://i.imgur.com/37uFixl.png)
*Ludolph van Ceulen's tombstone*
The longest expansion of π before the advent of digital computing was achieved by the English amateur mathematician William Shanks in the late 19th century. He spent over 20 years calculating π to 527 places. To put it into perspective, in order to calculate the circumference of the the observable universe, 93 billion light-years in diameter, to a precision of less than one Planck length you need an approximation of π to 62 decimal places.
Thanks Jadrian, I went ahead and corrected it.
Finding the square root of a number $V$ is equivalent to using Newton’s method to find the root of $f(x) = x^2 - V$. Newton’s method finds a root of $f(x)$ from a guess $x_n$ by approximating $f(x)$ as its tangent line at $x_n$ : $f(x_n) + f’(x_n)(x - x_n)$. To get $x_{n+1}$ you find the root of the tangent:
$$ f(x_n) + f’(x_n)(x - x_n) = 0$$
$$ f’(x_n)x = f’(x_n)x_n - f(x_n)$$
$$ x = x_n - \frac{f(x_n)}{f’(x_n)}$$
For $f(x) = x^2 - V$ you get:
$$x_{n+1} = x_n - \frac{x_n^2 - V}{2x_n}$$
This iterative method of finding square roots has been known since Babylonian times. This method is also known as Heron's method, after the first-century Greek mathematician Hero of Alexandria who gave the first explicit description of the method in his AD 60 work: Metrica.
### Modern computation of π
In the early 20th century the Indian mathematician Srinivasa Ramanujan found several rapidly converging infinite series of π, including:
$${\frac {1}{\pi }}={\frac {2{\sqrt {2}}}{9801}}\sum _{k=0}^{\infty }{\frac {(4k)!(1103+26390k)}{(k!)^{4}396^{4k}}}$$
This series computes a further eight decimal places of π with each term. Ramanujan series are the basis of a lot of the fastest algorithms used to compute Pi.
The current record was set in 2020 by Timothy Mullican, who computed 50 trillion digits of π. He wrote about it [here](https://blog.timothymullican.com/calculating-pi-my-attempt-breaking-pi-record)
Professor John von Neumann was one of the people that expressed interest in having the ENIAC compute the value of π to many decimal places. Von Neumann was in fact interested in obtaining a statistical measure of the randomness of the distribution of the digits.
### Vector Displays
The Sketchpad, developed at MIT’s Lincoln Lab, is an example of an early computer that made use of a vector displays.
![sketchpad](https://i.imgflip.com/3r3v47.gif)
### Raster Displays
Here is an example of a raster display - a slow motion recording of a CRT displaying Super Mario Bros. on a NES:
![raster](https://i.imgur.com/kknDRng.gif)
Below you can see a picture of a a IBM 709 along with the included CRT display. The IBM 709 was an early vacuum tube powered computer introduced by IBM in 1958. The IBM 709 had 32,768 words of 36-bit memory, could execute 42,000 add instructions per second and could multiply two 36-bit integers at a rate of 5000 per second.
![IBM 709](https://i.imgur.com/80AKAlk.jpg)
Here is a simple implementation of the Bresenham’s algorithm in JavaScript.
```
let draw_line = (x0, y0, x1, y1) => {
// Calculate "deltas" of the line (difference between two ending points)
let dx = x1 - x0;
let dy = y1 - y0;
// Calculate the line equation based on deltas
let D = (2 * dy) - dx;
let y = y0;
// Draw the line based on arguments provided
for (let x = x0; x < x1; x++)
{
// Place pixel on the raster display
pixel(x, y);
if (D >= 0)
{
y = y + 1;
D = D - 2 * dx;
}
D = D + 2 * dy;
}
};
```
You can find a full tutorial along with an interactive demo [here](http://www.javascriptteacher.com/bresenham-line-drawing-algorithm.html)
Given that $x$ is the fraction and $a$ is the characteristic:
$$\sqrt{x \times 10^a} \Leftrightarrow (x \times 10^a)^\frac{1}{2} \Leftrightarrow \sqrt{x} \times 10^\frac{a}{2}$$
Using a rapidly converging series:
$\pi = \sqrt{12}\left(1-\frac{1}{3.3}+\frac{1}{3.5^{2}}-\frac{1}{3.7^{2}}+\ldots \right)$
Madhava of Sangamagrama (Modern day Kerala, India) computed the the correct value of $\pi$ to 11 decimal digits as early as the period (c. 1340 – c. 1425) . The value of $\pi$ correct to 13 decimals, $\pi=3.1415926535898$, the most accurate since the 5th century, is sometimes attributed to Madhava.
https://en.wikipedia.org/wiki/Madhava_of_Sangamagrama
I think $G_n+1$ is a misprint for $G_{n+1}$.
Of course the second identity immediately implies the first by setting b = a.
In _Figure 3_ it would be better to say that the circle is circumscribed by the hexagon. “A circumscribed hexagon” implies, rather, _Figure 2_ where the hexagon is circumscribed by a circle.
It's Newton–*Raphson*; there's no *L* in Raphson's name.
### Floating Point Numbers
You can think of floating point numbers as being written in *scientific notation*. A floating point number has 3 parts:
- **sign**: indicates whether the number is positive or negative
- **significant or fraction**: The significant digits of the number
- **exponent or characteristic**: indicates how large (or small) the number is
In the example below I’m using a biased exponent - which was popularized by the IBM 704. This was made a standard by the IEEE Standard for Floating-Point Arithmetic (IEEE 754). In this paper the author uses signed exponents.
Example
Let’s say we wanted to store `2019` as a floating point number:
- Decimal Representation: `2019`
- Binary Representation: `11111100011`
- Scientific notation: $ 2.019 \times 10^{3} $
- Binary Scientific notation: $1.1111100011 \times 2^{10}$
- Double Precision Raw Binary:
![](https://i.imgur.com/sa4lgWK.png)
As you can see in the raw binary, the exponent is stored as `10000001001`, which is `1033`: `10` (the actual binary exponent) + `1023` (the bias, which allows you to store the exponent as an unsigned integer). This means that the exponent range is `−1022` to `+1023` (exponents of `−1023` (all 0s) and `+1024` (all 1s) are reserved for special numbers.)
The resulting line on the raster screen will be something like this:
![](https://i.imgur.com/YWZDeYZ.png)
The basic intuition is that on each loop you have to decide whether you are increasing the $y$ coordinate by 1 or not.
The **Newton-Raphson** method is an iterative root-finding algorithm which produces successively better approximations to the zeroes of a real-valued function.
Here is a representation of a few iterations of the method
![](https://media.giphy.com/media/j2GcSDWz93j3rn3QIY/giphy.gif)
### Convergence
The **Newton-Raphson** method doesn’t always converge. Sometimes it can fail:
- Derivative may be zero at the root
- Function may fail to be continuously differentiable
- Choosing a bad starting point, i.e. one that lies outside the range of guaranteed convergence