Structure of Programming Languages: Setyembre 2014

A. Perspective of programming languages

Almost thirty years ago,

a noted computer scientist (1) remarked that it was

unfortunate that real computers had to be us

ed in teaching computer science. Although

many in the audience may have viewed this as

a rather radical position at the time, it has

proven to be an insightful commentary on many

of our efforts to design and deliver courses

in the discipline. In fact, the premise probably

should be broadened to include software as

well as hardware. Actual computing systems,

hardware as well as software, often swamp

the learner in a sea of minutia in which basic concepts are at least obscured if not

completely lost. While there are

difficulties in using real system

s in courses at all levels, it

appears that some of the greatest problems ma

y be found at the introductory level. In

particular, achieving consensus in the choice of a programming language (or none at all!) for

CS1 has proven to be elusive. With

Curriculum 2001

now in the works, it is particularly

timely that experience with

this course be reviewed.

FROM FORTRAN TO JAVA AND BEYOND

In the early seventies, introductory

courses using FORTRAN were quite common.

After all, if only half in jest, it was observed that "real programmers" would use nothing

else. BASIC was also popular at that time

and was making inroads where time sharing was

available, and the presentations were directed

to more diverse audiences - perhaps not real

programmers. By the end of the decade (70's)

, however, Pascal had captured the market.

Here was a language simple, comp

act, and consistent which had

been designed especially for

education. It should be an optimum choice. Textbooks and materials using Pascal

flourished. Many instructors were drawn to Pascal at all levels, but it was particularly

popular in the introductory course. The Educational Testing Service (ETS) adopted Pascal

for its Advanced Placement Examination. In fairness, it should be noted that there were

and continue to be a number of other voices. Despite its heritage and acknowledged

simplicity, Pascal can be viewed as overly comp

lex in terms of syntax when compared with

languages such as LISP (or Scheme).

It can be argued that unlike Pascal,

Scheme has little if any syntax, and its

semantics are based on sound mathematical principles (2). It also has become apparent that

Pascal is not the

lingua francaof professional programmers.

Why should we teach something

which may be irrelevant? When

objects became important, efforts were made to update

and enhance Pascal. Unfortunately, these efforts have resulted

in more arcane systems and

have been met with a rather tepid response.

Moving the language of the introductory course to C or more likely to C++ seemed

imminently logical in the nineties. Almost immediately, a large number of introductory texts

on C and C++ appeared, and the agendas of national meetings were filled with reports on the

more or less successful implementation of C

and/or C++ in CS1. It seemed that no

respectable computer science department could ignore the tida

l wave. ETS announced that

the Advanced Placement Exam would eventually

use C++. There is, however, a disturbing

fact that became painfully apparent to instructors as they actually tried to present C++ and

objects in their courses. Despite its highly

publicized advantages

, C++ is not a simple

language. Also, it is now a generally accepted

as an undesirable practice to simply translate

code segments in an introductory

text, perhaps originally written in Pascal, into C++.

This lesson may not have been fully learned in

that several CS1 texts

have attempted to do

exactly this and have appeared in several flavors (3).

The pitfalls of C for beginning students

have been acknowledged for some time.

The programming style of C encourages short,

pithy code, replete with multiple levels of

indirection. Novices also find

the standard I/O facilities of

C largely impenetrable. When

using C++ at the introductory level, these problems remain, although the I/O difficulties

are somewhat alleviated, even a cursory treatment of object principles has proven to be a

formidable task. Most introductory courses using C++ are content to introduce the concept

of objects and illustrate their use, while

reserving object creation for a later course.

For schools determined to present a "pure"

view of modularization, object-oriented

programming, and good software

engineering methodology in general, the emergence of ADA

in the 1980's was very welcome. However, remains a favorite of only a relatively select

small group of schools.

The C++ craze, however, may have been short-lived. The emergence of the Internet

must be acknowledged and reflected in our courses. Recent conferences have produced an

impressive number of papers on the use of Java at the introductory level. C++ is often

represented as a better C. In that sense, Java

may be viewed as a (slightly) simpler version

of C++. While its overall utility for Internet

programming may be somewhat specious, Java

does project the aura of modern practice

and with the implication that jobs may be

available for the cognoscenti. Possibly for these reasons, courses offered using Java tend

to be very popular with students, even with those who are

not computer science majors.

The size of this audience is hard to ignore.

If presenting several introductory courses isn't

feasible, can the goals of CS 1 be met using

Java or maybe Visual Basic? Or, for that

matter, should the selection of the programming language be a prime consideration?

At Brooklyn College and other schools it is

now common to include the teaching of

some HTML and Java Scripts in the breadth first course. This illustrates that language

choices have become somewhat market and demand driven -- rather than purely based on

academic issues. Furthermore HTML is a relatively simple language to learn and use.

From a historical perspective it appears that very satisfactory courses have been

developed with a wide variety of programming

languages. The determining factor for which

way the pendulum swings in terms of a language's adoption for CS1 may be image and

marketing factors, which certainly cannot be ignored. If an introductory course is carefully

designed to meet its desired outcomes, it

may not matter how we elect to express our

algorithms. Although this may

not be easy to accept, it may

to be perfectly defensible to

use whatever language the market dictates to

ply our trade. Academic integrity need not

be compromised by pursuing this rationale.

B. Language processors

language processor, we mean a program that processes programs written in a programming language (source language). All or part of a language processor is a language translator, which translates the program from the source language into machine code, assembly language, or some other language. The machine code can be for an actual computer or for a virtual (hypothetical) computer. If it is for a virtual computer, then a simulator for the virtual computer is needed in order to execute the translated program.

If a language processor is a translator that produces machine or assembly code as output (in object code or executable code) then it is called a compiler. If the language processor executes the translated program (output from the translator) then it is called an interpreter.

C. Data-level Structure

In computer science a level set data structure is designed to represent discretely sampled dynamic level sets functions.
A common use of this form of data structure is in efficient image rendering. The underlying method constructs a signed distance field that extends from the boundary, and can be used to solve the motion of the boundary in this field.

Narrow band

The narrow band level set method, introduced in 1995 by Adalsteinsson and Sethian,^[2] restricted most computations to a thin band of active voxels immediately surrounding the interface, thus reducing the time complexity in three dimensions to

O(n^2)

for most operations. Periodic updates of the narrowband structure, to rebuild the list of active voxels, were required which entailed an

O(n^3)

operation in which voxels over the entire volume were accessed. The storage complexity for this narrowband scheme was still

O(n^3).

Differential constructions over the narrow band domain edge require careful interpolation and domain alteration schemes to stabilise the solution.^[3]

Sparse field

This

O(n^3)

time complexity was eliminated in the approximate "sparse field" level set method introduced by Whitaker in 1998.^[4] The sparse field level set method employs a set of linked lists to track the active voxels around the interface. This allows incremental extension of the active region as needed without incurring any significant overhead. While consistently

O(n^2)

efficient in time,

O(n^3)

storage space is still required by the sparse field level set method. See ^[5] for implementation details.

Sparse block grid

The sparse block grid method, introduced by Bridson in 2003,^[6] divides the entire bounding volume of size

n^3

into small cubic blocks of

m^3

voxels each. A coarse grid of size

(n/m )^3

then stores pointers only to those blocks that intersect the narrow band of the level set. Block allocation and deallocation occur as the surface propagates to accommodate to the deformations. This method has a suboptimal storage complexity of

O\left((nm)3 + m^3n^2\right)

, but retains the constant time access inherent to dense grids.

Octree

The octree level set method, introduced by Strain in 1999 ^[7] and refined by Losasso, Gibou and Fedkiw,^[8] and more recently by Min and Gibou^[9] uses a tree of nested cubes of which the leaf nodes contain signed distance values. Octree level sets currently require uniform refinement along the interface (i.e. the narrow band) in order to obtain sufficient precision. This representation is efficient in terms of storage,

O(n^2),

and relatively efficient in terms of access queries,

O(\log\, n).

An advantage of the level method on octree data structures is that one can solve the partial differential equations associated with typical free boundary problems that use the level set method. The CASL research group^[10] has developed this line of work in computational materials, computational fluid dynamics, electrokinetics, image guided surgery and controls.

Run-length encoded

The run-length encoding (RLE) level set method, introduced in 2004,^[11] applies the RLE scheme to compress regions away from the narrow band to just their sign representation while storing with full precision the narrow band. The sequential traversal of the narrow band is optimal and storage efficiency is further improved over the octree level set. The addition of an acceleration lookup table allows for fast

O(\log r)

random access, where r is the number of runs per cross section. Additional efficiency is gained by applying the RLE scheme in a dimensional recursive fashion, a technique introduced by Nielsen & Museth's similar DT-Grid.^[12]

Hash Table Local Level Set

The Hash Table Local Level Set method, introduced in 2012 by Brun, Guittet and Gibou,^[13] only computes the level set data in a band around the interface, as in the Narrow Band Level-Set Method, but also only stores the data in that same band. A hash table data structure is used, which provides an

O(1)

access to the data. However, the authors conclude that their method, while being easier to implement, performs worse than a quadtree implementation. They find that

as it is, [...] a quadtree data structure seems more adapted than the hash table data structure for level-set algorithms.

Three main reasons for worse efficiency are listed:

to obtain accurate results, a rather large band is required close to the interface, which counterbalances the absence of grid nodes far from the interface;
the performances are deteriorated by extrapolation procedures on the outer edges of the local grid and
the width of the band restricts the time step and slows down the method.

D. Program-level Structure

The best way to learn a programming language is by writing programs. Typically, the first program beginners write is a program called "Hello World", which simply prints "Hello World" to your computer screen. Although it is very simple, it contains all the fundamental components C++ programs have:

// my first program in C++
#include <iostream>

int main()
{
  std::cout << "Hello World!";
}

Hello World!

Edit & Run

Let's examine this program line by line:

Line 1: // my first program in C++: Two slash signs indicate that the rest of the line is a comment inserted by the programmer but which has no effect on the behavior of the program. Programmers use them to include short explanations or observations concerning the code or program. In this case, it is a brief introductory description of the program.
Line 2: #include <iostream>: Lines beginning with a hash sign (#) are directives read and interpreted by what is known as the preprocessor. They are special lines interpreted before the compilation of the program itself begins. In this case, the directive #include <iostream>, instructs the preprocessor to include a section of standard C++ code, known as header iostream, that allows to perform standard input and output operations, such as writing the output of this program (Hello World) to the screen.
Line 3: A blank line.: Blank lines have no effect on a program. They simply improve readability of the code.
Line 4: int main (): This line initiates the declaration of a function. Essentially, a function is a group of code statements which are given a name: in this case, this gives the name "main" to the group of code statements that follow. Functions will be discussed in detail in a later chapter, but essentially, their definition is introduced with a succession of a type (int), a name (main) and a pair of parentheses (()), optionally including parameters.

The function named main is a special function in all C++ programs; it is the function called when the program is run. The execution of all C++ programs begins with the main function, regardless of where the function is actually located within the code.
Lines 5 and 7: { and }: The open brace ({) at line 5 indicates the beginning of main's function definition, and the closing brace (}) at line 7, indicates its end. Everything between these braces is the function's body that defines what happens when main is called. All functions use braces to indicate the beginning and end of their definitions.
Line 6: std::cout << "Hello World!";: This line is a C++ statement. A statement is an expression that can actually produce some effect. It is the meat of a program, specifying its actual behavior. Statements are executed in the same order that they appear within a function's body.

This statement has three parts: First, std::cout, which identifies the standard character output device (usually, this is the computer screen). Second, the insertion operator (<<), which indicates that what follows is inserted into std::cout. Finally, a sentence within quotes ("Hello world!"), is the content inserted into the standard output.

Notice that the statement ends with a semicolon (;). This character marks the end of the statement, just as the period ends a sentence in English. All C++ statements must end with a semicolon character. One of the most common syntax errors in C++ is forgetting to end a statement with a semicolon.

E. Control-level Structure

Control Structure

• A

control structure

is a control statement

and the statements whose execution it

controls

• Design question

– Should a control structure have multiple entries?

8-6

Selection Statements

• A

selection statement

provides the means

of choosing between two or more execution

paths in a program

• Two general categories:

– Two-way selectors

– Multiple-way selectors

≤

100?

S = S +I

True

False

...

8-7

Two-Way Selection Statements

• General form:

if

control_expression

then 

clause

else 

clause

• Design Issues:

– What is the form and type of the control

expression?

– How are the

then

and

else

clauses

specified?

– How should the meaning of nested selectors be

specified?

8-8

The Control Expression

• If the

then

reserved word is not used

– control expressions are specified in parentheses

• If the

then

reserved word is used

– Less need for parentheses

• In C89, arithmetic expressions were used

as control expressions

• In Ada, Java, Ruby, and C#, only

Boolean

expressions

can be used

8-9

Nesting Selectors

• Java example

if (sum == 0)

if (count == 0)

result = 0;

else result = 1;

• Which

if

gets the

else

• Java's static semantics rule:

else

matches

with the nearest

if

In Java

8-10

Nesting Selectors (continued)

•To force an alternative semantics,

compound statements may be used:

if (sum == 0) {

if (count == 0)

result = 0;

}

else result = 1;

• The above solution is used in C, C++, and

8-11

Nesting Selectors in Perl

• Perl requires that all

then

and

else clauses

to be compound, so it does not have the

problem

if (sum == 0) {

if (count == 0) {

result = 0;

}

} else { 

result = 1;

}

if (sum == 0) {

if (count == 0) {

result = 0;

}

else { 

result = 1;

}

}

8-12

Using a Special Word Also Resolves the

Problem...

• In Ruby

if

sum == 0 

then

if

count == 0 

then

result = 0

else

result = 1

end

end

if

sum == 0 

then

if

count = 0 

then

result = 0

end

else

result = 1

end

if

和

end

就像左右括弧般配對

8-13

Multiple-Way Selection Statements

• Allow the selection of one of any number

of statements or statement groups

• Design Issues:

1. What is the form and type of the control expression?

2. How are the selectable segments specified?

3. Is execution flow through the structure restricted

to include just a single selectable segment?

4. How are the case values specified?

5. What is done about unrepresented expression

values?

8-14

Multiple-Way Selection: Examples

•C’s

switch

statement

switch

(expression) {

case

const_expr_1: stmt_1;

...

case

const_expr_n: stmt_n;

[

default

: stmt_n+1]

}

8-15

Design Choices for C’s

switch

1. Control expression and constant expressions

are some integer type

2. Selectable segments can be statement

sequences, blocks, or compound statements

3. Any number of segments can be executed in

one execution of the construct (there is no

implicit branch at the end of selectable

segments)

4. default

clause is for unrepresented values (if

there is no

default

, the whole statement does

nothing)

8-16

In C, Control May Flow More Than One

Selectable Codes...

switch

(index) {

case

1:

case

3:

odd += 1;

sumodd += index;

case

2:

case

4:

even += 1;

sumeven += index;

default

: printf(“Error in switch”);

}

8-17

The Ada

case

Statement

case

expression 

is

when

choice list => stmt_sequence;

...

when

choice list => stmt_sequence;

[

when others

=> stmt_sequence;]

end case

;

• once a stmt_sequence execution is

completed, control is passed to the first

statement after the

case

statement

• More reliable than C’s

switch

8-18

More On The Ada

case

Statement

• choice list

can include:

– Subranges e.g.,

10..15

– Boolean OR operators. e.g.,

1..5 

|

7

|

15..20

• The choice lists must be

exhaustive

– Often accomplished with

when others

clause

– This makes it more reliable

8-19

Multiple-Way Selection Using

else-if

• Multiple Selectors can appear as direct

extensions to two-way selectors, using

else-if

clauses, for example in Ada:

if

...

then

...

elsif

...

then

...

elsif

...

then

...

else

...

end if

8-20

Iterative Statements

• The repeated execution of a statement or

compound statement is accomplished

either by iteration

or recursion

• General design issues for iteration control

statements:

1. How is iteration controlled?

2. Where is the control mechanism in the loop?

8-21

Counter-Controlled Loops

• A counting iterative statement has a

loop variable

, and a means of

specifying the

initial

and

terminal

and

stepsize

values

• the

initial

and

terminal

, and

stepsize

specifications of a loop are called the

loop parameters

8-22

Design Issues of Counter-Controlled

Loops

1.What are the type and scope of the loop

variable?

2.What is the value of the loop variable at

loop termination?

3.Should it be legal for the loop variable or

loop parameters to be changed in the loop

body, and if so, does the change affect loop

control?

4.Should the loop parameters be evaluated

only once, or once for every iteration?

8-23

Iterative Statements: FORTRAN 95

DO

label var = start, finish [, stepsize]

• Stepsize can be any value but zero

• Parameters can be expressions

• Design choices:

1. Loop variable must be

INTEGER

2. Loop variable always has its

last value

3. Loop parameters are evaluated only once

4. The loop variable cannot be changed in the

loop, but the parameters can; because they are

evaluated only once, it does not affect loop

control

8-24

FORTRAN 95’s

Do

Example

a = 1

DO

10 Index = a, 10

...

a = a + 1

10  Continue

The value of

Index

after normal termination of

the loop is 11

Loop parameter can be changed in the loop,

but it does not affect loop control

value can not

be changed in

the loop

8-25

Iterative Statements: FORTRAN 95

• FORTRAN 95 : a second form:

[name:] 

DO

variable = initial, terminal [,stepsize]

...

END DO

[name]

DO

Count = 1, 10

...

End Do

8-26

Iterative Statements: Ada

for

var

in

[

reverse

] discrete_range

loop

...

end loop

• discrete_range

is a sub-range of an

integer or enumeration type

• Scope of the loop variable is

the range of

the loop

• Loop variable is implicitly undeclared after

loop termination

8-27

Ada’s

for

Example

Count: Float := 1.35;

for

Count 

in

1..10 

loop

Sum := Sum + Count;

end loop

Count

is a floating

point number with

Initial value 1.35

Upon loop termination,

Count

is still

Float

type with the value of 1.35

Loop variable

Count

cannot be

assigned a value in the loop body

It is not accessible

in the loop body

8-28

Iterative Statements: C

for

(expr_1; expr_2; expr_3)

loop body

• The expressions can be whole statements, or even

statement sequences, with the statements

separated by commas

– The value of a multiple-statement expression is the value

of the last statement in the expression

• There is no explicit loop variable

• Everything can be changed in the loop

• The first expression is evaluated once, but the

other two are evaluated with each iteration

8-29

C’s

for

Example

for

(count1 = 0, count2 = 1.0;

count1 <= 10 && count2 <= 100.0;

sum = ++ count1 + count2, count2 *= 2.5);

Need not have a loop body

for

(a = 100, b = 1; a < b; a++, b *= 2);

No explicit loop variable

8-30

Iterative Statements: Examples

• C++ differs from C in two ways:

1. The control expression

can also be

Boolean

2. The initial expression can include variable

definitions (scope is from the definition to the

end of the loop body)

• Java and C#

– Differs from C++ in that the control

expression

must be

Boolean

for

(

int

count = 0; count < len; count++) {...}

8-31

Iterative Statements: Logically

Controlled Loops

• Repetition control is based on a Boolean

• Design issues:

– Pre-test or post-test?

– Should the logically controlled loop be a

special form of a counting loop or a separate

statement?

• General forms:

Pre-test

while

(ctrl_expr)

loop body

Post-test

do

loop body

while

(ctrl_expr)

8-32

Logically-Controlled Loops: Examples

• C and C++ have both pre-test and post-

test logical loop statements, but the

control expression for the post-test

version is treated just like in the pre-test

case

(

while-do

and

do-while

)

• Java is like C, except the control

expression must be Boolean (and the body

can only be entered at the beginning --

Java has no

goto

)

8-33

Logically-Controlled Loops: Examples

• Ada has a pretest version, but no post-test

• FORTRAN 77 and 90 have neither

• Perl has two pre-test logical loops,

while

and

until

, but no post-test logical loop

8-34

User-Located Loop Control Mechanisms

• Sometimes it is convenient for the

programmers to decide a location for loop

control (other than top or bottom of the

loop)

• Simple design for single loops (e.g.,

break

)

• Design issues for nested loops

1. Should the conditional be a part of the exit?

2. Should only one loop body be exited, or can

enclosing loops also be exited?

8-35

User-Located Loop Control Mechanisms

break

and

continue

• C , C++, Ruby, and C# have unconditional

unlabeled exits (

break

)

• Unconditional; for any loop or

switch

; one

level only

• Java and Perl have unconditional labeled

break

statement: control transfers to the

label

• An alternative:

continue

statement; it

skips the remainder of this iteration, but

does not exit the loop

8-36

break

and

continue

for

(row = 0; row < numRows; row++)

for

(col = 0; col < numCols; col++) {

sum += mat[row][col];

if

(sum > 1000.0) {

break

;

}

for

(x = 1; x <= 10; x++) {

if

(x == 5) {

continue

;

}

printf("%d ", x); 

}

In Java

8-37

A Break to the Outer Loop in Java

OuterLoop:

for

(row = 0; row < numRows; row++)

for

(col = 0; col < numCols; col++) {

sum += mat[row][col];

if

(sum > 1000.0)

break

outerLoop;

}

8-38

Iterative Statements: Iteration Based on

Data Structures

• Number of elements of in a data structure control

loop iteration

• Control mechanism is a call to an

iterator

function

that returns the next element in some chosen

order, if there is one; else loop is terminated

• C's

for

can be used to build a user-defined

iterator:

for

(p=root; p!=NULL; traverse(p)){ 

}

8-39

Iterative Statements: Iteration Based on

Data Structures (continued)

• C#’s

foreach

statement iterates on the

elements of arrays and other collections:

Strings[] = strList = {“Bob”, “Carol”, “Ted”};

foreach

(Strings name in strList)

Console.WriteLine (“Name: {0}”, name);

• The notation {0} indicates the position in

the string to be displayed

8-40

Unconditional Branching

• Transfers execution control to a specified place in

the program

• Represented one of the most heated debates in

1960’s and 1970’s

• Well-known mechanism:

goto

statement

• Major concern: Readability

• Some languages do not support

goto

statement (e.g.,

Module-2 and Java)

• C# offers

goto

statement (can be used in

switch

statements)

• Loop exit statements are restricted and somewhat

camouflaged (

偽裝的

)

goto

’s

Selection Guarded Command:

Illustrated

8-44

Loop Guarded Command

•

Form

do

[]

...

[]

od

• Semantics: for each iteration

– Evaluate all Boolean expressions

– If more than one are true, choose one non-

deterministically; then start loop again

– If none are true, exit loop

8-45

Loop Guarded Command: Illustrated

8-46

Guarded Commands: Rationale

• Connection between control statements

and program verification is intimate (

親近的

)

• Verification is impossible with

goto

statements

• Verification is possible with only selection

and logical pretest loops

• Verification is relatively simple with only

guarded commands

8-47

Conclusion

• Variety of statement-level structures

• Choice of control statements beyond

selection and logical pretest loops is a

trade-off between language size and

writability

• Functional and logic programming

languages are quite different control

structures

Structure of Programming Languages

Lunes, Setyembre 29, 2014

Elements of Programming Language

A. Perspective of programming languages

B. Language processors

C. Data-level Structure

Narrow band

Sparse field

Sparse block grid

Octree

Run-length encoded

Hash Table Local Level Set

D. Program-level Structure

E. Control-level Structure

Archive sa Blog

Tungkol sa Akin