Abelson&Sussman: Structure and Interpretation of Computer Programs

Abelson&Sussman: Structure and Interpretation of Computer Programs

We need an appropriate language for describing processes, and we will use for this purpose the programming language Lisp. Just as our everyday thoughts are usually expressed in our natural language (such as English, French, or Japanese), and descriptions of quantitative phenomena are expressed with mathematical notations, our procedural thoughts will be expressed in Lisp. Lisp was invented in the late 1950s as a formalism for reasoning about the use of certain kinds of logical expressions, called recursion equations, as a model for computation. The language was conceived by John McCarthy and is based on his paper “Recursive Functions of Symbolic Expressions and Their Computation by Machine” (McCarthy 1960).

Contents

Foreword

Preface to the Second Edition

Preface to the First Edition

Acknowledgments

1 Building Abstractions with Procedures
1.1 The Elements of Programming
1.1.1 Expressions
1.1.2 Naming and the Environment
1.1.3 Evaluating Combinations
1.1.4 Compound Procedures
1.1.5 The Substitution Model for Procedure Application
1.1.6 Conditional Expressions and Predicates
1.1.7 Example: Square Roots by Newton’s Method
1.1.8 Procedures as Black-Box Abstractions
1.2 Procedures and the Processes They Generate
1.2.1 Linear Recursion and Iteration
1.2.2 Tree Recursion
1.2.3 Orders of Growth
1.2.4 Exponentiation
1.2.5 Greatest Common Divisors
1.2.6 Example: Testing for Primality
1.3 Formulating Abstractions with Higher-Order Procedures
1.3.1 Procedures as Arguments
1.3.2 Constructing Procedures Using Lambda
1.3.3 Procedures as General Methods
1.3.4 Procedures as Returned Values

2 Building Abstractions with Data
2.1 Introduction to Data Abstraction
2.1.1 Example: Arithmetic Operations for Rational Numbers
2.1.2 Abstraction Barriers
2.1.3 What Is Meant by Data?
2.1.4 Extended Exercise: Interval Arithmetic
2.2 Hierarchical Data and the Closure Property
2.2.1 Representing Sequences
2.2.2 Hierarchical Structures
2.2.3 Sequences as Conventional Interfaces
2.2.4 Example: A Picture Language
2.3 Symbolic Data
2.3.1 Quotation
2.3.2 Example: Symbolic Differentiation
2.3.3 Example: Representing Sets
2.3.4 Example: Huffman Encoding Trees
2.4 Multiple Representations for Abstract Data
2.4.1 Representations for Complex Numbers
2.4.2 Tagged data
2.4.3 Data-Directed Programming and Additivity
2.5 Systems with Generic Operations
2.5.1 Generic Arithmetic Operations
2.5.2 Combining Data of Different Types
2.5.3 Example: Symbolic Algebra

3 Modularity, Objects, and State
3.1 Assignment and Local State
3.1.1 Local State Variables
3.1.2 The Benefits of Introducing Assignment
3.1.3 The Costs of Introducing Assignment
3.2 The Environment Model of Evaluation
3.2.1 The Rules for Evaluation
3.2.2 Applying Simple Procedures
3.2.3 Frames as the Repository of Local State
3.2.4 Internal Definitions
3.3 Modeling with Mutable Data
3.3.1 Mutable List Structure
3.3.2 Representing Queues
3.3.3 Representing Tables
3.3.4 A Simulator for Digital Circuits
3.3.5 Propagation of Constraints
3.4 Concurrency: Time Is of the Essence
3.4.1 The Nature of Time in Concurrent Systems
3.4.2 Mechanisms for Controlling Concurrency
3.5 Streams
3.5.1 Streams Are Delayed Lists
3.5.2 Infinite Streams
3.5.3 Exploiting the Stream Paradigm
3.5.4 Streams and Delayed Evaluation
3.5.5 Modularity of Functional Programs and Modularity of Objects

4 Metalinguistic Abstraction
4.1 The Metacircular Evaluator
4.1.1 The Core of the Evaluator
4.1.2 Representing Expressions
4.1.3 Evaluator Data Structures
4.1.4 Running the Evaluator as a Program
4.1.5 Data as Programs
4.1.6 Internal Definitions
4.1.7 Separating Syntactic Analysis from Execution
4.2 Variations on a Scheme — Lazy Evaluation
4.2.1 Normal Order and Applicative Order
4.2.2 An Interpreter with Lazy Evaluation
4.2.3 Streams as Lazy Lists
4.3 Variations on a Scheme — Nondeterministic Computing
4.3.1 Amb and Search
4.3.2 Examples of Nondeterministic Programs
4.3.3 Implementing the Amb Evaluator
4.4 Logic Programming
4.4.1 Deductive Information Retrieval
4.4.2 How the Query System Works
4.4.3 Is Logic Programming Mathematical Logic?
4.4.4 Implementing the Query System

5 Computing with Register Machines
5.1 Designing Register Machines
5.1.1 A Language for Describing Register Machines
5.1.2 Abstraction in Machine Design
5.1.3 Subroutines
5.1.4 Using a Stack to Implement Recursion
5.1.5 Instruction Summary
5.2 A Register-Machine Simulator
5.2.1 The Machine Model
5.2.2 The Assembler
5.2.3 Generating Execution Procedures for Instructions
5.2.4 Monitoring Machine Performance
5.3 Storage Allocation and Garbage Collection
5.3.1 Memory as Vectors
5.3.2 Maintaining the Illusion of Infinite Memory
5.4 The Explicit-Control Evaluator
5.4.1 The Core of the Explicit-Control Evaluator
5.4.2 Sequence Evaluation and Tail Recursion
5.4.3 Conditionals, Assignments, and Definitions
5.4.4 Running the Evaluator
5.5 Compilation
5.5.1 Structure of the Compiler
5.5.2 Compiling Expressions
5.5.3 Compiling Combinations
5.5.4 Combining Instruction Sequences
5.5.5 An Example of Compiled Code
5.5.6 Lexical Addressing
5.5.7 Interfacing Compiled Code to the Evaluator

References

List of Exercises

Index

The Perils of JavaSchools – Joel on Software

The Perils of JavaSchools – Joel on Software
Without pointers, for example, you’d never be able to work on the Linux kernel. You can’t understand a line of code in Linux, or, indeed, any operating system, without really understanding pointers.

Without understanding functional programming, you can’t invent MapReduce, the algorithm that makes Google so massively scalable. The terms Map and Reduce come from Lisp and functional programming. MapReduce is, in retrospect, obvious to anyone who remembers from their 6.001-equivalent programming class that purely functional programs have no side effects and are thus trivially parallelizable. The very fact that Google invented MapReduce, and Microsoft didn’t, says something about why Microsoft is still playing catch up trying to get basic search features to work, while Google has moved on to the next problem: building Skynet^H^H^H^H^H^H the world’s largest massively parallel supercomputer. I don’t think Microsoft completely understands just how far behind they are on that wave.

Interview mit Dr. Thilo Sarrazin, Kurier

Interview mit Dr. Thilo Sarrazin, Kurier
Jeder kann die Kinder bekommen, die er will, er sollte aber selbst für ihren Unterhalt aufkommen. Der Staat muss seinen Beitrag durch das staatliche Bildungssystem leisten. Zur Intelligenz führe ich nur einen Dreisatz durch. Erstens: Die bei Menschen gemessenen Intelligenzunterschiede sind zu 50 bis 80 Prozent erblich, das sagt die Wissenschaft. Zweitens: Gebildete Menschen bekommen deutlich weniger Kinder, das sagt das Statistische Bundesamt. Intelligenz und Bildung sind, wie nicht anders zu erwarten, positiv korreliert. Daraus folgt drittens: Wenn sich der Trend fortsetzt, dass die weniger Intelligenten mehr Kinder bekommen, dann sinkt die durchschnittliche genotypische Intelligenz, also der erbliche Anteil der Intelligenz in der Bevölkerung.

Es ist ja auch interessant, dass von den rund 840 Nobelpreisträgern, die es bisher gab, 25 Prozent jüdische Wissenschaftler waren. Es gab 8 Preisträger aus islamischen Ländern, darunter vier Friedensnobelpreise.

Recover tmp flash videos (deleted immediately by the browser …

Recover tmp flash videos (deleted immediately by the browser …
for h in `find /proc/*/fd -ilname “/tmp/Flash*” 2>/dev/null`; do ln -s “$h” `readlink “$h” | cut -d’ ‘ -f1`; done

Recover tmp flash videos (deleted immediately by the browser plugin)Newer versions of the flashplayer browser plugin delete the tmp flash video immediately after opening a filehandle to prevent the user from “exporting” the video by simply copying the /tmp/FlashXYZ file. This command searches such deleted flash videos and creates symbolic links to the opened filehandle with the same name as the deleted file.This allows you to play your flash-videos (from e.g. youtube) with e.g. mplayer or copy the buffered video if you want to keep it.

Max-Planck-Gesellschaft – Kosmische Kollisionen schmieden Gold

Max-Planck-Gesellschaft – Kosmische Kollisionen schmieden Gold
Der Ort, an dem die schwersten chemischen Elemente im Universum wie Blei oder Gold entstehen, dürfte nun gefunden sein: In einer heftigen Kollision verschmelzende Neutronensterne sind die idealen Produktionsstätten.

Viele schwere chemische Elemente entstehen durch das nukleare Brennen in Sternen. So fusioniert auch im Inneren unserer Sonne ständig Wasserstoff zu Helium und setzt dabei Energie frei. Massereichere Sterne als die Sonne schmieden danach aus Helium schwerere Elemente. Dieser Prozess funktioniert aber nur bis hin zum Eisen. Weil weiterer Energiegewinn in Fusionsreaktionen nicht möglich ist, können noch schwerere Atomkerne so nicht erzeugt werden. Sie bilden sich durch Einfang von ungeladenen Neutronen auf mittelschwere Saatkerne.

Still trying to get it all out: You’re being lied to

Still trying to get it all out: You’re being lied to

If you’re among the crowd who have migrated an OOP based application from PHP4 to PHP5, then I’m sure you’ve heard the expression “Objects are copied by reference by default in PHP5”. Whoever told you that, was lying.

Now, to be fair, it’s an innocent lie, since objects do behave in a reference-like manner, but references are NOT what they are. Let’s start with a simple illustration proving that they aren’t references:

foo = ‘bar’;
var_dump($b);
/* Notice at this point, that $a and $b are,
* indeed sharing the same object instance.
* This is their reference-like behavior at work.
*/

$a = ‘baz’;
var_dump($b);

/* Notice now, that $b is still that original object.
* Had it been an actual reference with $a,
* it would have changed to a simple string as well.
*/
?>

What’s going on here? Well, the answer is easiest to explain by explaining what the underlying structure of objects are. In PHP5, a variable containing an object identifies the instance by storing a simple numeric value. When an action is going to be performed on an object, that numeric value is used with a lookup table to retreive the actual instance. In PHP4, by contrast, a variable containing an array identifies that object by carrying around the actual properties table itself. What this means in practice is that when you assign (not by reference) a PHP5 object to a new variable, that integer handle is copied into the new variable, but it still points at the same instance, because it’s still the same number. Assigning a PHP4 object however, means copying all the properties, effectively generating a new instance, since changes to one will not effect the other.

To put this another way, PHP4 objects are basically Arrays with functions associated with them, PHP5 objects are basicly Resources (a la MySQL result handles, or file pointers) again with functions loosely associated to them. Consider the following code in PHP4 (or any version):

You’d fully expect data to be written to the same file, as though you’d used $fp everywhere, rather than interchanging the variables right? Well, PHP5 objects are the same. The instance itself isn’t duplicated when you assign to a new variable, just the unique identifier.

I’m lying to you also

“Copying” a variable doesn’t exactly mean copying. Take the following code block:

Now, you know PHP well enough to know that by the end of this code block, the value of $b will still be ‘foo’. What you may not know, is that the original copy of ‘foo’ that was in $a, was never actually duplicated.

To understand what PHP is doing, you need to understand the internal structure of the variable and how it relates to userspace visible variable names (‘a’ and ‘b’ in this case). First off, the actual contents of a variable (known as a zval) consists of four parts: type (e.g. NULL, Boolean, Integer, Float, String, Array, Resource, Object), a specific value (e.g. 123, 3.1415926535, etc…), is_ref – a flag indicating if the value is a reference or not, and refcount which tells how many times this value is being shared.

What you think of as a variable (e.g. $x) is actually just a label, that label (‘x’ in this case) is used as a lookup to find the zval which conatins the actual value. These are just like keys in an associative array, in fact, the mechanisms are identical.

With me so far? Good. Now, when you first create a variable (e.g. $x = 123;, PHP allocates a new zval for it, stores the specific value, and associates the label with the value:

‘x’ => zval ( type => IS_LONG,
value.lval => 123,
is_ref => 0,
refcount => 1 )

So far, refcount is 1 since the zval value is only being referenced by one label. If we now put this value into a full-reference set using $y =& $x;, the same zval is reused. It’s simply associated with a new label and it’s reference counters are adjusted properly.

‘x’ => zval ( type => IS_LONG,
| value.lval => 123,
| is_ref => 1,
| refcount => 2 )
‘y’ /

This way, when you later change the value of $x, $y appears to change as well because it’s looking at the same internal value. But what if we hadn’t done a reference assignment, what if we’d done a normal assignment: $y = $x;, surprisingly, the result would be almost the same.

‘x’ => zval ( type => IS_LONG,
| value.lval => 123,
| is_ref => 0,
| refcount => 2 )
‘y’ /

Again, the original zval associated with $x is reused, the only difference this time is that is_ref is not set to 1. This is known as a copy-on-write reference set (as opposed to the full-reference set described above). This 0 flag tells the engine that if anyone tries to change this value (regardless of which label they use to reach it), any other references to it should be left alone. Here’s what happens if we take that current state and do $x = 456;

‘y’ => zval ( type => IS_LONG,
value.lval => 123,
is_ref => 0,
refcount => 1 )
‘x’ => zval ( type => IS_LONG,
value.lval => 456,
is_ref => 0,
refcount => 1 )

$x has been disassociated from the original zval (thus dropping its refcount back to 1), and new zval has been created for it.

Why referencing when you don’t have to is a bad idea.

Let’s consider one more situation, take a look at this code block:

At the first instruction, a single zval is created, associated to a single label:

‘a’ => zval ( type => IS_STRING, value.str.val = ‘foo’, is_ref = 0, refcount = 1 )

At the second intstruction, that zval is associated to a second label, so far so good:

‘a’ => zval ( type => IS_STRING,
| value.str.val => ‘foo’,
| is_ref => 0,
| refcount => 2 )
‘b’ /

At the third intstruction, however, we run into problems. Since this zval is already tied up in a copy-on-write reference set which include $b, that zval can’t be simply promoted to is_ref==1. Doing so would drag $b into $a and $c’s full-reference set, and that would be wrong. In order to resolve this, the engine is forced to duplicate that zval into two identical copies, from which it can begin to shuffle around reference flags and counts:

‘b’ => zval ( type => IS_STRING,
value.str.val =>’foo’,
is_ref => 0,
refcount => 1 )
‘a’ => zval ( type => IS_STRING,
| value.str.val => ‘foo’,
| is_ref => 1,
| refcount => 2 )
‘c’ /

Now you’ve got two copies of the same literal value, so you’re wasting memory for the storage, and processing time required to actually make the duplication. Since a LOT of events lead to copy-on-write uses (including simply passing an argument to a function), this sort of forced duplication actually happens very commonly when you start involving actual references.

The moral of the story

Assigning values by references when you don’t need to (in order to later modify the original value through a different label) is NOT a case of you outsmarting the silly engine and gaining speed and performance. It’s the opposite, it’s you TRYING to outsmart the engine and failing, because the engine is already doing a better job than you think.

How does this reflect on objects? They’re not special. They’re not different from other variables. They are not pretty snowflakes. In this code block:

The labels are still placed into copy-on-write reference sets. What’s important, is that even when a duplication does occur, (A) only that unique integer is copied (which is cheap), and (B) the duplicated integer still points to the same place. Hence you get reference-like behavior, but not an actual reference by default.

Hungry for more? Check out my coverage of the zval.