I attended a conference USENIX2002. I hope you and the LtU readers might be interested in
some of my notes. The previous, subjectivity, disclaimer applies. 


* X Meets Z: Verifying Correctness in the Presence of POSIX Threads
Bart Massey, Portland State University; and 
Robert T. Bauer, Rational Software Corp.  Freenix track.

The paper showed how formal methods helped solve a really hard
multi-threaded problem. The problem arose during the development of
XCB, a client library for the X protocol. See the XCB/XCL talk below.
The library is supposed to be _transparently_ used in a multi-threaded
environment. Making the library thread-safe is hard, because the X
protocol does not, by its design, prevent deadlocks. Therefore, the
client library (XCB in our case) must be careful to avoid these
potential deadlocks. Alas, the careful design is hard. The author
tried a brute force approach and failed. His algorithm may prevent a
deadlock in one case and allow it in another. Bart Massey will correct
the algorithm, only to find another fatal flaw later. After four
attempts, he realized that there must be a better way.

As a professor at Portland State University, Bart Massey teaches
Z. The textbook in his class is "The way of Z. Practical Programming
with formal methods". Interestingly, this book is written by a
radiologist. Bart Massey decided then to follow what he teaches. He
wanted to specify POSIX threads in Z, specify the X server behavior in
Z, describe the properties of the desired client algorithm in Z, and
then formally prove that the whole system is deadlock free.

The latter part -- the formal proof -- is still wanting. Unfortunately
the semantics of POSIX threads is not well-specified. Still, the mere
fact of describing the client behavior in Z has given a crucial
insight that lead to the correct algorithm. That algorithm is
implemented in the current version of XCB. Although there is no formal
proof of its correctness (yet), the extensive testing and use revealed
no problems.

Thus the lesson of this talk is that formal methods are indeed
useful. Good ideas come up just from mere writing down a formal
specification. Engineers use Math and logic extensively in their work;
software engineers and opensource developers should do that too.

* XCL: An Xlib Compatibility Layer for XCB
Jamey Sharp and Bart Massey, Portland State University.  Freenix track.

XCB is C bindings for the X protocol. Unlike XLib, XCB deals only with
communication, marshaling and similar issues. XCB offers far simpler
and consistent interface to the X protocol than XLib does. XCB is very
small (27 KB compiled) and is therefore particularly suitable for
embedded systems. XCB can be used in a multi-threaded environment. The
code for XCB is mainly _automatically_ generated from the
specification of the X protocol!

XLib contains much more functionality than mere a client-side of the X
protocol. XCL is an attempt to implement some of that functionality,
using XCB to communicate with the server. XCL+XCB is meant as a
replacement of XLib (a lighter version of XLib, to be precise). The
goal is largely achieved. XCL+XCB is 55 KB compiled (whereas XLib is
665 KB. Some of that can be reduced by replacing pervasive macros with
procedure calls). Some of the XCL code is automatically generated;
most of the other is just copied from the XLib source code and
adjusted to use XCB. XCL does not implement all of the XLib. In
particular, XCL does not support X color management, i18n and
caches. As the presenter showed, these features, although accounting
for more than the half of XLib, are hardly used at all. Most of the
toolkits do i18n and color management on their own.

The author reported the success in relinking of several X windows
applications (rxvt and gw) to use XCL rather than XLib.

	     http://xcb.cs.pdx.edu/


* Cyclone: A Safe Dialect of C
Trevor Jim, AT&T Labs-Research; and Greg Morrisett, Dan Grossman,
Michael Hicks, James Cheney, and Yanling Wang, Cornell. General track.

The talk made a flimsy case for Cyclone. The presenter never justified
his assertions that developers really need to know precise details of
the data representation and to access bits and bytes of complex data
structures. Incidentally, if these assertions were true, the
developers should not be using Cyclone instead of C. Cyclone
introduces several kinds of pointers, including a fat pointer. A fat
pointer carries the range information, needed for run-time bound
checks. A fat pointer is not merely a pointer, yet its details are
opaque to developers. A dereference operation via a fat pointer is not
atomic (which creates problems for using Cyclone in multi-threaded
programs).

The idea of a never-null pointer and of a static reasoning of these
pointers is interesting. Let's consider:
	 int @f = x;
where '@' marks a never-null pointer, similar to '*' marking a regular
pointer. An assignment or an initialization of a never-null pointer
entail an _implicit_ run-time nullarity check. However, if the value
to assign is known to the compiler to be a never-null pointer, no
run-time checks will be inserted. Note that the Cyclone compiler adds
implicit run-time checks -- which make the language safer but remove
the programmer from "the bare metal." If a programmer doesn't mind an
abstraction of the bare metal, the programmer then would be more
comfortable with a more expressive language like OCaml, for example,
or Java.

Performance: from 0 overhead to 3x slowdown, in pointer-intensive
applications. After the talk, several persons noted that 3x slowdown
is unacceptable. Cyclone still does not support multi-threading nor
completely manual memory management (Cyclone supports arena management
and GC?).

One person asked a funny and witty question: with never-null pointers
(marked by '@' rather than the regular '*'), fat pointers (marked by
'?')  and contemplated forward-only pointers, won't the designers of
Cyclone run out of characters to mark these pointers? Have they
considered Unicode?
	   http://www.cs.cornell.edu/projects/cyclone/


* The AGFL Grammar Work Lab
Cornelis H.A. Koster and Erik Verbruggen, KUN.  Freenix track.

The talk has described the first _free_ parser generator for natural
languages, under the GNU public license (GPL/LGPL). The parser
generator takes a grammar and a lexicon and yields a parser. The
latter reads the text and converts it into a Head-Modifier (H-M) form.
Generated parsers can be used commercially royalty-free.

The H-M form is a set of pairs
        [head, modifiers]
where 'head' and 'modifiers' are (possibly empty) strings of
words. For example, the phrase "new walking shoes" is translated into
        [N:shoes, A:walking]
        [N:shoes, A:new]
 
Both the head and the modifiers are typed: N (for noun), P (pronoun),
A (adjective), V (a verbform). H-M pairs represent _major_ relations
expressed in the text; "unimportant" words are eliminated. The parser
can also normalize the tense of verbs. Here's a more elaborate
example:
 
        "Every PhD student gets a reduced price for the ETAP
conference"
 
is parsed into:
 
        [N:student, N:PhD]  [N:student, V:gets]
        [V:gets, N:price]   [P:it,V:reduced]
        [V:reduced,N:price] [N:price, for N:conference]
        [N:conference, N:ETAP]  

The conversion from a passive particle phrase "a reduced price" to an
active verb construction [P:it,V:reduced] [V:reduced,N:price] is
remarkable. The original phrase did not say who authorized the
reduction of the price for the conference. The parser has determined
that the actor is missing, and inserted a dummy actor 'it':
[P:it,V:reduced].
 
H-M pairs produced by a parser can be used for text classification,
information retrieval, and question answering systems. The parser can
also be used in educational software. In fact, the parser has been
used by the authors in a text (patent application) classification
system called Peking.

The AGFL package comes with a sample grammar and a lexicon, of
English. AGFL stands for an "affix grammar over a finite lattice",
which is a _context-free_ grammar with set-valued attributes.  The
parser is non-deterministic, and therefore is not fast. It is,
however, robust: it does not crash when is presented with unknown
words or an incorrect phrase. Again, the package is free.

The AGFL package is available at:
        http://www.cs.kun.nl/agfl/
 
The grammar of English is very illuminating:
        http://www.cs.kun.nl/agfl/npx/grammar.html

Given an ambiguous phrase, the parser currently returns only one
parsing alternative (chosen more or less arbitrarily and
heuristically). Version 2.0 will incorporate statistics and
probabilities. The parser is more scalable and faster than DCG is
Prolog. 

The system is written in a CDL3 language. The CDL system is also a
parser generator, which yields fast _deterministic_ parsers.

When I asked the speaker to compare his approach with a "dominant"
pure statistical approach, he said that extremes are rarely
sufficient. Statistical approaches need a lot of statistics to
represent the basic facts of the language (e.g., the
Subject-Verb-Object order of English). The latter can be more
succinctly and conveniently expressed by rules. It's more productive
to use rules for most of the grammar, and statistics to resolve
ambiguities.

Cornelis H.A. Koster conducted a BoF on natural language
processing. The main topic of the BoF was again the statistical
vs. rule-based classification and retrieval of documents. He mentioned
an interesting fact that stemming and singular/plural normalization do
not improve classification much. One of the drawbacks of the
pure-statistical approach to the information retrieval is that the
relevance score is highly non-linear. A relevance score of 99% tells
that the document in question is most likely highly relevant to a
query. OTH, a relevance score of 70% often means that the document has
nothing to do with the topic. The precision and the recall of pure
statistical classifications and retrieval reach a plateau. You need to
add the rules of the grammar to get over the plateau. Dr. Koster has
briefly described his document classification system Peking. The
documents to classify are patent application. First he parses the
document into HM pairs as outlined above. The system collects the
statistics of HM pairs and winnows out stiff and noisy terms. What's
left are significant pairs, which describe "the essence" of a set of
documents.


* Plan9 and Inferno
As the year before, Vita Nuova had a booth at USENIX Expo. Two
exhibitors were constantly busy demonstrating the two remarkable
operating systems. Inferno now runs natively on a iPAQ.

Inferno supports only one language: Limbo. Because the language is
safe, Inferno does not need a kernel-user-space boundary. This makes a
system fast. A Limbo virtual machine, DIS, is rather similar to
JVM. It's possible to map (all?) JVM codes to DIS. Alas, DIS does not
support tail-recursion.

Roger Peppe, one of the exhibitors, showed a Tk emulation in Limbo.
Essentially, he wrote a Limbo/Tk. He also showed an amazingly transparent
networking in Inferno/Plan9. A serial port in Plan9 is represented by
several files. You can control the port with the regular 'echo'
command. He demonstrated a driver for a digital camera, written in
"shell" (in Limbo). The interface to the driver is a file system
again. In Plan9/Inferno, it's possible to create 'virtual files' which
are associated with a pair of handlers rather than with a storage.
When you load the camera driver, you see a set of files in a directory
you designate. Doing "echo zoom 0.5 > kodak/ctl" changes the zoom
factor. You can see the pictures by executing "ls -l kodak/img/" at
the shell prompt. Roger Peppe showed a GUI, again written in shell
(Limbo/Tk to be precise) that talks to the camera by reading and
writing to the files in the kodak/ directory. You can mount the kodak
directory on another computer: iPAQ that was connected to the Plan9
laptop via a wireless link. You can indeed control the camera from the
iPAQ. The transparency of networking in Plan9/Inferno is astounding.

The underlying protocol, Styx, is well thought out and concise. It
takes fewer than 10 pages to completely describe it. OS developers
could greatly enhance their systems if they implement Styx.

Distributive programming in Limbo is based on channels -- like in
communicating sequential processes. Threads are synchronized through
rendez-vous, like in Ada. From the distributive system point of view,
Limbo programming looks surprisingly similar to Erlang.

Plan9 and Inferno binaries are freely available, even for commercial
use. The source, the core, costs money. The source cannot be freely
distributed, but it can be modified and used in a commercial
application royalty-free. Changes do not have to be reported back to
Vita Nuova. See the FAQs at www.vitanuova.com


[Below are three summaries from USENIX02 that talk about threads -- 
and  programming language's support for threads -- or the lack of it.
The common theme is that threads are really a bad idea, just as 
John Ousterhout said in 1996.]

* Cooperative Tasking Without Manual Stack Management
Atul Adya, Jon Howell, Marvin Theimer, Bill Bolosky, and John Douceur,
Microsoft Research. General track.

The talks has stirred up a lively debate, which continued after the
talk and all through the lunch.

There are two major concurrency models: multi-threading and
event-driven computations. As Ousterhout pointed out, threads are a
bad idea, for most of the purposes.

The acrimony between the two models is caused by conflating several
concepts. The talk took time to explain different axes of a concurrent
application.

** task management
*** serial task management.
On one end is a serial task management: a task executes on a CPU until
it is finished. No other task is scheduled until the first task
completes. This strategy is the easiest to reason about; yet it is
inefficient as the CPU idles when the task waits for i/o completion.

*** preemptive task management
Execution of tasks can interleave on a uniprocessor or overlap on
multiprocessors. This strategy seems efficient, until we consider the
communication between tasks. If tasks are completely separate and do
not share any data, there is no danger of them being interleaved or
overlapping [see a data partitioning axis below]. However, the tasks
do communicate. In the threaded model, tasks share the same address
space and communicate by updating shared variables.

With pre-emptive multi-threading, a thread is generally never sure
that it is working with consistent data structures. A thread can be
pre-empted at any moment. A newly scheduled thread may alter the data
structures. To be sure of consistency of its data, a thread must lock
the data structures and then verify (and if needed, restore)
invariants. However a thread must not hold locks for a long
time. Furthermore, to avoid deadlocks, the thread must release locks
in certain situations (when it failed to acquire all needed locks, for
example or before doing a slow i/o). Once the thread released the
locks and then re-acquired them, it must verify the invariants all
over again.

*** co-operative task management
This is a compromise between the serial and the pre-emptive
management. Unlike the serial task management, a task may surrender
the CPU before the completion and let another task (thread) run in its
address space. Unlike the pre-emptive management, a thread yields
control on its own accord, at well-defined points in its execution.

Cooperative multi-threading does not eliminate the problem of
consistency of data structures. After the thread yielded the control
and then regained it, the thread has to verify all
invariants. However, a thread yields at the points defined by its
programmer. Between these points, the computation can be assumed
serial. No locks are needed to guarantee the consistency of thread's
data. Therefore, the cooperative multitasking model is significantly
less error-prone than the pre-emptive one. The cooperative
multitasking often has a better performance, too, due to the
elimination of the locking overhead. The work on the Flash web server
and the last year USENIX presentation ["High-performance memory-based
web servers: kernel and user-space performance" by Philippe Joubert et
al] showed that multi-threaded web servers are not competitive against a
finite state machine with an efficient event notification.

** stack management
Stack management is making sure a thread can access its local data
after the thread has lost and then regained the control. Automatic
stack management is the one performed, for example, by C thread
libraries. A thread library saves the stack and other context of the
thread that has lost control, and restores the context for the thread
scheduled to run next. Manual stack management is when a programmer
must pack a callback and all the data for it in a closure. The
programmer thus manually creates a continuation frame -- to be
executed when an event arrives.

Other axes are: i/o management (synchronous vs. asynchronous),
conflict management (optimistic vs. pessimistic; semaphores
vs. monitors), data partitioning. The paper explicitly considers only
the task and stack management axes.

In a multi-threaded application, pre-emption of a thread is
transparent to a programmer. The thread library or the OS take care of
saving and restoring thread's stack. OTH, an event-driven application
often must explicitly specify which function to execute when an event
arrives and which data to pass it. These data must be manually packed
in a closure. The manual stack management rips a C function apart and
leads to a code that is difficult to maintain. The talk gave several
good illustrations of horrors of event-based programming in a language
that does not natively support closures and continuations.

The paper mentions that the transparency of thread's pre-emption and
resumption is actually a drawback, which makes multi-threaded programs
difficult to reason about. The paper contends that the immensely
convoluted style of typical event-passing programs has nothing to do
with the cooperative task management. The convoluted style is the
consequence of the manual _stack_ management. The presenter
specifically mentioned that Scheme doesn't suffer that style
problem. In Scheme, event-based programming looks elegant and natural.

As the paper says, "some languages have a built-in facility for
transparently constructing closures; Scheme's call-with-current
continuation is an obvious example. Such a facility obviates the idea
of manual stack management altogether. The paper focuses on the stack
management problem in conventional languages without elegant
closures."

The rest of the paper and of the talk showed techniques to make
cooperative-multitasking programming, specifically in C, more
pleasant. The paper demonstrated adapters that allow mixing of
cooperative-thread and preemptive-thread code (written by disparate
programmers). Note a USENIX01 talk "A toolkit for user-level file
systems" by David Mazieres, who used C++ templates to automate
building of continuations.

The solution advocated in the talk is to use co-routines based on
"fibers". I think though that subliminally the message of the talk was
to use Scheme.


* Improving Wait-Free Algorithms for Interprocess Communication in Embedded Real-Time Systems
Hai Huang, Padmanabhan Pillai, and Kang G. Shin, University of
Michigan.  General track.

This talk, perhaps unwittingly, has confirmed the previous talk's
observation that threads are really a bad idea.

The gist of a threading model is a (conceptually) parallel execution
of computations that communicate via shared data.  This model is
fault-prone, deadlock-prone, priority--inversion-prone; the model
exhibits a poor scalability due to locking. The model is hard to
reason about, e.g., it's hard to compute the worst execution time of a
multi-threaded task. The latter is a particular drawback in real-time
(RT) systems.

A better IPC model is a wait-free inter-process
communication. Concurrently running tasks can communicate through a
shared state -- but in a way that requires no locking and still
guarantees race-free execution. The drawback is bigger space
requirements.

The talk has described a memory efficient wait-free algorithm for a
single-writer multiple-reader problem. The algorithm is a combination
of two kinds of non-blocking protocols. One is a non-blocking write
(NBW) protocol, which is as fast as a simple write to shared memory
without any synchronization. Another class of lock-free write/read
protocols relies on the presence of certain atomic operations:
compare-and-set or increment or decrement. Both classes of the
lock-free protocols trade space for safety.

The interesting result of the paper is that by combining the two
classes, it's possible to design a fast lock-free protocol that takes
less memory that each of the original algorithms separately.  The
insight is an appropriate partitioning of reader tasks into slow and
fast readers. The fast readers and the writer interact via a NBW
protocol; the slow readers interact via one of the protocols of the
second class (e.g., a double-buffering protocol). The hybrid protocol
takes less space than either NBW or the double-buffering protocol
applied to all readers; The hybrid protocol is faster than the
double-buffering algorithm alone.

The paper and the talk showed the results of several benchmarks. The
graphs demonstrated that the best locking solution to the
single-writer multiple-readers problem has the least space
requirements but absolutely horrible worst-case performance.

* An Implementation of Scheduler Activations on the NetBSD Operating System
Nathan J. Williams, Wasabi Systems Inc.  Freenix track.

The talk started with an overview of three implementation models of
threads. In one model, threads exist only at the user level. All
threads run within the same kernel process. The implementation is
fast; however, when one thread enters the kernel or blocks, all other
threads within the same process are blocked too. On the other extreme
are kernel threads. Like processes, kernel threads are scheduled or
blocked independently. Unlike processes, different threads may share
the same address space, job identification, user credentials,
etc. Although thread switching is faster than process switching, it is
still relatively expensive. What's more important, each kernel thread
takes kernel resources, which are limited. Kernel thread
synchronization involves a system call and is therefore expensive.

The third model -- which is the topic of the talk -- is a two-level
implementation. N kernel threads run M (M>=N) user threads. Thread
synchronization and switching occurs at the user level. When a kernel
thread executing a user thread blocks on an i/o or other system call,
the kernel "sends a signal" to an application, telling it that a user
thread has blocked. The signal will be handled by the remaining kernel
threads assigned to an application. If there are no other threads to
handle the "signal", the kernel may launch a new thread and give it to
the application. The "signal" handler (called an upcall handler) will
mark the user thread that got itself blocked as non-runnable and will
switch to another runnable user thread if any. This approach implies a
co-operation between the kernel and the user application. The kernel
scheduler notifies the application about scheduling decisions that
affect the application: creation and deletion of kernel threads, a
thread becoming blocked, unblocked, or pre-empted. In the latter case,
the pre-emption event is delivered to an application _after_ it was
resumed (so the application cannot stave off the pre-emption but still
can know that it was off the CPU for some time). The NetBSD
implementation described in the talk uses so-called schedule
activations. Scheduling event notifications are called upcalls.  They
behave pretty much like POSIX signals. An application registers its
intent to be notified of scheduling events by registering an upcall
handler. The application should also allocate a set of stack frames to
run the handler.

The implementation described in the talk looks surprisingly similar to
call/cc-based threads/co-routines. When the kernel executes an upcall,
it runs a special trampoline code in the user space. The trampoline
saves the stack pointer and the context of the previously running
code, sets the stack frame and jumps to the upcall handler.  Upcall
handlers are supposed to "switch" to an appropriate user thread,
hence, they do not return. To avoid running out of stack frames for
upcall handlers, an application must determine which stack frames are
no longer "relevant" and manually "garbage collect them" (i.e., mark
them as free). The talk clearly showed kernel hackers re-discovering
call/cc.

Scheduler activations will be in the next release of NetBSD. The
maximum number of concurrent threads ran without encountering resource
limitations was 7000-8000.

After the talk, Clem Cole commented that they have to support more
threads and have to spend the enormous amount of time on
debugging. Clem Cole said that DEC/Compaq thought they got threads
right in their True64 OS. Until they tried to run a Netcenter, which
launched 10,000-100,000 threads. A lot of bugs showed up then.


* Biglook: A Widget Library for the Scheme Programming Language
Erick Gallesio, University of Nice; and Manuel Serrano, INRIA.  Freenix track.

Like Tcl, Biglook relies on a programming language for building
graphical user interfaces. It uses a CLOS-like object layer [generic
functions rather than class methods] to represent widgets, and Scheme
closures to handle events. Biglook is a (Bigloo) Scheme
library. Biglook code can be compiled by a Bigloo compiler into C or
JVM byte code.  Biglook implemented a full-fledged IDE for Scheme:
Bee.

Biglook is a thin wrapper on the top of native back-end
libraries. Currently, two native widget libraries are supported: GTK+
and Swing. The wrapping overhead is quite low: about 15% of additional
allocations.

The paper stresses several points that greatly contribute to Biglook
expressiveness: 'virtual slots' in records, keyword arguments, and
closures as call-backs.

Virtual slots is the mechanism to relay actions on widget classes to
the back-end widget library. Virtual slots also make the API concise
and expressive. For example, three classical functions 'iconify',
'de-iconify' and 'window-iconified' are represented in Biglook by a
single virtual slot 'visible' in a window class.

The variable number of arguments and keyword arguments are essential
to declarative GUI programming. Without them, the creation of a widget
must be split into two steps: the instantiation of a widget and the
setting of its attributes. An alternative, passing of a list of
attributes to a widget constructor, involves a run-time overhead and
disables compile-time checking of attributes and their values. In
Biglook, keyword parameters are resolved at macro-expansion (i.e.,
compile) time. A macro instantiate::widget (which is automatically
generated from widget's class declaration) does all the work of
resolving keyword attributes. Other automatically generated macros are
getters and setters, and a 'with-access::widget' binding form.

During the talk, Manuel showed several GUIs and their complete code,
which was only 10-lines long or shorter.

Of a special note is a comparison of Biglook, Tcl/Tk, Swing and GTK on
an example of a widget with several buttons. In Biglook, the creation
of a button, its packing and the setting of the event handler are all
part of the same expression. In Tcl/Tk, the code that deals with a
button is spread out in three places. In Swing, the same code is split
in 4 places. In GTK, in 5 places. The paper notes that in the
experience of the authors, the Biglook source code is about twice
smaller than the same interface implemented in Tcl/Tk, and about four
times smaller that the interface implemented in C with GTK+.

     http://kaolin.unice.fr/Biglook