Multi-Threading and S

A thread is a stream of control within an application that executes a sequence of instructions. Traditional applications have single thread. A multi-threaded application consists of one or more threads that execute concurrently. On a single processor machine, the threads are shceduled by a library or by the kernel to run in different time slices. On multi-processor machines two ormore threads can be executing simultaneously as scheduled by the kernel. The benefit of threads is clear on a multi-processor machine. Potentially, a task made can be divided into subtasks, each of whih is executed on its own processor and the time taken to complete the overall task reduced. On a single processor machine, threads offer significant benefits. These include improved performance since when one task must wait for a service (e.g. disk access) to become available, another task can proceed. Also, since threads are scheduled by a 3rd party, development of independent tasks is greatly simplified (e.g. multi-source event loops).

There is no ``free lunch'' and the advantages of threads are offset by the need for synchronization between threads. Unlike traditional applications with a single thread, the facility to be performing multiple instructions simultaneously introduces the need for ensuring that certain sub-tasks be serialized or at least not simultaneous. A thread implementation must provide synchronization mechanisms to ensure data validity.

We have added thread support to the S language. This is a persistent, functional, object oriented language primarily used for data analysis. The work consisted of adding an S level interface to the Pthreads routines and modifying the S internal C level code to make it thread-safe and re-entrant. The design of the S task/thread mechanism constituted a significant part of the work and is being used to develop a distributed version of S.

The work was completed on a single-processor Intel-based machine running Linux (2.0.14). We used Chris Provenzano's Pthreads user-level library to provide the thread support. We are now testing this on multiple processor machines (2 dual Pentium II machines, a 4 processor Pentium Pro machine all running a Linux SMP kernel & 2.0.30 and a dual processor Sun 450 running Solaris.) In the near future, we will also employ an SGI Origin 2000. The Solaris box allows us access to both Pthreads and the UNIX threads interfaces.

This work - including examples, S level documentation and the techincal description of the underlying implementation - is described in my Ph.D. dissertation. HTML versions of the documentation are available here. This API is not guaranteed to be supported exactly as-is in the future. It is unlikely to undergo significant change. Sed caveat emptor.

Parallel and Distributed Computing

There are obvious similarities between parallel computing using multiple processors (and a shared memory system) and distributed computing across multiple machines (with separate memory). We are now turning our attention to distributed computing in its general form and are currently using CORBA. It turns out that the architecture for user level threads in S is similar to the CORBA architecture. Similalry, the user level API for threads will be mirrored for distributed computing allowing users to develop code for either parallel or distributed systems without caring which is used (except for tweaking performance).
Duncan Temple Lang<>
Last modified: Fri May 22 08:58:58 EDT 1998