An R/S-Database Interface

Background

For too long S and R have lacked good interfaces to relational database systems (RDBMS). For the last twenty years or so these RDBMS have evolved into highly optimized client-server systems for data storage and manipulation, and currently they serve as repositories for most of the business, industrial, and research ``raw'' data that analysts work with. Other analysis packages, such as SAS and SPSS, have traditionally provided good data connectivity, but S and GNU R have mostly relied on intermediate text files as means of importing data. Although this simple approach works well for relatively modest amounts of mostly static data, it does not scale up to larger amounts of data distributed over machines and locations, nor does it scale up to data that is highly dynamic -- situations that are becoming increasingly common.

We want to propose a common interface between R/S and RDBMS that would allow users to access data stored on database servers in a uniform and predictable manner irrespective of the database engine. The interface defines a small set of classes and methods similar in spirit to Python DB-API, Java's JDBC, Perl's DBI, and Microsoft's ODBC, but it conforms to the ``whole-object'' philosophy so natural in S and R.

Example

   mgr <- dbManager("Oracle")  
   con <- dbConnect(mgr, user = "user", passwd = "passwd")
   rs <- dbExecStatement(con, "select fld1, fld2, fld3 from MY_TABLE")
   tbls <- fetch(rs, n = 100)
   hasCompleted(tbls)
   [1] T
   close(rs)
   rs <- dbExecStatement(con, "select id_name, q25, q50 from liv2")
   res <- fetch(rs)
   getRowCount(rs)
   [1] 73
   close(con)
Similar scripts should work with other RDBMS, (say, MySQL) by replacing the first line with
   mgr <- dbManager("MySQL")

As a test case, we have an implementation of this interfaces for the open source MySQL database system. An Oracle implementation is in progress.

A Proposal

You'll find our current proposal here. Please send us your comments and suggestions preferably to both the omega development and the R development mailing lists. See the Omega Project for Statistical Computing for the most up-to-date information, and, http://www.omegahat.org/download/contrib/RS-DBI for the latest software. Alternatively, the source is available via anonymous CVS access in the module RS-DBI from

       :pserver:anoncvs@www.omegahat.org:/home/anoncvs/src
using the password anoncvs.

Other Database Connections from R

The following R implementations pre-date our proposal for a common interface.