R

Implementing support for references in [R]

October 15, 2002
(Updated: December 6, 2004)

Henrik Bengtsson
Mathmatical Statistics, Centre for Mathematical Sciences
Lund University, Sweden

Table of Content

1. Introduction

The core functionalities of R does not provide references. However, they can emulated using so called environments [1], but such usage will fill the source code with get()'s and assign()'s calls, which will make the code really unreadable. There are also other ways references could be emulated, but for the developer and the end-user it is actually not of interest how it is implemented as long as it intuitive and transparent. For this reason will give some suggestions how such technical details could be hidden from the programmer and also the end user. Our suggestions will be based solely on the S3/UseMethod class model, which was supported by R already in the early days.

In all UML charts [2] and in all the source code we will use the coding convention suggested in [3]. For example, a field name ending with a . (period), denotes a private field. However, we will not provide an implementation that restricts access to private fields, but such a functionality is not too hard to add.

2. Work example

Through out this document we will use a model of persons, who are fully described by their names and their ages. We will work with references to objects of class Person. A Person object has a name and an age attribute, whic can be accessed and modified using get-() and set-() methods. Information about a person can also be displayed through the print() method. See figure below for an UML representation of this class.


Person
name.: string
age.: double
getAge(): double
setAge(double)
getName(): string
setName(string)
print()


Figure: UML represenation1 of the class Person, which is used as working example through out this document. For this simple example a person has a name and an age and these properties can be accessed and modified using get-() and set-() methods. Information about a person can also be displayed through the print() method.

Given a complete support for references and a full implementation of the Person class, we would like to create an object of class Person and then use two different references, which refers to this object, to display and update the information about the person. Here is an example using references:

# Create an object of class Person and return a reference to it.
p <- Person("Dalai Lama", 66);

# Make another reference to this object.
p2 <- p;

print(p);         # [1] "Person: Dalai Lama is 66 years old."

setAge(p2, 67);

print(p);         # [1] "Person: Dalai Lama is 67 years old."

p3 <- clone(p);

setAge(p3, 55);

print(p);         # [1] "Person: Dalai Lama is 67 years old."
print(p2);        # [1] "Person: Dalai Lama is 67 years old."
print(p3);        # [1] "Person: Dalai Lama is 55 years old."


Download: PersonEx.R

The problem is of course that we currently can not work with references and neither do we have a implementation of the class Person. However, the latter problem is easy as soon as we have a way of using references. The next few section will discuss how this could be done in a safe, encapsulated and reusable way.

3. Design

Instead of implementing functionalities for references from scratch each time it is needed, we can do it once and for all using object oriented design. We will simply define a root class that will encapsulate all the necessary code for reference functionalities and enforce all other classes to extend this class or one of its subclasses. Inspired by Java [4], we will call this root class Object. It might be a little bit confusing calling a class Object, but as you get used to it you will actually find it convenient since you can refer to all objects as "Objects" without having to say "object of class so-and-so".

All object, not only in R, belongs to a class and therefore it is reasonable to have a way to query the class of an Object (note the usage of the name of the root class). We let the method getClass() return the class of the object. Note that it only returns a single string, not all the names of all super classes too, like class() would do. As with all other standard object in R, Object should also be printable using print() and they should also be possible to be converted into a character string using as.character(). Furthermore, we will simply let the former method take the output from the latter and just print it on the standard output. This behavior can of course be overridden by subclasses. Since we will deal with references we can not just do p2 <- p to create a copy of the object refered to by p (because then the whole beauty of references would be lost). Instead we need a method of copying the actual object, not just the reference. Each Object should have a method clone() that returns a copy of itself.

In addition to this, it should be possible to get and retrieve any fields defined by the class of the object using the list operator $. Note that due to language restrictions it currently not possible to use the S4 slot operator for the same purposes. The root class Object does not define any fields, but the subclasses will certainly do and therefore the implementation of the $ and $<- operators will be encapsulated by the root class.


Object

as.character(): string
clone(): Object
getClass(): string
print()
$(name): ANY
$<-(name, value): Object


Figure: UML class diagram1 of the root class Object, which will implement and hide the basic functionalities for reference support.

4. Implementation

As mentioned earlier there are several different ways of emulating references. One such way is the use of environments, which provides a protected and encapsulated region in memory with its own name space.

4.1 Using an environment directly

For example:

p <- new.env();
assign("name", "Dalai Lama", envir=p);
assign("age", 66, envir=p);

p2 <- p;

assign("age", 67, envir=p2);

print(get("age", envir=p));   # 67

The above example shows how environments can be used for, except from the fact that a class has still not been assigned to the object. We could try the standard class(p) <- "Person";, which applies to almost all R object. Unfortunately, environments are part of the exceptions. First it would look like it works, but trying for instance to quit R and saving the workspace, i.e. quit(save="yes"), then restarting R, and we will find that for instance unclass(p) will not work. Neither the S3/UseMethod method dispatching will work correctly. See [5] for a discussion on this problem and why the above approach is incorrect.

However, there is a simple workaround, which will work correctly. By simply wrapping the environment up in a list or in an attribute we can get the same functionality.

4.2 Encapsulating an enviroment in a list or in an attribute

Wrapping the environment up in a list or in an attribute will give exactly the functionality, but running the following short test code it turns out that it faster to retrieve the environment variable if it is stored as an attribute, i.e. attr(ref, ".env"), than if is stored as a list entry, i.e. ref$.env or ref[[".env"]]. By storing the enviroment as an attribute we also do not restrict ourselves to the list data type, but we can also let a function definition to act as a reference. This will indeed be used for static classes and the access to static fields and static methods.




4.3 Source code for the Object class

Here is a fully functional code listing of an Object class as described above and provides reference functionality.

 # Defines the class Object and is it constructor.
 Object <- function() {
   # Create a new environment and wrap it up as an attribute to the
   # smallest R object available, that is, NA.
   this <- NA;
   attr(this, ".env") <- new.env();
   class(this) <- "Object";
   this;
 }

 # Returns a string representation of the Object. By default it
 # is "{class name}: 0x{memory address}".
 as.character.Object <- function(this) {
   # getAddress() is a private method for extracting the pointer address
   # where the environment situated all to be able to distinguish one
   # Object from another. Note really necessary though.
   getAddress <- function(this) {
     con <- textConnection("pointer", open="w");
     on.exit(close(con));
     sink(con);
     print.default(attr(this, ".env"));
     sink();
     pointer <- substr(pointer[1], 15, 21);
     pointer;
   }

   paste(getClass(this), ": 0x", getAddress(this), sep="");
 }


 # Clone the Object by copying all of its content.
 clone.Object <- function(this) {
   # Copy the reference.
   clone <- this;

   # Create a new environment, i.e. a new Object.
   clone.env <- new.env();
   attr(clone, ".env") <- clone.env;

   this.env <- attr(this, ".env");

   # Copy all variables in the environment.
   for (name in ls(envir=this.env, all.names=TRUE)) {
     value <- get(name, envir=this.env, inherits=FALSE);
     assign(name, value, envir=clone.env);
   }

   clone;
 }

 clone <- function(...) UseMethod("clone");      # New generic function.

 # Get the class name of the Object (not its superclasses).
 getClass.Object <- function(this) {
   class(this)[1];
 }

 getClass <- function(...) UseMethod("getClass");   # New generic function.

 # Print information about the Object to standard output.
 print.Object <- function(this, ...) {
   print(as.character(this));
 }

 # Map ${name} to return the value of the field {name}, which lives
 # inside the private environment.
 "$.Object" <- function(this, name) {
   get(name, envir=attr(this, ".env"));
 }

 # Map ${name} <- {value} to assign {value} to field {name}, which
 # lives inside the private environment.
 "$<-.Object" <- function(this, name, value) {
   assign(name, value, envir=attr(this, ".env"));
   this;
 }

Download: Object.R

Note: For safer methods of creating generic functions, see [6].

4.4 Providing inheritance

We are now almost done, but it would be nice if there were an easy way to make on class inherit another. With do this by adding another method for class Object, but it should not be though of as a method like others and therefore we do not list it in the UML diagrams.

# Create instance of class {classname}, by taking another Object,
# add {classname} to the class list and add all the named values
# in ... as fields to the new Object.
extend.Object <- function(this, classname, ...) {
  fields <- list(...);
  names <- names(fields);
  for (name in names)
    assign(name, fields[[name]], envir=attr(this, ".env"));
  class(this) <- c(classname, class(this));
  this;
}

extend <- function(...) UseMethod("extend");

Download: Object.R

5. Work example continued

Given then implementation of the Object class it is now straight forward to create a new class Person, which has the same core functionalities as Object, and in addition has new fields and methods.

# Class Person extends Object.
Person <- function(name, age=NA) {
  extend(Object(), "Person", 
    .name=name,
    .age=age
  );
}

as.character.Person <- function(this) {
  sprintf("%s: %s is %.0f years old.", 
          getClass(this), getName(this), getAge(this));
}

getName.Person <- function(this) {
  this$.name;
}

getName <- function(...) UseMethod("getName");

setName.Person <- function(this, newName) {
  this$.name <- newName;
}

setName <- function(...) UseMethod("setName");

getAge.Person <- function(this) {
  this$.age;
}

getAge <- function(...) UseMethod("getAge");

setAge.Person <- function(this, newAge) {
  this$.age <- newAge;
}

setAge <- function(...) UseMethod("setAge");
Download: Person.R

6. All the source code

The code in this document has not been optimized or tested thoroughly. For a well tested and evaluated version of the Object class, see package R.oo [7] below. It also contains a fully functional version of the Persons example. See library(R.oo); example(Object).

References

[1] R Development Core Team, R Language Definition, ISBN 3-901167-56-0.
http://cran.r-project.org/manuals.html
[2] UML Resource Page, Object Management Group, 2002.
http://www.omg.org/uml/
[3] Henrik Bengtsson, R Coding Conventions (RCC) (draft), Mathematical Statistics, Centre for Mathematical Sciences, Lund University, Sweden.
http://www.maths.lth.se/help/R/RCC/
[4] B. Joy et al, The Java Language Specification, Second Edition, Addison-Wesley Pub Co, ISBN: 0-201310-08-2, 2000.
http://java.sun.com/docs/books/jls/
[5] Henrik Bengtsson et al, The class attribute on an environment seems buggy (PR#2159), r-devel mailing list.
http://www.r-project.org/nocvs/mail/r-devel/2002/1478.html
[6] Henrik Bengtsson, Safely creating S3 generic functions using setGenericS3(), Division for Mathematical Statistics, Centre for Mathematical Sciences, Lund University, Sweden, 2002.
http://www.maths.lth.se/help/R/
[7] Henrik Bengtsson, The R.oo package - Object-Oriented Programming with References Using Standard R Code. In Kurt Hornik, Friedrich Leisch and Achim Zeileis, editors, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20-22, Vienna, Austria.
http://www.ci.tuwien.ac.at/Conferences/DSC-2003/



1Since the HTML format does not provide an easy way of connection components with lines and arrows, UML connections are replaced by textual connectors. The inheritance rule is specified by prepending the classname with the name of the superclass. The consists of connector is specified by a field specification (as it is actually done in the code).