Implementing a modern object system using C++
=============================================


Recently I've been working on programming language design and
implementation.  My inspiration was drawn from an unpublished
experimental version of XEmacs based on C++ instead of C.
Naturally, I discovered that I didn't understand either object systems
or C++ deeply enough, so I digressed to do object system research.

One valuable way to approach language and library design is to try to
map the programming paradigm of one programming language into the
actual facilities available in another.  C++ is particularly
interesting because it has extremely powerful mechanisms intended to
enable a plethora of programming styles, yet these mechanisms are hard
to use and poorly understood (even by their designers).

An example of the approach I'm thinking of is FC++
(http://www.cc.gatech.edu/~yannis/fc++/), an attempt to enable
efficient Haskell-style programming in C++.

One focus of my own research has been to make programming styles and
idioms from other programming languages, in particular Lisp, Java, and
C#, possible in C++.  I'm trying to combine the advantages of
parametric polymorphism, subtype polymorphism, dynamic typing, and
value types.

My basic idea is very simple: provide C++ types with reference
semantics instead of copy semantics.  Free the client from having to
use pointers to kludgily implement reference semantics by hand.  Don't
make internal representations available to the user.  The problem with
"smart pointers" is that they still act like pointers.  Instead we
want to use "smart references".

Here is an example of Lisp-style programming that is possible:

static void lispy ()
{
  assert ("(1 2)"   == toString (cons (1, cons (2, nil))));
  assert ("(1 . 2)" == toString (cons (1, 2)));

  Object x = list (1, 3.4, "foo", true);
  assert (toString (x) == "(1 3.4 foo 1)");
  assert (car (cdr (x)) == 3.4);
}

To the convenience of lisp-style programming we want to add Java-style
static typing, subtype polymorphism, and the ability to easily add new
types to the framework.  Static typing gives us better compile-time
error checking and better performance.

(I don't believe the claims of some of the XP crowd that type errors
should all be caught using extensive testing.)


static void java_y ()
{
  Array<ISequence> seqs (4, String ("?"));
  seqs [0] = String ("foo"); // OK - a String is a Sequence
  seqs [1] = Array<int> (5); // OK - an Array is a Sequence
  // seqs [2] = Int (3);     // COMPILE ERROR - an Int is not a Sequence
  // seqs [2] = Object (3);  // COMPILE ERROR - upcasts must be explicit
  //seqs [2] = ISequence (Object (3)); // RUNTIME EXCEPTION - cast failed

  // Runtime polymorphism
  assert (seqs.length () == 4);
  assert (seqs[0].length () == 3);
  assert (seqs[1].length () == 5);
}

The smart references that are the foundation of the object system are
implemented using a novel C++ implementation technique.

What's wrong with Smart Pointers
================================

They don't hide implementations.

Implementing Smart References in C++
====================================

The C++ community is all excited about smart pointers, but the idea of
smart references seems to have died 10 years ago.

Smart references are an old idea.  Kennedy's

Object-Oriented Abstract Type Hierarchy (OATH) (1990)

http://www.desy.de/user/projects/C++/products/oath.html

advocated smart references in C++.  The toolkit was not developed
further and did not inspire other developers.  Except for me.

Smart references are similar to smart pointers in that copying a smart
reference is implemented as a pointer copy, but the pointer nature of
the variable being manipulated is hidden from the user.  A user never
has to dereference the smart reference to use it, as is necessary with
a smart pointer.  This is exactly the semantics of Java or Lisp object
references.

STL is different - it defines copying semantics.  A copy of an STL
aggregate object like vector does not alias the original object.  We
want the Java aliasing semantics, at least for objects with identity.

A fundamental problem with smart references in C++ is the static
overloading problem, for which we develop a solution.

Say we have an inheritance hierarchy

Object <= Number <= Int
Object <= Number <= Double
Object <= String
...

which we want to implement using smart references.  Say we have a
static overloaded function f

void f (Object) { ... };
void f (Number) { ... };

f (Int (3));

We would like the call to f to resolve statically to f (Number),
because Number is more derived than Object.

The obvious way to implement these semantics in C++ is to have the
smart reference classes Object <= Number <= Int actually inherit from
each other in the C++ language sense.

But this is not typesafe.  Unsafe code as follows can be written:

void f (Object & o) { o = Int (1); }
void g () { String s ("?"); f (s); /* s is now no longer a String */ }


Kennedy "proved" that you cannot implement the semantics you want
while maintaining type safety.

Although it is true that you cannot implement the desired semantics
using the desired syntax above, it is possible to implement them with
a somewhat uglier syntax, which is developed below:

One of the fundamental ideas of strongly typed languages is to encode
as many concepts as possible using the type system.

Since we exclude the possibility of smart references classes directly
inheriting from each other, we can try to have the base classes of the
smart reference classes inheriting from each other.  We can try to
encode type hierarchy information using an inheritance hierarchy of
empty classes that exist purely to carry type information.

Let's try:

template <class T> class IsA;
template <> class IsA<Object> {};
template <> class IsA<String> : public IsA<Object> {};
template <> class IsA<Number> : public IsA<Object> {};
template <> class IsA<Int>    : public IsA<Number> {};

Now our smart reference classes can inherit from the corresponding
IsA objects:

class Object : public IsA<Object> { ... };
class String : public IsA<String> { ... };
class Number : public IsA<Number> { ... };
class Int    : public IsA<Int>    { ... };

This uses the "Curiously Recursive Template Idiom" popularized by
Coplien.

The IsA hierarchy is a "spine" of empty classes that can be
manipulated using C++ templates.

With the above definitions we can now write

void f (const IsA<Object>& x) { ... };
void f (const IsA<Number>& x) { ... };
f (Int (3));

Now the desired function f (const IsA<Number>& x) is called, since
  IsA<Object> <= IsA<Number> <= IsA<Int> <= Int
even though there is no direct inheritance relationship between Object,
Number, and Int.

This looks nice, but we quickly discover that this is not quite what
we want, since the body of the function

f (const IsA<Number>& x);

cannot access the "real" object, of type Int, that has been passed in.
If all object types are represented the same way, and if free-standing
IsA<TYPE> objects are outlawed, then the body of f can be written as
follows:

void f (const IsA<Number>& x)
{
  Number n (reinterpret_cast<const Number&> (x));
  /* Use n */
};

But we would like to allow different representations for different
classes inheriting from IsA<Number>, and we would like to have the
original type of the argument available.  Pondering on this for a
while leads naturally to the:

"Doubly Curiously Recursive Template Idiom"
===========================================

template <class T, class U = T> class IsA;
template <class T> class IsA<T,Object> {};
template <class T> class IsA<T,String> : public IsA<T,Object> {};
template <class T> class IsA<T,Number> : public IsA<T,Object> {};
template <class T> class IsA<T,Int>    : public IsA<T,Number> {};

class Object : public IsA<Object,Object> { ... };
class String : public IsA<String,String> { ... };
class Number : public IsA<Number,Number> { ... };
class Int    : public IsA<Int,Int>       { ... };

template <class T> void f (const IsA<T,Object>& x) { ... };
template <class T> void f (const IsA<T,Number>& x) { ... };
f (Int (3));

As before, the desired f overload is called.  But now the body of f
can access the true original object, using the invariant first
template parameter, as follows:

template <class T>
void f (const IsA<T,Number>& x)
{
  T t (reinterpret_cast<const T&> (x));
  /* Use t */
};

(Our inheritance hierarchy guarantees that a const IsA<T,Number>& is
really a reference to an object of actual type T.)

Of course, we hide the ugly reinterpret_cast by providing constructors
T::T (const IsA<T,U>& x)
for all appropriate types T,U.

template <class T>
void f (const IsA<T,Number>& x)
{
  Number n (x);  // or: T t (x);
  /* Use n */
};

If the body of function f is large, we can avoid template expansion
code bloat (although a really good C++ implementation would make this
unnecessary) by using forwarding functions as follows:

// Body of function f_Number in .cc file
void f_Number (Number x) { /* lots of code here ... */ }

// Forwarding template in .hpp file
template <class T> inline void f (const IsA<T,Number>& x) { f_Number (x); };

All the library code is written in this style.  Almost all functions
are templates taking arguments of the form const IsA<T,Number>&,
returning "real" objects of a type like Number.

Users of the framework who don't create statically overloaded
functions can usually get away with writing their functions in a more
natural style.  Lazy users can simply program in a lisp-like style by
declaring most variables and parameters of type "Object" and relying
on run-time type checking.

But it's probably better to simply write all functions consistently in
the templated style.  One gets used to it quickly enough.  The
semantics are a superset of Java's static overloading, because the
templated functions keep access to the original type of the parameter,
allowing calls to other templated functions that may statically
distinguish amongst those types.

It would be nice if we could write a saner syntax like

void f (T extends Number) { ... }

We read "f (const IsA<T,Number>& x)" as "f takes any type T which is a
Number".  In other words, this is a direct translation of Java's
admittedly saner "f (Number x)".

It would be nice to sweeten this a little to "f (IsA<T,Number> x)",
but I think the only way to do something like this is via a macro
 #define ISA(T,U) const IsA<T,U>&
which is a little distasteful.

At first, this style of programming is a little odd, but it grows on
one after a while.  What we really have here is a simulation of a
language feature missing in C++, template argument constraints.

Eventually, writing almost every function as a template becomes
natural, because the semantics are identical to Java semantics.  You
just "think Java" while writing the function.


What we have so far is nice, but we can do better.  Currently writing
a class that is part of the framework requires a lot more work than
writing a simple C++ class.  In particular, the implementation of most
classes involves the representation you would expect: a pointer to the
"real" object allocated on the heap.  Think "handle/body" or "pimpl
idiom", depending on what cultural community you are a member.

For value objects, it would be nice if we could get automatic
incorporation into the framework using "boxing".  In fact, this is
possible.  A given class which wants to be integrated into the
framework only needs to declare its "parent" class (default Object)
and define the methods needed for its role.

To be eligible for Object-hood, a type only needs to have an
operator<<.  Note that this can be done without modifying the class
definition.

struct TrivialString
{
  std::string s_;
  TrivialString (std::string s) : s_ (s) {}
};

// Perhaps defined in a separate module
inline std::ostream&
operator << (std::ostream& o, TrivialString myString)
{
  return o << myString.s_ << "!";
}

With this, we can now use TrivialString's as Objects, and recover them
using Unbox<TrivialString>.

    Object pair = cons (3, TrivialString ("?"));
    assert (toString (pair) == "(3 . ?!)");
    assert (Unbox<TrivialString>(cdr (pair)).s_ == "?");

More control over the integration into the framework is possible, if
you're willing to do more work.  We use the C++ template traits
technique to define properties of the type.

// TrivialSequence registers its "boxing traits"
namespace MObS
{
  // Give TrivialSequence a class name (optional)
  template <>
  struct ClassName<TrivialSequence>
  {
    static std::string Name () { return "TrivialSequence"; }
  };

  // Declare that a Boxed<TrivialSequence> is an ISequence, not just an Object
  template <> struct BoxingTraits<TrivialSequence>
    : public DefaultBoxingTraits<TrivialSequence>
  {
    typedef ISequence ObBase; // default is `Object'
  };

  // Need to define the virtual function `length' required by ISequence
  namespace ObI  // "Object Implementation"
  {
    template <>
    class Impl<Boxed<TrivialSequence> >
      : public DefaultImpl<Boxed<TrivialSequence> >
    {
    public:
      // Constructors are never inherited
      inline Impl (TrivialSequence v)
	: DefaultImpl<Boxed<TrivialSequence> > (v) {}
      // Typically implement virtual functions by forwarding
      virtual size_t length () { return value().length(); }
    };
  }
}

================================================================
Why C++ semantics are broken
copies happen everywhere,often implicitly.  but lose information,
namely IDENTITY.  It's OK for value types.  Because the identity of an
object is an important part of it (sometimes its only part), you
cannot use the copy of an object in the same way as the original
object.  To

================================================================
Implementing C#'s `ref' and `out'
=================================

C++ has a reference feature that looks this:
void f (int& x) { x = 5; }
void g () { int y = 4; f (y); /* now y == 5 */ }

This is confusing for the user of f, since the call by value/call by
reference semantics of f are only apparent if you hunt down the
declaration of f.  This reduces the transparency of the language.  The
semantics of the definition of g() is less apparent by reading the
source code of g alone.

Although pass by reference may be a bad idea in general, it seems at
least that C# has improved the situation by requiring the call by
reference semantics to be declared by both caller and callee.

void f (ref int x) { x = 5; }
void g () { int y = 4; f (ref y); /* now y == 5 */ }

Now the maintainer of g() can immediately see from the definition of
g() that y may be modified by the call to f - indeed, call by
reference is a strong hint that the argument WILL be modified.

The argument for call by reference is that using a function with
multiple return values is more convenient.

Instead of
pair<int,string> f () { return pair<int,string>(1,"foo"); }

void g ()
{
  pair<int,string> p = f();
  int x = p.first();
  string s = p.second();
  ...
}

void f (ref int x, ref string s) { x = 1; s = "foo"; }

void g()
{
  int x;
  string s;
  f (ref x, ref s);
}

Call by reference is just syntactic sugar.  I won't decide whether
it's too sweet or not.  However, I do claim that C#'s approach is
better than C++'s.

However, we can implement `ref' and `out' for C++.  We just do the
obvious:


template <typename T>
class Ref
{
public:
  explicit Ref (T& y) : x (y) {}
  T* operator & () { return &x; }
  template <typename U> Ref& operator= (const U& y) { x = y; return *this; }
  Ref& operator++ () { ++x; return *this; }
  T operator++ (int) { T t = x; ++x; return t; }
  ... /* more operators here */
  operator T () { return x; }
private:
  T& x;
};

template <typename T>
class Out
{
public:
  explicit Out (T& y) : x (y) {}
  template <typename U> void operator= (const U& y) { x = y; }
private:
  T& x;
};

template <typename T>
Ref<T> ref (T& x) { return Ref<T> (x); }

template <typename T>
Out<T> out (T& x) { return Out<T> (x); }


With this infrastructure, we now have better compile-time checking

void bar (Out<int> x)
{
  // cout << x << endl; // COMPILE ERROR
  x = 7;                // OK
}

void foo (Ref<int> x);

  int x = 42;
  foo (ref (x)); // OK
  // foo (x);    // COMPILE ERROR
  bar (out (x)); // OK
  // bar (x);    // COMPILE ERROR


================================================================

How to implement constrained static polymorphism in C++
=======================================================

Stroustrup has an article describing how one kind of constraint - one
that generates errors if the template doesn't match the constraint.

E.g.

template <class T> void foo (T t) { Check_Constraints (t); ... }

will give a compile-time error if T doesn't satisfy the constraints.

But we would like to have a way of avoiding ambiguity.

E.g.

template <class T> void foo (T t) { ... }
template <class T> void foo (T* t) { ... }

But suppose you want something more sophisticated, e.g. suppose you
want to have a definiition of foo just for classes derived from, while
keeping overloads without ambiguity.

have different
definitions of foo for di

Technique: use an unused function argument with a default value.

template <class T>
void foo (T t, typename Constraint<T>::Dummy = false) { ... }

Then we can have a Constraint template and only produce partial
instantiations for the types we want.

template <typename T, bool condition=IsDerived<T,Base>::value>
struct Constraint;

template <typename T, true> struct Constraint {typedef bool Dummy;}

We can do this whenever we can "take over" an unused function
argument.  This works for constructors, for example.

Foo::Foo (RealArg realarg, typename Constraint<T>::Dummy = false) {... }

With operators we don't have unused arguments we can apply defaults to

The obvious attempt doesn't work:

template <typename T> struct is_fundamental;
template <typename T> struct is_fundamental_base { typedef T identity; };
template <> struct is_fundamental<int>  : is_fundamental_base<int> {};
template <> struct is_fundamental<char> : is_fundamental_base<char> {};

struct S
{
  template <typename T> S (typename is_fundamental<T>::identity t) {}
};


because template argument deduction is too stupid to deal with this.

void foo (typename Constraint<T,U>::cdr)

Just doesn't work for conversion operators.


Does this work:????




================================================================

================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================
================================================================

