Robert Allan Schwartz: Publications

Robert Allan Schwartz

Read my professional biography.
                     Contact me.
            Go to my home page.
             Read my publications.
Read my professional resume.
    See some student reviews of my teaching.

To Virtual, Or Not To Virtual?

[published in C++ Report, September 1998, page 46.]

Most students of object-oriented analysis and design are taught that inheritance proceeds top-down, i.e. first you identify the superclass, then you specialize/derive subclasses from the superclass. Example: first you identify the superclass "furniture", then you specialize the subclasses "chair", "sofa", "table", etc.

Identifying superclasses first means thinking abstractly, since most (if not all) superclasses will be abstract classes. It can be difficult to identify the operations (i.e. member and non-member functions) of an abstract class - how can you know, in advance, what all of its subclasses will be able to do, before you know any of its subclasses?

Here's an example of this from when I designed a retargetable compiler code generator. I wanted to capture the concept of "memory address" in a class. What are the operations of a "memory address"? I knew what a Motorola 68000 memory address could do; I knew what an Apollo DN10000 memory address could do; but I had trouble deciding what an abstract memory address could do.

I solved my problem by realizing that a superclass can do whatever all of its subclasses can do. I had to know what the subclasses could do, first. Instead of proceeding top-down, I went bottom-up. I began by identifying the concrete subclasses and their operations, then I generalized to create the abstract superclass and its operations. This bottom-up approach also makes it easier to recognize which functions should be non-virtual, which should be virtual, and which should be pure virtual.

Suppose we have the classes male and female, with attributes and operations as follows:

class male
{
public:
	const string & get_name(void) const { return name; }
	void set_name(const string & new_name) { name = new_name; }

	const male_data & get_male_data(void) const { return data; }
	void set_male_data(const male_data & new_male_data) { data = new_data; }

	void output(void) const
	{
		cout << get_name() << endl;
		cout << get_male_data() << endl;
	}

	void dance(void) { cout << "I'm leading." << endl; }

private:
	string name;
	male_data data;
};

class female
{
public:
	const string & get_name(void) const { return name; }
	void set_name(const string & new_name) { name = new_name; }

	const female_data & get_female_data(void) const { return data; }
	void set_female_data(const female_data & new_male_data) { data = new_data; }

	void output(void) const
	{
		cout << get_name() << endl;
		cout << get_female_data() << endl;
	}

	void dance(void) { cout << "I'm following." << endl; }

private:
	string name;
	female_data data;
};

The name attribute, and the get_name() and set_name() operations, are identical in both classes, and can therefore propagate upward into a new superclass as follows:

class human
{
public:
	const string & get_name(void) const { return name; }
	void set_name(const string & new_name) { name = new_name; }

private:
	string name;
};

The declarations and definitions for the name attribute, and the get_name() and set_name() operations, move into the human header and source files. Since male and female inherit these class members from human, they no longer need declarations or definitions for them:

This upward migration of common class members is called "factoring" during initial design, and "refactoring" when applied to an existing design.

class male : public human
{
public:
	const male_data & get_male_data(void) const { return data; }
	void set_male_data(const male_data & new_male_data) { data = new_data; }

	void output(void) const
	{
		cout << get_name() << endl;
		cout << get_male_data() << endl;
	}

	void dance(void) { cout << "I'm leading." << endl; }

private:
	male_data data;
};

class female : public human
{
public:
	const female_data & get_female_data(void) const { return data; }
	void set_female_data(const female_data & new_male_data) { data = new_data; }

	void output(void) const
	{
		cout << get_name() << endl;
		cout << get_female_data() << endl;
	}

	void dance(void) { cout << "I'm following." << endl; }

private:
	female_data data;
};

The next observation is that the male and female classes both have an output() operation. The interface (i.e. the declaration) is the same, but the implementations (i.e. the definitions) have some similarities and some differences. The interface can propagate upward into the human class, but what do we do with the implementations?

The answer is to extract the common part of the implementation, and propagate that upward into the human class as follows:

class human
{
public:
	const string & get_name(void) const { return name; }
	void set_name(const string & new_name) { name = new_name; }

	virtual void output(void) const
	{
		cout << get_name() << endl;
	}

private:
	string name;
};

class male : public human
{
public:
	const male_data & get_male_data(void) const { return data; }
	void set_male_data(const male_data & new_male_data) { data = new_data; }

	virtual void output(void) const
	{
		human::output();
		cout << get_male_data() << endl;
	}

	void dance(void) { cout << "I'm leading." << endl; }

private:
	male_data data;
};

class female : public human
{
public:
	const female_data & get_female_data(void) const { return data; }
	void set_female_data(const female_data & new_male_data) { data = new_data; }

	virtual void output(void) const
	{
		human::output();
		cout << get_female_data() << endl;
	}

	void dance(void) { cout << "I'm following." << endl; }

private:
	female_data data;
};

The male and female classes's implementations of output() invoke (i.e. reuse) the human class's implementation of output(). Note the use of the scope resolution operator (i.e. human::output); without that, we'd have an infinite loop, as male::output() would call itself!

When a superclass and a subclass both provide an interface to the same operation (e.g. output()), we say the subclass overrides that operation. When this happens, the superclass should declare its interface as virtual. The subclass's declaration of that interface becomes virtual automatically, although I recommend that the virtual keyword be present in both the superclass and the subclass.

Note that human is still a concrete class.

The final observation is that the male and female classes both have a dance() operation. The interface (i.e. the declaration) is the same, but the implementations (i.e. the definitions) are completely different. The interface can propagate upward into the human class, but what do we do with the implementations?

The answer is: the superclass has no implementation for that interface. I don't mean the definition has an empty body, as in:

	void human::dance(void) { }

I mean we don't define it at all. Nothing. Nada. Zip. Zero.

In fact, zero is such a good idea that we include it in the interface in the superclass:

class human
{
public:
	const string & get_name(void) const { return name; }
	void set_name(const string & new_name) { name = new_name; }

	virtual void output(void) const
	{
		cout << get_name() << endl;
	}

	virtual void dance(void) = 0; // do not implement!

private:
	string name;
};

This is a pure virtual interface. Pure virtual means "this interface has no implementation in this class; any class who wants to be a concrete subclass of this class is obligated to provide an implementation". Since both the male and female classes provide an implementation for dance(), they meet their obligations, and are concrete classes.

What about human? It doesn't supply an implementation. It is not a concrete class. It is an abstract class. The rule is: as soon as a class acquires its first pure virtual function, then it becomes an abstract class.

You can't instantiate an abstract class. If you could, then what would happen in the following case?

	human h; // line 1.
	h.dance(); // line 2.

Line 2 is illegal, since instances of the human class don't have a definition for dance(). To make doubly sure that you can't make a mistake, line 1 is also illegal.

Here's an algorithm for deciding whether to make a function non-virtual, virtual, or pure virtual:

if (the interface is different among different subclasses)
{
	// example: male has get_male_data(), female has get_female_data()
	then interface and implementation stay in the subclass
	and such operations are non-virtual.
}
else // (the interface is the same among different subclasses)
{
	// examples: get/set_name(), output() and dance().
	if (the subclasses' implementations are identical)
	{
		// example: get/set_name()
		then
		{
			superclass gets the interface and the implementation,
			subclasses don't keep anything,
			and such operations are non-virtual.
		}
	}
	else if (the subclasses' implementations have some similarities)
	{
		// example: output()
		then
		{
			superclass and subclasses all get this interface,
			superclass gets the common code,
			subclasses keep the different code,
			subclass implementations override and invoke (i.e. reuse) the
				superclass's implementation
			and such operations are virtual.
		}
	}
	else // (the subclasses' implementations are completely different)
	{
		// example: dance()
		{
			superclass and subclasses all get this interface,
			superclass declares this interface as "= 0",
			superclass does not implement this interface,
			and such operations are pure virtual.
		}
	}
}

Notice that one abstract class (e.g. human) can have operations of all 3 varieties (e.g. get_name() is non-virtual, output() is virtual, and dance() is pure virtual).

Remember to identify inheritance bottom-up, and you won't have any trouble deciding "to virtual or not to virtual".