Returning dynamically allocated memory
It is quite common to encapsulate in our classes allocation and
deallocation of heap memory. A good example of such class would be a Array
class, which encapsulates array, that size can be set at run-time.
We will assume in the rest of the article, that there is only one block
of memory that the class is using.
It is a good habit, and sometimes a necessity, to provide a
copy-constructor for such class. There are two possibilities: copy only
pointer to the allocated memory, and have both objects share it, or
allocate new memory and copy the contents. And, this behavior of copy constructor
describes what the object is: in the first case, the object is a handle to
the real representation of the object (which would be, in this case, the
data stored in the memory), in the second case we are dealing with the
object itself.
Of course, there can be mixed cases, when some of memory is copied, and
some not, but I would classify it as a the second case, that means an
object itself, that shares another object with his buddies.
The decision which strategy to use is not always easy. Using handles
involves many other problematic issues, like deciding whether to destroy
the shared object, reference counting, and so on. From the other hand,
allocating memory each time you want to copy the object may be too
expensive in the terms of running time.
I personally think the approach with copying the memory should be preferred.
The most important point is that you know exactly when the object is
created and destroyed. In the case of reference counting or other
techniques, like garbage collectors, you are never sure, when the memory it consumes will be
deleted. For some of programmers (esp. Java and C#) this may seem as a
disadvantage - you have to remember to destroy the object, and you copy it
all the time - when passing as an argument to function and when
returning it. But I personally like to know what's happening with my
objects and manually manage its destruction. Furhtermore, I tjhink
the removal of the object is so important think in object model, that it
should be done explicitly, instead of relying on the GC, which also incurs
a run-time overhead.
I bet all Java programmers now say "How many times did you forget
to delete an object?". The answer is "couple of times :)".
There is a way to prevent most of such errors by using auto pointers
class. It's a simple template class which stores a pointer to some type.
It the destructor it calls delete on that pointer. Also it has operators
overloaded to behave like a normal pointer. And that's it. If you use
those pointers for every object allocated on the heap, there is only slight chance that some piece of memory will
escape.
And now Java programmers say "How many times did you referenced
object that did not existed?". Well, yes, this is a problem, but
there are two things I want to say. Firstly, there are tools to inspect
such errors - like CodeGuard in C++Builder 5, or debug compile in VC++,
which fills released memory with some values, which causes all uses of
deleted objects to generate errors and helps programmer to correct the
problem. And secondly - if you use deleted object that means that the
structure of object you created is wrong. This may indicate that the
object model you implemented is not right. So, this uncovers the internal
problems in the structure of application and enforces more conscious
programming.
And what with other advantages of handles? When you want to pass it to
other object, you can always use a pointer or reference. It's a bit harder
to determine the moment when the object is not needed - but if there is no
more specific solution, that would originate in the nature of connections between
your objects, you can always use reference counting or automatic GC. It
doesn't mean that I'm against aggressive memory management, like
stop-and-copy, used in most JVMs. I just say programmer should care of
deleting his objects, because it leads to better and faster code.
And finally, the topic of this article - what with superfluous
coping of memory when passing as an argument or being returned.
When you pass such object to function, it is copied, and so it's its
memory. There is a simple solution to that - simply pass objects as
constant reference to that object, and there will be no copying.
But what with returning values? Sometimes you have to return a new
object, not just a reference to existing one. When you return an object by
value, the instance from local function scope is implicitly copied to a
new one in the caller scope. References cannot solve this problem - you
cannot return reference to a local object. One way is to use out
parameters - get the reference to the output object, set its properties in
the body of function and exit the function. But it's not pretty, and has
further consequences, like you cannot simply nest function calls.
There is also another way, which is our subject. The main idea is that
there is no need to copy the dynamic memory - it will be deleted just
before exiting the function during the destruction of returned object. It
is enough to take this memory from returned object and give it to the new
object in the caller scope. You have to inform the returned object, that
it should be not destroyed (by setting null to pointers) and that's
it! This will be called strong copying of object, as the copy takes
the memory used by the original object.
And one more thing, which can apply only in several cases, and I will
explain it on the string class example. It concerns passing objects as a
constant references. The problem is that if you want to pass a standard C
string - that is, char *, its char array has to be copied in order to create a string
object, and then passed to the function. There is one unnecessary copy. To
avoid it, we can use the given pointer as a string base and do not copy
it, simply, to set internal string class pointer to the given one. And at
the end, remember not to delete it. This we will call weak copying - the
copy takes the memory only temporally, cannot modify it, and cannot delete
it. Programmer has to look out for unintentional delete of weak copied
object, but in function calls this should not be a problem as long as the
object is declared as const, and, of course, it won't be changed during
the execution of the procedure. Anyway, the programmer he has to be
aware of the danger.
I discovered those things I've written earlier more than a year ago, but
was not able to implement it till now. It was hard for two reasons - firstly, I
wanted it to be universal, to support inheriting and aggregation and
secondly, there were many problems with compilers - the mechanism I used
produced lots of Internal errors, and I had lot of work with dealing with
them. But finally, I managed to compile it under Borland C++ Compiler 5.1
and Visual C++ .NET 7.0.
The main idea is to use different descendant classes for passing
parameters, returning values, and normal use (as a member of object or
automatic/static object). We will call those classes in, out, and std
versions of the base class, respectively. In constructors and destructors
of them we will implement the functionality mentioned earlier - strong,
weak and normal copying.
The in class has a copying constructor, which simply copies the
pointer. That class has no destructor.
The std class has a normal copy constructor for other std objects,
which copies the allocated memory and another constructor for out objects,
which copies strongly: copies the pointer, and sets the original one to
null. The class ahs also a normal destructor, which frees the memory.
The out class has two copy constructors: one for std objects, and
second for out objects. In the first one, it allocates new memory and
copies the contents. The second one uses strong copying - copies the
pointer and sets the original one to null. There is also a standard
destructor freeing the memory.
This is an example of this technique - it implements a simple vector of values of unspecified size.
The vctribase is a base class for in, out and std types. You can see
examine the constructors and destructors to see that they are built in the
described way.
If you want to be able to assign existing vectors to such class, you
have to implement assignment operators in the same way constructors
are.
Each class version implementation has three member types defined. Those
are used to more or less seamlessly generate version classes for
descendants of vector class or for classes that has members of type
vector. The idea is you pass the version of vector as a template parameter to
the class that uses it, no matter if its descendant or other object. And
then, you simply put similar typedefs in that new class, for example:
template<class C>
class counted : public C {
private:
int count;
public:
typedef inT
counted<C::inT>;
typedef stdT
counted<C::stdT>;
typedef outT
counted<C::outT>;
};
Smart, isn't it? But, unfortunately, that's not all. We have to create
constructors and assignment operators, in which we will forward objects to
the ancestor.
template<class C>
class counted : public C {
private:
int count;
public:
typedef inT
counted<C::inT>;
typedef stdT
counted<C::stdT>;
typedef outT
counted<C::outT>;
counted(const
inT &co) : C(co) {}
counted(const
outT &co) : C(co) {}
counted(const
stdT &co) : C(co) {}
};
Note, that the version of original object will be preserved and will
get to the vector classes, which know how to handle it.
In case of member vectors, we would similarly initialize the new
members with the old ones, and the version would be also preserved.
There are more ideas about all this, for example, one could add
typedefs to *Timpl to be able to recognize in decendant classes which
version of the class it is basing on. Maybe I will describe it in the
future.
If you are interested in this technique, you can view string class,
which I wrote using it. Its available here. It
needs my headers to compile.
Should you have any problems compiling it, email me, because it uses a new
version of nnlib library, which is in state of constant transition, so
there may be problems.
I hope you've found it useful.
|