Tuesday, February 24, 2009

Forward Declarations in C++ and runtime Bug/Error

Forward declaration is a pretty simple concept to understand. As everyone knows it is a way of declaring the class names before a piece of
code that uses it (where we do not want to include header files) so that the compilation happens properly and does not complain about
missing class name symbols. Now why will someone not want to include header files and instead have a forward declaration.This is
especially helpful when there could be multiple header files with conflicting definitions (not for the class but let us say a macro like
MAX_PATH) and we do not want to include one header file versus including the other in a given cpp file. In this case if we need to use the
objects of the class whose header file is omitted in the code of this cpp file we need to have some symbol declaration so that the
COMPILATION happens properly. Later the linker when linking the .obj files will properly declare the sizes of the objects.
Now there is a very grave pitfall that can go unseen and result in a difficult to find bug that could end up you spending lot of valuable
time looking at the memory and guessing it. I took about an hour to figure this out when I happened to work with someone who wrote such
piece of code.
First and foremost the scenario is as follows. Let us look at the declarations of our problem candidates
=====
//In A.h
class A
{
public:// some func
int array1[5];
};

//In Ace.h
#pragma once
class Ace
{
public:
int var;
};

//In newder.h
#pragma once
class Ace;
class Der;
class newDer
{
public:
void * membervar; //points to an instance of Der as supposed by the programmer
void setMembervar(void*);
void * getMembervar();
Ace * getDerVar();
};
//In newder.cpp
#include "A.h"
#include "newder.h"
void newDer::setMembervar(void * setvar)
{
this->membervar = setvar;
}

void * newDer::getMembervar()
{
return this->membervar;
}
Ace * newDer::getDerVar()
{
//Uncomment this to understand what could be happening inside the pointer conversions
//Ace * someptr = (Ace*) this->membervar;
return (Ace*) (Der*)this->membervar;
}

//In Der.h
#pragma once
class Der: public A, public Ace
{
public:
int * pointer;
void execute();
};
//In der.cpp
#include "A.h"
#include "Ace.h"
#include "Der.h"
#include "newder.h"
#include "stdio.h"
void Der::execute()
{
newDer n;
n.setMembervar((void *)this);
printf("Before: Der* 0x%08x\n",(Der*)this);
Der* d = (Der*) n.getDerVar();
printf("After: 0x%08x\n",d);
printf("Pointer value diff %d\n",(unsigned int)this-(unsigned int)d);
}
int main()
{
printf("size of int = %d\n", sizeof(int));
printf("size of long = %d\n", sizeof(long));
void * someptr;
printf("size of pointer = %d\n", sizeof(someptr));
printf("size of class A = %d\n", sizeof(A));
printf("size of class Ace = %d\n", sizeof(Ace));
printf("size of class Der = %d\n", sizeof(Der));
Der d;
d.execute();
}

====

This seemingly naive looking code has a very subtle bug that can make the code crash and this bug is very difficult to scoop into when it
is production like (with lot of other code). For the sake of simplicity the above code only revolves around the problem and not any
business logic. If you run this code you will see the following output:
size of int = 4
size of long = 4
size of pointer = 4
size of class A = 20
size of class Ace = 4
size of class Der = 28
Before: Der* 0x001bfae8
After: 0x001bfad4
Pointer value diff 20

Notice the Before and After pointer values. Interesting isnt't it. I would expect the pointer value to be the same. But it is moved.
Imagine myself using d inside execute function to access the pointer member variable of Der class, like d->pointer and then use this
pointer to access some function etc...
How could the value of the address of Der change before the call and after the call. I guess you might have figured out by looking at the
extra print statements with the sizes of the classes. Yes, the bug is in the function "Ace * newDer::getDerVar()". When we assigned Der*
to the void * membervar in setMembervar everything was fine. membervar will still be pointing to location 0x001bfae8. But when we return
from getDerVar and cast it to Ace* the problem surfaces. Look carefully into the header file newder.h. The class Ace and Der are declared as FORWARD DECLARATION. Rings bells? Yes the class size of Ace* and Der* when newder.cpp was compiled was 0. So it did not put any offsetting instructions in when the obj was prepared. Usually the compiler offsets into a given pointer when it is typecasted in the hierarchy. That is missing in this newder.obj. So when the code gets executed the address to which membervar is pointing to (0x001bfae8) is passed out as Ace*. Now continuation of the bug is in the function "void Der::execute()". When this function is compiled and the statement "Der* d = (Der*) n.getDerVar();" is compiled the compiler knows that getDerVar() returns Ace* and we are telling it that this is actually a pointer to Der* which contains A and Ace in it content serially as defined. So it puts some offsetting code to rollback the pointer by 20 bytes since it needs to get to the root address of Der class and there we have our horrible bug.

This type of code is not healthy for large code bases. I usually avoid forward declarations as much as possible. Instead of forward declarations if we have header files included in newder.cpp for Ace.h and Der.h we విల్ not see this problem.

No comments:

Post a Comment