Monday, December 29, 2008

Thread safe singletons

With this rewrite of Ainur, I will loose several Ogre3D stuff.
Before thinking about shaders, post-process effects and all the graphics stuff, I need to implement "core" stuff like singletons.

In the previous version I wrote all the managers as singletons because the game engine only need one instance that is accessible from every place.

Since the new Ainur engine will be multi-threaded, I felt the need for a thread-safe implementation.

Single-threaded implementation

If you are sure that the singleton will be first accessed in a single-threaded context, for example during the initialisation fase where we do not have threads yet, a simple test for singleton existence is sufficient.

In this case, the Singleton has a private static pointer to the singleton instance, and public static functions such as getSingletonPtr() and getSingleton() to get the pointer or reference to the instance.

In getSingletonPtr() we create the pointer if it doesn't exist yet and return it to the caller.
The constructor and destructor are private, so that it's impossible to directly instantiate.

class Singleton

{

public:

static Singleton* getSingletonPtr()

{

if (m_singleton == NULL)

m_singleton = new Singleton();

return m_singleton;

}



static Singleton getSingleton()

{

return (*m_singleton);

}



private:

Singleton() {}

~Singleton() {}



static Singleton* m_singleton = NULL;

}; 



Multi-threaded implementation


What happens in a multi-threaded environment?
What, if multiple threads try to access the singleton at the same time?

We need to find a way to avoid creation of more than one instance.
In order to do this, I think we should use mutex locks to protect the singleton creation code from being executed by more than one thread at a time.



#include  m_singleton="new" m_mutex="CreateMutex(NULL,">
Attempting to Avoid Locking Overhead

That's pretty simple, and it works. However, you might notice that we've incurred the overhead of locking every time someone accesses the instance of the singleton.

There's a well-known pattern to eliminate this overhead, called the double-checked lock. It takes advantage of the fact that we can (apparently) check for null as a quick test before we even bother with locking. Here's how this would look:

T* T::instance()

{

if (smInstance == NULL)

{

VMutexLocker lock(&smMutex);



if (smInstance == NULL) // double-check

smInstance = new T();

}

return smInstance;

}

The idea is if we can see that the instance pointer is not null, then there's no need to even enter the synchronized lock and consider creating it.

Once we determine that the instance may need to be created, we enter the locked section. However, we need to re-check for null once we acquire the lock, because "we" may be the thread that lost a race to the lock. That is, another thread may have also checked for null at the same time (we both saw a null pointer) and then it acquired the lock first, created the instance, and released the lock before we were given the lock.

Unfortunately, while this will work reliably on many if not almost all platforms, there is no requirement that the combination of the compiler and the processor's memory model will work as this code intends. It is simply not guaranteed to work.

The Potential Problem

The problem comes from the following statement and the lack of guarantee of the order in which it performs its multiple sub-operations. (The same problem can exist in other languages such as Java and .NET; this is not a C++-specific issue.)

smInstance = new T();

Remember that in our attempt to optimize away unnecessary locking, another thread may be executing the first null check statement (prior to lock acquisition) at the exact same time as our thread is creating and assigning the instance:

if (smInstance == NULL)

Here is the core problem: We have no guarantee that T will be fully constructed before the pointer smInstance is assigned. For example, the compiler is within its rights to generate processor instructions that implement the statement in the following order:

Allocate a memory block sized for T.

Assign the address of the memory block to smInstance

Call T's constructor.

(Multi-processor memory architectures can provide a similarly "hostile" environment to our code.)

Consider then what happens if the other thread performs its unsynchronized null pointer test when we are between steps 2 and 3 of the instantiation statement: It will see a non-null value, and will proceed to return the pointer to the raw memory of the not-yet-constructed object.

The Outcome

Fortunately, we can fall back to the original "thread-safe" implementation if we want the instance() function to work with certainty from multiple threads on any platform and compiler.

This leaves you with some choices. If you are sure that the first call to instance() will be made in a single-threaded mode, such as during static initialization, then you don't need the lock. Similarly, if you know that (or can structure your code such that) the instance is created from the main thread before you create other threads, then you don't need the lock. Alternatively, the performance overhead of the lock may well be a non-issue unless you are calling instance() frequently--a classic case of "don't optimize prematurely", where you shouldn't work to eliminate the lock if it isn't a problem to begin with. And of course, if you reference the singleton a lot in one place, you can avoid the cost of checking the lock repeatedly by calling instance() once and then using the object directly, rather than calling instance() repeatedly.

In my Code Vault library, I've taken this two steps further, first by implementing the singleton pattern as a template that is parameterized with the locking requirements, and also allowing the singleton to be registered for deletion during program termination, with rules for whether a deleted singleton can "resurrected" if it is accessed after being deleted. 

No comments: