In defense of Core Data (Part I) 07 Sep 2013


Core Data has had a lot of bad press lately. To name the dissidents:

I do like Core Data but I realize that many people don't. In this blog post I would like to write about concurrency and performance, clear some things up and lay the foundation for successive posts in this series. Of course you should take my enthusiasm about Core Data with a grain of salt since I am the developer of Core Data Editor. That being said I think that Core Data needs more than just a defense of its potential flaws: A lot of explaining has to be done. Let's get dirty!

Concurrency and Core Data

Core Data first was made available with the release of OS X 10.4 (Tiger). This was two years before the iPhone was introduced. We can assume that Core Data was in development long before it's introduction in 2005 and was not developed with a mobile platform in mind. This can also be inferred by the fact that it took Apple three major releases to introduce Core Data on iOS. Macs in 2005 were already pretty fast. Almost everything you wanted to do with Core Data could easily be done on the main thread of your application without blocking the UI in a noticeable manner. If you wanted to use Core Data in a multi threaded scenario back then you really had to know what you were doing. Nearly nobody used Core Data in a multi threaded environment back then. The introduction of Core Data on the iPhone increased the need for concurrency because of several factors:

  • Less powerful hardware
  • iOS has a much more "fluid" UI. You scroll all the time and dropping a couple of frames is a very bad experience especially for touch devices.
  • So: Anything less than 30 frames per second is unacceptable
  • There is no beach ball on iOS (for good reasons).

This increased need for concurrency was seemingly addressed by Apple with the introduction of nested contexts and concurrency types (iOS 5/ OS X 10.7). Why use the word seemingly you might ask? Many developers who make use of nested contexts and concurrency types report problems:

  • random crashes ('Core Data could not fullfill fault' exception)
  • data loss
  • merge conflicts
  • ...

Now one could say that these two features (nested contexts and concurrency types) are buggy and thus Core Data is at fault here. In fact shortly after those features were introduced many obvious bugs have been reported to Apple. Apple took it's time to fix those issues and now they work much better. But people are still reporting the same problems listed above. Just a couple of days ago I tried to help a fellow developer with his crashing Core Data app. What I saw is a pretty common (mis-) use of Core Data: Applying nested contexts and concurrency types naively to the own application. Naively is probably the wrong word because it is not obvious from the documentation alone how one is supposed to use nested contexts and concurrency types. I needed quite a few weeks to wrap my head around it. This is not because those two features are bad by design but mainly because of the very limited documentation we have from Apple. Another problem is that many developers seem to think that only by using nested contexts and concurrency types all of your concurrency related problems go away magically. This assumption is wrong.

So what are the common patterns that we see in concurrent Core Data based applications?

In Detail:

  • We have a persistent store at the top.
  • Then there is a managed object context below that. The concurrency type of this context is set to NSPrivateQueueConcurrencyType. This context is typically called the root context.
  • The root context has a NSMainQueueConcurrencyType child context. This context is typically used by everything related to the user interface (bound to NSArrayController, used by NSFetchedResultsController, …).
  • The main queue context can typically has one or more NSPrivateQueueConcurrencyType child contexts. Those contexts are often used to import results from a web API.
  • Dynamic child contexts can for example be used to discard changes (by simply throwing a context away without saving).

As you can see this setup is quite complex. Now before going any further one has to understand what it actually means when a context has a certain concurrency type. Let's assume we implement NSManagedObjectContext ourselves. How would our implementation of -initWithConcurrencyType: look like? Something like this:

// Some required parts were omitted for better readability  

@interface NSManagedObjectContext ()
@property dispatch_queue_t queue;
@end

@implementation NSManagedObjectContext
- (id)initWithConcurrencyType:(NSManagedObjectContextConcurrencyType)ct {
 self = [super init];
 if(self) {
  if(ct == NSMainQueueConcurrencyType) {
   self.queue = dispatch_get_main_queue();
  }
  if(ct == NSPrivateQueueConcurrencyType) {
   // create a private serial queue
   self.queue = dispatch_queue_create("queue", NULL);
  }
  // NSConfinementConcurrencyType is ignored for the sake
  // of simplicity.
 }
 return self;
}
... 
@end

I assume that this is roughly how Apple implements this initializer.

I would like to mention that I know for a fact that the implementation is more complex. This is true for almost every pseudo-implementation given in the blog post. I have simplified the implementation to make it easier to understand.

So the only thing a concurrency typed context does is to create a dispatch_queue. Nothing more, nothing less. In addition to concurrency types there are two different methods which can be used to work with a context. I will use the above snippet to explain how those methods work. Both methods basically work the same way: You pass in a block and the method then executes your block on the correct queue.

-performBlockAndWait: (NSManagedObjectContext)

If you use this method then the receiving managed object context is using it's queue and synchronously executes the passed block. The implementation of this method could look something like this:

...
- (void)performBlockAndWait:(void (^)())block {
  block ? dispatch_sync(self.queue, block) : nil;
}
...

Again: The real implementation is more complex because of certain things that Apple is guaranteeing about this method but in fact the implementation given is something very reasonable and approximates the real implementation very good - as far I know.

So the only thing this method does is to use the queue of self and execute block on it in a synchronous fashion. Nothing more. Nothing less. There is almost no magic involved at all. -performBlock: (NSManagedObjectContext) looks almost the same.

-performBlock: (NSManagedObjectContext)

The following snippet shows a possible implementation of the method -performBlock: (NSManagedObjectContext):

...
- (void)performBlock:(void (^)())block {
  block ? dispatch_async(self.queue, block) : nil;
}
...

The only difference: Instead of using dispatch_sync this method is using dispatch_async which executes the passed block asynchronously.

Now you know how the concurrency type and nested context magic is roughly implemented and I hope that you have the same opinion than I do:

There is nothing really special about it. I would even go so far to say that it is not really helpful. Why do I say that? Let's imagine a typical task that one could solve by using nested contexts and concurrency types: Your app is using a web service to retrieve some data. This is done in the background with some kind of HTTP client library. So you could create a child context, perform a couple of fetches to determine which kind of HTTP calls you have to do. Then you execute those HTTP calls and when they are done you try to incorporate the result back into your child context. The result could be the deletion of certain managed objects. It could also be the creation of managed objects or the result could make you modify a set of objects. Of course a combination of those things is also possible. You perform your changes on the managed object context and then you call -save:. Let's assume that you have deleted an object in your child context. How do you know that it does still exist? You don't. At this point Core Data could throw its hands up in the air, make some coffee, throw an exception or do something else. The fact is: Only by using nested contexts and concurrency types you have gained nothing except random crashes.

How to do concurrency with Core Data?

I have shown that a private queue concurrency type context creates it's own queue. Creating a new context of this kind creates a new queue. Now try to apply this to your own application design: You have an app that is performing not so well. You decide to perform work in the background. How do you do that? Are you doing that by creating queues and using dispatch_async more or less randomly? You could do it that way but it increases the complexity of your app. When I have a look at a random Core Data based application I simply count the number of -performBlock: and -performBlockAndWait: calls and it gives me a good idea of the overall complexity of the app. I suggest the following things when I want to do concurrency with Core Data:

  1. I create a single main queue managed object context.
  2. I am using the context for everything: Executing big fetches, executing big saves, …
  3. I realize: Most of the time the performance is quite okay! I am happy!
  4. I try to not execute fetch requests explicitly. On OS X I try to use NSArrayController and on iOS I am using NSFetchedResultsController whenever possible. Executing fetches manually is possible in Core Data but every time you do it you should have a very very good reason.
  5. I avoid frameworks like MagicalRecord (sorry Saul). Not because these frameworks suck (they don't) but because the make fetching easy for me. I do not want that. I want to think very hard about every fetch I make.

So basically I try to avoid multiple contexts for as long as possible. In most cases (especially on OS X) you will be fine. This keeps the complexity of your logic low. At some point however, you may have to use multiple contexts. In most cases the reason for this is a very specific use case which needs some concurrency to really shine. Now it is your obligation to think of a policy for this use case. Forget everything about Core Data when first drafting this policy. Thinking about a policy has nothing to do with Core Data. This is a general side effect of introducing concurrency. Let's make it more concrete:

You have to use a couple of objects and compute something highly complex. The result of the computation can be an arbitrary change to the object graph: Inserting objects, deleting objects and/or updating objects. This computation should happen in the background and at some point the result of the computation has to be applied to the object graph. A problematic workflow could be like this:

  1. I use objects A, B and C as input for my complex computation.
  2. The computation is started in the background.
  3. While the computation is going on the user deletes object A and saves the root context.
  4. At some point the complex computation is finished and the computation determines that object A has to be updated.
  5. A is updated but the app crashes because A does no longer exist.

A typical Core Data related exception that you see in this case is a Could not fullfill fault exception or a merge error while saving. This is what you get when you apply concurrency naively without using a policy. A basic policy (which may not cover every possible case) for this example could be that while the complex computation is running A, B and C can't be changed nor deleted by the user. There are many options how a policy like this can be implemented:

  • Showing a modal dialog that disallows changes to everything while the computation is running (not really concurrent, is it?).
  • Only "lock" objects A, B and C: If the user tries to change or delete them while the computation is running tell the user he has to cancel the computation or wait for it to finish.
  • Let the user make any change to the object graph but before applying the changes make sure they can be applied. For example make sure that objects which will be modified do exist.

I know, this sounds like a lot of work and it cannot be automated or solved in a generic fashion. Coming up with a policy is a creative process that requires some thinking.

Summary

To sum it up:

  1. Don't use nested contexts unless you really need them.
  2. Count the number of times you use -performBlock: and/or -performBlockAndWait:. If you use them a lot rethink your architecture. You may need a convention.
  3. If you need concurrency think of a policy and implement it.
  4. Before throwing concurrency at your problem try to improve the performance.
  5. Think about every fetch you do: NSManagedObjectContext can be seen as an in-memory cache of your store. Fetching all the time makes the context less useful.

I hope that you have learned something. If so try to apply my suggestions to your current Core Data app and you will see many problems you may have had to go away.