Day 6: Caching Results


I was hoping to avoid this step. However, as soon as I started uploading images to my App Engine-base application, I found issues when trying to display all of my pictures. Why? Data contention! The reason why is kind of interesting. The application follows this flow:

  1. User logs in.
  2. Application retrieves list of images for user and generates a list of ImageFile that is processed by Django.
  3. Each ImageFile produces a URL that causes the application to lookup the current user, lookup the requested image. If the image is public, the JPEG for the image is returned. If the image is private, bytes are returned to the owner only.
  4. Images are requested.

For six images, the current owner would be retrieved from the data store no less than seven times. Sometimes, this would happen so fast that the current owner information could not be retrieved from the database due to Google throttling access to the item. In the application logs, I saw this information:

too much contention on these datastore entitites. please try again.

What is going on here? App Engine quickly identified that the application was hitting some piece of data fairly frequently within a short period of time. This type of access pattern across a large datastore can cause performance issues if everyone uses the same, bad practice. Even the same application can cause its own set of headaches. So, App Engine identifies this through behavior and throttles access. What is a developer to do? They should cache information, which means they should use google.appengine.api.memcache. memcache supports adding and removing single items as well as collections. If you are an ASP.NET developer, you can think of this as an externally hosted application cache. Things go into the cache and have a set duration for how long you expect them to stay before they become invalid. For our usage, we want to make sure that any authenticated user is also in our datastore. This is accomplished through a simple function:

def getCurrentUserOwner():
    currentUser = users.get_current_user()
    owner = None
    if not currentUser is None:
        # We have a valid user, which we store by email
        email = currentUser.email()
        owner = memcache.get(email)
        if (owner is None):
            owner = Owner(key_name=email)
            owner.name = email
            owner.put()
            memcache.add(email, owner, 10)
    return owner

If the memcache has this owner, we pass it back to the caller. Otherwise, we lookup the user and make sure that they exist in the datastore, then add the item to the cache so that the same work will not happen again for 10 seconds.

This is enough to know in order to start building applications on Google App Engine. You can download the source for the application from the finished app, at http://sseely-gae.appspot.com or from this site: http://www.scottseely.com/Downloads/sseely-gae.zip.

%d bloggers like this: