Archive for February, 2009

Speaking at Chicago Area Cloud Computing User Group (Feb 25, 2009)

I’ll be speaking Wednesday night at the Cloud Computing User Group in Downers Grove, IL. I have a short presentation on the main computing platforms. If you are a regular reader, you know that I’ve been spending some time going beneath the surface on major platforms. If folks like the high level overview, I’ll do some more in depth talks in the future.

Here are the details and the announcement that Bryce Calhoun sent out:


Join us for the third local meeting of the Cloud Computing User Group – this month in Downers Grove. At this meeting, we will be learning about how Live ID integration works in the Azure cloud computing platform. We’ll demo and dig into the code of an application built in the cloud that integrates directly with the Live ID service and stores information specific to the individual associated with that ID.

Also, Scott Seely, Architect at MySpace, will kick off the meeting with a 20-30 minute overview of the top three cloud computing offerings available today: Google App Engine, Amazon EC3 and Azure Services. His discussion will be primarily focused on a compare/contrast of the functionality and features inherent to each platform.
Please take the time to register here so we can plan appropriately for food: http://www.clicktoattend.com/?id=135321


Come on out. This is a great group and the discussions are getting to be a lot better. We are peeking under the hood now, but should be going deeper over the coming year.

Leave a comment

Day 6: Caching Results

I was hoping to avoid this step. However, as soon as I started uploading images to my App Engine-base application, I found issues when trying to display all of my pictures. Why? Data contention! The reason why is kind of interesting. The application follows this flow:

  1. User logs in.
  2. Application retrieves list of images for user and generates a list of ImageFile that is processed by Django.
  3. Each ImageFile produces a URL that causes the application to lookup the current user, lookup the requested image. If the image is public, the JPEG for the image is returned. If the image is private, bytes are returned to the owner only.
  4. Images are requested.

For six images, the current owner would be retrieved from the data store no less than seven times. Sometimes, this would happen so fast that the current owner information could not be retrieved from the database due to Google throttling access to the item. In the application logs, I saw this information:

too much contention on these datastore entitites. please try again.

What is going on here? App Engine quickly identified that the application was hitting some piece of data fairly frequently within a short period of time. This type of access pattern across a large datastore can cause performance issues if everyone uses the same, bad practice. Even the same application can cause its own set of headaches. So, App Engine identifies this through behavior and throttles access. What is a developer to do? They should cache information, which means they should use google.appengine.api.memcache. memcache supports adding and removing single items as well as collections. If you are an ASP.NET developer, you can think of this as an externally hosted application cache. Things go into the cache and have a set duration for how long you expect them to stay before they become invalid. For our usage, we want to make sure that any authenticated user is also in our datastore. This is accomplished through a simple function:

def getCurrentUserOwner():
    currentUser = users.get_current_user()
    owner = None
    if not currentUser is None:
        # We have a valid user, which we store by email
        email = currentUser.email()
        owner = memcache.get(email)
        if (owner is None):
            owner = Owner(key_name=email)
            owner.name = email
            owner.put()
            memcache.add(email, owner, 10)
    return owner

If the memcache has this owner, we pass it back to the caller. Otherwise, we lookup the user and make sure that they exist in the datastore, then add the item to the cache so that the same work will not happen again for 10 seconds.

This is enough to know in order to start building applications on Google App Engine. You can download the source for the application from the finished app, at http://sseely-gae.appspot.com or from this site: http://www.scottseely.com/Downloads/sseely-gae.zip.

Leave a comment

Deprecating a Web Service

Once upon a time, I worked for a company on a project to build a Web service stack. Prior to that, I wrote a number of articles for that company, including one that wound up being used as the basis for versioning efforts across several industry leaders (the latter based on conversations and personal e-mails). After all these years, people still come to me and ask how to inform consumers of the interface that the version they use is about to go away. Today, Amazon SimpleDB provided a great example of how to inform customers of a new interface. The announcement was well done, short, and was communicated to all users via e-mail as well as posted publicly on the Internet. Amazon provided this statement, which I think is a great way to communicate that things are about to change as well as why they are changing:

Dear Amazon SimpleDB Developers,

In response to direct customer feedback, we announced today the release of count and long-running query support for our Select API. We have consistently heard that query syntax more like SQL simplifies the transition to SimpleDB, as well as lowering concerns surrounding lock-in. Thus, after much deliberation, we have decided to focus this and future development efforts on the Select API and begin the process of deprecating the existing WSDL.

In the coming weeks, we will be publishing a new WSDL version which excludes the Query and QueryWithAttributes APIs. In addition, a migration guide will be released to help SimpleDB developers make the transition. Upon release of this new WSDL and migration guide, we will begin a 15-month deprecation process for the 2007-11-07 WSDL. During this 15-month interval, we will continue to support, but add no new functionality to the 2007-11-07 WSDL. At the end of 15 months, it will no longer be available. Today’s announcement is intended to give as much advance notice as possible to our developer community of our intentions. If you have any questions or concerns, please do not hesitate to contact us via the forums.

Sincerely,

The Amazon Web Services Team

Leave a comment

Day 5: Saving Information in App Engine

When you think about data storage with respect to App Engine, it is better to think of this as a mechanism to persist object data than to think about rows and columns that one worries about in a traditional relational database management system (RDBMS). Instead of defining tables, the developer defines Python objects. A persistable object inherits from one of the following types from the google.appengine.ext.db module:

  • Model: A data model whose properties are all defined within the class definition.
  • Expando: A data model whose properties are determinted dynamically.
  • PolyModel: Allows for models that use inheritance.

Each value in the data class must be an instance of a property type. There are strings, numeric, dates, boolean, blogs, lists, postal address, geographic points, and other values. You can see the complete list in the App Engine docs. All objects in the models have a unique key property, key(). The property isn’t just unique across the type– the property is unique across all objects. In the photo application, we have a fairly simple set of data: people who own pictures and the pictures plus metadata. Owners are fairly simple: we only need to be able to look them up by name or key. Recalling that ALL of our objects automatically have a key, the owner model is exceptionally simple:

class Owner(db.Model):
    name=db.StringProperty()

This says that we have an Owner that inherits from db.Model. Owner has one property, name, that is a String.

class ImageFile(db.Model):
    owner = db.ReferenceProperty(Owner)
    caption = db.StringProperty()
    description = db.TextProperty()
    image = db.BlobProperty()
    date = db.DateTimeProperty(auto_now_add=True)
    public = db.BooleanProperty()

Every ImageFile has an Owner. The owner property is ReferenceProperty. The value of a ReferenceProperty is the same as the key value for the referenced item. In this case, we state that the ReferenceProperty will always reference an Owner. We allow for caption to be a string and description to be a TextProperty. A TextProperty is like a StringProperty in that both can hold Unicode characters. They are different in these two forms:

  1. StringPropertys can be used for sorting. TextProperty cannot.
  2. StringProperty supports only 500 characters. TextProperty can be larger. TextProperty is a BlobProperty with encoding.

And yes,a BlobProperty is what you think it is– a block of bytes. For the date on the ImageFile, you will see that the item has auto_now_add set to True. This forces the date to be set to the current time when the Model is added to the data store.

App Engine has standard Create|Retrieve|Update|Delete (CRUD) operations. Objects that inherit from Model|Expando|PolyModel merge Create|Update into one operation: put(). Delete is db.delete(), and get is implemented by db.get(). Get supports retrieving an object by ID or by query. For example, to get all the ImageFile objects for a known user, one would write this query:

images = db.GqlQuery("SELECT * FROM ImageFile WHERE owner=:1", owner)

The preceding code assumes a valid Owner object. In the query, the owner.key() value will appear in place of :1. If you know the key for the image you want to extract, then you can get the item directly:

image = db.get(self.request.get("id"))

The preceding gets the ID from the query string of a request. Finally, with an object in hand, one can also delete the object:

requestedImage = db.get(self.request.get("id"))
if requestedImage.owner.key() == currentUser.key():
    db.delete(requestedImage)

And this is all one needs to know in order to handle manipulating data in App Engine. Please note that this is a fairly simple application with simple needs. This sample application has been proving itself as a good way to learn the basics. It is also bringing up things to investigate further:

  • Are cascading deletes handled?
  • Is referential integrity handled?
  • Can I do batch updates?
  • Can I delete the results of a query?

But, those are questions to be answered on another day.

Leave a comment

Day 4: Implementing Authentication

When writing an application, it is always handy when your platform of choice offers some sort of plug and play authentication mechanism. For example, ASP.NET has authentication providers that allow users to authenticate against Windows accounts or against username/password combinations in a data store. For Internet based applications, there has been a large push to separate authentication from authorization. The reason for this is simple: users want to use a large number of applications and need to be authenticated for each application. Each of these applications has a choice to make: use a custom username/password that only it knows about or trust a third party to provide an identity. Many web sites choose to implement their own username/password list. Over time, this simply means that people tend to use the same username/password combination on all sites that they access. The downside here is that if one site is compromised, the attacker has probably gained access to many identity on several more popular sites. A site like EBay or Amazon probably won’t be compromised. But, a less popular site with only a few dozen or few hundred users, written by a hobbyist, may be easily compromised. If just one or two users have an active account at a site like Amazon, those users might find themselves holding a lot of debt.

To work around this issue, Microsoft released a product originally called Passport, today known as Windows Live ID. It’s original goal was to allow for a way for Microsoft to vouch for someone’s identity independent of the rights of that identity. Believe it or not, Microsoft was ahead of its time when creating this authentication mechanism. After learning a number of lessons about how to set up identity providers and parties that rely on those parties, the industry came together and created OpenID. If you are really interested in how OpenID works, I encourage you to spend some quality time on the OpenID web site. So, what do we need to know? OpenID put together specifications that state how the authenticators, known as OpenID Providers (OPs), work with Relying Parties (RPs). RPs typically have a fixed set of OPs that they will use for authentication. The RP has to trust the identities provided to it by the OP– an OP that verifies or refutes every claimed identity is easy to write and of no value in verifying someone’s identity.

The long story about OpenID is important– when building a larger web site, you want to use OpenID in your finished site. It allows you to broaden your appeal by letting someone use an established identity. That said, you need to start with something. Having to do minimal work to get delegated authentication up and running sounds good. You can start out by choosing to trust Google identities. App Engine makes this all incredibly easy.

You start out by importing the users types:

from google.appengine.api import users

From here, it is a simple one liner to get the current user:

currentUser = users.get_current_user()

If currentUser is set to None, we know that no one has logged in using Google authentication. If the return value is not None, we can get information about the user, such as their nickname and e-mail address. We can use this information to personalize the page a bit for each user. The get handler for the main page (which inherits from webapp.RequestHandler) does this with the following code:

    1     def get(self):

    2         loggedIn = False

    3         username = "test"

    4         currentUser = users.get_current_user()

    5         if currentUser:

    6             url = users.create_logout_url(self.request.uri)

    7             url_linktext = ‘Logout’

    8             loggedIn = True

    9             username = currentUser.nickname().strip()

   10             if (len(username) < 1):

   11                 username = currentUser.email().strip()

   12         else:

   13             url = users.create_login_url(self.request.uri)

   14             url_linktext = ‘Login’

   15 

   16         template_values = {

   17             ‘url': url,

   18             ‘url_linktext': url_linktext,

   19             ‘loggedIn': loggedIn,

   20             ‘username': username,

   21             }

   22 

   23         path = os.path.join(os.path.dirname(__file__), ‘Pages/index.html’)

   24         self.response.out.write(template.render(path, template_values))

The users class contains two helper meth
ods to create login and logout URLs. The rest of this information is passed to our Django template for display on the page:

    1 <div id="Columns">

    2     <div id="LeftColumn">

    3         <a href="{{url}}">{{url_linktext}}</a>

    4     </div>

    5     <div id="RightColumn">

    6         {% if loggedIn %}

    7         Hello, {{username}}!

    8         {% else %}

    9         Please log in.

   10         {% endif %}

   11     </div>

   12 </div>

Leave a comment

Day 3: Responding to Requests in App Engine

When a request comes into a web server, that request has to be either processed or rejected. The request may be for static or dynamic content. So long as the web server can find something to respond to the request, the server will try. Otherwise, it will inform the caller that the resource cannot be found. Google App Engine supports both static and dynamic content. Our app.yaml file tells the server how to respond to different request types.

Handling Static Content in App Engine

When creating a web site, the site typically contains a fair number of static resources. These resources include cascading style sheets (CSS), JavaScript files (JS), images, static HTML, and so on. Most developers will structure their projects so that the folders are structured something like this

/Project

    /pages

    /scripts

    /theme

    /images

where pages contains static (or templatized) HTML, scripts contains JavaScript, theme contains elements for displaying the site (CSS and images), and images contains any stock content that is not specific to the theme. Additionally, a site will frequently want to customize the icon shown in the browser address bar by setting favicon.ico. To inform App Engine how to find this content, we write the following in our app.yaml:

handlers:

- url: /theme

  static_dir: theme

- url: /scripts

  static_dir: scripts

- url: /pages

  static_dir: /pages

- url: /favicon.ico

  static_files: favicon.ico

  upload: favicon.ico

  mime_type: application/octet-stream

The first set of entries contain a static_dir directive. This directive says that when someone requests data from the theme, scripts, or pages directory, go look in that directory and return any matching file. When we upload the application, the App Engine SDK will push up all the contents in these directories.

The work for favicon.ico is a bit different. This icon file sits in the root of the application and is requested by the browser. Because it is a single file, we map the url to the actual file, tell the toolkit which file to upload, and finally set the MIME type to use when sending the file as a response. octet-stream works for favicon.ico on all major browsers (Internet Explorer, Firefox, Opera, Chrome, and Safari).

Handling Dynamic Content in App Engine

Like with static content, we need to tell the web server what to do when a request for dynamic content comes in. Static content was handled by returning a file. Dynamic content is handled by passing the request off to a Python file. We tell App Engine how to do this through the following settings in app.yaml:

- url: /

  script: HelloWorld.py

With this, the request gets Python supports something called the Web Server Gateway Interface, aka WSGI. Through WSGI, one can map an incoming request to a specific class within the file.

application = webapp.WSGIApplication(

                                     [(‘/’, MainPage)],

                                      debug=True)

 

def main():

  run_wsgi_app(application)

 

if __name__ == "__main__":

  main()

The preceding code instantiates a variable, application, that maps the path / to a class named MainPage. This also turns on debugging for all paths so that any exceptions or other errors generated during request processing will be displayed in the response. To make it easier to respond to messages, the MainPage class inherits from google.appengine.ext.webapp.RequestHandler. This is a handy base class. As a developer, you decide which HTTP methods you need to handle and RequestHandler does the rest. To handle requests for data, one simply implements a method named get and the rest works out. To handle the standard HTTP methods, you would implement some combination of the following:

  • get
  • delete
  • put
  • post
  • head

This model allows one to respond directly to requests by writing all of the response information in code:

class MainPage(webapp.RequestHandler):

    def get(self):

        self.response.out.write("<html><body>Hello, World!</body></html>")

You can generate whatever content you like and put everything out using code. A more pleasant alternative is to use a web templating framework where you fill in values and then code the HTML creation based on those values. App Engine supports several web frameworks, including the popular Django. I did a minor dive into Django’s template language. To return a template, one simply loads up a set of values and then asks Django to process a template using those values. I was able to design the home page in an HTML editor. To set a few values and then load the template from a file relative to the current Python script, one writes:

template_values = {

    ‘username': username,

    }

 

path = os.path.join(os.path.dirname(__file__), ‘pages/index.html’)

self.response.out.write(template.render(path, template_values))

The values are consumed in the html via markup like this:

Hello, {{ username }}!

Leave a comment

Utility to Replace rn with n, in F#

In my quest to learn F# better so that I might figure out when this tool makes sense, I have been trying to use the language whenever possible/feasible. Just such an opportunity happened while delving into Google App Engine. I found I could get rid of a little warning in the GAE development environment by converting all files from Windows style CRLF (rn) to a more Unix like LF (n). Out came F# to solve this little problem! This application was moderately frustrating to write because of the puzzling syntax errors. I’m still working on getting the feel for F# and am trying to use it as my preferred hammer (realizing full well that “when all you have is a hammer, everything looks like a nail”). I have to admit, it was a fun little application to write!

 

    1 #light

    2 open System

    3 open System.IO

    4 

    5 // Read the command line.

    6 let args = Environment.GetCommandLineArgs()

    7 

    8 // Figure out the directory to use. With 1 arguments, we only

    9 // have the app. 2 or more arguments means a directory was passed in.

   10 let directory =

   11     match args.GetLength(0) with

   12     | 1 -> Environment.CurrentDirectory

   13     | n -> (string) (args.GetValue(1))

   14 

   15 // Figure out if the directory is real.       

   16 let directoryExists = Directory.Exists(directory)

   17 

   18 // Recursive function to process the files in a directory

   19 let rec processDirectory (dirInfo: DirectoryInfo ) =

   20     for file in dirInfo.GetFiles() do

   21         printfn “Writing file %s” file.FullName

   22         let fileContents = File.ReadAllText(file.FullName)

   23         let modifiedContents = fileContents.Replace(“rn”, “n”)

   24         File.WriteAllText(file.FullName, modifiedContents)

   25     for dir in dirInfo.GetDirectories() do

   26         processDirectory(dir)

   27 

   28 // Kick off the work

   29 let processFiles =

   30     match directoryExists with

   31     | false ->  printf “Could not find %s” directory 

   32     | true ->   processDirectory(new DirectoryInfo(directory))

 

For those of you who are used to C# development, debugging these things can be tricky. If you build and run the code as an EXE, instead of using the interactive environment, here are a couple pointers/reminders. First, code like that shown above lives in a static class named after the file containing the code. In my case, the class is named Program. Second, all the class values are static properties on that class. To watch those values, either add the Program class to the Watch window or request values off of Program from the Immediate Window.

Leave a comment

Follow

Get every new post delivered to your Inbox.