Archive for February, 2009

Speaking at Chicago Area Cloud Computing User Group (Feb 25, 2009)

I’ll be speaking Wednesday night at the Cloud Computing User Group in Downers Grove, IL. I have a short presentation on the main computing platforms. If you are a regular reader, you know that I’ve been spending some time going beneath the surface on major platforms. If folks like the high level overview, I’ll do some more in depth talks in the future.

Here are the details and the announcement that Bryce Calhoun sent out:


Join us for the third local meeting of the Cloud Computing User Group – this month in Downers Grove. At this meeting, we will be learning about how Live ID integration works in the Azure cloud computing platform. We’ll demo and dig into the code of an application built in the cloud that integrates directly with the Live ID service and stores information specific to the individual associated with that ID.

Also, Scott Seely, Architect at MySpace, will kick off the meeting with a 20-30 minute overview of the top three cloud computing offerings available today: Google App Engine, Amazon EC3 and Azure Services. His discussion will be primarily focused on a compare/contrast of the functionality and features inherent to each platform.
Please take the time to register here so we can plan appropriately for food: http://www.clicktoattend.com/?id=135321


Come on out. This is a great group and the discussions are getting to be a lot better. We are peeking under the hood now, but should be going deeper over the coming year.

Leave a comment

Day 6: Caching Results

I was hoping to avoid this step. However, as soon as I started uploading images to my App Engine-base application, I found issues when trying to display all of my pictures. Why? Data contention! The reason why is kind of interesting. The application follows this flow:

  1. User logs in.
  2. Application retrieves list of images for user and generates a list of ImageFile that is processed by Django.
  3. Each ImageFile produces a URL that causes the application to lookup the current user, lookup the requested image. If the image is public, the JPEG for the image is returned. If the image is private, bytes are returned to the owner only.
  4. Images are requested.

For six images, the current owner would be retrieved from the data store no less than seven times. Sometimes, this would happen so fast that the current owner information could not be retrieved from the database due to Google throttling access to the item. In the application logs, I saw this information:

too much contention on these datastore entitites. please try again.

What is going on here? App Engine quickly identified that the application was hitting some piece of data fairly frequently within a short period of time. This type of access pattern across a large datastore can cause performance issues if everyone uses the same, bad practice. Even the same application can cause its own set of headaches. So, App Engine identifies this through behavior and throttles access. What is a developer to do? They should cache information, which means they should use google.appengine.api.memcache. memcache supports adding and removing single items as well as collections. If you are an ASP.NET developer, you can think of this as an externally hosted application cache. Things go into the cache and have a set duration for how long you expect them to stay before they become invalid. For our usage, we want to make sure that any authenticated user is also in our datastore. This is accomplished through a simple function:

def getCurrentUserOwner():
    currentUser = users.get_current_user()
    owner = None
    if not currentUser is None:
        # We have a valid user, which we store by email
        email = currentUser.email()
        owner = memcache.get(email)
        if (owner is None):
            owner = Owner(key_name=email)
            owner.name = email
            owner.put()
            memcache.add(email, owner, 10)
    return owner

If the memcache has this owner, we pass it back to the caller. Otherwise, we lookup the user and make sure that they exist in the datastore, then add the item to the cache so that the same work will not happen again for 10 seconds.

This is enough to know in order to start building applications on Google App Engine. You can download the source for the application from the finished app, at http://sseely-gae.appspot.com or from this site: http://www.scottseely.com/Downloads/sseely-gae.zip.

Leave a comment

Deprecating a Web Service

Once upon a time, I worked for a company on a project to build a Web service stack. Prior to that, I wrote a number of articles for that company, including one that wound up being used as the basis for versioning efforts across several industry leaders (the latter based on conversations and personal e-mails). After all these years, people still come to me and ask how to inform consumers of the interface that the version they use is about to go away. Today, Amazon SimpleDB provided a great example of how to inform customers of a new interface. The announcement was well done, short, and was communicated to all users via e-mail as well as posted publicly on the Internet. Amazon provided this statement, which I think is a great way to communicate that things are about to change as well as why they are changing:

Dear Amazon SimpleDB Developers,

In response to direct customer feedback, we announced today the release of count and long-running query support for our Select API. We have consistently heard that query syntax more like SQL simplifies the transition to SimpleDB, as well as lowering concerns surrounding lock-in. Thus, after much deliberation, we have decided to focus this and future development efforts on the Select API and begin the process of deprecating the existing WSDL.

In the coming weeks, we will be publishing a new WSDL version which excludes the Query and QueryWithAttributes APIs. In addition, a migration guide will be released to help SimpleDB developers make the transition. Upon release of this new WSDL and migration guide, we will begin a 15-month deprecation process for the 2007-11-07 WSDL. During this 15-month interval, we will continue to support, but add no new functionality to the 2007-11-07 WSDL. At the end of 15 months, it will no longer be available. Today’s announcement is intended to give as much advance notice as possible to our developer community of our intentions. If you have any questions or concerns, please do not hesitate to contact us via the forums.

Sincerely,

The Amazon Web Services Team

Leave a comment

Day 5: Saving Information in App Engine

When you think about data storage with respect to App Engine, it is better to think of this as a mechanism to persist object data than to think about rows and columns that one worries about in a traditional relational database management system (RDBMS). Instead of defining tables, the developer defines Python objects. A persistable object inherits from one of the following types from the google.appengine.ext.db module:

  • Model: A data model whose properties are all defined within the class definition.
  • Expando: A data model whose properties are determinted dynamically.
  • PolyModel: Allows for models that use inheritance.

Each value in the data class must be an instance of a property type. There are strings, numeric, dates, boolean, blogs, lists, postal address, geographic points, and other values. You can see the complete list in the App Engine docs. All objects in the models have a unique key property, key(). The property isn’t just unique across the type– the property is unique across all objects. In the photo application, we have a fairly simple set of data: people who own pictures and the pictures plus metadata. Owners are fairly simple: we only need to be able to look them up by name or key. Recalling that ALL of our objects automatically have a key, the owner model is exceptionally simple:

class Owner(db.Model):
    name=db.StringProperty()

This says that we have an Owner that inherits from db.Model. Owner has one property, name, that is a String.

class ImageFile(db.Model):
    owner = db.ReferenceProperty(Owner)
    caption = db.StringProperty()
    description = db.TextProperty()
    image = db.BlobProperty()
    date = db.DateTimeProperty(auto_now_add=True)
    public = db.BooleanProperty()

Every ImageFile has an Owner. The owner property is ReferenceProperty. The value of a ReferenceProperty is the same as the key value for the referenced item. In this case, we state that the ReferenceProperty will always reference an Owner. We allow for caption to be a string and description to be a TextProperty. A TextProperty is like a StringProperty in that both can hold Unicode characters. They are different in these two forms:

  1. StringPropertys can be used for sorting. TextProperty cannot.
  2. StringProperty supports only 500 characters. TextProperty can be larger. TextProperty is a BlobProperty with encoding.

And yes,a BlobProperty is what you think it is– a block of bytes. For the date on the ImageFile, you will see that the item has auto_now_add set to True. This forces the date to be set to the current time when the Model is added to the data store.

App Engine has standard Create|Retrieve|Update|Delete (CRUD) operations. Objects that inherit from Model|Expando|PolyModel merge Create|Update into one operation: put(). Delete is db.delete(), and get is implemented by db.get(). Get supports retrieving an object by ID or by query. For example, to get all the ImageFile objects for a known user, one would write this query:

images = db.GqlQuery("SELECT * FROM ImageFile WHERE owner=:1", owner)

The preceding code assumes a valid Owner object. In the query, the owner.key() value will appear in place of :1. If you know the key for the image you want to extract, then you can get the item directly:

image = db.get(self.request.get("id"))

The preceding gets the ID from the query string of a request. Finally, with an object in hand, one can also delete the object:

requestedImage = db.get(self.request.get("id"))
if requestedImage.owner.key() == currentUser.key():
    db.delete(requestedImage)

And this is all one needs to know in order to handle manipulating data in App Engine. Please note that this is a fairly simple application with simple needs. This sample application has been proving itself as a good way to learn the basics. It is also bringing up things to investigate further:

  • Are cascading deletes handled?
  • Is referential integrity handled?
  • Can I do batch updates?
  • Can I delete the results of a query?

But, those are questions to be answered on another day.

Leave a comment

Day 4: Implementing Authentication

When writing an application, it is always handy when your platform of choice offers some sort of plug and play authentication mechanism. For example, ASP.NET has authentication providers that allow users to authenticate against Windows accounts or against username/password combinations in a data store. For Internet based applications, there has been a large push to separate authentication from authorization. The reason for this is simple: users want to use a large number of applications and need to be authenticated for each application. Each of these applications has a choice to make: use a custom username/password that only it knows about or trust a third party to provide an identity. Many web sites choose to implement their own username/password list. Over time, this simply means that people tend to use the same username/password combination on all sites that they access. The downside here is that if one site is compromised, the attacker has probably gained access to many identity on several more popular sites. A site like EBay or Amazon probably won’t be compromised. But, a less popular site with only a few dozen or few hundred users, written by a hobbyist, may be easily compromised. If just one or two users have an active account at a site like Amazon, those users might find themselves holding a lot of debt.

To work around this issue, Microsoft released a product originally called Passport, today known as Windows Live ID. It’s original goal was to allow for a way for Microsoft to vouch for someone’s identity independent of the rights of that identity. Believe it or not, Microsoft was ahead of its time when creating this authentication mechanism. After learning a number of lessons about how to set up identity providers and parties that rely on those parties, the industry came together and created OpenID. If you are really interested in how OpenID works, I encourage you to spend some quality time on the OpenID web site. So, what do we need to know? OpenID put together specifications that state how the authenticators, known as OpenID Providers (OPs), work with Relying Parties (RPs). RPs typically have a fixed set of OPs that they will use for authentication. The RP has to trust the identities provided to it by the OP– an OP that verifies or refutes every claimed identity is easy to write and of no value in verifying someone’s identity.

The long story about OpenID is important– when building a larger web site, you want to use OpenID in your finished site. It allows you to broaden your appeal by letting someone use an established identity. That said, you need to start with something. Having to do minimal work to get delegated authentication up and running sounds good. You can start out by choosing to trust Google identities. App Engine makes this all incredibly easy.

You start out by importing the users types:

from google.appengine.api import users

From here, it is a simple one liner to get the current user:

currentUser = users.get_current_user()

If currentUser is set to None, we know that no one has logged in using Google authentication. If the return value is not None, we can get information about the user, such as their nickname and e-mail address. We can use this information to personalize the page a bit for each user. The get handler for the main page (which inherits from webapp.RequestHandler) does this with the following code:

    1     def get(self):

    2         loggedIn = False

    3         username = "test"

    4         currentUser = users.get_current_user()

    5         if currentUser:

    6             url = users.create_logout_url(self.request.uri)

    7             url_linktext = ‘Logout’

    8             loggedIn = True

    9             username = currentUser.nickname().strip()

   10             if (len(username) < 1):

   11                 username = currentUser.email().strip()

   12         else:

   13             url = users.create_login_url(self.request.uri)

   14             url_linktext = ‘Login’

   15 

   16         template_values = {

   17             ‘url’: url,

   18             ‘url_linktext’: url_linktext,

   19             ‘loggedIn’: loggedIn,

   20             ‘username’: username,

   21             }

   22 

   23         path = os.path.join(os.path.dirname(__file__), ‘Pages/index.html’)

   24         self.response.out.write(template.render(path, template_values))

The users class contains two helper meth
ods to create login and logout URLs. The rest of this information is passed to our Django template for display on the page:

    1 <div id="Columns">

    2     <div id="LeftColumn">

    3         <a href="{{url}}">{{url_linktext}}</a>

    4     </div>

    5     <div id="RightColumn">

    6         {% if loggedIn %}

    7         Hello, {{username}}!

    8         {% else %}

    9         Please log in.

   10         {% endif %}

   11     </div>

   12 </div>

Leave a comment

Day 3: Responding to Requests in App Engine

When a request comes into a web server, that request has to be either processed or rejected. The request may be for static or dynamic content. So long as the web server can find something to respond to the request, the server will try. Otherwise, it will inform the caller that the resource cannot be found. Google App Engine supports both static and dynamic content. Our app.yaml file tells the server how to respond to different request types.

Handling Static Content in App Engine

When creating a web site, the site typically contains a fair number of static resources. These resources include cascading style sheets (CSS), JavaScript files (JS), images, static HTML, and so on. Most developers will structure their projects so that the folders are structured something like this

/Project

    /pages

    /scripts

    /theme

    /images

where pages contains static (or templatized) HTML, scripts contains JavaScript, theme contains elements for displaying the site (CSS and images), and images contains any stock content that is not specific to the theme. Additionally, a site will frequently want to customize the icon shown in the browser address bar by setting favicon.ico. To inform App Engine how to find this content, we write the following in our app.yaml:

handlers:

– url: /theme

  static_dir: theme

– url: /scripts

  static_dir: scripts

– url: /pages

  static_dir: /pages

– url: /favicon.ico

  static_files: favicon.ico

  upload: favicon.ico

  mime_type: application/octet-stream

The first set of entries contain a static_dir directive. This directive says that when someone requests data from the theme, scripts, or pages directory, go look in that directory and return any matching file. When we upload the application, the App Engine SDK will push up all the contents in these directories.

The work for favicon.ico is a bit different. This icon file sits in the root of the application and is requested by the browser. Because it is a single file, we map the url to the actual file, tell the toolkit which file to upload, and finally set the MIME type to use when sending the file as a response. octet-stream works for favicon.ico on all major browsers (Internet Explorer, Firefox, Opera, Chrome, and Safari).

Handling Dynamic Content in App Engine

Like with static content, we need to tell the web server what to do when a request for dynamic content comes in. Static content was handled by returning a file. Dynamic content is handled by passing the request off to a Python file. We tell App Engine how to do this through the following settings in app.yaml:

– url: /

  script: HelloWorld.py

With this, the request gets Python supports something called the Web Server Gateway Interface, aka WSGI. Through WSGI, one can map an incoming request to a specific class within the file.

application = webapp.WSGIApplication(

                                     [(‘/’, MainPage)],

                                      debug=True)

 

def main():

  run_wsgi_app(application)

 

if __name__ == "__main__":

  main()

The preceding code instantiates a variable, application, that maps the path / to a class named MainPage. This also turns on debugging for all paths so that any exceptions or other errors generated during request processing will be displayed in the response. To make it easier to respond to messages, the MainPage class inherits from google.appengine.ext.webapp.RequestHandler. This is a handy base class. As a developer, you decide which HTTP methods you need to handle and RequestHandler does the rest. To handle requests for data, one simply implements a method named get and the rest works out. To handle the standard HTTP methods, you would implement some combination of the following:

  • get
  • delete
  • put
  • post
  • head

This model allows one to respond directly to requests by writing all of the response information in code:

class MainPage(webapp.RequestHandler):

    def get(self):

        self.response.out.write("<html><body>Hello, World!</body></html>")

You can generate whatever content you like and put everything out using code. A more pleasant alternative is to use a web templating framework where you fill in values and then code the HTML creation based on those values. App Engine supports several web frameworks, including the popular Django. I did a minor dive into Django’s template language. To return a template, one simply loads up a set of values and then asks Django to process a template using those values. I was able to design the home page in an HTML editor. To set a few values and then load the template from a file relative to the current Python script, one writes:

template_values = {

    ‘username’: username,

    }

 

path = os.path.join(os.path.dirname(__file__), ‘pages/index.html’)

self.response.out.write(template.render(path, template_values))

The values are consumed in the html via markup like this:

Hello, {{ username }}!

Leave a comment

Utility to Replace rn with n, in F#

In my quest to learn F# better so that I might figure out when this tool makes sense, I have been trying to use the language whenever possible/feasible. Just such an opportunity happened while delving into Google App Engine. I found I could get rid of a little warning in the GAE development environment by converting all files from Windows style CRLF (rn) to a more Unix like LF (n). Out came F# to solve this little problem! This application was moderately frustrating to write because of the puzzling syntax errors. I’m still working on getting the feel for F# and am trying to use it as my preferred hammer (realizing full well that “when all you have is a hammer, everything looks like a nail”). I have to admit, it was a fun little application to write!

 

    1 #light

    2 open System

    3 open System.IO

    4 

    5 // Read the command line.

    6 let args = Environment.GetCommandLineArgs()

    7 

    8 // Figure out the directory to use. With 1 arguments, we only

    9 // have the app. 2 or more arguments means a directory was passed in.

   10 let directory =

   11     match args.GetLength(0) with

   12     | 1 -> Environment.CurrentDirectory

   13     | n -> (string) (args.GetValue(1))

   14 

   15 // Figure out if the directory is real.       

   16 let directoryExists = Directory.Exists(directory)

   17 

   18 // Recursive function to process the files in a directory

   19 let rec processDirectory (dirInfo: DirectoryInfo ) =

   20     for file in dirInfo.GetFiles() do

   21         printfn “Writing file %s” file.FullName

   22         let fileContents = File.ReadAllText(file.FullName)

   23         let modifiedContents = fileContents.Replace(“rn”, “n”)

   24         File.WriteAllText(file.FullName, modifiedContents)

   25     for dir in dirInfo.GetDirectories() do

   26         processDirectory(dir)

   27 

   28 // Kick off the work

   29 let processFiles =

   30     match directoryExists with

   31     | false ->  printf “Could not find %s” directory 

   32     | true ->   processDirectory(new DirectoryInfo(directory))

 

For those of you who are used to C# development, debugging these things can be tricky. If you build and run the code as an EXE, instead of using the interactive environment, here are a couple pointers/reminders. First, code like that shown above lives in a static class named after the file containing the code. In my case, the class is named Program. Second, all the class values are static properties on that class. To watch those values, either add the Program class to the Watch window or request values off of Program from the Immediate Window.

Leave a comment

Google App Engine, Day 2. Deploy HelloWorld

At the end of Google App Engine, Day 1, we got the environment setup. We verified the environment with a very simple application, Hello World. Before moving forward and building the photo sharing application, I want to make sure that I understand how to deploy an application. Given the simplicity of the application, this should be easy. The documentation says that I should run

appcfg.py update [application directory]

and things should just work. My code looks exactly like this.

HelloWorld.py (note the capitalization of this filename–it’s important) contains:

print ‘Content-Type: text/plain’
print ”
print ‘Hello, world!’

app.yaml

application: sseely-gae
version: 1
runtime: python
api_version: 1

handlers:
– url: /.*
  script: helloworld.py

After running appcfg.py update, I navigated to http://sseely-gae.appspot.com and saw an HTTP Status Code 500 error. Blech! I did find that GAE has a basic set of diagnostic tools. To find them, go to http://appengine.google.com/ and sign in. Once there, click on the application to bring up its dashboard. You can then look at the logs for traces as well as requests. Here, I got to see lots of HTTP 500 (Server) errors. Some were for favicon.ico– I don’t have a favicon file setup so that’s expected. The errors for everything else just isn’t good. After some digging around, I found out that I was encountering the difference between a case-sensitive file system and a case-insensitive file system. Windows is case insensitive, so the application ran on my box. When it was deployed to GAE, it failed. I fixed app.yaml to read:

application: sseely-gae
version: 1
runtime: python
api_version: 1

handlers:
– url: /.*
  script: HelloWorld.py

I uploaded the application back to GAE, and this time, I saw HelloWorld work.

I ran into one other issue that was easily fixed. I saw some errors early on when the Python app started up. After seeing this case sensitivity, I figured that I was running into another Unix like behavior. Unix files typically use a linefeed character (n) after each line. Windows will frequently use a carriage return line feed (rn). I wrote a little application to read in a file and convert any rn combinations to n. This removed the remaining, non-serious errors I saw on Windows in the development environment.

Leave a comment

Google App Engine, Day 1. Prepare the Development Environment

I’m recording this as I setup my development environment to just write an App Engine application. I’m going to skip the step of acquiring a Google account. I got my Google account about 3 years ago. If you have a GMail or AdSense account, you have authentication credentials. If not, they are easy enough to get. Here is a step by step transcript of setting things up. Wish me luck!

  1. Go to http://code.google.com/appengine/.
  2. Look for the button that says Try it now. (Linked here if you want to skip a step.)
  3. Click on the button that says Create an Application.
  4. If you’ve never done this before, you’ll see a screen that asks you to Verify Your Account by SMS. Pick your country, your cell phone carrier, and your mobile number. Do NOT forget to fill in your cell phone carrier. Failure to do so means that you won’t get a message– I just made that mistake. Then click on Send.
  5. You should now be on a page that says An Authentication Code Has Been Sent to [your number]. A few seconds after clicking Send, you will have a code. Type that code in and press Send on this new page.
  6. Go to http://appengine.google.com/ and click on Create an Application.
  7. On the Create an Application page, I’ve entered sseely-gae for the Application Identifier. Under Title, I entered Google App Engine for Blog. Click on Save.
  8. According to the AppEngine Python Runtime Environment Page, AppEngine uses Python version 2.5.2. Download and install the 2.5.2 runtime from python.org. I’m assuming you too can install an MSI. Just take the default options and go. 😉
  9. Over at http://code.google.com/appengine/downloads.html, download and install the GAE SDK. Again, this is an MSI.
  10. For good measure, download the GAE Documentation. It’s in a compressed ZIP file.
  11. If you do not have this yet, you will also want a decent Python IDE. Visual Studio does have IronPython 2.0 and some form of Python support. My not so humble opinion is that better Python IDEs do exist. You can grab Eclipse, Aptana, or many others. I installed Eclipse– it was easier to get rolling than many others thanks to the setup instructions on the AppEngine web site, so I went with it. (Make sure you have previously downloaded and installed the JDK from http://java.sun.com/javase/downloads/index.jsp. Eclipse needs this.)
  12. Run through the instructions at http://code.google.com/appengine/docs/python/gettingstarted/helloworld.html. I made sure to update app.yaml to point to my application name: sseely-gae. Everything checks out and http://localhost:8080 just works.

At this point, I know that I have a valid GAE development environment working. I started writing this entry about 80 minutes ago and everything appears to be fine. Some of the downloads were pretty beefy (JDK + JRE was 73 MB), so a slow Internet connection could slow some of you down. Overall, my environment setup experience was pretty decent.

Leave a comment

Let's Build on a Cloud: a Proposed Experiment

At this point, I think I have a good idea about what GAE, AWS, and Azure offer in terms of potential. To force myself to learn things faster, I think that it’s time to go out and build an application. For the sake of making apples to apples comparisons, I need an application that is fairly simple and allows me to exercise cloud storage as well as application hosting. I also want something that is super simple and that involves something I have already built. For my book, Effective REST Services via .NET, I put together an application that allowed a user to create and manage a photo album. I think it’s appropriate to reuse that concept. For one thing, I already have the application coded, so I will be able to reuse a lot of ideas and code. The other thing that I like about this idea is that it is a well understood problem. People are familiar with the notion of uploading a picture, then seeing that picture displayed on a web site. It allows me to demonstrate something that may be large. It let’s someone testing my application to get a visual confirmation that the big file is on the server. The other thing is that it’s hard to fake an image at the beach, not too hard to fake "Uploaded Northwind.mdf, size 52MB."

Because I am a bit of a Microsoft fanboy, I want to save that experience for last. I will start with GAE next week and I’ll report on the experience as I go. If this sounds interesting, it might be good time to subscribe to the blog feed (http://feedproxy.google.com/scottseelysblog).

Leave a comment