Archive for April, 2009
One of the most popular posts on my blog is Hosting Silverlight on Google App Engine from back in March. With my .NET REST book officially hitting shelves last week, InformIT contacted me to write an article (or more!) to help promote the book and me. I suggested expanding the information to show how a Silverlight application could use App Engine as a platform and they said “sure”. Turnaround on the article will be tight-I owe it to them by May 8. I’ll announce when the article goes live. But for those of you waiting for the follow up, it’s coming! I’m writing the Silverlight side in C# to make sure it’s accessible to everyone. The AppEngine part is in Python.
I appreciate all the visitors I’ve been getting and the positive feedback. You are awesome!
I recently talked about using SimpleDB to save or update a record. Today, we look at how to query against records in SimpleDB while using the Amazon SimpleDB C# Library. Each record in SimpleDB has an ItemName (unique, primary key) and a set of attributes (name-value pairs). Upon insert or update, all fields are indexed for easy querying. You can access the data using the Query API or Select API. Since I am already familiar with SQL, I picked the Select API as it closely resembles the standard SQL SELECT statement.
Recall that SimpleDB stores data in domains. Any query goes against the stored domain. For example, I know that the MembershipUsers in the domain for my sample ASP.NET MembershipProvider all have a unique email address in a field named email. I also know that the email only exists on one record collection, so I shouldn’t be getting back any other record types. The actual lookup code is pretty simple:
let NormalizeUsername(username:string) =
let LookupUser username =
let normalizedName = NormalizeUsername(username)
let simpleDb = PhotoWebInit.SimpleDBClient
let selectStmt = new Model.SelectRequest()
"select * from " + PhotoWebInit.domainName +
" where email=’" + normalizedName + "’"
let result = simpleDb.Select(selectStmt)
let temp = new AwsMembershipUser()
NormalizeUserName takes care of embedded tick marks (and may be open to other attacks). Recall that this is NOT SQL, so a DELETE or UPDATE or DROP won’t do much of anything other than fail.
The values come back as attributes and get parsed with the following function (lines are numbered to workaround line wrapping):
1 member this.LoadFromSelect (data: Model.SelectResponse) =
2 let hasSelectResult = data.SelectResult.Item.Count > 0
3 let hasAttributes = hasSelectResult && data.SelectResult.Item..Attribute.Count > 0
4 if (hasAttributes) then
5 let attributeCollection = data.SelectResult.Item..Attribute
6 let providerName = PhotoWebInit.DefaultMembershipProvider
7 let name = (this.SelectAttribute attributeCollection "email")
8 let providerUserKey = (this.SelectAttribute attributeCollection "email")
9 let email = (this.SelectAttribute attributeCollection "email")
10 let passwordQuestion = (this.SelectAttribute attributeCollection "passwordQuestion")
11 let isApproved = (PhotoWebInit.ParseBool (this.SelectAttribute attributeCollection "isApproved") false)
12 let isLockedOut = (PhotoWebInit.ParseBool (this.SelectAttribute attributeCollection "isLockedOut") true)
13 let creationDate = (PhotoWebInit.ParseDateTime (this.SelectAttribute attributeCollection "creationDate") DateTime.MaxValue)
14 let lastLoginDate = (PhotoWebInit.ParseDateTime (this.SelectAttribute attributeCollection "lastLoginDate") DateTime.MaxValue)
15 let lastActivityDate = (PhotoWebInit.ParseDateTime (this.SelectAttribute attributeCollection "lastActivityDate") DateTime.MaxValue)
16 let lastPasswordChangedDate = (PhotoWebInit.ParseDateTime (this.SelectAttribute attributeCollection "lastPasswordChangedDate") DateTime.MaxValue)
17 let lastLockoutDate = (PhotoWebInit.ParseDateTime (this.SelectAttribute attributeCollection "lastLockoutDate") DateTime.MaxValue)
18 let passwordAnswer = (this.SelectAttribute attributeCollection "passwordAnswer")
19 let password = (this.SelectAttribute attributeCollection "password")
60; 20 (new AwsMembershipUser(providerName, name, providerUserKey, email,
21 passwordQuestion, System.String.Empty, isApproved, isLockedOut, creationDate, lastLoginDate,
22 lastActivityDate, lastPasswordChangedDate, lastLockoutDate, passwordAnswer, password))
24 (new AwsMembershipUser())
Finally, the helper functions that parse a date or boolean are:
let ParseBool value (defaultValue : bool) =
let mutable retval = defaultValue
let success = bool.TryParse(value, ref retval)
let ParseDateTime value (defaultValue : DateTime) =
let mutable retval = defaultValue
let success = DateTime.TryParse(value, ref retval)
(Is it obvious yet that I’m still an F# neophyte? Yes, I’m now grabbing the old F# books and reading them so that I develop some sense of style because the above is suboptimal.)
The Select API supports the standard equality operators:
not makes an appearance to balance out like and is null (not like, is not null). You can also do range checking via the between operator, value checking against a set via in, and operations against multi-valued attributes using every(). A great set of examples is up on Amazon.
This post is just a dump of a set of thoughts that have been running around in my brain.
All three major cloud platforms, Google App Engine, Amazon Web Services and Microsoft Azure, offer a way to run worker tasks. Google recently introduced cron jobs. Amazon has Elastic Compute Cloud. Azure has the worker role. Each of these mechanisms works in a similar manner: a process looks somewhere for work to do (in the distributed data base, a work queue, or elsewhere) and then performs the task. I don’t have any issues here-this all makes sense. You need a headless process to take input and produce output. This is a staple of most systems I have worked on. I have another issue-how much is this going to cost me? While many people are busy climbing the Gartner hype cycle and are close to the “Peak of Inflated Expectations,” I chose to enter at a personal “Trough of Disillusionment.” I believed the hype with SOAP, worked with the leaders on WS-* at Microsoft, and taught WCF for a while. In the process, I grew up. I’m working on living on the “Slope of enlightenment” as I learn what the platforms do and do not do. (Hopefully, I’m not fooling myself!)
At this point, I’m investigating when it will make sense to process worker information locally vs. in the cloud. At the end of the day, it comes down to costs. Of the big three, only Microsoft remains to announce their pricing, and those numbers will come out by the end of the year (2009). So, what does it cost per hour?
Amazon: $0.03/hour, $21.59/month
AppEngine: $0.10/cpu hour.
Microsoft has not announced costs. However, I have been advised to monitor the metrics Microsoft collects as those will likely be the things Microsoft uses to determine bills. Over the last 24 hours, I have had a WebRole and a WorkerRole running constantly. The metrics chart shows me consuming 2 virtual machine hours/hour. I’m fine with this so long as the baseline cost is competitive with web hosting. I’d probably spend as much as $20/month per VM in use for a given role to use this model. That’s the value to me for being able to hit web scale if and when my site gets to be popular. It’s hard to build in scalability, so I don’t want to face a rewrite/refactoring when moving from a web host environment to a cloud environment. I pays to start right.
If your current processing loads are at ~33% CPU usage, Amazon and Google are equal. However, if you have a new site where you processing usually finds 5 or 6 items waiting, processes those in a few seconds, and then waits 5 minutes, AppEngine might be a LOT cheaper. On a moderate transaction web site, you may only do 10 minutes of processing per HOUR, bringing your cron cost down to half the cost of Amazon.
It looks like Microsoft will be following the Amazon payment model. They will need to have a way to bring costs in line with AppEngine. I would prefer to see a model that bills me by CPU/Processor hour instead of VM hour. A VM hour can have very little usage whereas high CPU hours can adversely impact other VMs running on the same machine. Ideally, a cloud box would balance out based on required CPU, not number of machines, so these metrics should be available to those who run Microsoft’s data centers.
Thanks to all of my readers for sending great questions in e-mail (email@example.com) or via comments on the blog.
Last night (4-23), I saw that Amazon is now offering IBM applications by the hour. I thought “Cool!” Then I took a look at the pricing for these things. This pricing doesn’t take effect if you already own IBM licenses for the products and just want to host on EC2. If you own licenses, IBM has a table up to show you how to convert from Processor Value Units (PVUs) to EC2. These prices are for preconfigured Amazon Machine Instances (AMIs) with the IBM software ready to rock-no extra salesmen need to get involved.
All that said, I have no idea how much a PVU costs for an application, but my guess is it costs “a lot”. A project I was on in 2007 required an IBM C Compiler to run on Z/OS (it was needed to interpret SQL statements into C programs that could run as stored procedures on DB2-how this even made sense any longer in 2007 is beyond me). IIRC, the cost there was over $18,000/year (note-this number is from memory and is likely low). I would guess that the services IBM is offering are more expensive. Looking at the charts and factoring out the cost for just renting an AMI by the hour, it appears that a PVU is worth about ~$0.004/hour for most products (discounting the base cost of an AMI and using the simple math of the High CPU Medium Instance that is 100 PVUs). The hourly PVU cost is a 5-15 times higher for Content Management Server ($0.021) and a WebSphere + Content Management Server Combo (~$0.06/PVU).
The numbers above are approximate and were done on a piece of paper so I could get a feel for costs. Before you make any decisions, make sure to do your homework. I am curious if the pricing differences seem about right for IBM products. I have nothing against their pricing model-they do great work for companies that consume software but where having the latest software and tools isn’t seen by management as a competitive advantage.
Finally, a note about Processor Value Units for those who have not worked with IBM packages in the past: a Processor Value Unit (PVU) is IBM’s way to work around per CPU and per user licensing. CPU manufacturers are busy adding cores and speeding up their chips. While this goes on, IBM looks at these new chips and states how much workload the CPU can handle in units called PVUs. When you buy a product such as DB2 and you need to allow 100 users access to the product, IBM can know how many PVUs you need for that many users. It’s sales team then makes sure you have the right hardware for this new workload with your current workload, and sends you a bill. Because IBM’s sales model is high touch, the PVU is one tool among many that enables their sales people to make sure the hardware and software needs are correctly matched. (Feel free to correct me if I’m mistaken-but this is how things appear after reading the literature on IBM’s site.) I could not find a standard price for a PVU (but I attempted to derive one). Again, because IBM is high touch, my guess is that the price of a PVU is negotiable depending on a number of factors including:
- Size of account
- If the account represents a conversion to IBM (competitive pricing)
- Gut feel from the sales team
Understand that a high touch sales model allows both parties to come out ahead. This sales practice involves a lot of unpaid research and preparation by the sales team and support staff in an effort to match customer needs with what the sales organization can provide. This sales practice also minimizes the amount of money left “on the table” because the sales team gains a lot of inside knowledge about the client’s needs and wants.
I’m using NHaml because, frankly, I wanted to try something that gives me easier to write markup. NHaml seemed about perfect. I started my earning experience last night and thought “this is cool”. I returned to the project again today and I forgot some simple basics. Because I’m also using Azure, I may be one of 5 people on the planet who have had this problem so far. When adding a file to an Azure project, you may want the file to be present in the deployment module. This means that after adding the .haml file, you need to set its Build Action property to Content. By default, the value will be None. If you forget to set the Build Action to Content, you will get the following error (in this case, for a page at /home/about):
The view ‘about’ or its master could not be found. The following locations were searched:
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.InvalidOperationException: The view ‘about’ or its master could not be found. The following locations were searched:
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
[InvalidOperationException: The view 'about' or its master could not be found. The following locations were searched: ~/Views/home/about.aspx ~/Views/home/about.ascx ~/Views/Shared/about.aspx ~/Views/Shared/about.ascx ~/Views/home/about.haml ~/Views/Shared/about.haml] System.Web.Mvc.ViewResult.FindView(ControllerContext context) +105521 System.Web.Mvc.ViewResultBase.ExecuteResult(ControllerContext context) +139 System.Web.Mvc.ControllerActionInvoker.InvokeActionResult(ControllerContext controllerContext, ActionResult actionResult) +10 System.Web.Mvc.<>c__DisplayClass11.<InvokeActionResultWithFilters>b__e() +20 System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func`1 continuation) +251 System.Web.Mvc.<>c__DisplayClass13.<InvokeActionResultWithFilters>b__10() +19 System.Web.Mvc.ControllerActionInvoker.InvokeActionResultWithFilters(ControllerContext controllerContext, IList`1 filters, ActionResult actionResult) +178 System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +399 System.Web.Mvc.Controller.ExecuteCore() +126 System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +27 System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +7 System.Web.Mvc.MvcHandler.ProcessRequest(HttpContextBase httpContext) +151 System.Web.Mvc.MvcHandler.ProcessRequest(HttpContext httpContext) +57 System.Web.Mvc.MvcHandler.System.Web.IHttpHandler.ProcessRequest(HttpContext httpContext) +7 System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +181 System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +75
With SimpleDB, you need to create a domain to store any data. Domain creation is a unique event in the life of an application, happening at install or some other infrequent event (addition of a new product line, client, etc. depending on how you partition data). 99+% of the time, the code will need to use the domain to Create, Retrieve, Update, or Delete data. These are more commonly known as CRUD operations. AWS stores the data as name-value pairs called attributes. The attributes have an additional value indicating if the value is replaceable/updatable or if the value can only be set of creation. Each item in the database has a name and a set of 1 or more attributes. In this post, I walk through how MembershipUser records are created and updated. This post uses Amazon’s SimpleDB C# library. The example has the following statements at the top of the file:
Every SimpleDB record exists as a set of key value pairs. In our application, most of the attributes on a user can be updated. A user has the following attributes:
- passwordQuestion: The special question to ask when the user forgets their password.
- passwordAnswer: The answer to the special question.
- isApproved: Can this user access the site.
- isLockedOut: Is the user’s account locked due to too many invalid password attempts in a short period of time.
To handle the basics of creating these attributes, I have a pair of functions. One handles the very common scenario of creating a replaceable attribute. The other is the slightly more general purpose.
let CreateAttribute key value replaceable =
let retval = new Model.ReplaceableAttribute()
retval.Name <- key
retval.Value <- value
retval.Replace <- replaceable
let CreateAttribute key value =
CreateAttribute key value true
In order to save a user (or any SimpleDB record), you need to have the following:
- A valid connection to the database.
- A PutAttributesRequest with the Domain set to an existing domain. The ItemName on the PutAttributesRequest must be unique within the domain.
- A set of attributes.
The SaveUser method below handles both Create and Update operations. If the ItemName doesn’t exist, it is created. If the ItemName does exist, it is updated.
let SaveUser(user :AwsMembershipUser) =
let simpleDb = PhotoWebInit.SimpleDBClient
let putAttr = new Model.PutAttributesRequest()
putAttr.DomainName <- PhotoWebInit.domainName
let creationDate = System.DateTime.UtcNow
let userAttr = [|
CreateAttribute "password" (user.Password);
CreateAttribute "email" user.Email;
CreateAttribute "passwordQuestion" user.PasswordQuestion;
CreateAttribute "passwordAnswer" user.PasswordAnswer;
CreateAttribute "isApproved" (user.IsApproved.ToString());
CreateAttribute "isLockedOut" (user.IsApproved.ToString());
CreateAttribute "creationDate" (user.CreationDate.ToString());
CreateAttribute "lastLoginDate" (user.LastLoginDate.ToString());
CreateAttribute "lastActivityDate" (user.LastActivityDate.ToString());
CreateAttribute "lastPasswordChangedDate" (user.LastPasswordChangedDate.ToString());
CreateAttribute "lastLockoutDate" (user.LastLockoutDate.ToString());
putAttr.Attribute <- new System.Collections.Generic.List<Model.ReplaceableAttribute>(userAttr)
let response = simpleDb.PutAttributes(putAttr)
Each of the values in the attribute must be non-null, populated strings. An empty or null string will cause SimpleDB to reject the PutAttributes request. Instead of storing a null, you need to write code that behaves properly when the key doesn’t exist.
This has been on my TODO list for over a year now. I’m going to learn PowerShell. To do so, I have to forget the that the command line exists and start learning how to do all my old tricks in the new environment. What has prevented me thus far is pure and simple inertia. When I need to kill a process (say Firefox), it was so easy to pull up the old DOS prompt and type:
tasklist | findstr /C:"firefox"
After finding the process ID, I would then enter
taskkill -f -pid [process ID from above]
I use taskkill because it can be more effective than the process explorer. Today, I did something that caused firefox to hang. Remembering that I need to overcome my intertia, I found the commands and popped open PowerShell. The preceding action wound up being much easier to type in:
get-process firefox | kill
Maybe I’ll overcome my inertia after all. I’ve been using the DOS command language since 1983 and I just know it too well. (I’ll admit to a fair understanding of csh and bash too, as well as some old VAX-VMS that could probably come back to light if given a day, a VAX, and a reason to remember:)).
Before you can access data in SimpleDB, you have to have a domain. Domain creation is fairly expensive in terms of time-up to 1/2 a second. Listing domains is really cheap-especially since you will normally have a maximum of 100 domains. When the application starts up, we want to check if the domains we need exist-if not, create them. Otherwise, mark an all clear so that the check doesn’t happen again. To handle all this work, we have a module named PhotoWebInit. The module instantiates a client capable of communicating with SimpleDB by reading the key and secret from configuration. Using that client, the code then checks to see if the domain we want, friseton_com, exists. OK-this code really looks to see if we have 0 domains or more. friseton_com is the first domain we need because if a user needs to be able to log in before any other domains need to exist for my application. If no domains are found, the friseton_com domain is created.
module PhotoWebInit =
let domainName = "friseton_com"
let SimpleDBClient =
let InitializeAWS =
let listDomains = new Model.ListDomainsRequest()
let domainList = SimpleDBClient.ListDomains(listDomains)
let isInitialized = match domainList.ListDomainsResult.DomainName.Count with
| 0 ->
let createParam = new Model.CreateDomainRequest()
createParam.DomainName <- domainName
let response = SimpleDBClient.CreateDomain(createParam)
| n -> (Debug.WriteLine("domain exists"))
The corresponding configuration reads as follows:
<add key="AWSKey" value="Your AWS Key goes here"/>
<add key="AWSSecret" value="Your AWS Secret goes here"/>
Hey, I wasn’t going to share MY keys. This thing costs money! Next time, we will look at saving and retrieving data from the domain.
In writing the PhotoWeb application for Amazon Web Services, I took advantage of the fact that an EC2 instance is just a VM running on top of Xen. Because it is just a VM, the entire application can be developed and tested on your local dev box before deploying to EC2-I highly recommend doing as much development and testing on your local machine before deploying to the hosted environment. I decided to start out by handling the authentication piece first. Because hosted SQL Server on Amazon costs extra, I used SimpleDB as the datastore.
Before covering how to use SimpleDB, I need to explain what SimpleDB is. As a database, SimpleDB has a lot more in common with old school ISAM databases than RDBMS systems like Oracle, DB2, and SQL Server. To store data, you first need a domain. A domain is a collection of data. You can add data to a domain, query data within a domain using a SQL like syntax, and delete data from a domain. The data items within a domain may have heterogeneous structure. Data is added by setting a key for the Item and passing a collection of name/value pairs where the values need to be able to transform to and from a string. Everything in the database is indexed automatically. Item names need to be unique-all other attributes are just indexed to make queries efficient.
You do have some limits with SimpleDB:
- By default, you get 100 Domains. If you need more, contact Amazon and ask for more.
- Each domain can contain a maximum of 1 billion attributes.
- Each domain can be no larger than 10 GB.
- Each item can only have 256 attributes.
- An attribute name and value must be less than 1024 bytes each.
- Queries must run in less than 5 seconds or they are abandoned.
- Each query can only have 10 predicates and 10 comparisons.
- A select expression can only specify 20 unique attributes (use * to get more/all).
- Responses are limited to 1 MB. (Paging tokens are returned to get the complete result set using many responses.)
Doing the math, it means that you can have a database of 1 TB over 100 domains before you need to ask for more space. SimpleDB supports eventual consistency of data. Each copy of the data is stored multiple times. After adding, updating, or deleting data and Success is returned, you can know that your data was successfully stored at least once. It does take time for all copies to be updated, so an immediate read might not show the updated value. This means you need to design your applications to remember what was stored instead of hitting the database for the up to date information.
Any requests against SimpleDB require a token and key to make sure that only an authenticated identity is touching the data store. You can obtain a token and secret by going to http://aws.amazon.com/simpledb/ and clicking on Sign up For Amazon SimpleDB. Then, select Resource Center. Under the Your Account menu, click on Access Identifiers. From there, you should see something that says Access Key ID and Secret Access Key. Once you have this information, you can access SimpleDB, Simple Storage Service, and EC2 (and other services, but we won’t be using them for this application).
Next time, we’ll build something with SimpleDB. In F#. Using a C# library. Oh yeah!
Back in February, I walked through the development of a Photo storage application. The application originally comes from one of the examples in the REST book Kenn Scribner and I wrote, Effective REST Services via .NET. Photo sharing and uploads allow for me to present a well understood application without providing a lot of background. For a photo, you upload it somewhere and store metadata about the photo itself. We already covered Google App Engine in February.
For this application, we will use Amazon Web Services, including SimpleDB, Simple Storage Service, and Elastic Compute Cloud. At the end, I’ll tell you what I thought of the experience. I’ll develop the application in F#. When I presented this code to the Midwest Cloud Computing Users Group for the April 2009 meeting, Amanda Laucher offered up that my use of F# used some idioms she hadn’t seen before. That’s a nice way of saying “You appear to be a n00b.” Please keep that in mind whenever you review my F# code.
I’ll also be showing a few helper libraries that are OpenSource (or similar). A nice thing about AWS is that the C# OpenSource community has come out and done a great job putting out great tools.
For EC2, I used the application at https://console.aws.amazon.com as well as Windows Remote Desktop to setup and manage the EC2 instance. I use SimpleDB as a custom MembershipProvider for the ASP.NET application. Next time, we will look at how that provider was created.