Archive for February, 2009
At the end of Google App Engine, Day 1, we got the environment setup. We verified the environment with a very simple application, Hello World. Before moving forward and building the photo sharing application, I want to make sure that I understand how to deploy an application. Given the simplicity of the application, this should be easy. The documentation says that I should run
appcfg.py update [application directory]
and things should just work. My code looks exactly like this.
HelloWorld.py (note the capitalization of this filename–it’s important) contains:
print ‘Content-Type: text/plain’
print ‘Hello, world!’
– url: /.*
After running appcfg.py update, I navigated to http://sseely-gae.appspot.com and saw an HTTP Status Code 500 error. Blech! I did find that GAE has a basic set of diagnostic tools. To find them, go to http://appengine.google.com/ and sign in. Once there, click on the application to bring up its dashboard. You can then look at the logs for traces as well as requests. Here, I got to see lots of HTTP 500 (Server) errors. Some were for favicon.ico– I don’t have a favicon file setup so that’s expected. The errors for everything else just isn’t good. After some digging around, I found out that I was encountering the difference between a case-sensitive file system and a case-insensitive file system. Windows is case insensitive, so the application ran on my box. When it was deployed to GAE, it failed. I fixed app.yaml to read:
– url: /.*
I uploaded the application back to GAE, and this time, I saw HelloWorld work.
I ran into one other issue that was easily fixed. I saw some errors early on when the Python app started up. After seeing this case sensitivity, I figured that I was running into another Unix like behavior. Unix files typically use a linefeed character (n) after each line. Windows will frequently use a carriage return line feed (rn). I wrote a little application to read in a file and convert any rn combinations to n. This removed the remaining, non-serious errors I saw on Windows in the development environment.
I’m recording this as I setup my development environment to just write an App Engine application. I’m going to skip the step of acquiring a Google account. I got my Google account about 3 years ago. If you have a GMail or AdSense account, you have authentication credentials. If not, they are easy enough to get. Here is a step by step transcript of setting things up. Wish me luck!
- Go to http://code.google.com/appengine/.
- Look for the button that says Try it now. (Linked here if you want to skip a step.)
- Click on the button that says Create an Application.
- If you’ve never done this before, you’ll see a screen that asks you to Verify Your Account by SMS. Pick your country, your cell phone carrier, and your mobile number. Do NOT forget to fill in your cell phone carrier. Failure to do so means that you won’t get a message– I just made that mistake. Then click on Send.
- You should now be on a page that says An Authentication Code Has Been Sent to [your number]. A few seconds after clicking Send, you will have a code. Type that code in and press Send on this new page.
- Go to http://appengine.google.com/ and click on Create an Application.
- On the Create an Application page, I’ve entered sseely-gae for the Application Identifier. Under Title, I entered Google App Engine for Blog. Click on Save.
- According to the AppEngine Python Runtime Environment Page, AppEngine uses Python version 2.5.2. Download and install the 2.5.2 runtime from python.org. I’m assuming you too can install an MSI. Just take the default options and go. 😉
- Over at http://code.google.com/appengine/downloads.html, download and install the GAE SDK. Again, this is an MSI.
- For good measure, download the GAE Documentation. It’s in a compressed ZIP file.
- If you do not have this yet, you will also want a decent Python IDE. Visual Studio does have IronPython 2.0 and some form of Python support. My not so humble opinion is that better Python IDEs do exist. You can grab Eclipse, Aptana, or many others. I installed Eclipse– it was easier to get rolling than many others thanks to the setup instructions on the AppEngine web site, so I went with it. (Make sure you have previously downloaded and installed the JDK from http://java.sun.com/javase/downloads/index.jsp. Eclipse needs this.)
- Run through the instructions at http://code.google.com/appengine/docs/python/gettingstarted/helloworld.html. I made sure to update app.yaml to point to my application name: sseely-gae. Everything checks out and http://localhost:8080 just works.
At this point, I know that I have a valid GAE development environment working. I started writing this entry about 80 minutes ago and everything appears to be fine. Some of the downloads were pretty beefy (JDK + JRE was 73 MB), so a slow Internet connection could slow some of you down. Overall, my environment setup experience was pretty decent.
At this point, I think I have a good idea about what GAE, AWS, and Azure offer in terms of potential. To force myself to learn things faster, I think that it’s time to go out and build an application. For the sake of making apples to apples comparisons, I need an application that is fairly simple and allows me to exercise cloud storage as well as application hosting. I also want something that is super simple and that involves something I have already built. For my book, Effective REST Services via .NET, I put together an application that allowed a user to create and manage a photo album. I think it’s appropriate to reuse that concept. For one thing, I already have the application coded, so I will be able to reuse a lot of ideas and code. The other thing that I like about this idea is that it is a well understood problem. People are familiar with the notion of uploading a picture, then seeing that picture displayed on a web site. It allows me to demonstrate something that may be large. It let’s someone testing my application to get a visual confirmation that the big file is on the server. The other thing is that it’s hard to fake an image at the beach, not too hard to fake "Uploaded Northwind.mdf, size 52MB."
Because I am a bit of a Microsoft fanboy, I want to save that experience for last. I will start with GAE next week and I’ll report on the experience as I go. If this sounds interesting, it might be good time to subscribe to the blog feed (http://feedproxy.google.com/scottseelysblog).
I have found a need to do some research across the various cloud offerings so that I get good feel for what each has to offer. At this point in my investigations, I am focusing on only three platforms: Amazon Web Services, Microsoft Azure, and Google App Engine. The three have common sets of features: storage through an API, compute resources, and ability to respond to demand by scaling application instances. The storage APIs encourage scalable patterns over patterns that could cause data contention. Amazon requires that the application handle scale up and down on its own. Azure and App Engine scale for the user through a combination of configuration and observed demand. These services also offer authentication services as well as the ability to create your own authentication. App Engine integrates with Google logins, Azure works with Windows Live, and Amazon through security mechanisms in the Amazon Machine Instances.
At this point, the platforms start differentiating. Google App Engine (GAE) requires you to write all code to be executed on the platform in Python. To help you out, they provide a number of Python libraries to build applications. All applications receive input over HTTP. No queuing support exists. Google has some infrastructure to allow for high speed, shared, non-persistent cache as well as facilities to send e-mail and manipulate images.
Azure requires all applications that run on it to use .NET work happens in worker roles and pages are served by ASP.NET. Microsoft also includes a variety of ways to store information based on the scenario: Live Mesh for synchronization across devices; Sky Drive for sharing amongst friends; SQL Data Services for scalable application databases; and Azure Storage Service for blobs, queues, and non-relational tables. Coming soon are locks, a caching layer, and other features. Azure integrates with .NET Services (Service Bus, Access Control Service, and Workflow Services) and Live Services (a set of services to help build social and other application types). It appears that the primary integration point is "All offerings support HTTP/REST models."
Finally, there is Amazon Web Services (AWS). AWS has the most mature of all the offerings, having started their offerings in 2002. AWS divides their offerings into several groups: Infrastructure, Payments & Billing, On-Demand Workforce, Web Search & Information, and Amazon Fulfillment & Associates. The infrastructure group of services allows one to build scalable applications. It includes services for storage (S3), a scalable relational database (SimpleDB), a compute platform (EC2), a queue service (SQS), and a content delivery network (CloudFront). The remaining services allow a small entity to build a big business, track traffic patterns, or get assistance from people to perform tasks.
Of all the services, only Amazon charges money for all usage– though low usage means very little in overall cost. Google intends to not start charging until a sites receives 5 million monthly page views. Azure has yet to announce a pricing model. There seems to be consensus amongst Microsoft people I’ve spoken to that the service will be free for experimentation. I would expect that expenses might kick in using different metrics than Google. I fully expect that Microsoft will gate charges based on some threshold for bandwidth, storage, and compute time. There is no reason for this guess other than Amazon has a pricing model that uses these parameters and Microsoft plans on being "competitive."
I’m cheap, watch a lot of network programming, and all my favorite cable shows are on hulu or elsewhere on the web. To save a few bucks, I spent the week between Christmas 2008 and New Years 2009 upgrading my house to digital TV and canceling cable for all but my Internet connection. My home’s primary TV is attached to a Windows Vista based Media Center. Something that I’ve hated since switching was the craptacular viewing guide. I had a hard time believing that Microsoft hadn’t put out an update to Media Center where they could handle the new channel format. While looking for solutions today, I found out about something called TV Pack 2008 (yeah– I’m a Media Center user, not a fanatic. I’m late to this party…). The more I read, the more aggravated I got that my Windows Vista installation didn’t get this upgrade. You see, the upgrade included an update to the guide that allowed one to find out what was happening on all the local digital channels. Microsoft has put this out for OEMs only. Enthusiasts, like me, weren’t given access to this stuff. While I’m not a fanatic about Media Center, I build all my own PCs (newegg.com loves me!). Part of the upgrade involved installing a digital, over the air tuner. It’s a minor note in the articles I’ve read, where most writers focus on the clear QAM cable and over the air enhancements for England and Japan. Still, cheap guys like me care about the US digital TV enhancements.
Anyhow, I eventually found a site that contains links to the files so that I too could use the updates. http://digiex.net/guides-tutorials/699-windows-media-center-tv-pack-2008-download-installation-guide.html.
First, the good news: this works. The guide shows me everything– no more "missing data" on channels. The bad news– any scheduled shows will be forgotten. That’s OK, just write down your schedule and then add things back in. If you miss something, well, the web will have it or the show will repeat.
It took me about an hour to install the updates and get my shows scheduled back in. My little ones will be happy that the guide now knows when Arthur is scheduled.
I prefer the term utility computing to cloud computing. People outside of software development understand that electricity, water, and phone service are all utilities. Cloud computing is an attempt to deliver compute resources at utility prices. Before I define what utility computing is, I want to define what utility computing isn’t.
Some companies, such as IBM, are trying to do a "me too" with cloud computing. They use the term cloud computing because that amorphous word, cloud, does not have a clear meaning. They are also confusing the computing public. Why? They are conflating their virtualization products with utility computing in order to confuse the market and continue making sales (note: most big iron and *nix vendors have EXCELLENT virtualization technology). Virtualization of resources and compute resources is an ol+++++++++d story that mainframe vendors have had working well since the 1960s. More recently, companies like VMWare, Citrix,and Microsoft have made virtualization of compute resources common. Virtualization lets a customer remain ignorant of what the underlying hardware is. With virtualization, one still has control over the operating system that is used to store files, run applications, authenticate users, and communicate with other operating systems. Virtualization lets an entity buy a high powered machine and then load it up with operating system images that get to pretend like they are the only operating system on the hardware. Virtualization presents a simplified view of the hardware, including limited views into memory, CPU, and storage. With virtualization, one worries about access to memory, CPU, and storage. When a vendor like IBM or Sun says you can own your cloud, they mean that you can write applications that can demand to run like in a cloud, but you have to worry about having enough storage, CPU, and memory to get the job done. This space is important for lots of reasons, but it is different from utility computing.
Cloud computing is a mechanism to deliver compute and storage resources where the user does not need to know how those resources are provisioned, who else might be on the same hardware, or what the underlying technology to provide those services happens to be. The closest analogs to utility computing are other utilities: electricity, water, gas, cable TV, Internet, cell phone carriers to name a few. Utilities have a common set of characteristics.
- Picks who provides the utility.
- Is responsible for limiting consumption.
- Views available resources as infinite.
- Procures resources to deliver the utility.
- Decides how the utility is delivered.
- Makes sure that one user does not adversely effect other users.
- Dictates the mechanisms used to consume the utility.
Virtualization does allow for most of these items to appear. Virtualization does not allow for a user to treat available resources as infinite. Virtualization does require for the consumer to also be a provider. If you rent a Windows Server or *nix service as a virtual instance from a machine you can’t see, you have a utility operating system, not a utility compute resource. Your machine still has fixed storage and CPU.
There are many firms saying that they offer cloud computing. Three are very well known:
Of the three, Amazon is the odd man out, offering a hybrid of utility virtual instances, where you can spin up a Windows Server or Linux OS, but that instance can only provide durable storage via the infinite Simple Storage Service, S3.
Utility computing provides a benefit that virtualization does not: utility computing allows you to abstract away the professionals who handle data redundancy, keeping servers running, and adding compute power when needed. When your computing needs indicate that you need more or less, you just take what you need. There is no need to negotiate for extra compute utility or to keep those resources when they are not in use. Virtualization means you have the resources 24/7. Utility computing means you have the resources only when you need them. The rest of the time, you can give back the resources. That’s a strength of utility models. The owner of the resource can be creative around how to handle the demand flow by turning resources on and off. Consumers only have to worry about what they need now and how much they can afford to consume. The rest is automatic and transparent.
Utility providers do make decisions about how you can consume: REST to access storage, Python/.NET/something else to write applications, types of databases that work (hint: RDBMS doesn’t scale, and utilities know this), and types of storage. These decisions help you create applications that can take optimal advantage of the utility resources.
This change in computing will require people to learn a different way to develop applications. People will not like the changes until more successes happen. In the end, utility computing is going to succeed and will work hand in hand with virtualization.