Archive for category Uncategorized

Public Notes: Running Python in Azure Batch on Windows using the Azure Portal

TLDR; This post explains how to setup an Azure Batch node which uses Windows using installers. It also explains how to use application packages to preload Azure Batch nodes with utilities so that tasks can just be command lines. The explanation use case is “run a Python script”, but should apply more broadly to “install tools, distribute stuff, and run command lines.”

PDF Version of this article (which has a bit better formatting)

When I start experimenting with something, I do not start out with writing code to automate everything. Sometimes, I try to use a GUI to bootstrap my process, then automate when setup is correct. Why? A lot of environments, like Azure, will allow for fast cycles from the UI tools. My latest adventure took a bit of time, so I’m documenting what I did.

Here’s the context: I am developing a mid-sized example project for scalability. If everything goes to plan, the demo will show how to solve the same problem with Azure Batch and the Azure Kubernetes Service. The demo is targeting a special kind of data scientist: an actuary. Actuaries frequently write code in one of three languages: APL, Python, and R. I’m focusing on Python.

My goals:

  1. Configure everything through the Azure portal.
  2. Install Python on the batch node at startup.
  3. Use the application packages feature to deliver the Python virtual environment prior to any tasks running.
  4. Run a command line without resources to make sure I can run a script in the Python virtual environment.

What follows is essentially a lab for you to follow and do the same things I did. As time marches forward, this lab’s correctness will degrade. Hit me up on LinkedIn if you catch this and I may go back and update the details.

For all the Azure pieces, try to keep things in the same region. I’ll be using East US. This isn’t necessary, but is helpful for the parts that transfer files. Staying in the same region gives better speed.

1         Create a new project in PyCharm

  1. Open up your editor; I’m using PyCharm. Details for other editors will differ. Set the location to wherever you like. I’m naming the project BatchDemo.
  2. Setup a New environment using Virtualenv. The location will be in the venv directory of your project. For the base interpreter, use the one installed on your machine already.

For me, the dialog looks like this:

batch-1

  1. Click on Create.
  2. Add a new file, addrow.py, to the project in the BatchDemo directory.
  3. Add the library to use table storage.
    1. Select File–>Settings–>Project: BatchDemo–>Project Interpreter.
    2. In the list of packages, you’ll see a plus sign ‘+’. Click on that.
    3. Select azure-cosmosdb-table. Click on Install Package.
    4. Close the Available Packages window once done. My machine looked like this before I closed the window:

batch-2

  1. Click OK on the Settings Window. You should now have a number of packages installed.
  2. Add the following code to the addrow.py file. The code is fairly “Hello, world!”-ish: add a row to a Table Storage table (using libraries from the azure.cosmosdb namespace, but interacts with Storage, so no, not intuitive). The script is simple and adds one row to a table named tasktable:
from azure.cosmosdb.table.tableservice import TableService
import datetime

def main():
    table_name = 'tasktable'
    table_service = TableService(
        account_name="<your Azure Storage Account Name>",
        account_key="<your Azure Storage Account Key>")

    if not (table_service.exists(table_name)):
        table_service.create_table(table_name)
    task = \
        {
            'PartitionKey': 'tasks',
            'RowKey': str(datetime.datetime.utcnow()),
            'description': 'Do some task'
        }
    table_service.insert_entity(table_name, task)   

if __name__ == "__main__":
    main()

For the highlighted code, take the name of one of your Azure Storage Accounts and a corresponding key, then plug in the proper values. If you need to create a storage account, instructions are here. To get the keys, look in the same doc (or click here) and follow the instructions. If you create a new storage account, use a new resource group and name the resource group BatchPython. We’ll use that group name later too.

One last comment here: for a production app, you really should use Key Vault. The credentials are being handled this way to keep the concept count reasonably low.

Test the code by running it. You should be able to look at a table named tasktable in your storage account and see the new row. The RowKey is the current timestamp, so in our case it should provide for a unique enough key.

Once you have all this working and tested, let’s look at how to run this simple thing in Azure Batch. Again, this is for learning how to do some simple stuff via a Hello, World.

2         Create a batch account

In this step, we are going to create a batch account which I’ll refer to as batch_account in here; your name will be different. Just know to substitute a proper string where needed.

  1. In the Azure portal, click on Create a resource.
  2. Search for Batch Service. Click on Batch Service, published by Microsoft.
  3. Click on Create.
  • Account name: For the account name, enter in batch_account [remember, this is a string you need to make up, then reuse. You’re picking something unique. I used scseelybatch]
  • Resource Group: Select BatchPython. If you didn’t create this earlier, select Create New.
  • Select a storage account to use with batch. You can use the same one you created to test the table insertion.
  • Leave the other defaults as is.
  • Click on Create.

3         Upload the Python Installer

Upload the Python installer which you want to use. I used the Windows x86-64 executable installer from here.

  1. In your storage account, create a container named installers.
    1. In the Azure Portal, navigate to your Storage Account.
    2. Select Blob ServiceàBrowse Blobs
    3. Click on + Container.
    4. Set the Name to installers.
    5. Click on OK.
  2. Once created, click on the installers container.
  3. Upload the Python installer from your machine.
    1. Click on Upload.
    2. In the Upload blob screen, point to your installer and click on Upload.
    3. Wait until the upload completes.
  4. Get a SAS URL for the installer.
    1. Right click on the uploaded file.
    2. Select Generate SAS.
    3. Set the expiration of the token to some time in the future. I went for 2 years in the future.
    4. Click on Generate blob SAS token and URL

batch-3

Copy the Blob SAS URL. Store that in a scratch area. You’ll need it in a bit.

4         Create a Batch Application

  1. Going back to your machine, go to the BatchDemo directory which contains your addrow.py file along with the virtual environment. Zip up BatchDemo and everything else inside into a file called BatchDemo.zip. [Mine is about 12MB in size]
  2. Open up your list of Resource Groups in the portal. Click on BatchPython.
  3. Click on your Batch account.
  4. Select FeaturesàApplications
  5. Click on Add.
    1. Application id: BatchPython
    2. Version: 1.0
    3. Application package: Select BatchPython.zip
    4. Click on OK.

The file will upload. When complete, you’ll have 1/20 applications installed.

  1. Click on BatchPython.
  2. Set Default Version to 1.0.
  3. Click on Save.

5         Create a Batch Pool

  1. Open up your list of Resource Groups in the portal. Click on BatchPython.
  2. Click on your Batch account.
  3. Select on FeaturesàPools
  4. Click on Add.
    • Pool ID: Python
    • Publisher: MicrosoftWindowsServer
    • Offer: WindowsServer
    • Sku: 2016-Datacenter
    • Node pricing tier: Standard D11_v2 [Editorial: When experimenting, I prefer to pick nodes with at least 2 cores. 1 for the OS to do its thing, 1 for my code. I’ll do one core for simple production stuff once I have things working. This is particularly important to allow for effective remote desktop/SSH. The extra core keeps the connection happy.]
    • Target dedicated nodes: 1
    • Start Task/Start task: Enabled
    • Start Task/Command Line: We want this installed for all users, available on the Path environment variable, and we do not want a UI.

python-3.6.6-amd64.exe /quiet InstallAllUsers=1 PrependPath=1 Include_test=0

  • Start Task/User identity: Task autouser, Admin
  • Start Task/Wait for success: True
  • Start Task/Resource files:
    • Blob Source: Same as the URL you saved from the piece labeled “Upload the Python Installer”. The SAS token is necessary.
    • File Path: python-3.6.6-amd64.exe
    • Click on Select
  • Optional Settings/Application packages
    • Click on Application Packages.
    • Application: BatchPython
    • Version: Use default version
    • Click on Select
  • Click on OK.

The pool will be created. In my experience, creating the pool and getting the node ready can take a few minutes. Wait until the node appears as Idle before continuing.

batch-4

6         Run a job

  1. Open up your list of Resource Groups in the portal. Click on BatchPython.
  2. Click on your Batch account.
  3. Select on FeaturesàJobs
  4. Click on Add
    1. Job ID: AddARow
    2. Pool: Python
    3. Job manager, preparation, and release tasks:
      1. Mode: Custom
      2. Job Manager Task:
        1. Task ID: AddARowTask
        2. Command line:

cmd /c %AZ_BATCH_APP_PACKAGE_BATCHPYTHON%\BatchDemo\venv\Scripts\python.exe %AZ_BATCH_APP_PACKAGE_BATCHPYTHON%\BatchDemo\addrow.py

Note on the environment variable: Application packages are zip files. Batch puts the location of the unzipped application package into an environment variables in one of two ways, depending on if you select the default version or a specific version.

Default: AZ_BATCH_APP_PACKAGE_

Versioned: AZ_BATCH_APP_PACKAGE_

  • Click on Select
  • Click on OK

The job should start running immediately. Because it’s a short task, it’ll finish quickly too. Click on Refresh and you’ll probably see that the AddARowTask has completed.

batch-5

You can then verify the output by opening up the table and looking at the rows. A new one should be present. I’ll expect a row that completed near 21:33 on July 19; the time will be recorded as UTC, and I’m in Redmond, WA, USA, which is 7 hours behind UTC time.

batch-6

That view is courtesy of the Azure tools present in Visual Studio 2017.

7         So, what next?

Now that you’ve done all this, what does it mean? For your batch pools, you could preload them with a common toolset. The resource files you pass in to a job can be files to operate on, independent of binaries. Your tasks start times can be greatly reduced by loading the prerequisites early. Could you do this with custom VMs? Sure, but then you need to keep the VMs patched. This mechanism allows you to use a patched VM and just install your few items.

This is definitely a toy example, meant to show how to do the initial setup in the portal. Here’s what you want to do for automation purposes:

  1. Script all of this.
  2. For the Python piece, add a mechanism to create the zip file after you have a successful build and test.
  3. Script the management of the binaries, creating the Batch Service, and configuring the pools and application package(s).
  4. Add an integration test to validate that the pool is ready to run.
  5. Minimize the number of secrets in the code to 0 secrets. Use Key Vault to manage that stuff.

Leave a comment

Thoughts on Day 3 of Money 2020, Europe

Walking the show floor at Money 2020, one sees lots of payment providers. Its no shock then that today, I saw a lot of talks on how this works. This is in keeping with the theme of the show: the future of money. The previous two days, I did hear a fair amount of talk about moving away from paper money, what to do about fiat currency vs. other currency, and difficulty in managing payments. Today, we heard about what is possible with payments.

The facts keep stacking up around us about people preferring to pay electronically instead of with plastic or currency. Those electronic payments are happening in apps:

The first three are as phone OS capabilities, the last two are via apps on the phones. Electronic payments make a number of things better for consumers, retailers, and credit institutions (banks, credit card companies, etc.). For consumers, they get convenience. For retailers, they find less friction at checkout. For credit institutions, they get reduced fraud. So far, so good, right? Well, what I learned after this was a lot more interesting: if you are just reducing payment friction, you are leaving a lot of opportunity on the table. Opportunity for:

  • Learning about the customer. You are collecting their buying habits. Imagine what you could do if you did more to help, what else could you learn?
  • Getting retailers to use your payment system. You know a lot about your customers and what they like to buy. You can now refer them to other retailers, services, and so on. Use that to convince retailers that if they use your system, you will drive more interested customers their way.
  • Delight your customers in novel ways. WeChat has delighted their Chinese users by making it easier to send Hong Bao (a monetary gift in a red envelope) to others. Their users love this feature. Here’s what the growth has looked like year over year (numbers are from a presentation by Ashley Guo of WeChat):

wechat-hongbao

You can also look at Ant Financial/Alipay. Despite their name, they are not a payment company; they are a marketing company. You use them to schedule doctor appointments, figure out how to travel on public transportation or taxi, manage vacations, discover information about products, and so on. And yes, when the service is performed or goods are purchased, they also make sure that the vendor is compensated by you. But, they make it all seamless. The product has been successful in China, turning their tier 1 and tier 2 cities into cashless areas. The services are so popular with their Chinese user base that the apps are used around the world at high end retailers down to businesses like Burger King.

Both WeChat and Alipay emphasized that they use the data to better market to users. The users like the targeting in their lives. When traveling, they discover attractions and restaurants that appeal to them because the app knows them so well. The businesses are happy to participate because they acquire customers who may not have found them otherwise.

What I saw today was a lot of companies thinking about how to make transactions easier by working with banks and credit card companies to remove plastic from your life. This is great. I look forward to the day when my wallet no longer bulges because of all the cards I need to carry.

I also saw something wonderful and scary: a world where things will generally improve for me if I let artificial intelligence and machine learning see all the things I do. By knowing what I eat, where I go, and so on maybe the algorithms can warn me to start doing some things (walk more) and stop doing other things (keep it to two coffees a day). Scary, because I worry what would happen if all that data was combined in some nefarious way. For example, if the algorithm senses that I get out of depression by spending money, maybe the algorithm seizes on this by getting my spending up using knowledge sales people only wish they knew. Or, me being denied a job because the data leaks that I buy [something the employer wants to look out for: alcohol, cigarettes, etc.].

In all, a very interesting day around payments.

Leave a comment

Thoughts on Day 2 of Money 2020, Europe

The conference at Money2020 has many tracks. Given the amount of questions I have seen from customers around distributed ledger technology (DLT) (Blockchain, Hyperledger Fabric, Ethereum, Corda, etc.), I attended that track. In the track, there were a fair number of panels staffed by either users of DLT, consortiums looking to make their use cases more ubiquitous, and implementers of DLT. While all groups had different lenses on what is going on, they tended to agree on what problems DLT solves and how to use the technology.

For finance, DLT helps eliminate a lot of verification/validation work with happens in the middle and back office.

This has been framed as the “Do you see what I see?” (DYSWIS diss-wiss) problem. DYSWIS solution attempts from the past have involved things like cryptographic signatures where two parties compare signatures of the data. Those solutions fail for the main reason crypto comparison always fails: normalization of data prior to signing. DLT solutions solve this in different ways, but they all have ways to sign facts and achieve consensus about the facts.

Anyhow, back to the main point about saving time confirming that what I see matches what you see. The finance industry has used a strong central authority for smaller, contract-less transactions in the form of Visa, Mastercard, and others. Then, we get to more complex transactions. How complex? Consider this scenario:

A European importer purchases some goods from an East African exporter. The importer prefers to pay when goods arrive. The exporter prefers to be paid when goods are shipped. Neither gets their preferred mode because of risk. The bank for the importer needs to issue a letter of credit. The exporter also insures the goods until delivery. The shipper, sitting between importer and exporter, will orchestrate the movement of the goods through several partners, who in turn may use other partners. Finally, the exporter will insure the goods in case of loss. It is normal for the shipment to pass through around 30 entities and have around 200 transactions [info from a presentation by TradeIX.com].

How does DLT help here? Using DLT, the importer, exporter, bank, shipper, and insurers can all see what is happening in real time (within minutes). Because facts are attested to and sent digitally, human transcription errors disappear. This means that humans may only to verify that the numbers and such look “right” before allowing their end of a transaction to proceed. This frees up human capital to do more valuable tasks.

So, one question you might ask yourself is “which DLT is right for me?” More than a few of the C-level folks on panels said a variant of “I don’t care. I just want something that works.” For those of you that care about the details and optimal choices, understand this: if you are joining a DLT consortium and it doesn’t use what you consider to be best, you need to just build something that works with the choice. If you complicate things by creating translation layers between something like Corda and Ethereum, expect to be looking for a new job tomorrow (because you’ve been fired).

The great news here is that the businesses now understand how to apply DLT. They have found that their normal transaction volumes of 200 TPS are already handled by most enterprise DLT solutions. They also understand the difference between on-chain and off-chain data, so don’t put PII and other GDPR prohibited data on the chain.

Over and over again, I heard the C-level folks say “I want DLT for the use cases where I spend a lot of time verifying that data was input correctly because that work costs too much time and slows down the business.” Then, using those facts, they want to drive cost savings elsewhere. The instant verification of the truth reduces financial risk. The reduced financial risk means the business can now make decisions sooner to further improve their ability to move money, settle accounts, and so on.

In 2018 you will hear a number of implementations of DLT in a number of markets. At this time, it seems prudent to be familiar with the leading contenders in the space. At the moment, these seem to be (in no particular order):

This will be an exciting year for DLT.

Leave a comment

Thoughts on Day 1 of Money2020, Europe

I just finished day 1 of Money 20/20 Europe. I stuck mainly to the large sessions and to the show floor. What I saw was a repeated vision of what this group in finance sees as the next set of important things to be tackled. Everything they are doing revolves around the customer and making things better for customers. Depending on where you are in the financial ecosystem determines which pieces you are building and which pieces you are integrating.

From the banking side, we heard from many folks. I took the most notes from the talks by Ralph Hamers (CEO of ING Group) and Andy Maguire (Group COO at HSBC). After these two, the themes repeated which only solidified that they weren’t unique in their visions. Because banks already have the balance sheets and other nuts and bolts of building a banking business, their vision is to provide a banking platform that other businesses can plug into. Any workable platform must be open: competitors need to be able to plug into it just as easily as partners. This will allow the bank to stay good at what it knows while letting other partners fill the gaps with the wide variety of expertise that the bank does not have so that it can participate in new opportunities more easily. For example, many banks are finding success by going into geographies where their customers only interact with them over a digital experience: no human to human interaction over 99% of the time. To do this, they craft their platform and their onboarding experience to be as easy to use as possible. Several banks talked of doing work to reduce the integration times with their platforms from months down to weeks. These efforts are paying off to allow the banks to find ways to interact with more customers in more countries.

From the FinTech side of the house (which for this conference so far is the “everyone else” even though I know this leaves out personal finance folks), I saw a lot of interesting technology. A lot of the technology focused on a few areas, all with interesting takes on how to accomplish the goals. I saw a lot of distributed ledger technology (aka blockchain) with implementations that have already gone live. It wasn’t clear to me how blockchain is being leveraged, but tomorrow promises to have a number of talks around the “what” and “how”. The show also has a number of folks presenting different ways to present your identity. Many of these still focus around the two factors for authenticating and many are avoiding passwords, PIN codes, and the like. The primary mechanism here is:

  1. Some biometric. Two most commonly cited are fingerprint and face.
  2. Smart phone.

So, yes, the argument that goes “What about people from [some part of world that they think doesn’t have Android or Apple phones]?” is not under consideration. In the countries where the banks operate, they know that most of their customers have smart phones.

The final thing I noticed is that AI came up a bunch and it was all nebulous to the speakers. Asking some of the AI firms on the floor, the sales folks know that they have data scientists and those people build and maintain their models. AI/ML is being applied to Know Your Customer/Anti-money Laundering work as well as fraud detection. Given the sales process, my guess here is that the people who need the tech will talk to those who make it and then have their engineers have the nitty gritty discussions of integration. I’m definitely looking forward to learning more there.

I also spent a bit of time on the show floor. Because it’s banking, a lot of the vendors create solutions that run in the client data center OR the cloud. For those folks, I’d like to let you know that you should look at joining the Azure Marketplace. This can give you ease of deployment for your customers who run in Azure and is fairly handy for VM only deployments. Contact me and I can help you get on board.

Leave a comment

Copying files from a Docker container onto local machine

This past week, I’ve spent time wiping away my ignorance of containers. To do this, I started in my usual way:

  1. Buy a bunch of books. Probably too many.
  2. Work through books, doing exercises as I go.

The first book I’m running through is Using Docker: Developing and Deploying Software with Containers by Adrian Mouat. I’m posting this bit now to hopefully help others.

When working through the exercise to backup the redis database in Chapter 3, I ran the command to backup the database:

docker run --rm --volumes-from myredis -v $PWD/backup:/backup debian cp /data/dump.rdb /backup/

This then emits the error message:

C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from daemon: Drive has not been shared.
 See 'C:\Program Files\Docker\Docker\Resources\bin\docker.exe run --help'.

This is happening because I never shared the C-Drive with Docker. To do this, right click on the Docker icon sitting in your toolbar and select Settings… . Then, select Shared Drives and check the drive(s) on your system which you want to be able to use. DockerSettingsSharedDrive

Upon clicking Apply, enter your credentials. The command should now work.

One other note: I found that the command did not work right in cmd.exe or some bash shells. It did work just fine from a powershell window. So, that’s another note…

 

Leave a comment

.NET Fx version to Azure Cloud Service Mapping

Posting this here mostly for me so I can find this easily again:

https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-guestos-update-matrix.

I’m monitoring this URL, waiting for .NET 4.7 support to appear. I’m hopeful that we’ll see something early in 2017 Q4, but I won’t be holding my breath either 😉

Leave a comment

Azure OS Family 5 changes to RDP/Remote Desktop prevent logins on short passwords

TLDR; Azure OS Family 5 requires Remote Desktop passwords >= 10 characters. Anything less will cause your login to fail, repeatedly requesting that you re-enter your password.

I ran into an issue when upgrading an Azure application from OS Family 4 to OS Family 5. We have configured RDP for our development deployments. As part of that deployment, we had configured special passwords for each environment. Those passwords had a strong enough length when we added them a few years ago: 8 and 9 characters. OS Family 5 (Windows Server 2016) requires that the passwords are at least 10 characters long.

As a result, we found that the deployment went fine (no errors reported) but that we simply couldn’t log in post upgrade. Looking on the portal, we noted that one has to have a password of at least 10 characters to add Remote Desktop from the portal. We counted the characters in our passwords, adjusted lengths, and found we could login again.

Leave a comment