Monday, November 18, 2013

How to play a video on a Raspberry Pi Desktop by double-clicking on a file...

The article describes how to open video, audio, and other media files in the Raspberry Pi desktop (the LXDE file manager) using the GPU-based player program.

Does double-clicking on a video file in Raspbian result in slow blocky playback in SMPlayer and VLC on your Raspberry Pi?

The short answer is that those video players will not work because at this time (Nov. 2013), they do not make use of the GPU on the Raspberry Pi. You need to use the hardware accelerated player, omxplayer, that is used in XBMC Live and OpenELEC.  The problem is that omxplayer is a command line player that is designed to be embedded in the XBMC based distributions.  I present below a way to make it play videos, if you double-click them in the Raspbian Desktop. Others have presented this method, but I've added a little bit of abstraction to make management easier. To start, open LXTerminal and the follow the process below.

Step One - Get rid of the CPU-based media players

sudo aptitude remove vlc smplayer

Step Two - Install omxplayer and xterm

aptitude install omxplayer xterm

I'm installing xterm, because its command line syntax is clear. To have keyboard control when omxplayer is run, it must be run from an open terminal. I don't know why this is, but this is what works. Simply calling omxplayer as the application to open a media file works, but you lose keyboard control. This means, for example, that you can't quit omxplayer in the middle of a video.

Step Three - Make a wrapper script with a simple name to start omxplayer in an xterm

sudo nano /usr/local/bin/vplay

Add the following contents to the file:

exec xterm -fullscreen -fg black -bg black -e omxplayer -o hdmi -r "$1"

The "-o hdmi" forces omxplayer to pipe audio through the HDMI cable. Leave this option out if you have your Pi configured to use the headphone jack.

Save the file and quit, then make it executable:

sudo chmod 755 /usr/local/bin/vplay

Step Four - Make the "vplay" script the default handler for each video file type

We will use "mp4" files as an example.

Find a video file with the "mp4" file extension.  Right-click on it and select "Open with...". Click the "Custom Command Line" tab. Type "vplay %f" into the "Command line to execute:" box. Check the box at the bottom of the screen with the label "Set selected application as default action for this file type".

Click "OK"

If everything is correct, the file will now play in omxplayer. Press "q" to quit the program.

From this point forward, double-clicking any "mp4" file in the LXDE file manager will automatically play the file in omxplayer.  Spacebar pauses. The arrow keys skip forward and back. "2" speeds up the playback. To stop the sped up playback, press the spacebar twice.

Repeat step four for any other file extensions you want to automatically play.

If you make a mistake in the last step, you can clean up your bad attempt by deleting the "user-*" files in ~/.local/share/applications/.

Using this wrapper script technique means that you can modify the omxplayer options at any time without have to make the changes for each file extension in the LXDE file manager. Just edit /usr/local/bin/vplay.

Note that this script works for audio files as well. They will play with a black screen. It makes for a lightweight way to play audio files without opening an full application like Clementine.

Hope this helps!


Friday, November 1, 2013

PowerShell One-Liners


PowerShell is Microsoft's shell for their product lines. It's now on version 3.0. If you miss the power of the command line while using Windows on either your laptop or servers, PowerShell provides that power.

Important concepts:

  • Almost all aspects of the Microsoft ecosystem are objects within an overarching structure. You query and manipulate this structure and its objects with PowerShell. This includes all aspects of SharePoint, Active Directory, and Exchange. Other companies, like VMware (see below) have also written PowerShell modules.
  • This "object nature" means that PowerShell pipes pass objects and properties, not just text. 
  • Variables store data-structures of objects. 


Note: Unwrap the code lines before you use them.

Get Help

Get the usage of the command "Select-Object":

Get-Help Select-Object

Built-in examples for the command "Select-Object":

Get-Help Select-Object -examples | more

Get the list of all commands and sort it:

Get-Command | select-object name | sort name | more

Get the list of help topics for other parts of PowerShell:

Get-Help about*

Command History

history | select -Unique | Where-Object { $_ -like "*pattern*" }

Opening Files and Programs

PowerShell equivalent to Apple's Mac OS X command "open" is "Invoke-Item":

Start firefox.exe:

Invoke-Item "C:\Program Files (x86)\Mozilla Firefox\firefox.exe"

Open the file "Document.pdf" that is located the current directory:

Invoke-Item Document.pdf

Invoke-Item "\\myserver\c\Files\Document.pdf"

Manage Processes

To pattern match on an object list, use "Where-Object". The current object being processed is referred to by the special variable "$_". Members are accessed via the "." operator.:

Get-Process | Where-Object {$_.processname -match "powershell" } | Select-Object processname,CPU,VM

Dump all properties for all processes, print the process name and the VM size, and then sort by VM size:

Get-Process | Select-Object processname,virtualmemorysize | sort virtualmemorysize

Find the busiest Google Chrome process:

Get-Process chrome* | Select-Object processname,ID,CPU | sort CPU

Store the list of process objects:

$ListOfProcessObjects = Get-Process

Print the process name and virtual memory size from the stored process objects and sort by virtual memory size:

$ListOfProcessObjects | Select-Object processname,VM | sort VM

Print the chrome processes and sort by virtual memory size:

$ListOfProcessObjects | Where-Object { $_.processname -match "chrome" } | select-object processname,VM | sort VM

Find the Google Chrome process with the largest VM size:

Get-Process chrome* | sort VM | Select-Object processname,ID,VM -last 1

Find the Google Chrome process with the smallest VM size:

Get-Process chrome* | sort VM | Select-Object processname,ID,VM -first 1

Stop all Chrome processes:

Stop-Process -processname chrome*

Working on file systems

Find all "exe" files in a tree, list their full path, and sort "fullname":

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.exe | select-object fullname | sort fullname | more

Find all mp3s and sort by ascending size:

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.mp3 | select-object fullname,length | sort length

Find all mkvs and sort by ascending lastaccesstime:

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.mkv | select-object fullname,lastaccesstime | sort lastaccesstime

To get a list of all of an objects properties, use Where-Object on the list of file system objects to get a single object, and then pipe the object to: Select-Object * | more

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.pdf | Where-Object { $_.fullname -match ".*Q1_Report.pdf" } | Select-Object * | more

Get pdfs that were last accessed by Windows in 2008, get their fullname, length, and last access time, then finally sort by length in ascending order:

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.pdf {$_.LastAccessTime -match "2008" } | Select-Object fullname,length,LastAccessTime | sort length

You can output a command to CSV with "Export-CSV". This command requires a filename as an argument:

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.pdf | Select-Object fullname,lastaccesstime,length | sort length | Export-Csv C:\Files\list.csv

Load the above results into the clipboard as a list:

Get-ChildItem 'C:\Tree\Of\Files\' -recurse -include *.pdf | Select-Object fullname,lastaccesstime,length | sort length | Format-list | clip

New directory:

New-Item c:\Files\Log_Data -type directory

New directory on a server:

New-Item \\myserver\c\Files\Log_Data -type directory

New empty file:

New-Item c:\Files\Log_Data\logoutput.txt -type file

Create a new file on a server:

New-Item \\myserver\c\Files\logoutput.txt -type file

Rename a file:

Rename-Item c:\Files\Log_Data\logoutput.txt

Rename a file on a server:

Rename-Item \\myserver\c\Files\logoutput.txt

Delete a file:

Remove-Item C:\Files\Log_Files\

Delete a file on a server:

Remove-Item \\myserver\c\Log_Files\logoutput.txt

Delete a directory:

Remove-Item C:\Files\Log_Files

Delete a directory on a server:

Remove-Item \\myserver\c\Log_Files

Write text to a file. This replaces the contents of the file:

set-content c:\Files\Log_Files\ -value "Line 1" + "Line 2" + "Line 3" + ....

Log Processing

Store list of log objects from the event log "System":

$SystemLogs = Get-EventLog System

Get all the log entries of entrytype "Error" from the stored system logs and then sort by "Message"

$SystemLogs | Where-Object {$_.entrytype -match "error" } | select-object message,entrytype | sort message | more

Get all the log entries of entrytype "Error" from the stored system logs and then return a sort list of unique log messages:

$SystemLogs | Where-Object {$_.entrytype -match "error" } | select-object message| sort message | more | Get-Unique -asstring | more

Get list of logging providers:

$ListOfProviders = get-winevent -listprovider *

Looking at Hotfixes

Get the IDs of all installed hotfixes and their install times and then sort by install time:

Get-HotFix | select-object hotfixid,installedon | sort installedon | more

Get the hotfix list from a remote machine (replace someservername with the name of a server in your environment). The account from which you run this needs admin rights to that machine:

Get-HotFix -computername someservername| Select-Object hotfixid,installedon | sort installedon | more

Using PowerShell on remote machines

Start an interactive PowerShell session on the remote computer myserver:

Enter-PsSession myserver

Stop an interactive PowerShell session:


Run a command on a list of remote machines:

Invoke-Command -computername myserver1, myserver2, myserver3 {get-Process}

Run a remote script on a list of remote machines:

Invoke-Command -computername myserver1,myserver2,myserver3 -filepath \\scriptserver\c\scripts\script.psl

Operate interactively on a list of machines by setting up a "session" of open connections:

$InteractiveSession = new-pssession -computername myserver1, myserver2, myserver3

Run a remote command on the new session. This runs it all the connections in the session:

Invoke-Command -session $InteractiveSession {Get-Process} 

Run the remote command on the session, but report only certain objects:

invoke-command -session $InteractiveSession {Get-Process | select-object name,VM,CPU }

Groups and Users

Get all of the user objects in "Data-Center-Team"

Get-ADGroupMember -Identity “Data-Center-Team”

Suppose the group IT-Team contains the group "Data-Center-Team" and other teams. To list the groups in "IT-Team":

Get-ADGroupMember -Identity “IT-Team”

To list the groups in "IT-Team" and all of those groups' members:

Get-ADGroupMember -Identity “IT-Team” -Recursive

Add user "thomasd" to the Data-Center-Team group:

Add-ADGroupMember -Identity “Data-Center-Team” -Members "thomasd"

Remove user "thomasd" from the "Data-Center-Team" group:

Remove-ADGroupMember -Identity “Group-A” -Members "thomasd"

Add the members of "London-Office" group to the "IT-Group" group:

Get-ADGroupMember -Identity “London-Office” -Recursive | Get-ADUser | ForEach-Object {Add-ADGroupMember -Identity “IT-Group” -Members $_}

Remove the members of the "London-Office" group from the "IT-Group" group:

Get-ADGroupMember -Identity “London-Office” -Recursive | Get-ADUser | ForEach-Object {Add-Remove-ADGroupMember “IT-Group” -Members $_}

Get all of the user objects in groups beginning with "Development-":

Get-ADGroup -LDAPFilter “(name=Development-*)” | Get-ADGroupMember | Get-ADUser

Get all of the users in groups beginning with "Development-" that are disabled:

Get-ADGroup -LDAPFilter “(name=Development-*)” | Get-ADGroupMember | Get-ADUser | Where-Object {$_.Enabled -eq $False }

Find all of the users in groups beginning with "Development-" that are disabled and add them to the "Development-Disabled" group:

Get-ADGroup -LDAPFilter “(name=Development-*)” | Get-ADGroupMember | Get-ADUser | Where-Object {$.Enabled -eq $False} | ForEach-Object { Add-ADGroupMember -Identity “Development-Disabled” -Members $_ -Confirm:$False }

Get all members of all groups with their enabled status and put them in a CSV file in C:\Files\ :

Get-ADGroup -LDAPFilter “(name=Development-*)” | Get-ADGroupMember | Get-ADUser | Select-Object Enabled,SamAccountName | sort Enabled | Export-Csv C:\Files\Development-Group-Users.csv


Reset your network connections:

"release", "renew", "flushdns" | %{ipconfig /$_}

Get a list of the domain controllers in your domain:

[System.DirectoryServices.ActiveDirectory.Domain]::GetCurrentDomain() | Select-Object DomainControllers

VMware PowerCLI


Connect-VIServer -Server -User adminusername

Disconnect-VIServer -Server -User adminusername

There is a default connection when starting the PowerCLI tool, so for an admin account, run

Disconnect-VIServer -Server vc-20-ah -Force -Confirm:$false

Connect-VIServer -Server -User adminusername

List VMs

By name pattern

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Format-Table -wrap -AutoSize

By CPU count

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.NumCpu -gt "2"} | Format-Table -wrap -AutoSize

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.NumCpu -lt "2"} | Format-Table -wrap -AutoSize

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.NumCpu -eq "2"} | Format-Table -wrap -AutoSize

By Memory

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.MemoryGB -gt "4"} | Format-Table -wrap -AutoSize

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.MemoryGB -lt "4"} | Format-Table -wrap -AutoSize

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object {$_.MemoryGB -eq "4"} | Format-Table -wrap -AutoSize

By Power State

get-folder VM-folder-name | Get-VM | Where-Object { $_.PowerState -eq "PoweredOn" }

get-folder VM-folder-name | Get-VM | Where-Object { $_.PowerState -eq "PoweredOff" }

Start, Stop, Restart, Delete VMs

Note!!! ALWAYS operate exclusively on VMs in a folder AND use the VM name filter! Otherwise, update your resume.

Stop VMs

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Stop-VM

  • "A" to stop all matches at once
  • "Y" to stop VMs one at a time

Restart VMs

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Restart-VM

Start VMs

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Start-VM

Delete VMs: Stop and Remove VMs from disk (see above note about your resume)

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Stop-VM

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Remove-VM -deletepermanently

Moving a VM

Get VM datastore

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Get-Datastore | Format-Table -wrap -AutoSize

Get VM datastores on servers matching a pattern

Get-VMHost | Where-Object { $_.Name -like "*VM-host-name-pattern-*"} |  Get-Datastore | Format-Table -wrap -AutoSize

Get datastores matching pattern

Get-Datastore | Where-Object { $_.Name -like "naming*pattern*" } | Format-Table -wrap -AutoSize

Get hosts for datastores matching a pattern

Get-Datastore | Where-Object { $_.Name -like "naming*pattern*" } | Get-VMHost | Format-Table -wrap -AutoSize

Get hosts for VMs matching a pattern

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Get-VMHost | Format-Table -wrap -AutoSize

Move VM. Use current host for moving between local datastores.

get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"} | Move-VM -Destination target_vmware_hostname -Datastore Target_Datastore_Name

Snapshot VM

Get collection of VMs to snapshot

$collection_of_VMs=get-folder Virtual_Center_VM_folder | Get-VM | Where-Object { $_.Name -like "VM-name-pattern-*"}

Get list of snapshots for VMs in $collection_of_VMs (all snapshots, specific snapshot)

foreach($VM in $collection_of_VMs) {Get-Snapshot -VM $vm | Format-Table -wrap -AutoSize }

foreach($VM in $collection_of_VMs) {Get-Snapshot -VM $vm -Name "Test Snapshot" | Format-Table -wrap -AutoSize }

Create new snapshot for VMs in $collection_of_VMs

foreach($VM in $collection_of_VMs) {New-Snapshot -VM $vm -Name "Test Snapshot" -Memory:$true | Format-Table -wrap -AutoSize }

Remove snapshot for VMs in $collection_of_VMs (w/ confirmation for each VM and w/o confirmation for each VM)

foreach($VM in $collection_of_VMs) {Get-Snapshot -VM $vm -Name "Test Snapshot" | Remove-Snapshot -Confirm:$true | Format-Table -wrap -AutoSize }

foreach($VM in $collection_of_VMs) {Get-Snapshot -VM $vm -Name "Test Snapshot" | Remove-Snapshot -Confirm:$false | Format-Table -wrap -AutoSize }

-Adam (a0f29b982)

Wednesday, June 26, 2013

Programatically named variables in bash.

Suppose you wanted to do the following in bash:

for label in a b c d e f

This has the intended result of setting a series of variables:

But if what if you want to dereference them programatically?
for label in a b c d e f
  echo ${variable_${label}}
is not acceptable bash syntax.
But there is a way...
We can abuse export and env. We set them with:
for label in a b c d e f
  export variable_${label}=${label}
We can then programmatically dereference the variables by searching for them in the output of env and using awk to get their value.
for label in a b c d e f
  echo "`env | grep variable_${label} | awk -F= '{print $2}'`"
How's that for bash abuse?

Wednesday, May 22, 2013

A Proposal for Determining If A VM Is Used or Unused (work-in-progress)


As virtualization technology has taken over the information services landscape, the cost - both in terms of money and effort - of deploying a new server has fallen dramatically. Since the commoditization of PC architecture servers in the 2000s, organizations have typically deployed one application per a server to isolate each application for ease of maintenance and security. If an application did not use all of its server's resources all of the time, the remaining computing resources went to waste. To recover that waste, many organizations have recently replaced servers, each running one application, with many virtual machines, also each running one application. These virtual machines all run on a few large physical servers. If an application is idle, then other applications can use the remaining resources. With this shift, VMware estimates that the infrastructure cost of one application, and thus one VM, is now down to $1774.00 [VMware 2012]. In addition to the lower infrastructure cost, automation has driven the deployment cost of a new virtual machine to near zero. With this dramatic cost drop, organizations have witnessed their population of virtual machines balloon. With these ballooning populations of virtual machines, comes ever greater potential for virtual machines to simply linger, unused, as their applications fall out of use, and as the human structure of an organization shifts over time.

For many organizations, the cost of having a human constantly review the virtual machine population for unused virtual machines would cost prohibitive. Review would be possible, however, if the majority of in-use virtual machines could be filtered out of the population, leaving only virtual machines with a high chance of being unused for review by a human. This article seeks to develop a classifier for flagging virtual machines as potentially unused using a statistical method.


A systems administrator can use their experience to determine if a virtual machine is still being used by looking at various properties of that virtual machine and then making a judgement call regarding its level of use. The explicit process looks something like:

  • Property 1 = a
  • Property 2 = b
  • Property 3 = c
  • Property 4 = d
  • in my professional judgment, I believe this virtual machine to be unused.
For an administrator to make this judgement across hundreds of virtual machines would take many hours, so it would be best to automate this process.  To automate it, we need to capture it into computer code; and to capture it into computer code, we must express it mathematically.  So the question becomes,
Is there an existing mathematical model that captures this process, and thus approximates the systems administrator's expertise?
In fact, there is such a model.

Bayesian Classifiers 

Consider "this email is spam". It's boolean, that is true or false, and has some chance of being true. We denote that chance:

Think of a large area where each point is a possible email.  Area(spam) is the area of points that contains all of the emails you consider spam.  If Area(spam) has an area of zero, then no emails are spam. If Area(spam) equals one, then all emails are spam. Usually Area(spam) is some value between 0 and 1. [Moore n.d.] In fact, for all of the emails sent today, Area(spam) is .665. [Gudkova 2013]

To have computer determine whether or not an email is spam, the computer must use the properties of the email available to it to determine the overall probability that the email is spam - namely the words in the email.

The area of all email can be sliced up into partitions, where each partition contains all the emails that have a certain word like "rolex" in them. These partitions overlap the two larger partitions that split the area into spam emails and non-spam emails.

Using the above description, to determine if an email is spam, we ask, if an email has rolex in its corpus of words, what is the probability that the email is spam? [Graham 2002] That is,  out of all the emails containing rolex, how many are in the spam partition?

Conditional probabilities provide a model for this type of question:


the probability of condition U, given the probability of datum V.

The mathematicians Laplace and Bayes give a concise formula, Bayes Theorem, that relates conditional probabilities and overall probabilities for the condition and datum:


where, for our purposes, V is a measurable property of a virtual machine and U is the classification "unused".

There are many things that we can measure on each virtual machine. It would be good to be able combine them to find the probability that a particular virtual machine is unused. Thus, we would like to find a way to use the above equation to determine whether a virtual machine is "in use" or "unused", given several measurable properties. [Larsen 2001]

Buckets of Marbles

Consider two buckets, U and I, full of marbles of eight different colors. We want to be able to pull a marble at random from one of those two buckets and then estimate the probability that it was in one bucket or the other. That is, we want to use the color of the marble to estimate the classification of the marble, U or I. In terms of the areas we described above, imagine all of the marbles laid out flat with each color grouped together. Each color is an distinct area. Overlapping those distinct areas are two larger areas, U and I. We want to estimate how likely an arbitrary marble picked off the plane would have been plucked from under the U area, based on its color.

To this, we start by pulling a sample from each bucket, U and I. We count the different numbers of each color marble in the sample from each bucket. We count the total numbers of each color marble across both bucket samples. Finally, we count the total number of marbles in both samples.

Say we want to know the probability of a marble being from the U bucket given that it's red.  We can use the above relationship. Using the count of red marbles from the sample from U, we can estimate the number of red marbles in U.  This is P(red|U). Using the number of marbles in the U sample and the total number of marbles in both samples, we can estimate the number of marbles in the U bucket vs. the total number of marbles in both buckets. This P(U). And by counting the number of red marbles in both samples, we can estimate the total number of red marbles across both buckets. This P(red). The above equation says that we can estimate the probability that a red marble comes from the U bucket using this relationship between those values:


But what if the sample from one of the buckets has none of a particular color? This would make our count zero, and would not give us any estimate of the number of that color in the original bucket. Laplace gives us a slightly more complex estimator to "smooth" over that zero count [Smith 2009]:

Instead of count of a color divided by the number of marbles in the sample, we can use:


This smooths out the zero by assuming slightly lower likelihood of that color in the bucket than 1/total marbles in the sample.  This estimator gives us a way to estimate the likelihood of picking color from each bucket, even if the sample does not contain that color.

VM Classification

For this proof of concept, we need analyze a sample population of virtual machines, and then assign each VM a colored marble, based our analysis of that virtual machine. I propose doing this in the following way.

For simplicity, we will use eight colors, as we did above. To get those eight colors, we will measure three metrics, percent free memory, disk blocks transferred yesterday, and average daily logins. We will take a sample from the population of all of the VMs in the infrastructure and then divide them into two buckets, U (unused) and I (in use) using our experience. These represent samples from the two larger buckets of virtual machines that contain between them all of the virtual machines in the infrastructure. This manual division also represents the "expert knowledge" we want to approximate programmatically  For each metric, we will calculate the mean of that metric in each of the two sample sets. Then we will mark a particular virtual machine as above the mean for that metric (A), or below (B).  Thus each virtual machine in the two sample groups will have a triplet associated with it (A/B)(A/B)(A/B). This is the "color" of the virtual machine.  There are 8 combinations representing 8 marble colors:



I chose 130 virtual machines from the overall population. I determined 13 of those virtual machines to be unused, based on my professional experience.  I then determined the triplet for each virtual machine in the "in use" group and each virtual machine in the "unused" group. Then I counted the number of each "color" virtual machine in each group. Here the Laplace estimator applied. The sample size of the "unused" virtual machines was so small (only 13 unused virtual machines), that I needed the Laplace estimator to estimate the likelihood of colors in the larger "unused" bucket that didn't appear in the sample. Indeed, even in the larger "used" sample, some colors were missing, the Laplace estimator applied in that case as well. I then made overall estimates across both samples for overall unused virtual machines and the overall occurrence of each color.

Thus, for each virtual machine "color" in the two sample sets, I estimated:
  • P(a triplet IF the virtual machine was unused) = P(C|U)
  • P(unused virtual machines) = P(U)
  • P(a triplet) = P(C)
So, by Bayes Theorem, I was able to approximate the probability that a virtual machine was unused, given that it had a particular color:
I then calculated P(U|C) for each color:

P(U|AAA) =.66666666666666666623
P(U|AAB) =.07407407407407407400
P(U|ABB) =.33333333333333333311
P(U|BBB) =.26666666666666666645
P(U|BBA) =.66666666666666666652
P(U|BAA) =.66666666666666666666
P(U|ABA) =.66666666666666666660
P(U|BAB) =.00666666666666666666


With these values, I can evaluate any virtual machine in the environment for the probability it's unused.

For an arbitrary virtual machine, I would first take measurements of each of the three metrics. Then I would determine the triplet using the mean for each metric determined from the unused virtual machine sample. I would then look up the virtual machine's "color" in the above list then to get the estimate of the probability that the virtual machine is unused.

The above process may not be completely perfect, but it makes a much cheaper first pass at finding unused virtual machines than having a system administrator evaluate each virtual machine by hand.  A system administrator need only evaluate machines with colors that have .666 probability of being unused for example.


Graham, Paul. "A Plan for Spam." A Plan for Spam. Http://, Aug. 2002. Web. 23 May 2013. .

Gudkova, Darya. "Spam in Q1 2013." Kaspersky Lab ZAO., 8 May 2013. Web. 22 May 2013.

Larsen, Richard J., and Morris L. Marx. An Introduction to Mathematical Statistics and Its Applications. 3rd ed. Upper Saddle River, NJ: Prentice Hall, 2001. Print.

Moore, Andrew W. "Probabilistic and Bayesian Analytics." Probability for Data Miners. Andrew W. Moore, n.d. Web. 22 May 2013.

Smith, David. "Estimation - Maximum Likelihood and Smoothing." Introduction to Natural Language Processing ( University of Massachusetts, Amherst, Sept. 2009. Web. 22 June 2013. .

VMWare, Inc. "Determine True Total Cost of Ownership." Get Low Total-Cost-of-Ownership (TCO) with Maximum Virtual Machine Density. VMWare Inc., Sept. 2012. Web. 23 May 2013. .

Appendix - Code

Since this blog is somewhat about doing things in bash that really should not be done in bash, I did all the math to calculate each P(U|C) in a bash script. I will include it here. It's not optimized by any means and includes some creative abuses of bash. Note that the format of the original data file is virtual_machine_name,free_mem_percentage,blocks_transferred_yesterday,average_daily_logins. As I said above, I manually broke the sample list into used and unused virtual machines, and proceeded from there.


set -u
#set -e
#set -x

status () {
  echo -n " |$*| "

get_unused_training_data () {

  for virtual_machine in `cat ${unused_VMs} `; do grep ${virtual_machine} ${training_data} ; done | sort | uniq 
  unset virtual_machine


get_in_use_training_data () {

  for virtual_machine in `cat ${in_use_VMs} `; do grep ${virtual_machine} ${training_data} ; done | sort | uniq 
  unset virtual_machine


get_unused_training_data_count () {

  get_unused_training_data | wc -l


get_in_use_training_data_count () {

  get_in_use_training_data | wc -l


get_unused_average_for_free_memory () {

  get_unused_training_data | awk -F, '{lines=$lines+1;sum=+$2} END {print sum/lines } '


get_in_use_average_for_free_memory () {

  get_in_use_training_data | awk -F, '{lines=$lines+1;sum=+$2} END {print sum/lines } '


get_unused_average_for_block_transfer () {

  get_unused_training_data | awk -F, '{lines=$lines+1;sum=+$3} END {print sum/lines } '


get_in_use_average_for_block_transfer () {

  get_in_use_training_data | awk -F, '{lines=$lines+1;sum=+$3} END {print sum/lines } '


get_unused_average_for_logins () {

  get_unused_training_data | awk -F, '{lines=$lines+1;sum=+$4} END {print sum/lines } '


get_in_use_average_for_logins () {

  get_in_use_training_data | awk -F, '{lines=$lines+1;sum=+$4} END {print sum/lines } '


return_above_or_below_group_average () {

  a_or_b_value="`echo ${1} | awk -F. '{print $1}'`"
  a_or_b_average="`echo ${2} | awk -F. '{print $1}'`"

  if [ ${a_or_b_value} -gt ${a_or_b_average} ]
    echo "A"

  if [ ${a_or_b_value} -lt ${a_or_b_average} ]
    echo "B"

  if [ ${a_or_b_value} -eq ${a_or_b_average} ]
    echo "A"

  unset a_or_b_value
  unset a_or_b_average


get_in_use_triplets () {




  for virtual_machine_line in `get_in_use_training_data`
    virtual_machine_name="`echo ${virtual_machine_line} | awk -F, '{print $1}' `"
    get_in_use_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $2}' `"
    free_memory_a_or_b="`return_above_or_below_group_average ${get_in_use_triplets_local_value} ${get_in_use_triplets_free_memory_average}`"

    get_in_use_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $3}' `"
    block_transfer_a_or_b="`return_above_or_below_group_average ${get_in_use_triplets_local_value} ${get_in_use_triplets_block_transfer_average}`"

    get_in_use_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $2}' `"
    logins_a_or_b="`return_above_or_below_group_average ${get_in_use_triplets_local_value} ${get_in_use_triplets_logins_average}`"

    echo "${virtual_machine_name},${free_memory_a_or_b}${block_transfer_a_or_b}${logins_a_or_b}"

    unset virtual_machine_name
    unset free_memory_a_or_b
    unset block_transfer_a_or_b
    unset logins_a_or_b



get_unused_triplets () {




  for virtual_machine_line in `get_unused_training_data`
    virtual_machine_name="`echo ${virtual_machine_line} | awk -F, '{print $1}' `"
    get_unused_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $2}' `"
    free_memory_a_or_b="`return_above_or_below_group_average ${get_unused_triplets_local_value} ${get_unused_triplets_free_memory_average}`"

    get_unused_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $3}' `"
    block_transfer_a_or_b="`return_above_or_below_group_average ${get_unused_triplets_local_value} ${get_unused_triplets_block_transfer_average}`"

    get_unused_triplets_local_value="`echo ${virtual_machine_line} | awk -F, '{print $2}' `"
    logins_a_or_b="`return_above_or_below_group_average ${get_unused_triplets_local_value} ${get_unused_triplets_logins_average}`"

    echo "${virtual_machine_name},${free_memory_a_or_b}${block_transfer_a_or_b}${logins_a_or_b}"

    unset virtual_machine_name
    unset free_memory_a_or_b
    unset block_transfer_a_or_b
    unset logins_a_or_b
    unset get_unused_triplets_local_value



get_l_of_triplets_if_unused () {

  for property_triplet in AAA AAB ABB BBB BBA BAA ABA BAB

    number_of_unused_members_with_triplet="`get_unused_triplets | grep ${property_triplet} | wc -l `"
    number_of_in_use_members_with_triplet="`get_in_use_triplets | grep ${property_triplet} | wc -l `"
    number_of_unused_members="`get_unused_training_data | wc -l `"
    number_of_in_use_members="`get_in_use_training_data | wc -l `"

  # Laplace smoothed likelyhood estimator:
  # ((number of group members with a certain triplet)+1)/((number of combinations)+(# of members in bucket))

    export l_of_${property_triplet}_if_unused="`echo \(${number_of_unused_members_with_triplet}+1\)/\(${number_of_combinations}+${number_of_unused_members}\) | bc -l `"
  #    export l_of_${property_triplet}_if_in_use="`echo \(${number_of_in_use_members_with_triplet}+1\)/\(${number_of_combinations}+${number_of_in_use_members}\) | bc -l `"

  #  echo "`env | grep l_of_${property_triplet}_if_unused | awk -F= '{print $2}'`"
  #    echo "`env | grep l_of_${property_triplet}_if_in_use | awk -F= '{print $2}'`"


get_overall_l_of_triplets () {

  total_virtual_machines="`cat ${training_data} | wc -l`"

  for property_triplet in AAA AAB ABB BBB BBA BAA ABA BAB

    export count_of_${property_triplet}_for_in_use="`get_in_use_triplets | grep ${property_triplet} | wc -l `"
    export count_of_${property_triplet}_for_unused="`get_unused_triplets | grep ${property_triplet} | wc -l `"
    count_of_in_use="`env | grep count_of_${property_triplet}_for_in_use | awk -F= '{print $2}'`"
    count_of_unused="`env | grep count_of_${property_triplet}_for_unused | awk -F= '{print $2}'`"
    export overall_l_of_triplet_${property_triplet}="`echo \(${count_of_in_use}+${count_of_unused}+1\)/\(${number_of_combinations}+${total_virtual_machines}\) | bc -l `"
    # echo "`env | grep overall_l_of_triplet_${property_triplet} | awk -F= '{print $2}'`"
    # echo $((${count_of_in_use}+${count_of_unused}))

get_p_unused_if_triplet () {

# P(u|triplet) = ( p(triplet|unused) * p(unused) ) / p(triplet)


  count_of_unused="`get_unused_training_data | wc -l `"
  total_virtual_machines="`cat ${training_data} | wc -l`"

  l_of_unused="`echo \(${count_of_unused}+1\)/\(${number_of_combinations}+${total_virtual_machines}\) | bc -l `"

  for property_triplet in AAA AAB ABB BBB BBA BAA ABA BAB
      l_of_triplet_if_unused="`env | grep l_of_${property_triplet}_if_unused | awk -F= '{print $2}'`"
      overall_l_of_triplet="`env | grep overall_l_of_triplet_${property_triplet} | awk -F= '{print $2}'`"

#echo ${l_of_unused}
#echo ${l_of_triplet_if_unused}
#echo ${overall_l_of_triplet}

      export p_unused_if_triplet_${property_triplet}="`echo \(${l_of_triplet_if_unused}\*${l_of_unused}\)/${overall_l_of_triplet} | bc -l `"
      echo "p_unused_if_triplet_${property_triplet}=`env | grep p_unused_if_triplet_${property_triplet} | awk -F= '{print $2}'`"



#Local data files


exit 0

Friday, April 12, 2013

Fixing SSH connection problems in EGit in Eclipse

Note: I posted a version of this on Stack Overflow.
Errors can occur when there is an underlying SSH authentication issue, like having the wrong public key on the git remote server or if the git remote server changed its SSH host key.
Often the an SSH error will appear as: "Invalid remote: origin: Invalid remote: origin"

Eclipse will use the .ssh directory you specify in Preferences -> General -> Network Connections -> SSH2 for its ssh configuration. Set it "{your default user directory}.ssh\" .
To fix things, first you need to determine which ssh client you are using for Git. This is stored in the GIT_SSH environmental variable. Right-click on "Computer" (Windows 7), then choose Properties -> Advanced System Settings -> Environment Variables.
If GIT_SSH contains a path to plink.exe, you are using the PuTTY stack.
  • To get your public key, open PuTTYgen.exe and then load your private key file (*.ppk). The listed public key should match the public key on the git remote server.
  • To get the new host key, open a new PuTTY.exe session, and then connect to git@{git repo host}.
  • Click OK and say yes to store the new key.
  • Once you get a login prompt, you can close the PuTTY window. The new key has been stored.
  • Restart Eclipse.
If GIT_SSH contains a path to "ssh.exe" in your "Git for Windows" tree, you are using Git for Windows's OpenSSH.
  • Set %HOME% to your default user directory (as listed in Eclipse; see above).
  • Set %HOMEDRIVE% to the drive letter of your default user directory.
  • Set %HOMEPATH% to the path to your default user directory on %HOMEDRIVE%
  • To get your public key, open the file %HOMEDRIVE%%HOMEPATH%/.ssh/ (or in a text editor. The listed public key should match the public key on the git remote server.
  • To get the new host key, run: cmd.exe
  • Run Git Bash
  • Ctrl-C
  • At the bash prompt, run /c/path/to/git/for/windows/bin/ssh.exe git@{git remote host}.
  • Type yes to accept the new key.
  • Once you have a login prompt, type: ctrl-c
  • Close the cmd.exe window
  • Restart Eclipse.
Finally, if you are still having trouble with your external ssh client, delete the GIT_SSH environmental variable and set the HOME environmental variable to your default user directory on Windows. Without the GIT_SSH variable, EGit will use its internal ssh client (java). It will use the .ssh directory you specified above as its SSH configuration directory.
Note: If you have Git for Windows, you can use its tools to create a SSH key pair your .ssh directory:
  • Set %HOME% to your default user directory (as listed in Eclipse).
  • Set %HOMEDRIVE% to the drive letter of your default user directory.
  • Set %HOMEPATH% to the path to your default user directory on %HOMEDRIVE%
  • Run Git Bash
  • Ctrl -C
  • Run: ssh-keygen.exe -t rsa -b 2048
  • Save to the default filenames
  • Choose a passphrase or save without one. If you save with a passphrase, Eclipse will prompt you for it each time you push or pull from your git remote server.
  • Close Git Bash
You can also use the GUI in the SSH2 Preference pane in Eclipse to manage hosts and keys.