Enabling Self Service for your DataCenter — Part II
In Part I of this blog, Partheeban Kandasamy (PK) explained the reasons for and how to leverage the event-driven capabilities as well as incorporate the command-driven aspect of a deployed function to deliver a true self-service experience for business units. He also delved into the logic behind the functions.
To recap, the functions that will be covered here are the remediation functions written in Go. These functions are triggered when virtual machines (VMs) go into alarm state for when they run out of CPU, memory, or storage. When a VM runs out of CPU or memory and goes into red alarm status, configuration tags are automatically attached to increment the CPU or memory that is or has run out. When a VM runes out of storage, a function will move it into a per-determined datastore.
The Remediation Functions (written in Go)
These functions fit under two primary use cases explored during Partheeban Kandasamy (PK)’s VmWorld2020 presentation, Arm Yourself with Event-Driven Functions and Reimagine SDDC Capabilities [HCP1404]. They demonstrate the power of event-driven crisis remediation capabilities enabled by vCenter Event Broker Appliance (VEBA).
1. Vertical Scaling a VM — Two functions work together to enable Vertical Scaling of VMs. The first function is based on VEBA’s Go tagging function example. This modified tagger function will respond to CPU and memory alarms and attach a tag to the VM in alarm. The second function, vm-reconfigure-via-tag function (available in the VEBA repo), responds to the VmPoweredOff event and will reconfigure a VM based on the configuration tags attached to it.
2. Auto Storage DRS — The last function, vm-datastore-move, responds to datastore alarms. When a VM runs out of storage, the alarm will trigger moving a VM to another datastore.
For this article, I’ll cover the highlights of these functions, mostly areas where Govmomi (the Go wrapper for the vSphere APIs) was used to interact with vCenter.
Modified Tagger Function
The Modified Tagger function will react to either a CPU or Memory alarm that has gone into red level. Once it gets one of the alarms, it will look at the current configuration of the VM for either the CPU or memory (whichever caused the alarm) and add a tag that will increment the appropriate resource.
i. Create Tags
In order to be able to add a tag, they must first be created. There is some information in the VEBA repo in a section titled Create configuration categories and tags. I also created a small Go program that will generate the tags for you. The source code for the tag generator is here.
The maximum value that a configuration can be incremented to has been hardcoded. Hardcoding is bad programming practice as it doesn’t allow for the flexibility needed for the various implementations of the code. It is better to have the configuration in an external file, or even an environmental variable. But, for the purpose of this blog, it is an exercise for the reader to improve the code by making it more easily configurable.
ii. Event Trigger (Alarm Status Changes to Red)
The event data comes into the OpenFaaS processor as a JSON string. Annotation in the stack.yaml configuration file has filtered out all but the AlarmStatusChangedEvent type events. But even then, we might have events that this function won’t do anything about. All the modified tagger function can do is respond to memory and CPU alarms that have gone into red. It’s possible that other alarms will be hitting the function such as datastore alarms.
To ensure the function is responding to the correct alarm, we’ll need to parse the incoming event data, which comes in as a JSON slice of bytes.
Here, you can see how the event data, passed in through the req variable is saved into the event variable which is cloudEvent struct type. Before it is saved, the JSON fields are parsed into cloudEvent fields using the json standard library package and the Unmarshal() method.
Once the even data is captured in a variable, the alarm information can be checked to verify it is of the correct type.
If the alarm has a name that is either of “VM Memory Usage” or “VM CPU Usage” and the status has gone to “red”, then we know it is worth connecting to vCenter. Here, another improvement can be considered. Right now, every time the correct alarm is determined, we will establish a new connection to vCenter. I thought this would be all right for systems that rarely triggers CPU or memory alarms. However, if many alarms are expected, it would be better to have a persistent connection and try to log into the same session if the session times out. Again, this is left to the reader as an exercise to improve the code.
iii. Connect to vCenter
Connecting to vCenter in this case is done by creating a new client.
Two sessions are established through Govmomi. The first is a connection to the vSphere API using govmomi.NewClient(). This connection is used to see the tags currently attached to the VM. We should try to detach an outdated configuration tag if we want to attach a new and different one. But to detach a tag, we need to see what tags are on it and that’s what the Govmomi client is for.
The second connection to the vSphere REST API using vsc.rest.Login(). We need access to the REST API in order to create tag manager using tags.NewManager. Tag manager is needed to list, attach and detach tags on a VM.
iv. Gather current VM configuration settings
To know what the new CPU or memory configuration needs to be, we need to know what it is currently. For that, we’ll use Govmomi’s property collector. To use property collector, we’ll need the VM’s managed object reference (VM moRef). Property collector will return a managed object VM (not to be confused with VM managed object) that contains the configuration information.
The way you create a VM managed object reference is by taking information from the event data.
Having the VM moRef, we can create the property collector which will use property.DefaultCollector() to return a managed object VM (moVM). The moVM contains the current configuration settings on the VM.
With the help of a couple switch statements and the help of the alarm type, we can extract from the moVM the number of CPUs or the size of RAM allotted to the VM. Here is where these values are stored:
v. Determine desired configuration settings
Now that we know what the current VM configuration for CPU or memory is, we can increment those values. CPU is easy to increment as the count of CPUs is increased by one. You can see the maximum number of CPUs to increment to has been hardcoded here to 4.
Incrementing the memory is trickier because we aren’t incrementing a count of memory. Instead, we are incrementing the base 2 exponent, which can be done using the bitshift operator, <<. You can see the maximum value for memory we can increment to is 2¹³ GB RAM. The maximum exponent was hardcoded.
vi. Attach Tags Indicating Desired Configuration Settings
These incremented values are the values of the tags we’d like to attach. In order to attach a tag, it will have had to have been pre-made, and we’ll need to know the tag’s ID. We can loop through the pre-made tags, find the tag in the appropriate category with the desired value.
Use the tag manager’s GetTagsForCategory() to find all the pre-made tags that fall under the category passed in as the catName argument.
Loop through the tagList tags and find the one where the tag’s Name field matches the desired value. Find the match, you can get the tag and category IDs from the CategoryID and ID fields as shown here:
Given the tag’s category and tag IDs, we can attach tags. However, let’s first see if there are other tags attached to the VM that are under the same category. If their values are different, we’d like to remove them because it would be confusing to have two different desired values for one category.
Using the VM moRef created earlier, get the attached tags from the VM using the tag maanger:
Loop through the tags and detach the ones that are in catID but are not tagID.
Finally, we can now attach the tag indicating the desired configuration of the VM:
One area that can be improved in this last part of detaching and attaching tags is it is possible that the desired configuration was already on the VM. If a tag is already on the VM, the function should return and nothing should be done. As it is written, although the tag isn’t detached, an attempt is made to attach the same tag to the VM. It’s another exercise for the reader to work on.
2. VM Reconfigure Via Tag Function
The VM Reconfigure Via Tag Function was featured in a VMworld 2020 session called Golang for vCenter Admins: Ten Quick Steps [HCP1263].
When a VM powers off, it triggers this reconfigure function. It reads the tags attached to it and looks for six types of configuration settings, two of which (memory size and number of CPUs) were used by the Modified Tagger in the previous section. We expect that when a desired CPU or desired memory tag are attached to this VM, when it powers off, the VM will be reconfigured to have the desired memory or CPU.
i. Event Trigger (Alarm Status Changes to Red)
The event data comes into the OpenFaaS processor as a JSON string. Annotation in the stack.yaml configuration file has filtered out all but the VmPoweredOffEvent event. The JSON string contains information about the VM that was powered off. From this information, we can create a reference to the VM called a managed object reference (moRef).
Govmomi uses the moRef to identify the type of object we wish to manipulate and retrieve data from. In our case, we’ll use the moRef in tag manager methods as well as get the current configuration settings on the VM. We use the standard library package JSON and the Unmarshal() method to store the JSON fields into the appropriate fields of the ce variable which is of the cloudEvent struct type.
ii. Connect to vSphere
The next step is to create a connection to the vSphere APIs if one hasn’t been created already. It will use Govmomi’s newClient() method to log into the vSphere API and use that connection to log in to vSphere’s REST API using Govmomi’s rest package’s newClient() method. The REST API is needed to create a tag manager which will be used to run methods related to VM tags. The tag manager is initialized with the tags package’s NewManager() method.
Now that a VM’s moRef has been created, connection has been made and a tag manager created, we’ll grab the tags attached to the VM using
iii. Retrieve Tags Attached to VM
Getting tags serves two purpose: get tags and check the vSphere REST API connection. If the connection was dropped, an attempt will be made once to reconnect by creating another connection. To check the vSphere API connection, a managed object virtual machine (moVM) will be initialized. Not to be confused with a virtual machine managed object, the managed object VM is used by the property collector to store the VM’s current configuration. Similar to the REST API connection retry, if the connection was lost, a single attempt will be made to reconnect to the vSphere API.
iv. Get Current VM Configuration Settings
The next step is to retrieve the configuration settings for the VM and put them in a list (a slice of vmConfig). There are many configurations, but the ones we are currently able to handle with this function are numCPU, memoryMB, numCoresPerSocket, memoryHotAddEnabled, cpuHotRemoveEnabled, and cpuHotRemoveEnabled.
v. Determine Desired VM Configurations
We now have a list of tag objects on the VM. This list contains potentially both the desired configs as well as tags that are unrelated to configurations. To get only the desired configs, loop through the attached tags, get their categories, and collect a list of tags that fall into a configuration category. Because the list of tag objects on the VM only contain category IDs and not category names, it would be hard to determine if the categories are of the configuration type. To get the category names then, we’ll use the tag manager’s GetCategory() method.
The tag objects do have the tag names (the configuration values) so having the category name (configuration type) and the tag name (configuration value) can be used to build a list of desired configs.
Knowing the desired configs and the current configs, the desired config list can be filtered such that it only contains configurations that are not consistent with the current VM. That way, a configuration will not be redundantly applied. This final list of non-redundant desired configs is called unapplied configs.
vi. Reconfigure VM
Given a list of unapplied configs, we can now apply the configs. First create a virtual machine managed object using the Govmomi’s object package’s NewVirtualMachine() method. Then create a spec that contains the desired configuration settings. Finally, call the method Reconfigure() on the VM managed object.
One thing that’s not obvious to set in the VirtualMachineConfigSpec is the ChangeVersion field. This field ensures that if something else reconfigures the VM before you have a chance to, your call to Reconfigure() will fail. This is a way to minimize overwriting of VM configuration by simultaneous processes.
And that’s that how to reconfigure a VM using tags.
It’s one thing to write functions. It’s another to deploy them. Patrick Kremer has written a very helpful blog post covering all the relevant steps I went over in my VMworld presentation and more. Find Patrick’s blog at http://www.patrickkremer.com/vmware-event-broker-appliance-part-xiii-deploying-go-functions.
3. VM Datastore Move Function
The VM Datastore Move function will react to a datastore alarm that has gone into red level. The function will then move the VM to a another datastore.
The most difficult part of this function, how to determine a free datastore, has been replaced with hardcoded values. So, it’s purely a demonstration function and can in no way be used for production. For production, it is recommended to use software like VMware’s vSphere vMotion to be able to migrate VMs to different storage hosts.
i. Connect to vSphere
The function starts by connecting to the vSphere API using Govmomi’s NewClient() method.
ii. Event Trigger (Alarm Status Changes to Red)
It looks at the cloud event data passed to the function as a JSON string in the form of a slice of byte. The standard library JSON package’s Unmarshal() method is used to parse the JSON field values into a variable I called event which is of the struct type cloudEvent.
Once the cloud event is stored in a variable, it can be used to check if the alarm type is indeed the type we’d like to respond to. The alarm type needs to both have an alarm name of “VM Storage Usage” and must be in the “red” status.
The cloud event variable also can be used to extract the virtual machine reference information, also called virtual machine managed object reference, or VM moRef. The VM moRef will allow us to move the VM to another storage location.
Getting the moRef from the cloud can be done thusly:
Having the moRef, a virtual machine object can be created using Govmomi’s object package.
iii. Create the VM Relocation Spec
Generate the VirtualMachineRelocateSpec which will be used to determine where the VM should move. For the demo and this blog, the relocation spec has been entirely hardcoded and won’t be able to be used in anyone else’s system.
Once the relocation spec is created, it can be used in the Relocate() method called on the vm object.
That is how one moves a VM to another datastore using an alarm as a trigger.
Step-by-step instructions on how to get all the functions, starting from controlling VMs with Slack commands, the remediation functions, and ending with a trigger to Pager Duty can be found at https://github.com/pksrc/vebafn/tree/master/vm-self-service-app
To end this blog article, let me share with you a video that Partheeban Kandasamy (PK) created demoing the functions covered in this two-part blog article. You can find it here: https://youtu.be/iJ39aMVvMR8