When it comes to managing critical Horizon management servers such as Connection Servers, Unified Access Gateways, AV Managers, etc… It is imperative that you properly monitor their services for availability and that none of them are in a failed state, as well as make sure there are no connectivity issues on internal networks that the Horizon Client traffic traverses. By leveraging the Telegraf agent HTTP Health Check and PING Check functionality with Aria Operations, we can provide additional insight into the Health, performance, and availability of these services.
This video is meant to provide high-level overview of:
Why you should leverage Telegraf with the Management Pack for Horizon
How to install and configure the Telegraf agents.
How to validate your settings after deployment.
Review of the dashboards that leverage Telegraf metrics.
The latest release of the Management Pack for Horizon 2.7 is now available for download on the VMware Marketplace, as well for SaaS hosted customers in the Aria Operations Integration Repository. This post will cover the latest feature enhancements to the Management Pack and how best to leverage them.
Top Feature Enhancements
Added support for Horizon 2309
Updated Horizon Session Data Collection model
New Cost and Pricing Dashboards
Improved Deployment and Scale considerations documentation
As new editions of Horizon are released with updates to the APIs, there is usually a slight lag for supporting the new version until we are able to test and certify compatibility. That is why I highly recommend checking the interop guide to ensure that your specific version of Horizon matches the version(s) of the MP that has been tested and validated.
For Horizon 2309,only MP4H version 2.7 is supported due to the API changes mentioned above. As we move forward and continue our migration from the VLSI APIs to the REST APIs, our hope is that supporting future versions of Horizon will be much simpler and allow for near zero day support.
Updated Horizon Session Data Collection Model
When the MP4H was originally created there were certain Horizon Metrics that were inadvertently created as Properties. Because their values are numerical in nature and were prone to updating frequently, this triggered platform level alarms in terms of performance and scalability, as Aria Operations does not expect Property values to change frequently.
As of MP4H 2.7, Last Session Duration, Session Start Time, and Session Status will now appear as Metrics instead of Properties.
Additionally, we have added the ability to support Kiosk or multiple sessions from the same User ID in the same VDI Pool. Previously, if the same User ID logged into the same pool, only one User Session object was visible and it was not feasible to troubleshoot a single instance of that User ID in the pool. Moving forward, a new session “instance” will be created for each additional session with an incremental appended numerical value at the end. This new model will allow each session to have its own unique performance metrics and session properties, so that each session can be evaluated for performance issues.
New Horizon Costing and PricingDashboards
In MP4H 2.6, we added the ability to calculate Cost and Pricing metrics for Horizon Users, VDI Pools, and RDS Farms. With the new Cost and Pricing Dashboards we are able to provide additional Insight into Cost vs Pricing analysis for the entire Horizon Service.
From the Horizon Cost vs Price dashboard, simply select the Horizon World object for the entire service, or select an individual VDI Pool or RDS Farm, to see the Cost vs Price analysis.
The Horizon Showback dashboard shows the actual Cost for the Horizon Service. As before, simply select the Horizon World, VDI Pool, or RDS Farm to see cost analysis for the selected object.
Finally, the Horizon Chargeback dashboard shows the Price (cost + service charges) for the entire Horizon Service (Horizon World object), or the selected VDI Pool/RDS Farm.
Improved Deployment and Scale considerations documentation
While the topics of deployment and scale considerations have discussed in previous blog posts, it was imperative that we included these topics in the product documentation, so that all customers can easily access them. The documentation can now be accessed in the link below.
With the latest release of the Management Pack for Horizon the priorities were to maintain compatibility with the latest release of Horizon, improve the Horizon VDI session data model to support kiosk use cases and ensure better platform performance, provide better insight into Horizon Cost and Pricing use cases, and provide easier access for all customers into critical documentation topics.
Stay tuned for more feature and content updates on the next release…
In the latest major release of the Management Pack for Horizon we are announcing a number of new features and content to improve end-to-end visibility of the Horizon service, scale, as well as new VDI Pool and RDS farm automations. This blog will focus on providing additional insight into what features were released and how they relate to providing additional value and insight for monitoring a Horizon deployment. To keep the blog from running too long, I’ve decided to break it into multiple parts.
The new Horizon World Operations Overview dashboard is an upgrade/replacement of the previous Horizon World Overview dashboard. As the name suggests, this dashboard is highly focused on providing “operational” insight into the overall Horizon service in terms of performance, risk, availability, and compliance.
Unified Access Gateway Visibility
In the first image, we can see overall UAG and associated session performance, as well the the health of the Horizon service for the UAG. UAG session disparity shows “in-balance” on how Horizon sessions are being load balanced across the available UAG’s in a Horizon Pod. A detailed UAG List view has also been added to make it simpler to sort and list UAGs by their critical KPIs.
Taking advantage of the new UAG “Route Latency” and “Route Packet Loss” monitoring feature, from a single dashboard we can monitor overall performance and availability of designated routes across all UAGs. If a specific UAG experiences high latency or packet loss for a specified route, the provided chart will show the additional performance overhead the network route is having on the UAG’s connected sessions.
Connection Server Visibility
For the Horizon Connection Servers, in a single dashboard, we provide visibility to the overall performance, availability, and risk across to all Connection Servers in every Horizon Pod. In the dashboard image below, we have provided distributions to highlight which Connection Servers are having performance issues, failing their Horizon service “Health Checks”, have a large disparity in the number of Connections being serviced, as well as failing their LDAP replication.
Similar to the UAG, we have added a new list of Connection Servers that can easily be sorted by critical KPI or property. We have also added a “CS Net Performance to Router” widget to quickly evaluate the overall performance and availability of the designated route to the CS’s in the Horizon Pod.
VDI Pool Visibility
For Horizon VDI Pools, we provide insight into pools which have provisioning errors, low on available capacity, have desktops in an “error” state, and which pools have the worst login times.
A detailed list view has been added so that Pools can be sorted or filtered by their critical KPIs and Properties. We have also made the Pool list view interactive, so that selecting an individual pool provides additional insights into the meaningful performance KPIs impacting the Pool, as well as the meaningful KPIs from the from the supporting vSphere Cluster.
Leveraging the vROPs Costing engine, we are now able to provide the estimated cost per VDI Pool and per desktop based on the compute required for the virtual desktops.
RDS Farm Visibility
For RDS Farms, we provide insight into RDS Farms that have RDS Hosts with performance issues, estimated Farm Session capacity remaining based on actual compute, RDS Hosts that have large disparity of Connected Sessions, and RDS Farms with Hosts in an Error state.
We have also added a RDS Host list view to make it easier to sort by Farm critical KPI or property. Selecting a RDS Farm will provide a stacked trend chart of the KPIs and the supporting vSphere Cluster performance KPIs.
RDS Farm cost analysis metrics are also provided based on the compute required for the RDS hosts, as well as the computed CPU and Memory required per session.
Horizon Client and Agent Compliance
For Horizon Client and Agent Compliance, distribution views are provided to show a break down of the various protocol types, Client Type, Client Version, and Horizon Agent versions currently in use. These views can quickly identify users with old Client Versions or Desktop agents that need to be upgraded in the Horizon environment.
Connection Server Compliance and Authentication Configuration Errors
For Connection Server compliance and authentication configuration errors, the distribution views highlight the Connection Servers by build and Certificates no longer valid. If True SSO and SAML authentication is being used, you can quickly identify Pods with SSO Connector Errors or SAML authentication errors.
VDI Pool and RDS Farm Image "mismatch"
To provide greater insight into VDI Pools and RDS Farms with image provisioning issues, we have added the ability for the Management Pack to detect VM’s where the underlying disk Image does not properly match the Snapshot configured for the Pool or Farm. When a mismatch is detected, we will count the number of mismatches and provide it as a running count for the new “Desktop Image Mismatch” and “RDS Host Image Mismatch” metric. Customers can then alert on this metric to quickly identify which Pools and which Farms require remediation.
As you can see, there have been a significant number of improvements to the latest Management Pack for Horizon in terms of providing greater insight into performance, availability, risk, and compliance. In the next blog post, I will touch on the new built-in Horizon automations, updated Horizon Object Summary Pages, and scale features that have been added.
Since the first release of the Management Pack for Horizon and the shift to an API vs. Agent based architecture, questions have manifested as to what are the best practices for deployment and what factors should a customer consider before enabling the MP in their production Horizon environment?
This post is meant to cover those topics in detail and provide insight into potential configuration challenges and how best to avoid them.
**However, with any point-in-time article new considerations can emerge necessitating the need to test and validate your specific environment and configuration.
Before deploying… >>>>>>> Stop <<<<<<< For this brief service announcement!
While it would be amazing if you could simply install some code, click a few buttons, and everything just magically work, there are usually additional considerations that may impact that mythical moment from becoming a realization. All Horizon environments have a degree of uniqueness that necessitates the need to validate and test for any configuration or environmental conflicts that might prevent a successful deployment.
That is why I like to say, there is no “All” approach to deployments, rather there is a methodology that should be followed to help you achieve a high degree of success. That methodology is based on how vROPs and the MP are designed to collect data against the Horizon APIs, as well as various configuration aspects of the Horizon deployment.
Now let’s move forward with the solution requirements and we can dig into why these topics matter….
MP4H Requirements
**If the old vROPs for Horizon adapter (V4H) is installed, MP4H will not collect data!!! Recommendation is to uninstall V4H before installing the MP4H adapter; However, if you have already installed the MP4H adapter, simply removing the V4H adapter should resolve the issue.
Horizon Version Support: ** please check the Interop Matrix for the latest supported version information comparing Horizon Version to the version Management Packs for Horizon that are supported.
Horizon 7.13+ is supported with MP4H. This allows customers still on Horizon 7.13 with the older V4H adapter to move to the new MP without having to first upgrade to the Horizon 8/2xxx platform. Once you decide to move to Horizon 2xxx the same adapter will support the new version of Horizon.
While 7.13+ is supported, not all versions will be supported with the latest version of the Management Pack. Please check the Interop Matrix to confirm supported versions.
The planned support model is N & N-1, to include ESB branches moving forward. As long as there are no SDK changes in older versions, then they will also be “supported not tested”.
**Horizon 7.13 End of Support is April 30th, 2023.
vROPs Version Support: ** please check the Interop Matrix for the latest supported version of Aria Operations and the version of the Management Pack for Horizon you plan to deploy.
The current minimum supported version with MP4H 2.6.1 is 8.10, but 8.12.1 is the recommended version. 8.12 will also work, but there are some known UI bugs that are resolved 8.12.1.
vCenter needs to monitor all Horizon Desktops and Management Components: MP4H still relies on the vCenter adapter to pull VM level and Guest OS metrics from the Management Servers and Desktops within the Horizon Service. If there are objects outside of vROPs monitoring visibility we will not be able to populate VM level performance metrics, or have the ability to calculate the DC% Performance calculation on those objects.
Unified Access Gateways and Connection Servers also need to be monitored by vROPs: If these objects are not monitored by the vCenter adapter in vROPs we will not collect their VM level performance data. Some data will still be collected, but they will be based purely on the available Horizon API data.
Connection Server Certificates require valid hostnames: When you configure the Horizon Adapter to communicate with one of the Connection Servers in the Pod, vROPs will attempt to validate the CS certificate and the hostname associated with certificate. If the hostname in the certificate does not match the FQDN of the Connection Server vROPs will reject the certificate, as it cannot validate the authenticity of the Connection Server host providing it. It is recommended to have the Subject Alternative Name of the valid FQDN for the Connection Server in the certificate (or wildcard), so that vROPs can properly validate the certificate and establish a secure connection to the host.
The following diagram highlights the recommended certificate configuration:
External vs. Internal CA’s are not a requirement, but more of a common practice to obscure the internal CS domain names from the outside connections.
The important takeaway is to have a Subject Alternative Name value on the Connection Server certificates that contains either the FQDN of the CS’s or a generic Wildcard (*.yourdomain) that vROPs can properly validate with the Internal CA.
Fun Fact: If SmartCard authentication is set to “required” on the Horizon Connection Server the Horizon API will be inaccessible to the MP. Change this setting to optional to allow the MP to authenticate against the API. ** This will be addressed in an upcoming Horizon release.
The MP4H adapter requires an account with Horizon Admin privileges (full or read-only)to access the Horizon APIs: When configuring the MP4H adapter to authenticate against a Pod Connection Server, you must provide an account with a Horizon Admin role (full or read-only). Both Horizon Administrator roles (full and read-only) are built-in roles for Horizon, and an Active Directory service account should be created and assigned one of these roles. Which role you choose is Horizon version dependent, but the full Administrator role works with every version.
Horizon 7.13 through Horizon 8.2 requires the Administrators role.
Horizon 8.3/2106+ can use the Administrator (Read only) role.
The MP4H adapter needs to point directly to a Horizon Pod Connection Server and not to a Load Balancer: When configuring the MP4H adapter to point to your Horizon Pod you must provide the FQDN or IP of a Connection Server and not point it to the Load Balancer VIP for the Pod. The Horizon APIs do not support Load Balanced queries, so even though it may initially connect and provide some data, the end result will be missing and inconsistent data coming into the MP4H adapter. Not cool!!!
The following diagram highlights what you can and can’t do to configure the adapter to communicate with Horizon:
To enable Load-Balancing across the Connection Servers in the Pod, simply enable the load-balancing option under the Advanced Settings of the Horizon Adapter. The Adapter will automatically discover every CS in the Pod and will attempt to load-balance API calls across the Connection Servers. This option provides both High-Availability to maintain communication in the case of a loss of CS, as well as scalability in the number of API requests that can be made to the Horizon Pod.
Pausing for another brief service announcement……
Do not Load-Balance your UAG’s across Pod boundaries!!!
While this may seem like a good idea, and technically it will function, doing so has dire consequences:
#2 It removes Fault Domain Isolation between the Pods!!!
#3 MP4H does not support this configuration and it just breaks stuff!!!
While the adapter will still be able to collect data regarding the UAG’s, each POD is reporting its own distinct UAG servers, and the adapter will treat the UAG as a net new item per Pod that it is associated with. It has also been observed that the Horizon Pods intermittently lose their API connection to the UAG’s and they are unable to provide the necessary health and connectivity metrics to the Pod. Not cool!!!
Regardless of the platform or the cloud platform, all API’s have limitations on the ability service a certain number of requests in a given time period. With Horizon, the number of requests that can be serviced in a 5-minute interval is dependent on the type of metric and the number of objects that the metrics has to come from.
Pod level metrics regarding configuration and state of the Pod, Pools, Farms, UAG’s, etc. do not really impact size or scale since the number of objects are relatively low. Session counts on the other hand can impact scale dependent on the number of sessions and the types of metrics being requested. For example, session status and properties are lower impact because the metrics are directly available in the Horizon Pod API and the time to process them is relatively quick. However, other metrics such as Login Times and Protocol metrics must be queried from the desktop by the Connection Server and the time that it takes to get a response per session from each desktop lowers the number of sessions that can be queried in a given time period.
If the “Collection Time” starts to exceed the 5-minute collection window, additional tuning to the adapter or adding additional Connection Servers to the Pod with the load balancing option enabled increases the number of session requests that can be made in the 5-minute time window. You can also create new Pods with an additional MP4h adapter to further increase overall scale.
*Using two different Horizon monitoring solutions that both communicate to the Horizon API can impact scale, as the overall number of requests that the Horizon Pod can support is finite.
**Different APIs are being investigated to see if the current API limits can be improved upon.
The following diagram highlights the Pod scalability:
Let’s wrap this up…
As you can see from the topics covered in this post, there are a number of factors that must be taken into consideration during the deployment of the latest Management Pack for Horizon. While all factors aren’t necessarily exclusive to this MP, the purpose of this article was to help provide a better understanding of how the MP and the Horizon APIs work together to provide rich insight into the health and performance of a Horizon environment.
In the future, we plan to investigate where we can further simplify or minimize some of the considerations outlined in this post. But until that day happens… I hope this article provides you a guidepost to make your deployment successful.