Why you should use Telegraf with the Management Pack for Horizon

When it comes to managing critical Horizon management servers such as Connection Servers, Unified Access Gateways, AV Managers, etc… It is imperative that you properly monitor their services for availability and that none of them are in a failed state, as well as make sure there are no connectivity issues on internal networks that the Horizon Client traffic traverses. By leveraging the Telegraf agent HTTP Health Check and PING Check functionality with Aria Operations, we can provide additional insight into the Health, performance, and availability of these services.

This video is meant to provide high-level overview of:

  • Why you should leverage Telegraf with the Management Pack for Horizon
  • How to install and configure the Telegraf agents.
  • How to validate your settings after deployment.
  • Review of the dashboards that leverage Telegraf metrics.

The Management Pack for Horizon 2.5 is now available!

In the latest major release of the Management Pack for Horizon we are announcing a number of new features and content to improve end-to-end visibility of the Horizon service, scale, as well as new VDI Pool and RDS farm automations. This blog will focus on providing additional insight into what features were released and how they relate to providing additional value and insight for monitoring a Horizon deployment. To keep the blog from running too long, I’ve decided to break it into multiple parts.

The latest Release Notes can be found here: https://docs.vmware.com/en/Management-Packs-for-vRealize-Operations/services/rn/release-notes-for-vrealize-operations-management-packs/index.html

The latest Documentation here: https://docs.vmware.com/en/Management-Packs-for-vRealize-Operations/2.5/Horizon/GUID-69D66CA6-FB78-4688-BCF9-752EEB34599E.html

Download the latest MP here: https://marketplace.cloud.vmware.com/services/details/vrealize-operations-management-pack-for-horizon-2-5-1?slug=true

Improved End-to-End Visibility: 

For our first topic, lets discus what features and content have been added to improve end-to-end visibility of the Horizon Service.

  1. New Dashboard: Horizon World Operations Overview
  2. New UAG and Connection Server Route Performance/Availability metrics
  3. New VDI Pool and RDS Farm Cost Analysis metrics.
  4. New VDI Pool and RDS Farm Image Standardization “mismatch” count metrics
New Dashboard:  Horizon World Operations Overview

The new Horizon World Operations Overview dashboard is an upgrade/replacement of the previous Horizon World Overview dashboard. As the name suggests, this dashboard is highly focused on providing “operational” insight into the overall Horizon service in terms of performance, risk, availability, and compliance.

 Unified Access Gateway Visibility

In the first image, we can see overall UAG and associated session performance, as well the the health of the Horizon service for the UAG. UAG session disparity shows “in-balance” on how Horizon sessions are being load balanced across the available UAG’s in a Horizon Pod. A detailed UAG List view has also been added to make it simpler to sort and list UAGs by their critical KPIs.

Taking advantage of the new UAG “Route Latency” and “Route Packet Loss” monitoring feature, from a single dashboard we can monitor overall performance and availability of designated routes across all UAGs. If a specific UAG experiences high latency or packet loss for a specified route, the provided chart will show the additional performance overhead the network route is having on the UAG’s connected sessions.

Connection Server Visibility

For the Horizon Connection Servers, in a single dashboard, we provide visibility to the overall performance, availability, and risk across to all Connection Servers in every Horizon Pod. In the dashboard image below, we have provided distributions to highlight which Connection Servers are having performance issues, failing their Horizon service “Health Checks”, have a large disparity in the number of Connections being serviced, as well as failing their LDAP replication.

Similar to the UAG, we have added a new list of Connection Servers that can easily be sorted by critical KPI or property. We have also added a “CS Net Performance to Router” widget to quickly evaluate the overall performance and availability of the designated route to the CS’s in the Horizon Pod.

VDI Pool Visibility

For Horizon VDI Pools, we provide insight into pools which have provisioning errors, low on available capacity, have desktops in an “error” state, and which pools have the worst login times.

A detailed list view has been added so that Pools can be sorted or filtered by their critical KPIs and Properties. We have also made the Pool list view interactive, so that selecting an individual pool provides additional insights into the meaningful performance KPIs impacting the Pool, as well as the meaningful KPIs from the from the supporting vSphere Cluster.

Leveraging the vROPs Costing engine, we are now able to provide the estimated cost per VDI Pool and per desktop based on the compute required for the virtual desktops.

RDS Farm Visibility

For RDS Farms, we provide insight into RDS Farms that have RDS Hosts with performance issues, estimated Farm Session capacity remaining based on actual compute, RDS Hosts that have large disparity of Connected Sessions, and RDS Farms with Hosts in an Error state.

We have also added a RDS Host list view to make it easier to sort by Farm critical KPI or property. Selecting a RDS Farm will provide a stacked trend chart of the KPIs and the supporting vSphere Cluster performance KPIs.

RDS Farm cost analysis metrics are also provided based on the compute required for the RDS hosts, as well as the computed CPU and Memory required per session.

Horizon Client and Agent Compliance

For Horizon Client and Agent Compliance, distribution views are provided to show a break down of the various protocol types, Client Type, Client Version, and Horizon Agent versions currently in use. These views can quickly identify users with old Client Versions or Desktop agents that need to be upgraded in the Horizon environment.

Connection Server Compliance and Authentication Configuration Errors

For Connection Server compliance and authentication configuration errors, the distribution views highlight the Connection Servers by build and Certificates no longer valid. If True SSO and SAML authentication is being used, you can quickly identify Pods with SSO Connector Errors or SAML authentication errors.

VDI Pool and RDS Farm Image "mismatch"

To provide greater insight into VDI Pools and RDS Farms with image provisioning issues, we have added the ability for the Management Pack to detect VM’s where the underlying disk Image does not properly match the Snapshot configured for the Pool or Farm. When a mismatch is detected, we will count the number of mismatches and provide it as a running count for the new “Desktop Image Mismatch” and “RDS Host Image Mismatch” metric. Customers can then alert on this metric to quickly identify which Pools and which Farms require remediation.

As you can see, there have been a significant number of improvements to the latest Management Pack for Horizon in terms of providing greater insight into performance, availability, risk, and compliance. In the next blog post, I will touch on the new built-in Horizon automations, updated Horizon Object Summary Pages, and scale features that have been added.

vROPs 8.0 Horizon Adapter 6.7.1 Updated Dashboards

Horizon Overview

Horizon Desktop End-to-End Performance Analysis

Cluster Contention Analysis

Horizon Site Connectivity Analysis

Horizon User History Report

Ready to Download?

** Dashboards can be demoed in the VMware TestDrive environment: https://portal.vmtestdrive.com/

Content Release Notes:
This content release includes a number of Dashboards, Views, and Super Metrics designed to take advantage of the User Interface enhancements in the vROPs 8.x platform, providing richer and more visually impacting insight into the overall health, performance, and consumption of your Horizon View deployment.

Compatibility: All content was created on the latest vROP 8.0 platform, so it has only been tested to work with this specific version. The Super Metrics are most likely to work with older versions such as 7.5, but the Views specifically have configurations options that were not available prior to vROPs 8.0.

With vROPs 8.0, only Horizon Adapter 6.7.x is fully supported, so make sure that your Horizon View environment is compatible with the vROPs for Horizon 6.7.x adapter.

Compatibility Matrix: https://www.vmware.com/resources/compatibility/sim/interop_matrix.php#interop&

Conditions: All content provided is on an as-is basis, and should be validated and tested before use in production environments.

*It is recommended that you import and configure the provided Views and Super Metrics before importing the dashboards. Examples of how to do so are provided in previous posts.

Location analysis using vROPs for Horizon

As the number of user sites or locations increases, having good visibility into the overall quality of connectivity of those sites to your Horizon View data center(s) becomes increasingly important. Having worked with many customers on troubleshooting connectivity between such locations, it has become clear that monitoring only at the physical network layer is not sufficient to properly diagnose user connectivity issues impacting the display protocol. In fact, in most cases, it’s the configuration of the physical layer that causes the issue(s), and the device(s) in question do not have the ability to diagnose or detect their impact to the display protocol.

Alas, all hope is not lost! With vROPs for Horizon, we have the ability to monitor the key metrics related to protocol performance and can alert when those metrics have reached critical thresholds. We can also leverage a handy custom grouping feature to organize the remotely connected sessions into defined sites or locations, based on information available in the user’s session data. We can then leverage Super Metrics to calculate the overall health of the group of connected sessions from that site, and then display and alert when the health has dropped below our SLA thresholds.

Sound like something you want to take on? Then read on!

First things first… 1️⃣

We need a strategy to capture and organize how we determine which connections are from which site. The most common approach is to leverage the known internal subnet IP range for that site, and set that as the primary filter for the custom group. However, additional metrics supplied by the user’s session data can be leveraged as well.

In this example, I will create a new Site labeled “External Users”, under the “Location” group, and filter on all users who have an internal subnet IP containing “192.”.

Navigate to Environment tab -> Location -> +
Notice that both session types Desktop and Application are added separately. While not required it will impact which sessions are captured, so make sure to add the required session types.

Note: It may take some time before the group starts to populate the sessions, and as noted in the image above, population updates are every 20 minutes.

You will also want to create at least one more additional site for comparison purposes. In the example below, I have created an “Internal Users” site that filters on 10.x networks to separate the session traffic.

Second is just after first… as long as we’re counting up! 👆

Once your group has started to populate with sessions, it’s time to create meaningful Super Metrics on the group that will give us insight into how well that site is connected.

By default, vROPs 7.5+ will create population TotalCount and Health Criticality Total Count metrics. These will be useful in leveraging out-of-box KPI based alerts that track how well users are connected to the Horizon Environment and let us know the distribution of good to bad connected sessions from that site.

Out of box Health Criticality counts on the custom group.

Leveraging Super Metrics, it’s time to create additional site tracking metrics that include, but are not limited to, average site Latency, Packet Loss, Jitter, Frame Rate, and Transmitted Bytes.

Super Metric name is “Total Site VDI Avg Packet Loss”.
Use the Avg function on the VDI Session Blast Packet Loss Uplink metric.
Assign the Super Metric to the “Location” object.
Select the active policy that will allow the Super Metric to collect.
Verify that the “Location” Object Type has been assigned to the Super Metric.
Verify that the active policy is associated with the Super Metric.

Thirdly Rinse and Repeat… ♻️

Continue to create the Super Metrics for the additional session metrics that you want to collect. As in our example below, you will need to do this for both the VDI and Application Sessions individually to capture the metrics from both types of sessions.

Completed Super Metrics list.

Now… what to do with all this stuff? 🤔

Now that we have meaningful Super Metrics on our custom group, it’s time to display that data in a way that it makes it easier to diagnose specific site connectivity issues, and compare that site against other sites that are connecting to our Horizon environment.

To do so, I’ve created a custom example Dashboard that allows you to select from the list of defined sites, and then display the relevant Alerts and Super Metrics that we defined on those sites. While you can choose to display the data in any manner you see fit, I’ve chosen to use a combination of Scoreboard, Distribution, and Heatmap widgets.

I’ve also chosen to label the widgets using an ordered numbering system with directions so that other users leveraging the dashboard will know how it’s meant to be consumed.

1) Select the site. 2a) Review the site alerts. 2b) Review the break down of criticality of alerts. 3a) Review Super Metrics that were created to analyze the site connectivity.
3b) and 3c) Notice that clicking on a Distribution View will show you which session(s) fall under the defined performance “buckets”.
4a) Select a problem session. 4b) Review scoreboard of relevant KPIs impacting that session. KPI’s are colored to show level of impact to the user’s session. 4c) Review session related objects looking for other impacted objects with alerts.

Continue your analysis of the Application Sessions if relevant for your site. The workflow will be the same as the above VDI Session analysis.

5a) Review the App Session Super Metrics for anomalies.
5b) and 5c) Clicking the Distributions will show you the session(s) the fall into each performance “bucket”.
6a) Select the session you want to review. 6b) Review the Scoreboard metrics for offending KPIs. 6c) Review the session relationship chart and look for additional objects with Alerts that may be impacting the user session.

Next steps… Download and enjoy! 🏁

Now that we’ve discussed how to create the content and the high-level strategy for performing the site connectivity analysis, it’s time to download and import the content . The zip file provided in this blog contains all of the content that is required to create the custom dashboard, except for the step on creating the custom group locations. That step will still be required for you to define what “Sites” you want to monitor for connectivity performance.

Contents of the zip include a Dashboard_Horizon Site Location Analysis.zip, Views_Horizon Site Location Analysis.zip, and Supermetric_Horizon Site Analysis.json. All of which need to be imported and at least one Location group defined before the Dashboard will function properly.

Enjoy! Please provide thoughts or feedback on how to make the dashboard and content better.