Prometheus Verification

Updated 1 week ago by Michael Cretzman

The following sections describe how Harness integrates Prometheus into Harness Continuous Verification to monitor your live, production services and verify your deployments:

Prometheus and Harness

Prometheus uses a multi-dimensional data model with time series data and key/value pairs, along with a flexible query language to leverage this dimensionality. Prometheus records any numeric time series, such as machine-centric monitoring and the monitoring of highly dynamic service-oriented architectures. For microservices, Prometheus support for multi-dimensional data collection and querying is very useful.

Prometheus integrates with Harness to verify the performance of microservices instantly in every environment.

When you use Prometheus with Harness Service Guard 24/7, or when you deploy a new microservice via Harness, Harness automatically connects to Prometheus and starts analyzing the multi-dimensional data model to understand what exceptions and errors are new or might cause problems for your microservice performance and quality.

Setup Preview

You set up Prometheus and Harness in the following way:

  1. Prometheus - Monitor your application using Prometheus. In this article, we assume that you are using Prometheus to monitor your application already.
  2. ​Verification Provider Setup - In Harness, you connect Harness to your Prometheus account, adding Prometheus as a Harness Verification Provider.
  3. Harness Application- Create a Harness Application with a Service and an Environment. We do not cover Application set up in this article. See  Application Checklist.
  4. ​24/7 Service Guard Setup - In the Environment, set up 24/7 Service Guard to monitor your live, production application.
  5. Verify Deployments:
    1. Add a Workflow to your Harness Application and deploy your microservice or application to the service infrastructure in your Environment.
    2. After you have run a successful deployment, you then add verification steps to the Workflow using your Verification Provider.
    3. Harness uses unsupervised machine-learning and Prometheus analytics to analyze your future deployments, discovering events that might be causing your deployments to fail. Then you can use this information to set rollback criteria and improve your deployments.

Verification Provider Setup

To add Prometheus as a verification provider, do the following:

  1. Click Setup.
  2. Click Connectors, and then click Verification Providers.
  3. Click Add Verification Provider, and select Prometheus. The Add Prometheus Verification Provider dialog appears.

The Add Prometheus Verification Provider dialog has the following fields.

Field

Description

URL

Enter the URL of the server.

You cannot use a Grafana URL.

Display Name

Enter a display name for the provider. If you are going to use multiple providers of the same type, ensure you give each provider a different name.

Usage Scope

If you want to restrict the use of a provider to specific applications and environments, do the following:

In Usage Scope, click the drop-down under Applications, and click the name of the application.

In Environments, click the name of the environment.

24/7 Service Guard Setup

Harness 24/7 Service Guard monitors your live applications, catching problems that surface minutes or hours following deployment. For more information, see 24/7 Service Guard.

You can add your Prometheus monitoring to Harness 24/7 Service Guard in your Harness Application Environment. For a setup overview, see Setup Preview.

This section assumes you have a Harness Application set up and containing a Service and Environment. For steps on setting up a Harness Application, see Application Checklist.

To set up 24/7 Service Guard for Prometheus, do the following:

  1. Ensure that you have added Prometheus as a Harness Verification Provider, as described in Verification Provider Setup.
  2. In your Harness Application, ensure that you have added a Service, as described in Services. For 24/7 Service Guard, you do not need to add an Artifact Source to the Service, or configure its settings. You simply need to create a Service and name it. It will represent your application for 24/7 Service Guard.
  3. In your Harness Application, click Environments.
  4. In Environments, ensure that you have added an Environment for the Service you added. For steps on adding an Environment, see Environments.
  5. Click the Environment for your Service. Typically, the Environment Type is Production.
  6. In the Environment page, locate 24/7 Service Guard.
  7. In 24/7 Service Guard, click Add Service Verification, and then click Prometheus. The Prometheus dialog appears.
  8. Fill out the dialog. The Prometheus dialog has the following fields.
For 24/7 Service Guard, the queries you define to collect logs are specific to the application or service you want monitored. Verification is application/service level. This is unlike Workflows, where verification is performed at the host/node/pod level.

Field

Description

Display Name

Enter the name to identify this Service's Prometheus monitoring on the 24/7 Service Guard dashboard.

Service

The Harness Service to monitor with 24/7 Service Guard.

Prometheus Server

Select the server you added when setting up the Prometheus verification provider.

Metric to Monitor

Every time series is uniquely identified by its metric name and a set of key-value pairs, also known as labels. For more information, see Data Model from Prometheus.
A metric requires the following parameters:

  • Transaction Name: The service or request context which the metric relates to. For example, Login.
  • Metric Name: The name of the metric defined in Prometheus.
  • Metric Type: The type of metric (Response Time, Error, Throughput, or Value).
  • URL: The API query required to retrieve the metric value. This query must include placeholders for start time and end time before the query, and hostname in the query. For example, the following URL includes placeholders $startTime, $endTime, and $hostname:

/api/v1/query_range?start=$startTime&end=$endTime&step=60s&query=io_harness_custom_metric_learning_engine_task_queued_time_in_seconds{kubernetes_pod_name="$hostName"}

See Expression queries from Prometheus for example of queries, but always use the placeholders demonstrated above.

Algorithm Sensitivity

Select the Algorithm Sensitivity.

Enable 24/7 Service Guard

Click the checkbox to enable 24/7 Service Guard.

When you are finished, the dialog will look something like this:

  1. Click TEST. Harness verifies the settings you entered.
  2. Click SUBMIT. The Prometheus 24/7 Service Guard is configured.

To see the running 24/7 Service Guard analysis, click Continuous Verification.

The 24/7 Service Guard dashboard displays the production verification results.

For information on using the dashboard, see Using 24/7 Service Guard.

Verify Deployments

The following procedure adds a Prometheus verification step to a workflow.

In order to obtain the names of the host(s), pod(s), or container(s) where your service is deployed, the verification provider should be added to your workflow after you have run at least one successful deployment.

To verify your deployment with Prometheus, do the following:

  1. Ensure that you have added Prometheus as a verification provider, as described above.
  2. In your workflow, under Verify Service, click Add Verification, and then click Prometheus. The Prometheus dialog appears.

The Prometheus workflow dialog has the following fields.

Field

Description

Prometheus Server

Select the server you added when setting up the Prometheus verification provider.

Metric to Monitor

Every time series is uniquely identified by its metric name and a set of key-value pairs, also known as labels. For more information, see Data Model from Prometheus.
A metric requires the following parameters:

  • Transaction Name: The service or request context which the metric relates to. For example, Login.
  • Metric Name: The name of the metric defined in Prometheus.
  • Metric Type: The type of metric (Response Time, Error, Throughput, or Value).
  • URL: The API query required to retrieve the metric value. This query must include placeholders for start time, end time, and hostname.

Expression for Host/Container Name

The expression entered here should resolve to a host/container name in your deployment environment. By default, the expression is ${host.hostName}. If you begin typing the expression into the field, the field provides expression assistance.

The following settings are common to all verification provider dialogs in workflows.

Field

Description

Analysis Time duration

Set the duration for the verification step. If a verification step exceeds the value, the workflow Failure Strategy is triggered. For example, if the Failure Strategy is Ignore, then the verification state is marked Failed but the workflow execution continues.

Baseline for Risk Analysis

Select Previous Analysis to have this verification use the previous analysis for a baseline comparison. If your workflow is a Canary workflow type, you can select Canary Analysis to have this verification compare old versions of nodes to new versions of nodes in realtime.

Execute with previous steps

Check this checkbox to run this verification step in parallel with the previous steps in Verify Service.

Failure Criteria

Specify the sensitivity of the failure criteria. When the criteria is met, the workflow Failure Strategy is triggered.

Include instances from previous phases

If you are using this verification step in a multi-phase deployment, select this checkbox to include instances used in previous phases when collecting data. Do not apply this setting to the first phase in a multi-phase deployment.

Wait interval before execution

Set how long the deployment process should wait before executing the verification step.

Verification Results

When Harness deploys a new application or service to the target environment defined in the workflow, it will immediately connect to the Prometheus Server and build a model of what it is observing.

Next, Harness compares this model with previous deployment models to identify anomalies or regressions. If necessary, Harness rolls back to the previous working version automatically. For more information, see Rollback Steps.

Here is an example of a typical, successful deployment verified using Jenkins and Prometheus.

Under Prometheus, you can see that all Prometheus metrics have been validated by the Harness machine learning algorithms. Green indicates that there are no anomalies or regressions identified and the deployment is operating within its normal range.

To see an overview of the verification UI elements, see Continuous Verification Tools.


How did we do?