24/7 Service Guard
This article introduces Harness 24/7 Service Guard. Use the following links to get started:
- Set Up 24/7 Service Guard
- Add Workflow Steps
Most enterprises use multiple monitoring and verification tools for each stage of their microservice deployment, and multiple tools for monitoring the live microservice in production. Detecting and investigating regressions and anomalies across these tools consumes a lot of time. For those of you tasked with monitoring microservices, the following image will be familiar.
Harness solves this problem with Harness 24/7 Service Guard.
Harness 24/7 Service Guard:
- Collects all of your monitoring and verification tools into a single dashboard.
- Applies Harness Continuous Verification unsupervised machine-learning to detect regressions and anomalies across transactions and events.
- Lets you drill down to the individual issue and open it in the related tool.
Harness 24/7 Service Guard gives DevOps operational visibility across all your monitoring tools in all your production environments.
24/7 Service Guard's automatic anomaly and regression detection allows you to see when end users are impacted—without requiring configuration, thresholds (which you can optionally add), or rules.
24/7 Service Guard is an addition to Harness' basic deployment verification functionality, which is described in Continuous Verification. Harness Workflow verification steps provide verification of Harness deployments and the running microservice for the first 15-30 minutes. 24/7 Service Guard provides detection of your microservices from then on, catching problems that surface minutes or hours following deployment.
The following image shows how the Continuous Verification dashboard includes both 24/7 Service Guard and Harness Deployments verification.
- 24/7 Service Guard detection.
- Harness Deployments verification.
Machine Learning Overview
24/7 Service Guard sits on top of all your Application Performance Monitoring (APM), verification, and logging tools. 24/7 Service Guard applies:
- Predictive machine learning models for short-term behavior:
- Applies deep neural nets to short-term history.
- Detects unusual patterns due to spikes.
- Adapts to drift over deployments.
- Applies memory models for long term behavior:
- Learns historical/cyclical trends.
- Quantifies app reliability over Web and business transactions, based on the history of anomalous behavior.
- Quantifies the importance of different Web and business transactions, based on app usage over short- and long-term periods.
Here's a 2-minute video that explains Harness 24/7 Service Guard:
Using the Dashboard
To use 24/7 Service Guard, click Harness Manager's Continuous Verification link.
The Services configured with 24/7 Service Guard appear. In this example, we have two applications:
Let's look at the dashboard in detail. The following image describes the 24/7 Service Guard dashboard for the application.
- Monitoring sources: Verification and metrics providers, such as AppDynamics, etc. For a list of the verification providers supported by Harness, see Continuous Verification.
- Heat map: The heat map is generated using the application and the monitoring sources. Each square is a time segment.
- Time resolution: You can go high-level (for example, 30 days) or low-level (12 hours).
- Performance regressions: Red and yellow are used to highlight regressions and anomalies. The colors indicate the Overall Risk Level for the monitoring segment.
- Transactions analysis: Click a square to see the machine-learning details for the monitoring segment. The analysis details show the transactions for the monitoring segment. High-risk transactions are listed first.
- Drill-in to find the cause of the regression or anomaly: When you click the dot for a transaction, you get further details and you can click a link to open the transaction in the monitoring tool. This allows you to go into the monitoring tool and find the root cause of the regression (specific queries, events, etc).
Set Up Service Guard
The following APM and logging tools support 24/7 Service Guard today. More are coming. To see the list of all the APM and logging tools Harness supports, see Continuous Verification.
You can set up AppDynamics with 24/7 Service Guard in your Harness Environment and AppDynamics will be used by Harness to verify the performance and quality of your live, production service using Harness machine-learning analysis.
For steps on setting up 24/7 Service Guard for AppDynamics, see AppDynamics 24/7 Service Guard Setup.
You can add your Prometheus monitoring to Harness 24/7 Service Guard in your Harness Application Environment.
For steps on setting up 24/7 Service Guard for Prometheus, see Prometheus 24/7 Service Guard Setup.
You can add your Datadog monitoring to Harness 24/7 Service Guard in your Harness Application Environment
For steps on setting up 24/7 Service Guard for Datadog, see Datadog 24/7 Service Guard Setup.
You can set up New Relic with 24/7 Service Guard in your Harness Environment and New Relic will be used by Harness to verify the performance and quality of your live, production service using Harness machine-learning analysis.
For steps on setting up 24/7 Service Guard for New Relic, see New Relic 24/7 Service Guard Setup.
You can set up ELK Elasticsearch with 24/7 Service Guard in your Harness Environment and Sumo Logic will be used by Harness to verify the performance and quality of your live, production service using Harness machine-learning analysis.
For steps on setting up 24/7 Service Guard for ELK, see ELK 24/7 Service Guard Setup.
You can set up Sumo Logic with 24/7 Service Guard in your Harness Environment and Sumo Logic will be used by Harness to verify the performance and quality of your live, production service using Harness machine-learning analysis.
For steps on setting up 24/7 Service Guard for Sumo Logic, see Sumo Logic 24/7 Service Guard Setup.
You can add your CloudWatch monitoring to Harness 24/7 Service Guard in your Harness Application Environment.
For steps on setting up 24/7 Service Guard for CloudWatch, see CloudWatch 24/7 Service Guard Setup.
You can add your Bugsnag monitoring to Harness 24/7 Service Guard in your Harness Application Environment
For steps on setting up 24/7 Service Guard for Bugsnag, see Bugsnag 24/7 Service Guard Setup.
You can add your Google Stackdriver monitoring to Harness 24/7 Service Guard in your Harness Application Environment.
For general information on integrating Stackdriver with Harness, see Stackdriver Verification.
Here are the high-level steps for setting up 24/7 Service Guard using one or more APM and logging tools:
- Connect each of your APM and logging tools to Harness as Verification Providers. Verification Providers contain the APM and logging tool account information Harness will use to access the tools via their APIs.
- Create a Harness Application. The Application will identify the application you want to monitor, the production environment where the application is running, and allow you to use Harness RBAC to control who can set up 24/7 Service Guard.
- Add a Harness Service to your Application. The Service is a logical representation of your production application. You will add a Service for each application you want to monitor with 24/7 Service Guard.
- Add a Harness Environment to your Application. The Environment represents the production environments for one or more applications.
- Add a 24/7 Service Guard configuration for each Service in the Environment using a Verification Provider.
Once 24/7 Service Guard is set up in a Harness Environment, the new configuration is listed according to its Service name (in this example, the Service name Dev-CV-Todolist).
In a few minutes, the Continuous Verification dashboard will display the 24/7 Service Guard configuration.
No deployment is needed to add the 24/7 Service Guard configuration to the dashboard.
For each Verification Provider, you can customize the threshold and timing for alert notifications. To do so:
- Click the pencil icon to the right of the Alert Notification row.
- In the resulting Alert Notification dialog, select the Enable Alert Notification check box.
- Adjust the Alert Threshold slider to set the minimum severity level at which you want Harness to send alert notifications.The slider's scale represents the Overall Risk Level that Harness evaluates, based on data from your Verification Providers, transaction history, and machine-learning models. Harness' alerts are dynamic: over time, they will escalate or decrease, as we observe anomalies, regressions, and other factors. The scale's range corresponds to risk indicators on the dashboard's heat map as shown below.By default, the notifications that you configure here will appear under Harness Manager's bell-shaped Alerts indicator, and will also be sent to your Catch-All Notification User Group. However, you can also configure detailed conditions that route alert notifications to other User Groups. This dialog includes a link to Harness Manager's corresponding Notification Settings controls.
Suspending (Snoozing) Alerts
Optionally, you can pause alerts—for example, during lightly staffed periods. You'd do so in the Alert Notification dialog's Snooze Alert section, as follows:
- Click in the From field, to reveal the calendar and clock display for the snooze start time.
- After setting the From date and time, use the To field's similar controls to set the snooze period's ending date and time.
- Once the whole Alert Notification dialog is set to your specifications, click SUBMIT to save them.
Add Workflow Steps
Once you have set up 24/7 Service Guard in an Environment, you can use the 24/7 Service Guard setup to quickly configure the Verify Service step in any Workflow that uses the Environment.
For example, the following Canary Deployment Workflow uses an Environment with 24/7 Service Guard set up. In Phase 1 of the Workflow, in Verify Service, you can add a Verification Provider.
- Under Verify Service, click Add Verification.
- In the Add Command dialog, under Verifications, select a Verification Provider that is also used in the 24/7 Service Guard of the Environment used by this Workflow. For example, AppDynamics.
The AppDynamics dialog appears.
- At the top of the dialog, click Populate from Service Verification, and then click the name of the 24/7 Service Guard configuration you want to use.
The dialog is automatically configured with the same settings as the 24/7 Service Guard configuration you selected.