Harness Troubleshooting

Updated 3 weeks ago by Michael Cretzman

This topic contains general troubleshooting information for error messages and other issues that can arise. For each type of deployment, such as Kubernetes, Helm, ECS, etc, you can find specific troubleshooting steps in their deployment guides.

If you cannot find a resolution, please contact Harness Support.

In this topic, you will find help with the following:

Diagnose Issues

Before diagnosing issues, ensure you are familiar with the following:

This section provides the general troubleshooting steps to use when you cannot determine the cause of an error or failure.

  1. Troubleshoot the Delegate validation URL. When a task is ready to be assigned, the Harness Manager first validates its lists of Delegates to see which Delegate should be assigned the task. It validates the Delegate using the URL in the task, such as a API call or SSH command. See How Does Harness Manager Pick Delegates?.
    1. Locate the URL by looking at the Delegate log.
    2. See if that URL is reachable from the Delegate host or from another host. Log into the host running the Delegate and see if the URL can be reached via cURL or ping.
  2. Ensure that the Delegate is not blacklisted. If a Delegate fails to perform a task that Delegate is blacklisted for that task and will not be tried again. TTL is 5 minutes. This is true if there is only one Delegate and even if the Delegate is Tagged for that task, such as with a Shell Script command in a Workflow.
  3. Use all Delegate Tags. If you are using Delegate Tags for a task (via Shell Script command) or Cloud Provider credentials, ensure that the command or provider has all of the Tags used by the Delegate. If the command or provider has less than all of the Tags used by the Delegate, the tagging will not work.
  4. Review Delegate Scoping. Ensure the Delegate is not scoped out of performing the task. See Delegate Scope and Best Practices and Notes.
  5. Delegate Restarting. Check to see if the Delegate is restarting (for example, because the socket connection is failing) and offline at the time of the task.

Use Execution ID to View Delegate Activity

You can diagnose a lot of Delegate issues using the Execution ID, Task ID and Delegate information from the task assignment and operation.

A deployment has an Execution ID. Every Execution ID includes one or more tasks, identified by Task IDs. Using the Task ID, you can determine which Delegate was assigned the task and begin diagnosing why that Delegate could not perform the task.

To determine what Delegate was used to perform a task, do the following:

  1.   Obtain the the Execution ID from the deployment URL.
    1. When you deploy a Workflow or Pipeline, you are taken to the Deployments page. On the Deployments page, look in the In the Location field in your browser and locate the Execution ID following executions. If you are deploying a Pipeline, the ID will follow pipeline-execution. For example, in the following string, rwXc5SXjQ86e3AscnxRedg is the execution ID:executions/rwXc5SXjQ86e3AscnxRedg/details
  2. Open the Delegate log, located in same folder as the Delegate and called delegate.log or delegate.<date>.log. You can open the log in a text editor or a log viewing program such as LogDNA. See Delegate Logs.
  3. Search for the Execution ID in the log. For example, in LogDNA, put brackets around the Execution ID and search, like this: [NxO0JgbPSH-HL4TcN0zvaw] 

You will see log events for that Execution ID, indicated by executionId. For example, here is a log entry for Execution ID NxO0JgbPSH-HL4TcN0zvaw:

Jun 17 10:31:56 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES INFO [1.0.33801]
[deploymentEventQueue-handler-z4Ijo0_yT7G8iTsbzGG8Gg]
INFO software.wings.service.impl.instance.InstanceHelper - Handled deployment event for
executionId [NxO0JgbPSH-HL4TcN0zvaw],
infraMappingId [qR9_ifE0S9eLwt9zNa87Rw] of appId [fQ-liKwmS32DivKaF9Ggqg] successfully
  1. Find the Task ID. Task IDs are logged in many places in the log. We will use a task assignment example.

    Once you have located the Execution ID records in the log, look at the records containing the uuid field. For example:
Jun 17 10:31:57 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES 
INFO [1.0.33801] [pool-12-thread-4749]
INFO [NxO0JgbPSH-HL4TcN0zvaw] software.wings.service.impl.DelegateServiceImpl -
Queueing async task: uuid: lRlt-zfPTj-vPMaA_mlF4w,
accountId: xxxxxxxxxxxx, type: K8S_COMMAND_TASK, correlationId: null

The value following uuid is the Task ID.

  1. Look for the Queuing and Task submitted values using the Task ID. For example:
Queueing async task: uuid: lRlt-zfPTj-vPMaA_mlF4w
...
Task submitted:...

Also, you will likely see the number of Delegates available to execute the task:

un 17 14:16:17 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES INFO [1.0.33801] [pool-12-thread-5063] INFO  [GrYOo5AgSBGHgp_h82ojig] software.wings.service.impl.DelegateServiceImpl - 8 delegates [yGpGxVhMSKaM_cn_2Bh4BQ, j_VbQsK2SDKx8URKFTKRZg, PoiAobskTXKinaqj-thIGQ, K0HJXUjgTn2v3kT09sbI3Q, _7K-s6zYQ26lRCss2t3ivw, 9krY9UruTVimTfBwFoKmnQ, btBGYgtCRjWIuZ1aC42Kog, BJquSItxStqmN9q59Mau5Q] eligible to execute task JIRA

Search using the Task ID to see what Delegate was used to execute the task:

Jun 17 15:23:39 manager-79989476db-vxlmq manager-Harness-Prod-KUBERNETES 
INFO [1.0.33801] [dw-88297 - PUT /api/delegates/TG01x2cRRJafGfgybNbYrg/tasks/gtnEdyxLSESVg1WYdItQtg/acquire?accountId=XEAgZ-j4RvirUgGObdd8-g]
INFO software.wings.service.impl.DelegateServiceImpl -
Task gtnEdyxLSESVg1WYdItQtg assigned to delegate TG01x2cRRJafGfgybNbYrg

Follow the log to see each step taken by the Delegate assigned that task.

Login Issues

The following issues can occur when logging into Harness.

Logged Out Automatically

You are logged out of your Harness Manager session automatically, forcing you to log back in.

If you log out of Harness Manager in one browser tab, Harness might log you out of all tabs.

Typically, the solution is to clear local storage.

Troubleshooting Steps
  1. Log out of the Harness Manager from all Chrome tabs. (Harness only supports the Chrome desktop browser.)
  2. Clear Chrome Local Storage for app.harness.io in chrome://settings/siteData.
  3. Open a new tab and log into the Harness Manager.

You should not be logged out anymore.

Notes
  • Chrome Session Storage is used by the Harness Manager. If you close all the tabs running the Harness Manager and then open a new tab running Harness Manager, you will likely need to log in again.
  • A Chrome session will timeout after 5 minutes, but a session timeout can also happen if the tab running Harness Manager is idle for 24 hours. However, as long as the tab is not closed, Harness Manager will continue keep polling to check if a refresh is needed for the token. For example, if you have kept the tab open for 3 days, you might still be logged in, as long as the workstation has not been turned off or entered sleep mode preventing the refresh.

Delegate Setup

Most often, Delegate errors are the result of Delegate setup issues. Ensure you are familiar with how the Delegate and Harness Manager work together.

Another common issue is the SSH key used by the Delegate to deploy to a target host is incorrect. This can happen if the SSH key in Harness Secrets Management was set up incorrectly, or if it is not the correct key for the target host, or the target host is not set up to allow SSH connections.

Delegate Connection Failures To Harness Manager

If the Delegate cannot connect to the Harness Manager, try the following:

  1. Use ping on the Delegate host to test if response times for app.harness.io or another URL are reasonable and consistent.
  2. Use traceroute on app.harness.io to check the network route.
  3. Use nslookup to confirm that DNS resolution is working for app.harness.io.
  4. Connect using the IP address for app.harness.io (get the IP address using nslookup), for example: http://35.23.123.321/#/login.
  5. Flush the client's DNS cache
    1. Windows: ipconfig /flushdns
    2. Mac/Linux: sudo killall -HUP mDNSResponder;sudo killall mDNSResponderHelper;sudo dscacheutil -flushcache
  6. Check for local network issues, such as proxy errors or NAT license limits.
  7. For some cloud platforms, like AWS EC2, ensure that security groups allow outbound traffic on HTTPS 443.
  8. Try a different workstation or a smartphone to confirm the connection issue is not local to a single host.

Common Errors and Alerts

This section lists common error and alert messages you might receive.

No Delegates Could Reach The Resource

This error means that no Delegate could meet the URL criteria for validation. When a task is ready to be assigned, the Harness Manager first validates its lists of Delegates to see which Delegate should be assigned the task. It validates the Delegate using the URL in the task, such as a API call or SSH command. See How Does Harness Manager Pick Delegates?.

Harness SecretStore Is Not Able to Encrypt/Decrypt

Error message:

Secret manager Harness SecretStore of type KMS is not able to encrypt/decrypt. Please check your setup

This error results when Harness Secret Manager (named Harness SecretStore) is not able to encrypt or decrypt keys stored in AWS KMS. The error is usually transitory and is caused by a network connectivity issue or brief service outage.

Check Harness Site Status and AWS Status (search for AWS Key Management Service).

Editing a Notification Group Based Rule Is Not Supported Anymore

When trying to edit the Notification Strategy section of a Workflow, you might see this error message:

Editing a Notification Group based rule is not supported anymore, please delete and create a new one

Your Workflow might have some unsupported Notification Strategy setting that was not migrated to the current method for notifications.

To fix this, remove the existing Notification Strategy in one of the following ways:

  • Click the X next to the strategy in the Workflow's Notification Strategy section, and then create a new strategy.
  • Click the Configure As Code button (</>) in the Workflow, delete instances of notificationGroups, and then create a new strategy. The current method uses userGroupIds:
...
notificationRules:
- conditions:
- FAILED
executionScope: WORKFLOW
notificationGroupAsExpression: false
userGroupAsExpression: false
userGroupIds:
- GXFtTUS2Q9Gmo1r4MhqS4g
...

Trigger Rejected

If you use a Webhook Trigger to execute a Workflow or Pipeline deployment but the name for the artifact in the cURL command is different than the name of the artifact, you might receive this error:

Trigger Rejected. Reason: Artifacts Are Missing for Service Name(S)

This error can happen if a bad name for an artifact build version is placed in the cURL command. For example, a prefix in the artifact naming convention that was not added to the Trigger cURL command. Here is a cURL example showing a buildNumber v1.0.4-RC8:

curl -X POST -H 'content-type: application/json' 
--url https://app.harness.io/gateway/api/webhooks/TBsIRx. . .
-d '{"application":"tavXGH . . z7POg","artifacts":[
{"service":"app","buildNumber":"v1.0.4-RC8"}]}’

If the artifacts available for the Harness Service have a prefix or a different naming convention, such as myApp/v1.0.4-RC8, then the cURL command will not work.

Always ensure that the Webhook cURL command has the correct artifact name.

Exception in WinRM Session

When you deploy to a Windows environment, Harness makes a WinRM connection. Before it can make the connection, the Harness Delegate must be able to resolve any domain names for the target virtual network. If the target virtual network for a deployment does not have DNS enabled, you might see the following error:

Exception in WinrmSession. . . Buffer already closed for writing

Ensure that the virtual network allows DNS name resolution so that Harness can resolve names before running the WinRM connection. For more information, see Set Up WinRM on Instances and Network and Add IIS Deployment Environment in AWS or Azure.

Could not Fetch Container Metadata

If the Harness Delegate cannot fetch container details during a deployment step, you might see the error:

Could not fetch container metadata. Verification steps using containerId may not work

In the case of verification steps, the error message might be:

No analysis was done because no data was available during this time

For example, the ECS container agent provides an API operation for gathering details about the container instance where the agent is running and the associated tasks running on that instance. Harness uses the cURL command from within the container instance to query the Amazon ECS container agent on port 51678 and return container instance metadata or task information.

In the ECS case, ensure that the security group(s) used by AWS EC2 instances permits inbound traffic over TCP port 51678. For more information, see Amazon ECS Container Agent Introspection from AWS.

Somewhere in the deployment logs you will see the API URL the Delegate was using:

Could not connect to url http://10.16.4.97:51678/v1/tasks: connect timed out

As the Delegate could not reach the URL, you need to ensure that the URL is reachable from the Delegate by opening ports, checking security groups, etc.

This error can occur during verification steps, also. In this case Harness needs container metadata for verification, but as the Delegate cannot fetch metadata, Harness tries fetching verification data by IP address. If there is no IP address host data in the Verification Provider, Harness will not receive information for the deployed nodes.

Naming Conventions

Some naming conventions in repositories and other artifact sources, or in target infrastructures, cannot be used by Harness. For example, if a Harness Trigger Webhook uses a Push notification from a Git repo branch that contains a dot in its name, the Trigger is unlikely to work.

Character support in Harness Environment and Service infrastructure entity names is restricted to alphanumeric characters, underlines, and hyphens. The restriction is due to compatibility issues with Harness backend components, database keys, and the YAML flow where Harness creates files with entity names on file systems.

Secrets

The following issues can occur when using Harness secrets.

Secrets Values Hidden In Log Output

If a secret's unencrypted value shares some content with the value of another Harness variable, Harness will hide the secret's value in any logs. Harness replaces the secret's conflicting value with the secret's name in any log displays. This is for security only, and the actual value of the secrets and variables are still substituted correctly.

Configure as Code and Git Sync

The following issues can occur when using Harness Configure as Code and Git Sync.

Git Push to Harness Fails

If your Harness Application is synched two-way with your Git repo, the Git push to Harness might not work unless all of the required Application settings are fully-configured in your Git YAML files before pushing your Application up to Harness.

For example, if you have defined a Service Infrastructure in your Git YAML files, but its required fields are incomplete, the push to Harness will likely fail.

This is no different than trying to submit incomplete settings in the Harness Manager.

In many cases, it is best to first use Harness Manager to configure your Application, ensuring all required settings are configured, and then sync that with your repo. Unless you remove any required settings in the Git files, the Application will sync with Harness successfully.

Need to Reset History on Synced Git Repository

In some cases where you sync your Git repo with Harness, you might need to reset the Git repo history because of an error, such as accidentally adding a secret's value in the repo. For example, Git has a Rewriting History option.

To repair this scenario, do the following:

  1. Remove the Harness Webhook configured on your Git repo account (Github, Bitbucket, etc). This step is critical to ensure the Application history deletion does not propagate to Harness the next time it is synched.
  2. Delete all of your Harness Applications on the Git account.
  3. Re-sync each Harness Application using the Configuration As Code sync functionality. See Configuration as Code.
  4. Confirm that all of your Applications are visible in your Git repo.
  5. Notify Harness to confirm we don't see any issues/errors in your account.
  6. Enable the Webhook in Git repo and test syncing.

Deployments

The following issues can occur with manual or Trigger-based Deployments.

Trigger Policy Limits

Harness collects new artifacts from your repository on an interval basis. By default, if more than one of the same artifact is collected during the same polling interval, only the latest version will be used to initiate a Trigger set for On New Artifact.

If you prefer to have a deployment triggered for every single version of an artifact, Harness can implement a per artifact deployment.

Here is an example from the a log showing two -pr-#### versions for an artifact collected at the same time which resulted in two deployments starting:

May 24 10:45:07 manager-XXXXXXXXXX-jhjwb manager-Harness-Prod-KUBERNETES
INFO [1.0.32101] [notifyQueue-handler-XXXXXXX--XXXXXXX]
INFO software.wings.delegatetasks.buildsource.BuildSourceCallback -
[[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-branch-foo-bar-plugin-2-x,
523ccXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX32a7-pr-2462, dd32432955979f5745ba606e50320f2a25f4bff4,
dd324XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXbff4-branch-sku-lookup,
dd324XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXbff4-pr-2718]] new artifacts collected for artifactStreamId XXXXXXXXXXXXXXX

The corresponding Trigger that looks for all -pr versions and starts a deployment using the -pr-[0-9]+$ expression:

Deployment Rate Limits

Harness applies an hourly and daily deployment limit to each account to prevent configuration errors or external triggers from initiating too many undesired deployments. If you are notified that you have reached a limit, it is possible that undesired deployments are occurring. Please determine if a Trigger or other mechanism is initiating undesired deployments. If you continue to experience issues, contact Harness Support.

The daily limit is 100 deployments every 24 hours. The hourly limit is 40 deployments and is designed to detect any atypical upsurge of deployments.

Verifications

The following issues can occur with verifications.

Search Keyword Too Vague

If a 24/7 Service Guard or deployment verification is not reporting any data, the search keywords for verification provider might be too broad.

For example, when you set up the 24/7 Service Guard or deployment verification step, you might have level:ERROR in the Search Keywords setting.

If the Search Keywords settings in the verification provider settings are too broad, when you click the Guide for Example button for the Host Name Field, Guide for Example can return field names for application records other than your target application.

To fix this, click Guide from Example several times. This will give you different field lists for the different applications until you find the correct field for the host name. Refreshing the browser can also pick up a new sample.

Helm

The following issues can occur with Helm deployments.

Unable to get an Update from the Chart Repository

If Harness cannot get an update from a chart repo you have set up for your Helm Service, during deployment you might see the error:

Unable to get an update from the "XYZ" chart repository ... read: connection reset by peer

To fix this, find the Delegate that the Helm update ran on, and then SSH to the Delegate host and run the Helm commands manually. This will confirm if you are having an issue with your Harness setup or a general connectivity issue.

Submit a Ticket

  1. Click the Help button in the bottom-right of the Harness Manager:
  2. Click Submit a Ticket or Send Screenshot.
  3. Fill out the ticket or screenshot forms and click Submit ticket or Send Feedback.

Harness Support will contact you as soon as possible.


How did we do?