Harness Troubleshooting

Updated 1 month ago by Michael Cretzman

This topic contains general troubleshooting information for error messages and other issues that can arise. For each type of deployment, such as Kubernetes, Helm, ECS, etc, you can find specific troubleshooting steps in their deployment guides.

If you cannot find a resolution, please contact Harness Support or Harness Community Forum.

In this topic, you will find help with the following:

Diagnose Issues

Before diagnosing issues, ensure you are familiar with the following:

This section provides the general troubleshooting steps to use when you cannot determine the cause of an error or failure.

  1. Troubleshoot the Delegate validation URL. When a task is ready to be assigned, the Harness Manager first validates its lists of Delegates to see which Delegate should be assigned the task. It validates the Delegate using the URL in the task, such as a API call or SSH command. See How Does Harness Manager Pick Delegates?.
    1. Locate the URL by looking at the Delegate log.
    2. See if that URL is reachable from the Delegate host or from another host. Log into the host running the Delegate and see if the URL can be reached via cURL or ping.
  2. Ensure that the Delegate is not blacklisted. If a Delegate fails to perform a task that Delegate is blacklisted for that task and will not be tried again. TTL is 5 minutes. This is true if there is only one Delegate and even if the Delegate is Tagged for that task, such as with a Shell Script command in a Workflow.
  3. Use all Delegate Tags. If you are using Delegate Tags for a task (via Shell Script command) or Cloud Provider credentials, ensure that the command or provider has all of the Tags used by the Delegate. If the command or provider has less than all of the Tags used by the Delegate, the tagging will not work.
  4. Review Delegate Scoping. Ensure the Delegate is not scoped out of performing the task. See Delegate Scope and Best Practices and Notes.
  5. Delegate Restarting. Check to see if the Delegate is restarting (for example, because the socket connection is failing) and offline at the time of the task.

Use Execution ID to View Delegate Activity

You can diagnose a lot of Delegate issues using the Execution ID, Task ID and Delegate information from the task assignment and operation.

A deployment has an Execution ID. Every Execution ID includes one or more tasks, identified by Task IDs. Using the Task ID, you can determine which Delegate was assigned the task and begin diagnosing why that Delegate could not perform the task.

To determine what Delegate was used to perform a task, do the following:

  1.   Obtain the the Execution ID from the deployment URL.
    1. When you deploy a Workflow or Pipeline, you are taken to the Deployments page. On the Deployments page, look in the In the Location field in your browser and locate the Execution ID following executions. If you are deploying a Pipeline, the ID will follow pipeline-execution. For example, in the following string, rwXc5SXjQ86e3AscnxRedg is the execution ID:executions/rwXc5SXjQ86e3AscnxRedg/details
  2. Open the Delegate log, located in same folder as the Delegate and called delegate.log or delegate.<date>.log. You can open the log in a text editor or a log viewing program such as LogDNA. See Delegate Logs.
  3. Search for the Execution ID in the log. For example, in LogDNA, put brackets around the Execution ID and search, like this: [NxO0JgbPSH-HL4TcN0zvaw] 

You will see log events for that Execution ID, indicated by executionId. For example, here is a log entry for Execution ID NxO0JgbPSH-HL4TcN0zvaw:

Jun 17 10:31:56 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES INFO [1.0.33801]
[deploymentEventQueue-handler-z4Ijo0_yT7G8iTsbzGG8Gg]
INFO software.wings.service.impl.instance.InstanceHelper - Handled deployment event for
executionId [NxO0JgbPSH-HL4TcN0zvaw],
infraMappingId [qR9_ifE0S9eLwt9zNa87Rw] of appId [fQ-liKwmS32DivKaF9Ggqg] successfully
  1. Find the Task ID. Task IDs are logged in many places in the log. We will use a task assignment example.

    Once you have located the Execution ID records in the log, look at the records containing the uuid field. For example:
Jun 17 10:31:57 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES 
INFO [1.0.33801] [pool-12-thread-4749]
INFO [NxO0JgbPSH-HL4TcN0zvaw] software.wings.service.impl.DelegateServiceImpl -
Queueing async task: uuid: lRlt-zfPTj-vPMaA_mlF4w,
accountId: xxxxxxxxxxxx, type: K8S_COMMAND_TASK, correlationId: null

The value following uuid is the Task ID.

  1. Look for the Queuing and Task submitted values using the Task ID. For example:
Queueing async task: uuid: lRlt-zfPTj-vPMaA_mlF4w
...
Task submitted:...

Also, you will likely see the number of Delegates available to execute the task:

un 17 14:16:17 manager-79989476db-cz8k5 manager-Harness-Prod-KUBERNETES INFO [1.0.33801] [pool-12-thread-5063] INFO  [GrYOo5AgSBGHgp_h82ojig] software.wings.service.impl.DelegateServiceImpl - 8 delegates [yGpGxVhMSKaM_cn_2Bh4BQ, j_VbQsK2SDKx8URKFTKRZg, PoiAobskTXKinaqj-thIGQ, K0HJXUjgTn2v3kT09sbI3Q, _7K-s6zYQ26lRCss2t3ivw, 9krY9UruTVimTfBwFoKmnQ, btBGYgtCRjWIuZ1aC42Kog, BJquSItxStqmN9q59Mau5Q] eligible to execute task JIRA

Search using the Task ID to see what Delegate was used to execute the task:

Jun 17 15:23:39 manager-79989476db-vxlmq manager-Harness-Prod-KUBERNETES 
INFO [1.0.33801] [dw-88297 - PUT /api/delegates/TG01x2cRRJafGfgybNbYrg/tasks/gtnEdyxLSESVg1WYdItQtg/acquire?accountId=XEAgZ-j4RvirUgGObdd8-g]
INFO software.wings.service.impl.DelegateServiceImpl -
Task gtnEdyxLSESVg1WYdItQtg assigned to delegate TG01x2cRRJafGfgybNbYrg

Follow the log to see each step taken by the Delegate assigned that task.

Login Issues

The following issues can occur when logging into Harness.

Logged Out Automatically

You are logged out of your Harness Manager session automatically, forcing you to log back in.

If you log out of Harness Manager in one browser tab, Harness might log you out of all tabs.

Typically, the solution is to clear local storage.

Troubleshooting Steps
  1. Log out of the Harness Manager from all Chrome tabs. (Harness only supports the Chrome desktop browser.)
  2. Clear Chrome Local Storage for app.harness.io in chrome://settings/siteData.
  3. Open a new tab and log into the Harness Manager.

You should not be logged out anymore.

Notes
  • Chrome Session Storage is used by the Harness Manager. If you close all the tabs running the Harness Manager and then open a new tab running Harness Manager, you will likely need to log in again.
  • A Chrome session will timeout after 5 minutes, but a session timeout can also happen if the tab running Harness Manager is idle for 24 hours. However, as long as the tab is not closed, Harness Manager will continue keep polling to check if a refresh is needed for the token. For example, if you have kept the tab open for 3 days, you might still be logged in, as long as the workstation has not been turned off or entered sleep mode preventing the refresh.

Delegate Issues

The Harness Delegate runs as a service in your target deployment environment, on a host, a pod, a container, or as a task. It makes outbound HTTPS connections over port 443 and uses the credentials you provide in Harness connections such as Cloud Providers and Artifact Servers to run remote SSH and API calls.

Most Delegate issues arise from network connectivity where the Delegate is unable to connect to a Cloud Provider, Artifact Server, etc, because of network issues like port changes and proxy settings.

Some issues arise from invalid credentials due to expiry or access issues resulting from missing policies or cross project requirements in a cloud vendor.

The simplest way to detect if an issue is caused by Delegate connectivity is to run a cURL command on the Delegate host/pod and see if it works. If it does, the next step is to look at the credentials.

The following sections provide solutions to common Delegate issues.

Delegate Setup

Most often, Delegate errors are the result of Delegate setup issues. Ensure you are familiar with how the Delegate and Harness Manager work together.

Another common issue is the SSH key used by the Delegate to deploy to a target host is incorrect. This can happen if the SSH key in Harness Secrets Management was set up incorrectly, or if it is not the correct key for the target host, or the target host is not set up to allow SSH connections.

Delegate Connection Failures To Harness Manager

If the Delegate cannot connect to the Harness Manager, try the following:

  1. Use ping on the Delegate host to test if response times for app.harness.io or another URL are reasonable and consistent.
  2. Use traceroute on app.harness.io to check the network route.
  3. Use nslookup to confirm that DNS resolution is working for app.harness.io.
  4. Connect using the IP address for app.harness.io (get the IP address using nslookup), for example: http://35.23.123.321/#/login.
  5. Flush the client's DNS cache
    1. Windows: ipconfig /flushdns
    2. Mac/Linux: sudo killall -HUP mDNSResponder;sudo killall mDNSResponderHelper;sudo dscacheutil -flushcache
  6. Check for local network issues, such as proxy errors or NAT license limits.
  7. For some cloud platforms, like AWS EC2, ensure that security groups allow outbound traffic on HTTPS 443.
  8. Try a different workstation or a smartphone to confirm the connection issue is not local to a single host.

Delegate Successes Followed By Failures

If you have incorrectly used the same Kubernetes Delegate YAML file for multiple Delegates, you will see Delegate successes followed by failures in the Delegate logs. This sequence is the result of one Delegate succeeding in its operation and the same operation failing with the second Delegate.

To avoid any Delegate conflicts, always use a new Kubernetes Delegate YAML download for each Delegate you install, and a unique name. For Kubernetes Delegates, you can increase the number of replicas run using a single Delegate download YAML file (change the replicas setting in the file), but to run multiple Delegates, use a new Delegate download from Harness for each Delegate.

No Delegates Could Reach The Resource

This error means that no Delegate could meet the URL criteria for validation. For more information, see How Does Harness Manager Pick Delegates?.

WARNING: ulimit -n is too low (1024)

In Linux, you can change the maximum amount of open files allowed. You modify this number using the ulimit command. It grants you the ability to control the resources available for the shell or process started by it.

The Harness Shell Script Delegate requires a minimum of 10000. By default, most Linux virtual machines will have 1024.

To increase the ulimit, do the following:

  1. Open an SSH session into your Linux virtual machine.
  2. Open the limits configuration file as a root user:

    $ sudo nano /etc/security/limits.conf
  3. Add the following settings:

    * soft nofile 10000 * hard nofile 10000 root soft nofile 10000 root hard nofile 10000
    You can also set the limit for a user named fred like this:

    fred soft nofile 10000
    fred hard nofile 10000
  4. Save the limits configuration file (Ctrl+x).
  5. Log out and back into the SSH session.
  6. View the ulimit:

    $ ulimit -n
    10000
  7. You may now navigate to the Shell Script Delegate folder and run the Delegate without encountering the ulimit error.

If you are simply testing a Delegate on your local Mac, use the following commands to display and raise the ulimit:

$ launchctl limit maxfiles

$ sudo launchctl limit maxfiles 65536 200000

Google Cloud Platform: Cluster has unschedulable pods

If you do not have enough space available in your Kubernetes cluster, you might receive the following error:

Cause

Depending on the size of your cluster, without Autoscaling enabled or enough space, your cluster cannot run the delegate.

Solution

Add more space (see Delegate Requirements above), or turn on Autoscaling, wait for the cluster to restart, reconnect to the cluster, and then rerun the command:

$ kubectl apply -f harness-delegate.yaml

For more information, see Autoscaling Deployments from Google.

Deleting a Kubernetes Delegate

In the case where you have to delete a Harness Delegate from your Kubernetes cluster, you can delete the StatefulSet for the Delegate. Once created, the StatefulSet ensures that the desired number of pods are running and available at all times. Deleting the pod without deleting the StatefulSet will result in the pod being recreated.

For example, if you have the Delegate pod name mydelegate-vutpmk-0, you can delete the StatefulSet with the following command:

$ kubectl delete statefulset -n harness-delegate mydelegate-vutpmk

Note that the -0 suffix in the pod name is removed for the StatefulSet name.

Need to Use Long Polling for Delegate Connection to Harness Manager

By default, the Harness Delegate connects to the Harness Manager over a TLS-backed WebSocket connection, sometimes called a Secure WebSocket connection, using the wss:// scheme ( RFC 6455). Some network intermediaries, such as transparent proxy servers and firewalls that are unaware of WebSocket, might drop the WebSocket connection. To avoid this uncommon error, you can instruct the Delegate to use long polling.

To set up the Delegate to use long polling, you use the Delegate YAML file.

For a Kubernetes Delegate, you can set the POLL_FOR_TASKS setting to true in the harness-delegate.yaml file:

...
env:
...
- name: POLL_FOR_TASKS
value: "true"
...

For the Shell Script Delegate, edit the pollForTasks setting to true in the config-delegate.yml file:

pollForTasks: true

For the Docker Delegate, edit the POLL_FOR_TASKS setting to true in the config-delegate.yml file:

-e POLL_FOR_TASKS=true \

For the ECS Delegate, edit the POLL_FOR_TASKS setting in the ecs-task-spec.json file:

{
"name": "POLL_FOR_TASKS",
"value": "true"
},

Common Errors and Alerts

This section lists common error and alert messages you might receive.

No Delegates Could Reach The Resource

This error means that no Delegate could meet the URL criteria for validation. When a task is ready to be assigned, the Harness Manager first validates its lists of Delegates to see which Delegate should be assigned the task. It validates the Delegate using the URL in the task, such as a API call or SSH command. See How Does Harness Manager Pick Delegates?.

Harness SecretStore Is Not Able to Encrypt/Decrypt

Error message:

Secret manager Harness SecretStore of type KMS is not able to encrypt/decrypt. Please check your setup

This error results when Harness Secret Manager (named Harness SecretStore) is not able to encrypt or decrypt keys stored in AWS KMS. The error is usually transitory and is caused by a network connectivity issue or brief service outage.

Check Harness Site Status and AWS Status (search for AWS Key Management Service).

Editing a Notification Group Based Rule Is Not Supported Anymore

When trying to edit the Notification Strategy section of a Workflow, you might see this error message:

Editing a Notification Group based rule is not supported anymore, please delete and create a new one

Your Workflow might have some unsupported Notification Strategy setting that was not migrated to the current method for notifications.

To fix this, remove the existing Notification Strategy in one of the following ways:

  • Click the X next to the strategy in the Workflow's Notification Strategy section, and then create a new strategy.
  • Click the Configure As Code button (</>) in the Workflow, delete instances of notificationGroups, and then create a new strategy. The current method uses userGroupIds:
...
notificationRules:
- conditions:
- FAILED
executionScope: WORKFLOW
notificationGroupAsExpression: false
userGroupAsExpression: false
userGroupIds:
- GXFtTUS2Q9Gmo1r4MhqS4g
...

Trigger Rejected

If you use a Webhook Trigger to execute a Workflow or Pipeline deployment but the name for the artifact in the cURL command is different than the name of the artifact, you might receive this error:

Trigger Rejected. Reason: Artifacts Are Missing for Service Name(S)

This error can happen if a bad name for an artifact build version is placed in the cURL command. For example, a prefix in the artifact naming convention that was not added to the Trigger cURL command. Here is a cURL example showing a buildNumber v1.0.4-RC8:

curl -X POST -H 'content-type: application/json' 
--url https://app.harness.io/gateway/api/webhooks/TBsIRx. . .
-d '{"application":"tavXGH . . z7POg","artifacts":[
{"service":"app","buildNumber":"v1.0.4-RC8"}]}’

If the artifacts available for the Harness Service have a prefix or a different naming convention, such as myApp/v1.0.4-RC8, then the cURL command will not work.

Always ensure that the Webhook cURL command has the correct artifact name.

Exception in WinRM Session

When you deploy to a Windows environment, Harness makes a WinRM connection. Before it can make the connection, the Harness Delegate must be able to resolve any domain names for the target virtual network. If the target virtual network for a deployment does not have DNS enabled, you might see the following error:

Exception in WinrmSession. . . Buffer already closed for writing

Ensure that the virtual network allows DNS name resolution so that Harness can resolve names before running the WinRM connection. For more information, see Set Up WinRM on Instances and Network and Add IIS Deployment Environment in AWS or Azure.

Could not Fetch Container Metadata

If the Harness Delegate cannot fetch container details during a deployment step, you might see the error:

Could not fetch container metadata. Verification steps using containerId may not work

In the case of verification steps, the error message might be:

No analysis was done because no data was available during this time

For example, the ECS container agent provides an API operation for gathering details about the container instance where the agent is running and the associated tasks running on that instance. Harness uses the cURL command from within the container instance to query the Amazon ECS container agent on port 51678 and return container instance metadata or task information.

In the ECS case, ensure that the security group(s) used by AWS EC2 instances permits inbound traffic over TCP port 51678. For more information, see Amazon ECS Container Agent Introspection from AWS.

Somewhere in the deployment logs you will see the API URL the Delegate was using:

Could not connect to url http://10.16.4.97:51678/v1/tasks: connect timed out

As the Delegate could not reach the URL, you need to ensure that the URL is reachable from the Delegate by opening ports, checking security groups, etc.

This error can occur during verification steps, also. In this case Harness needs container metadata for verification, but as the Delegate cannot fetch metadata, Harness tries fetching verification data by IP address. If there is no IP address host data in the Verification Provider, Harness will not receive information for the deployed nodes.

Naming Conventions

Some naming conventions in repositories and other artifact sources, or in target infrastructures, cannot be used by Harness. For example, if a Harness Trigger Webhook uses a Push notification from a Git repo branch that contains a dot in its name, the Trigger is unlikely to work.

Character support in Harness Environment and Service infrastructure/Infrastructure Definition entity names is restricted to alphanumeric characters, underlines, and hyphens. The restriction is due to compatibility issues with Harness backend components, database keys, and the YAML flow where Harness creates files with entity names on file systems.

Secrets

The following issues can occur when using Harness secrets.

Secrets Values Hidden In Log Output

If a secret's unencrypted value shares some content with the value of another Harness variable, Harness will hide the secret's value in any logs. Harness replaces the secret's conflicting value with the secret's name in any log displays. This is for security only, and the actual value of the secrets and variables are still substituted correctly.

AWS KMS 403

The Harness Delegate runs in your target deployment environment and needs access to the default Harness AWS KMS for secrets management. If it does not have access, the following error can occur:

Service: AWSKMS; Status Code: 403

Ensure that the Delegate can reach the Harness KMS URL by logging into the Delegate host(s) and entering the following cURL command:

curl https://kms.us-east-1.amazonaws.com

Next, ensure that your proxies are not blocking the URL or port 443.

If this does not fix your error, and you are not using the default Harness KMS secret store, the AWS KMS access key provided in Harness for your own KMS store is likely invalid.

Configure as Code and Git Sync

The following issues can occur when using Harness Configure as Code and Git Sync.

Git Push to Harness Fails

If your Harness Application is synched two-way with your Git repo, the Git push to Harness might not work unless all of the required Application settings are fully-configured in your Git YAML files before pushing your Application up to Harness.

For example, if you have defined a Service Infrastructure/Infrastructure Definition in your Git YAML files, but its required fields are incomplete, the push to Harness will likely fail.

This is no different than trying to submit incomplete settings in the Harness Manager.

In many cases, it is best to first use Harness Manager to configure your Application, ensuring all required settings are configured, and then sync that with your repo. Unless you remove any required settings in the Git files, the Application will sync with Harness successfully.

Need to Reset History on Synced Git Repository

In some cases where you sync your Git repo with Harness, you might need to reset the Git repo history because of an error, such as accidentally adding a secret's value in the repo. For example, Git has a Rewriting History option.

To repair this scenario, do the following:

  1. Remove the Harness Webhook configured on your Git repo account (Github, Bitbucket, etc). This step is critical to ensure the Application history deletion does not propagate to Harness the next time it is synched.
  2. Delete all of your Harness Applications on the Git account.
  3. Re-sync each Harness Application using the Configuration As Code sync functionality. See Configuration as Code.
  4. Confirm that all of your Applications are visible in your Git repo.
  5. Notify Harness to confirm we don't see any issues/errors in your account.
  6. Enable the Webhook in Git repo and test syncing.

Triggers

This section covers error messages you might see when creating, updating, deleting, or executing a Trigger. It includes authorization/permissions steps to resolve the errors.

About Triggers and Authorizations

A Trigger involves multiple settings, including Service, Environment, and Workflow specifications. Harness examines these components as you set up a Trigger. You might be authorized for one component selected in a Trigger, such as a Service, but not another, such as an Environment. In these cases, an error message will alert you to missing authorizations.

To determine if you are authorized to create Triggers for a particular Environment or other components, review:

  • All the permissions of your Harness User Group.
  • The Usage Scope of the Cloud Provider, and of any other Harness connectors you have set up.

For further details, see Managing Users and Groups (RBAC) and Connectors Overview.

User does not have "Deployment: execute" permission

Error messages of the form User does not have "Deployment: execute" permission indicate that your user group's Application Permissions > Action settings do not include execute in the scope of the specified Application and/or Environment. To resolve this, see Application Permissions.

User not authorized

The following error message indicates that a non-Administrator has tried to submit a Trigger whose Workflow Variables: Environment field is configured with a variable, rather than with a static Environment name:

User not authorized: Only members of the Account Administrator user group can create or update  Triggers with parameterized variables

Submitting a Pipeline Trigger that includes such a Workflow will generate the same error.

One resolution is to set the Environment field to a static value. But if the Environment setting must be dynamic, a member of the Account Administrator user group will need to configure and submit the Trigger.

Deployments

The following issues can occur with manual or Trigger-based Deployments.

Trigger Policy Limits

Harness collects new artifacts from your repository on an interval basis. By default, if more than one of the same artifact is collected during the same polling interval, only the latest version will be used to initiate a Trigger set for On New Artifact.

If you prefer to have a deployment triggered for every single version of an artifact, Harness can implement a per artifact deployment.

Here is an example from the a log showing two -pr-#### versions for an artifact collected at the same time which resulted in two deployments starting:

May 24 10:45:07 manager-XXXXXXXXXX-jhjwb manager-Harness-Prod-KUBERNETES
INFO [1.0.32101] [notifyQueue-handler-XXXXXXX--XXXXXXX]
INFO software.wings.delegatetasks.buildsource.BuildSourceCallback -
[[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-branch-foo-bar-plugin-2-x,
523ccXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX32a7-pr-2462, dd32432955979f5745ba606e50320f2a25f4bff4,
dd324XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXbff4-branch-sku-lookup,
dd324XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXbff4-pr-2718]] new artifacts collected for artifactStreamId XXXXXXXXXXXXXXX

The corresponding Trigger that looks for all -pr versions and starts a deployment using the -pr-[0-9]+$ expression:

Deployment Rate Limits

Harness applies an hourly and daily deployment limit to each account to prevent configuration errors or external triggers from initiating too many undesired deployments. If you are notified that you have reached a limit, it is possible that undesired deployments are occurring. Please determine if a Trigger or other mechanism is initiating undesired deployments. If you continue to experience issues, contact Harness Support.

The daily limit is 100 deployments every 24 hours. The hourly limit is 40 deployments and is designed to detect any atypical upsurge of deployments.

Verifications

The following issues can occur with verifications.

License Type Does Not Support Running This Verification

Harness Continuous Verification is not fully supported for Harness Essentials and Community Editions. If you are running one of those editions, you might receive the following error:

Your license type does not support running this verification

For Harness Essentials and Community, Continuous Verification on metrics is supported for Prometheus, CloudWatch, and Stackdriver only.

You can upgrade to Harness Professional or use Prometheus, CloudWatch, and Stackdriver.

Search Keyword Too Vague

If a 24/7 Service Guard or deployment verification is not reporting any data, the search keywords for verification provider might be too broad.

For example, when you set up the 24/7 Service Guard or deployment verification step, you might have level:ERROR in the Search Keywords setting.

If the Search Keywords settings in the verification provider settings are too broad, when you click the Guide for Example button for the Host Name Field, Guide for Example can return field names for application records other than your target application.

To fix this, click Guide from Example several times. This will give you different field lists for the different applications until you find the correct field for the host name. Refreshing the browser can also pick up a new sample.

AWS AMI

The following errors might occur when setting up and deploying AMIs in Harness. For deployment steps, see AWS AMI Deployments.

Auto Scaling Group Not Showing Up

When you configure a Service Infrastructure/Infrastructure Definition, the Service Infrastructure/Infrastructure Definition dialog's Auto Scale Group drop-down will initially be empty. This is expected behavior. Simply allow a few seconds for the drop-down to populate.

Couldn't Find Reference AutoScalingGroup

If a Workflow's Setup AutoScaling Group step fails with a message of the following form, this indicates that at least one Service Infrastructure/Infrastructure Definition in the Workflow's Environment is configured with an ASG that is not currently available on AWS:

Couldn't find reference AutoScalingGroup: [ECS__QA__Application_AMI_QA__245] in region: [us‑east-1]

To correct this:

  1. In Harness Manager, navigate to your Application's Environments details page.
  2. Open each Service Infrastructure/Infrastructure Definition used by the Workflow that failed, then open the Service Infrastructure/Infrastructure Definition's configuration section. Ensure that the Auto Scaling Groups field points to an ASG to which you currently have access in the AWS Console.
  3. If this does not allow your deployment to proceed, you might also need to toggle the Host Name Convention field's entry between the publicDnsName and privateDnsName primitives. (This depends on whether the Launch Configuration that created your ASG template was configured to create a public DNS name.) For details, see AWS' IP Addressing in a VPC topic.
Harness Manager will prevent you from simply removing a misconfigured Service Infrastructure/Infrastructure Definition, if it's referenced by any of your Application's Workflows. So in some cases, you might find it easiest to create a new Service Infrastructure/Infrastructure Definition, reconfigure your Workflow to use that Service Infrastructure/Infrastructure Definition, and then delete the broken Service Infrastructure/Infrastructure Definition.

Valid Blue/Green Deployment Failed and Rolled Back in Harness

This can occur when Harness' steady state timeout setting is too restrictive, compared to the time AWS requires to swap your Target Groups' routes.

To resolve the rollbacks: In your Blue/Green Workflow's Step 1 (Setup AutoScaling Group), try raising the Auto Scaling Steady State Timeout (mins) setting to at least match the switchover interval you observe in the AWS Console.

AWS ECS

The following errors might occur when setting up and deploying ECS in Harness:

For information on ECS troubleshooting, see Amazon ECS Troubleshooting from AWS.

Rate Exceeded

A common issue with AWS deployments is exceeding an AWS rate limit for some AWS component, such as ECS clusters per region or maximum number of scaling policies per Auto Scaling Groups.

For steps to increase any AWS limits, see AWS Service Limits from AWS.

New ARN and Resource ID Format Must be Enabled

Harness uses tags for Blue/Green deployment, but ECS requires the new ARN and resource ID format be enabled to add tags to the ECS service.

If you have not opted into the new ECS ARN and resource ID format before you attempt Blue/Green deployment, you might receive the following error:

InvalidParameterException: The new ARN and resource ID format must be enabled to add tags to the service. Opt in to the new format and try again.

To solve this issue, opt into the new format and try again. For more information, see Migrating your Amazon ECS deployment to the new ARN and resource ID format from AWS.

Unable to Place a Task Because no Container Instance met all of its Requirements

The Upgrade Containers step might show the following message:

(service service-name) was unable to place a task because no container instance met all of its requirements.

Review the CPU requirements in both the task size and container definition parameters of the task definition.

See Service Event Messages from AWS.

Cannot Pull Container Image

You might see Docker errors indicating that when creating a task, the container image specified could not be retrieved.

See Cannot Pull Container Image Error from AWS.

Invalid CPU or Memory Value Specified

See the required settings in Invalid CPU or Memory Value Specified from AWS.

ClientException: Fargate requires that 'cpu' be defined at the task level

Ensure that you add the CPU and Memory settings in the Harness Service Container Specification section—for example:

"cpu" : "1",

"memory" : "512"

ClientException: The 'memory' setting for container is greater than for the task

In the Harness Service Container Specification JSON, there are two settings for memory. The memory setting for the container must not be greater than the memory setting for the task:

{

 "containerDefinitions" : [ {

   "name" : "${CONTAINER_NAME}",

   "image" : "${DOCKER_IMAGE_NAME}",

   "memory" : 512,

   ...

 } ],

 "executionRoleArn" : "${EXECUTION_ROLE}",

 ...

 "cpu" : "1",

 "memory" : "512",

 "networkMode" : "awsvpc"

}

Could Not Reach Http://<IP Address>:<Port>/V1/Tasks to Fetch Container Meta Data

The ECS container agent provides an API operation for gathering details about the container instance on which the agent is running and the associated tasks running on that instance. Harness uses the cURL command from within the container instance to query the Amazon ECS container agent on port 51678 and return container instance metadata or task information.

Ensure that the security group(s) used by AWS EC2 instances permits inbound traffic over TCP port 51678.

For more information, see Amazon ECS Container Agent Introspection from AWS.

AmazonElasticLoadBalancingException: Rate exceeded

You might receive this error as a result of AWS Load Balancer rate limiting. For more information, see Limits for Your Application Load Balancers and Limits for Your Classic Load Balancer from AWS.

AWS Lambda

The following troubleshooting steps should help address common Lambda issues.

User is not authorized to perform: lambda:GetFunction

When you deploy your Workflow you might receive this error:

Exception: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/iamRole_forDelegate/i-XXXXXXXXXXXX 
is not authorized to perform: lambda:GetFunction on resource:
arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:ExampleApp-aws-lambda-Lambda-test
(Service: AWSLambda; Status Code: 403; Error Code: AccessDeniedException;
Request ID: 1e93ab96-985f-11e9-92b1-f7629978142c) while deploying function: ExampleApp-aws-lambda-Lambda-test

This error occurs because the IAM role attached to your EC2 or ECS Delegate host does not have the AWSLambdaRole (arn:aws:iam::aws:policy/service-role/AWSLambdaRole) role attached. The role contains the lambda:InvokeFunction needed:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"*"
]
}
]
}

Attach the AWSLambdaRole (arn:aws:iam::aws:policy/service-role/AWSLambdaRole) policy to the IAM role used by your Delegate host(s).

For more information, see Identity-based IAM Policies for AWS Lambda from AWS.

Exception: The runtime parameter of nodejs6.10 is no longer supported

If you choose Node,js version 6.10 as the runtime for your Lambda function, you might receive this error.

AWS Lambda no longer supports Node,js version 6.10. Use a newer version.

Azure

The following troubleshooting steps should help address common Azure issues.

Failed to pull image

Kubernetes might fail to pull the Docker image set up in your Service:

Event  : Pod   harness-example-deployment-6b8794c59-2z99v   Error: ErrImagePull   Failed
Event : Pod harness-example-deployment-6b8794c59-2z99v Failed to pull image
"harnessexample.azurecr.io/todolist-sample:latest": rpc error: code = Unknown desc = Error response from daemon:
Get https://harnessexample.azurecr.io/v2/todolist-sample/manifests/latest: unauthorized: authentication required Failed

This is caused by the createImagePullSecret setting set to false in the values.yaml file in Service Manifests.

To fix this, set the createImagePullSecret setting set to true, as described in Modify ImagePullSecret:

createImagePullSecret: true

Helm

The following troubleshooting information should help you diagnose common Helm problems:

Failed to Find the Previous Helm Release Version

Make sure that the Helm client and Tiller are installed. Do the following:

  • Verify that Helm is installed.
  • Check if the Git connector being used in the Workflow and the Delegate can connect to the Git repo. Check in the Delegate logs for Git connectivity issues.

Helm Install/Upgrade Failed

Likely, there is an incompatible Helm client or Tiller. The Helm client needs to be lesser or equal to the Tiller version:

To fix this, upgrade Tiller:

helm init --upgrade

First Helm Deployment Goes to Upgrade Path

In some cases, the first Helm deployment goes to the upgrade path even though the Helm version is working fine.

This is the result of a Helm issue: https://github.com/helm/helm/issues/4169.

The issue happens between Helm client versions 2.8.2 to 2.9.1. To fix this, upgrade the Helm client to the version after 2.9.1.

Tiller and Helm in Different Namespaces

A Helm install/upgrade can fail because Tiller is deployed in a namespace other than kube-system.

To fix this, pass the--tiller-namespace <NAMESPACE>as command flag in the Workflow Helm Deploy step.

Unable to get an Update from the Chart Repository

If Harness cannot get an update from a chart repo you have set up for your Helm Service, during deployment you might see the error:

Unable to get an update from the "XYZ" chart repository ... read: connection reset by peer

To fix this, find the Delegate that the Helm update ran on, and then SSH to the Delegate host and run the Helm commands manually. This will confirm if you are having an issue with your Harness setup or a general connectivity issue.

IIS (.NET)

The following problems can occur when deploying your IIS website, application, or virtual directory.

Error: No delegates could reach the resource

You receive this error when deploying your workflow.

Solutions
  • Ensure your artifact can be deployed via WinRM onto a Windows instance. It's possible to select the wrong artifact in Service.
  • Ensure you have access to the deployment environment, such as VPC, subnet, etc.
  • Ensure your WinRM Connection can connect to your instances, and that your instances have the correct ports open. See Set Up WinRM on Your Instances above.

Port Conflicts

Do not target the same port as another website. In service, in Create Website, ensure $SitePort=80 points to a port that isn't in use. In the following example, the port was changed to 8080 to avoid the error:

You can keep the same port and use host header names to host multiple IIS sites using the same port. For more information, see How To Use Host Header Names to Configure Multiple Web Sites in Internet Information Services 5.0. This article is about an old version of IIS, but the same concepts apply.

Kubernetes

The following problems can occur when developing and deploying to Kubernetes.

Invalid Value LabelSelector

If you are deploying different Harness Workflows to the same cluster during testing or experimentation, you might encounter a Selector error such as this:

The Deploymentharness-example-deploymentis invalid: spec.selector: 
Invalid value: v1.LabelSelector{MatchLabels:map[string]string{“app”:“harness-example”},
MatchExpressions:[]v1.LabelSelectorRequirement{}}: field is immutable

This error means that, in the cluster, there is a Deployment with same name which uses a different pod selector.

Delete or rename the Deployment. Let's look at deleting the Deployment. First, get a list of the Deployments:

kubectl get all
...

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.83.240.1 <none> 443/TCP 18d

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/harness-example-deployment 1 1 1 1 4d
...

And then delete the Deployment:

kubectl delete deploy/harness-example-deployment svc/kubernetes

deployment.extensions "harness-example-deployment" deleted

service "kubernetes" deleted

Rerun the Harness deployment and the error should not occur.

ELK Elasticsearch

The following are resolutions to common configuration problems.

Workflow Step Test Error

When you click TEST in the ELK workflow dialog Expression for Host Name popover, you should get provider information:

The following error message can occur when testing the New Relic verification step in your workflow:

ELK_CONFIGURATION_ERROR: Error while saving ELK configuration. No node with name ${hostName} found reporting to ELK

Cause

The expression in the Expression for Host/Container name field is incorrect. Typically, this occurs when the wrong hostName label is selected to create the expression in the Expression for Host/Container name field.

Solution

Following the steps in Verify with ELK again to select the correct expression. Ensure that the name label selected is under the host section of the JSON.

SocketTimeoutException

When you add an ELK verification provider and click SUBMIT, you might see the following error.

Cause

The Harness delegate does not have a valid connection to the ELK server.

Solution

On the same server or instance where the Harness delegate is running, run one of the following cURL commands to verify whether the delegate can connect to the ELK server.

If you do not have a username and password for the ELK server:

curl -i -X POST url/*/_search?size=1 -H 'Content-Type: application/json' -d '{"size":1,"query":{"match_all":{}},"sort":{"@timestamp":"desc"}}'

If you have username and password then use this command:

curl -i -X POST url/*/_search?size=1 -H 'Content-Type: application/json' -H 'Authorization: <Basic: Base64 encoded username:password>'-d '{"size":1,"query":{"match_all":{}},"sort":{"@timestamp":"desc"}}'

If you have token-based authentication, use this command:

curl -i -X POST url/*/_search?size=1 -H 'Content-Type: application/json' -H 'tokenKey: tokenValue'-d '{"size":1,"query":{"match_all":{}},"sort":{"@timestamp":"desc"}}'

If the cURL command cannot connect, it will fail.

If the cURL command can connect, it will return a HTTP 200, along with the JSON.

If the cURL command is successful, but you still see the SocketTimeoutException error in the ELK dialog, contact Harness Support (support@harness.io).

It is possible that the response from the ELK server is just taking very long.

New Relic

The following are resolutions to common configuration problems.

Workflow Step Test Error

When you click TEST in the New Relic workflow dialog Expression for Host Name popover, you should get provider information:

The following error message can occur when testing the New Relic verification step in your workflow:

NEWRELIC_CONFIGURATION_ERROR: Error while saving New Relic configuration. No node with name ${hostName} found reporting to new relic

Here is the error in the Expression for Host Name popover:

Cause

The expression in the Expression for Host/Container name field is incorrect. Typically, this occurs when the wrong hostName label is selected to create the expression in the Expression for Host/Container name field.

Solution

Following the steps in Guide From Example again to select the correct expression. Ensure that the hostName label selected is under the host section of the YAML.

Harness Secret Managers

If the Harness Delegate(s) cannot authenticate with a Secret Manager, you might see an error message such as this:

Was not able to login Vault using the AppRole auth method. 
Please check your credentials and try again

For most authentication issues, try to connect to the Harness Secret Manager from the host running your Harness Delegate(s). This is done simply by using a cURL command and the same login credentials you provided when you set up the Harness Secret Manager.

For example, here is a cURL command for HashiCorp Vault:

curl -X POST -d '{"role_id":"<APPROLE_ID>", "secret_id":"<SECRET_ID>"}' https://<HOST>:<PORT>/v1/auth/approle/login

If the Delegate fails to connect, it is likely because of the credentials or a networking issue.

LDAP SSO

The following errors might occur during the set up or use of LDAP SSO.

Connection Query Error

When setting up the LDAP Provider and attempting the Connection Query, the following message appears.

Invalid request: No delegates could reach the resource.

Here is how the error appears in the LDAP Provider dialog.

Cause

This can occur if the Harness delegate is unable to connect to the LDAP server.

Solution
  • Ensure the delegate is running. In Harness, click Setup, click Harness Delegates, and then verify that the delegate is running. For more information, see Delegate Installation.
  • Ensure that the delegate can connect to the LDAP server from its network location. If the delegate is running in a VPC, ensure that it a has outbound HTTPS and LDAP connections over ports 443 and 389.
  • Ensure the password in the Connection Query is correct.
  • Try to connect with SSL disabled. You might not have SSL configured on your LDAP server.
  • The delegate attempts to resolve the hostname of the LDAP server using DNS. Ensure that the LDAP server host name can be resolved in DNS using nslookup and ping its IP address.

User Query Error

When attempting the User Queries in the LDAP Provider dialog, the following message appears.

Please check configuration. Server returned zero record for the configuration.
Cause

The User Query is unable to return users from the LDAP provider because its search settings do not match users in the LDAP directory.

Solution
  • If the Connection Query is working, the failure of the User Query is likely because the Base DN does not have users in it. Try CN=Users,DC=mycompany,DC=com or DC=mycompany,DC=com.
  • It is possible that the Search Filter does not return users. Try (objectClass=person) and (objectClass=user).

When you attempt to link a user group to an LDAP Provider, Harness uses the settings of the LDAP Provider you configured to locate the groups and group members in your LDAP directory.

In some cases, you might be unable to locate a group, or members of a group.

Cause
  • LDAP Provider settings are incorrect, and cannot locate the group and users you want according to its Connection, User, and Group Query settings.
Solution
  • Locate the group using your LDAP directory tools, such as Active Directory Users and Groups. Confirm the LDAP Base DN for the group and the members of the group. Use that Base DN in the User Query section of the LDAP Provider in Harness.

Submit a Ticket

  1. Click the Help button in the bottom-right of the Harness Manager:
  2. Click Submit a Ticket or Send Screenshot.
  3. Fill out the ticket or screenshot forms and click Submit ticket or Send Feedback.

Harness Support will contact you as soon as possible.


How did we do?