2023. január 3., kedd

Azure, AKS, Terraform, proxy


Happy New Year to everybody!

It maybe unexpected, but I start the year with some professional content and not my hobbies.

When you try to deploy an Azure Kubernetes Cluster from Terraform, and the cluster has proxy configuration, on any terraform apply on the cluster will cause redeployment, what is not what you expect.

The reason is that the Azure mechanism add a few addresses to the no_proxy list in addition to the ones you set. Changing the no_proxy list will force recreation. The list stored in the state will always be different from the list on the AKS resource itself.

First let check, what are those addresses:

The values above, can be known before the cluster creation, except the last one. This is created during the deployment of the cluster, so adding all of the items to your no_proxy list before the cluster creation will not help, if you have a private cluster.

What you can do:

Add lifecycle management to the cluster. It will look like this:

resource "azurerm_kubernetes_cluster" "aks" {
  lifecycle {
    ignore_changes = [http_proxy_config.no_proxy]
  }
    .
    .
    .

This would work, but you just half way to the solution.

When you reapply your plan, it will not recreate the AKS cluster. But what if you willingly change the no_proxy parameters? In this case this will still ignored and the AKS doesn't recreate, what is not the expected behavior.

Lets assume, you heave the user provided no_proxy parameters in a no_proxy variable:

variable "no_proxy" {
    type = list
    default = []
}

The lifecycle management is able to trigger recreation of the resource with replace_triggered_by property. The problem with it, that the variable above, can't be the source of the trigger. But for example a resource can.

Here comes a dirty trick. How convert a variable list into a resource?

Hashicorp has a Terraform provider named tfcoremock (https://registry.terraform.io/providers/hashicorp/tfcoremock/latest/docs). I'll use it here.

First add it to the providers list:

provider "tfcoremock" {
  use_only_state     = true
}

Now, we can store the list above into the state:

resource "tfcoremock_simple_resource" "user-noproxy" {
  count = var.no_proxy
  string = var.no_proxy[count.index]
}

Now, we can reference it from the replace_triggered_by:

resource "azurerm_kubernetes_cluster" "aks" {
  lifecycle {
    replace_triggered_by = [tfcoremock_simple_resource.user-noproxy]
    ignore_changes = [http_proxy_config.no_proxy]
  }
  http_proxy_config {
    http_proxy = var.http_proxy
    https_proxy = var.http_proxy
    no_proxy = var.no_proxy
  }
    .
    .
    .

}

This is almost perfect, but not completely.

When you change any element in the no_proxy variable list, it will trigger the replacement, but if you add, or remove element from the list, it will ignore it.

One last step. Make the length of the list into work:

resource "tfcoremock_simple_resource" "user-noproxy" {
  count = length(var.no_proxy)
  string = var.no_proxy[count.index]
}

resource "tfcoremock_simple_resource" "user-noproxy-count" {
  number = length(var.no_proxy)
}

resource "azurerm_kubernetes_cluster" "aks" {
  lifecycle {
    replace_triggered_by = [tfcoremock_simple_resource.user-noproxy,
tfcoremock_simple_resource.user-noproxy-count]
    ignore_changes = [http_proxy_config[0].no_proxy]
  }
  http_proxy_config {
    http_proxy = var.http_proxy
    https_proxy = var.http_proxy
    no_proxy = var.no_proxy
  }
    .
    .
    .

}

Now, it is replace the AKS on any change in the no_proxy variable list, but keep it intact otherwise.

Nincsenek megjegyzések:

Megjegyzés küldése