Integration with vCloud Director failing after NSXT upgrade to 4.1.2.0 certificate expired

Issue Clarification: after upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.

Issue Verification:
after Upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.

Root Cause Identification:
>>we confirmed the issue to be related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)

Root Cause Justification:
There are two main factors that can contribute to this behaviour:

NSX Managers have many certificates for internal services.
In version NSX 3.2.1, Cluster Boot Manager (CBM) service certificates were incorrectly given a validity period of 825 days instead of 100 years.
This was corrected to 100 years in NSX 3.2.3.
However any environment originally installed on NSX 3.2.1 will have the internal CBM Corfu certs expire after 825 regardless of upgrade to the fixed version or not.

On NSX-T 3.2.x internal server certificates could expire and no alarm would trigger. There was no functional impact.
Starting from NSX 4.1.0.2, NSX alarms now monitor validity of internal certificates and will trigger for expired or soon to expire certificates. Note on NSX 4.1.x, there is currently no functional impact when an internal certificate expires however alarms will continue to trigger.

Solution Recommendation:
>the issue is related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)

Scripted Resolution
VMware have developed a script that will replace all internal self signed certs with new certs of validity period 100 years.
The script is compatible with NSX version 4.1.0 and above.
The script does not replace API and cluster certificates.
An NSX backup must be taken before running the script. Also ensure the passphrase is known.

This is a python version 3 script which should be run from a client machine which has paramiko and cryptography python packages installed.
Depending on the system this may be installed with a command such as #sudo pip3 install cryptography
The script cannot be run directly on the NSX Manager as it does not have the required python module. It is not supported to install it on the NSX Manager.

1) Download the attached script replace_certs.py.
2) To execute the script run the following command and follow the prompts
#python3 replace_certs.py
3) You will need to input the NSX Manager cluster IP and admin credentials at the relevant prompts.
4) In some environments it may be necessary to increase the timeout value used by the script to allow the script to complete successfully.
long_wait_time defaults to a value of 60 but can be increased to 180 (or higher) and then re-run the script.

If the script does not work in your environment, there is the option to follow the manual procedure.
Alternatively open an SR to report the failure scenario.

Solution Justification:
The issue is related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)
Scripted Resolution
VMware have developed a script that will replace all internal self signed certs with new certs of validity period 100 years.
The script is compatible with NSX version 4.1.0 and above.
The script does not replace API and cluster certificates.
An NSX backup must be taken before running the script. Also ensure the passphrase is known.

This is a python version 3 script which should be run from a client machine which has paramiko and cryptography python packages installed.
Depending on the system this may be installed with a command such as #sudo pip3 install cryptography
The script cannot be run directly on the NSX Manager as it does not have the required python module. It is not supported to install it on the NSX Manager.

1) Download the attached script replace_certs.py.
2) To execute the script run the following command and follow the prompts
#python3 replace_certs.py
3) You will need to input the NSX Manager cluster IP and admin credentials at the relevant prompts.
4) In some environments it may be necessary to increase the timeout value used by the script to allow the script to complete successfully.
long_wait_time defaults to a value of 60 but can be increased to 180 (or higher) and then re-run the script.

If the script does not work in your environment, there is the option to follow the manual procedure.
Alternatively open an SR to report the failure scenario.

Device expanded/shrank messages are reported in the VMkernel log for VMFS-5

Symptoms A VMFS-5 datastore is no longer visible in vSphere 5 datastores view. A VMFS-5 datastore is no longer mounted in the vSphere 5 datastores view. In the /var/log/vmkernel.log file, you see an entry similar to: .. cpu1:44722)WARNING: LVM: 2884: [naa.6006048c7bc7febbf4db26ae0c3263cb:1] Device shrank (actual size 18424453 blocks, stored size 18424507 blocks) A VMFS-5 datastore is mounted in the vSphere 5 datastores view, but in the /var/log/vmkernel.log file you see an entry similar to: .. cpu0:44828)LVM: 2891: [naa.6006048c7bc7febbf4db26ae0c3263cb:1] Device expanded (actual size 18424506 blocks, stored size 18422953 blocks) Purpose This article provides steps to correct the VMFS-5 partition table entry using partedUtil . For more information see Using the partedUtil command line utility on ESX and ESXi (1036609) . Cause The device size discrepancy is caused by an incorrect ending sector for the VMFS-5 partition on the ...

Welcome to Mohamed Fouad blog

Search This Blog

Integration with vCloud Director failing after NSXT upgrade to 4.1.2.0 certificate expired

Comments

Post a Comment

Popular posts from this blog

Calculate how much data can be transferred in 24 hours based on link speed in data center

Device expanded/shrank messages are reported in the VMkernel log for VMFS-5