Issue Clarification: after upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.
Issue Verification:
after Upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.
Root Cause Identification:
>>we confirmed the issue to be related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)
Root Cause Justification:
There are two main factors that can contribute to this behaviour:
NSX Managers have many certificates for internal services.
In version NSX 3.2.1, Cluster Boot Manager (CBM) service certificates were incorrectly given a validity period of 825 days instead of 100 years.
This was corrected to 100 years in NSX 3.2.3.
However any environment originally installed on NSX 3.2.1 will have the internal CBM Corfu certs expire after 825 regardless of upgrade to the fixed version or not.
On NSX-T 3.2.x internal server certificates could expire and no alarm would trigger. There was no functional impact.
Starting from NSX 4.1.0.2, NSX alarms now monitor validity of internal certificates and will trigger for expired or soon to expire certificates. Note on NSX 4.1.x, there is currently no functional impact when an internal certificate expires however alarms will continue to trigger.
Solution Recommendation:
>the issue is related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)
Scripted Resolution
VMware have developed a script that will replace all internal self signed certs with new certs of validity period 100 years.
The script is compatible with NSX version 4.1.0 and above.
The script does not replace API and cluster certificates.
An NSX backup must be taken before running the script. Also ensure the passphrase is known.
This is a python version 3 script which should be run from a client machine which has paramiko and cryptography python packages installed.
Depending on the system this may be installed with a command such as #sudo pip3 install cryptography
The script cannot be run directly on the NSX Manager as it does not have the required python module. It is not supported to install it on the NSX Manager.
1) Download the attached script replace_certs.py.
2) To execute the script run the following command and follow the prompts
#python3 replace_certs.py
3) You will need to input the NSX Manager cluster IP and admin credentials at the relevant prompts.
4) In some environments it may be necessary to increase the timeout value used by the script to allow the script to complete successfully.
long_wait_time defaults to a value of 60 but can be increased to 180 (or higher) and then re-run the script.
If the script does not work in your environment, there is the option to follow the manual procedure.
Alternatively open an SR to report the failure scenario.
Solution Justification:
The issue is related to the below KB
NSX alarms indicating certificates have expired or are expiring (94898)
Scripted Resolution
VMware have developed a script that will replace all internal self signed certs with new certs of validity period 100 years.
The script is compatible with NSX version 4.1.0 and above.
The script does not replace API and cluster certificates.
An NSX backup must be taken before running the script. Also ensure the passphrase is known.
This is a python version 3 script which should be run from a client machine which has paramiko and cryptography python packages installed.
Depending on the system this may be installed with a command such as #sudo pip3 install cryptography
The script cannot be run directly on the NSX Manager as it does not have the required python module. It is not supported to install it on the NSX Manager.
1) Download the attached script replace_certs.py.
2) To execute the script run the following command and follow the prompts
#python3 replace_certs.py
3) You will need to input the NSX Manager cluster IP and admin credentials at the relevant prompts.
4) In some environments it may be necessary to increase the timeout value used by the script to allow the script to complete successfully.
long_wait_time defaults to a value of 60 but can be increased to 180 (or higher) and then re-run the script.
If the script does not work in your environment, there is the option to follow the manual procedure.
Alternatively open an SR to report the failure scenario.
Comments
Post a Comment