Tips on Upgrading Fortigate in HA Cluster

Upgrade - what actually happens

When upgrading a Fortigate HA Cluster the following happens:

  1. Admin uploads new FortiOS image via GUI to the Active member.

  2. Active Fortigate verifies validity of the image (tampered/broken image will be rejected).

  3. Active member asks admin whether to Back up configuration and upgrade or just Upgrade. In any case, the current configuration will be stored in the partition of the harddisk, together with the current FortiOS image.

  4. On confirmation, Active member saves new image into the secondary partition of the harddisk, then pushes the image to the Standby member and starts upgrade of the Standby member by uploading the new image to the secondary partition, then making it active partition and reboot.

  5. On successful upgrade of the Standby member and its reboot, the Active member fails over making the upgraded Standby member an Active one.

  6. Formerly Active but now Standby member upgrades itself and on successful completion and reboot joins cluster membership back.

Tips on HA upgrades

  • Always follow the Upgrade Path.

  • Review Release Notes.

  • Back up configuration with local super_admin level account, as even super_admin but remote users from Radius/TACACS/LDAP/whatever do not see the whole configuration - local system admin and some other parts are missing. You cannot revert back with such configuration fully.

  • (Opt - I don’t do it, but some say worth it) Reboot the members one by one to make sure there aren’t any unexpected errors while booting unrelated to the upgrade.

  • Look at the crash logs diagnose debug crashlog read for any issues the firewall may be experiencing, and in general look around to make sure the Fortigate is doing fine before upgrade (e.g. bad idea to start upgrade when CPU is at 100%).

  • Have physical/console access to all members.

  • Have plan for rollback.

  • Have the necessary firmware at hand. It may look easier to just click "Upgrade from Fortiguard", but I’ve seen many cases where it times out, wasting our time.

  • Make backup of the configs - both clear text and encrypted (only encrypted config contains VPN certificates).

  • After upgrading, check the startup error log get sys startup-error-log for errors in converting the configuration, fix config if necessary.

  • Do not stress about the time it takes - Fortigate is busy validating the image, converting the configuration to new version, rebooting each member, full syncing after the reboot. The larger the model/configuration the more time it takes. E.g. Fortigate 1500D takes about 15 minutes to completely upgrade HA Cluster (A/P) with 2 members. It takes 20+ minutes to do the same for 3000D.

  • Do not stress about the failover time. The failover mostly causes 4-5 seconds of downtime if everything goes smoothly. The experienced by the end clients downtime may be longer depending on the topology - e.g. if there are BGP peerings, they will be reset and will return to established as soon as BGP timers are configured to do so. So, it may take 30 seconds or more for BGP routes to be back online.

  • Try to upgrade to the nearest available version only, according to the upgrade path - rolling back is easier. See below discussion on downgrading the cluster.

About rollback/downgrade

There is NO automated rollback in Fortigate. On each upgrade, Fortigate keeps the current version and its configuration in the secondary partition. So, if say you upgrade from 7.2.4 to 7.2.5, Fortigate will keep 7.2.4 and its configuration. This allows us to roll back to the previous version of FortiOS and configuration. The rollback in this situation is easy (for standalone Fortigate) - just make the secondary partition an active one and reboot.

Example: I upgraded this Fortigate 100E to 6.0.6 version from 5.6.11, the upgrade went OK but APs managed by this Fortigate started to have issues. I set the secondary partition with the saved 5.6.11 version as active, did reboot and all reverted successfully. To see the partition and active image use dia sys flash list. To revert back, we set secondary partition as active and reboot:

execute set-next-reboot secondary

exe reboot

Here is the output of dia sys flash list after reverting back:

dia sys flash list

The above really works, except Cluster HA.

  • The commands above are NOT synchronized to the passive member and thus the passive member(s) has no idea that we are reverting to the previous version.

  • If you do run the commands on the active member, it will work but after reboot the active member will come up with older FortiOS and configuration and the passive member may just bail out of the cluster. So you may end up with 2 machines each thinking he is the active member, resulting in split brain.

  • To prevent the above, we may use the procedure, but we HAVE to run execute set-next-reboot secondary command on each member (active/passive) AND reboot them simultaneously.

If, on the other hand, you are jumping more than 1 version up, then it becomes even more problematic to roll back. Configuration versions may be incompatible, e.g. having upgraded to 7.2.4 from 6.0.17 you cannot just upload FortiOS 6.0.17 and stay with configuration from 7.2.4. Fortinet suggest harsh but universal procedure for downgrade - dismantle the cluster execute ha disconnect FGxxxxxxxx <interface to connect to after disconnect> <ip address/mask>, downgrade each member as standalone Fortigate, construct cluster back. Here are details Also here

Troubleshooting tips

  • Reboot slave if it doesn’t sync/upgrade

  • If one member upgraded successfully but the other not - run the upgrade procedure once again from the active member.

  • Give some time for members to sync after the upgrade, may take 5-10-15 mins.

  • As the last resort, having back up of configuration of all versions, disconnect one member from cluster, upgrade the remaining one any way you know, factory reset the second Fortigate, upgrade to the same version of FortiOS as the 1st one, construct cluster again - the new member will get the configuration via full sync.

  • Try to understand what is going on, here is the HA debug part in my cheat * sheet: HA Cluster Debug

Follow me on not to miss what I publish on Linkedin, Github, blog, and more.