I cut my teeth on Port Network (PN) outages when I joined Avaya’s Tier-3 backbone support back in 2006. I was assigned to supporting the S8700 series of duplexed Communication Manager (CM) servers just as CM 3 was being released. Back then, the timers were very tight and a large percentage of my trouble tickets were explaining to customers why an IPSI (Internet Protocol Server Interface) reset, which in turn caused a port network outage.

Avaya uses several different heartbeat mechanisms so that devices know if they have lost connectivity. In the case of port networks, which means any IPSI-controlled cabinet (such as a G650), the heartbeat is variously known as a Sanity Checkslot, Socket Sanity, or IPSI Sanity. This TCP heartbeat is sent to every IPSI every second by the active CM-main (and in duplex CM also by the standby CM). So, if you were to have CM-duplex and duplicated IPSIs in each of the maximum of 64-port networks (2 CM*2 IPSI *64 PN) 256 heartbeats would fly through the network each second.

Originally, the IPSI would react if only three consecutive heartbeats went missing. Starting in CM 3.13, the timer was administrable by an Avaya engineer and in CM 5.0 it became administrable by customers on the CM change system-parameters ipserver-interface form. Now the IPSI Socket Sanity Timeout defaults to 15 seconds (values: 3 to 15 seconds). Data from CM substitutes for missing heartbeats.

Frequently, the cause of missing heartbeats is a mismatch between the IPSI being locked to communicate 100 Mbps/full duplex while the Ethernet switch was set to auto-negotiate (resulting in a half-duplex connection), or vice versa. Also, not enabling quality of service (QoS) to give priority to IPSI traffic, or not segregating the IPSI traffic into a separate physical/virtual LANs, frequently caused problems.

Upon detecting the outage, the IPSI assumes it is sick and reacts by performing a warm reset. During the warm reset, stable calls using resources within the PN stay up. But neither new calls can be initiated nor established calls transition to some other state (e.g. hold) for the obvious reason that there is no connection to CM to manage such transactions. The IPSI’s warm reset generally takes only a few seconds.

If it still doesn’t get heartbeats or data from CM, then after a default of 60 seconds (values: 60 to 120 seconds) the IPSI escalates to a cold reset. All calls using resources within the PN are dropped. On the change system-parameters port-networks form, the PN cold reset delay timer can be modified.

Next, based on the No Service Time Out Interval, the IPSI then waits for a default of 5 minutes (values: 2 to 15 minutes). During that time, while the IPSI is waiting for communication from CM-main, the resources within that PN are unavailable. Note that if one heartbeat gets through, perhaps on a flapping WAN circuit, the timer resets and the countdown starts from the beginning. If the No Service Time Out timer expires, the IPSI then attempts to register to a CM-Survivable Core (SC), formerly known as Enterprise Survivable Servers.

Courtesy : https://www.avaya.com/blogs/archives/2016/01/understanding-avaya-internet-protocol-server-interface-resets.html