Sunday 29 October 2017

OSPF Fast Convergence

http://blog.ine.com/2010/06/02/ospf-fast-convergenc/

Convergence = Failure_Detection_Time + Event_Propagation_Time + SPF_Run_Time + RIB_FIB_Update_Time
The formula reflects the fact that convergence time for a link-state protocol is sum of the following components:


  • Time to detect the network failure, e.g. interface down condition.
  • Time to propagate the event, i.e. flood the LSA across the topology.
  • Time to perform SPF calculations on all routers upon reception of the new information.
  • Time to update the forwarding tables for all routers in the area.

Part I: Fast Failure Detection


BFD

Part II: Event Propagation


1) LSA generation delay

timers throttle lsa initial hold max_wait

initial - The router delays LSA generation the initial amount if milliseconds and sets the next interval to hold milliseconds

hold -  Thus, all events occurring after the initial delay will be accumulated and processed after the hold time expires. This means the next router LSA will be generated no earlier than hold milliseconds. At the same time, the next hold-time would be doubled, i.e. set to 2*hold

max_waitThe hold-time grows exponentially as 2^t*hold until it reaches the max_wait value. After this, every event received during current hold-time window would result in the next interval being equal to the constant max_wait

2)  LSA reception delay. This delay is a sum of the ingress queueing delay and LSA arrival delay. 

3)  Processing Delay is the amount of time it takes the router to put the LSA on the outgoing flood lists.

 it is required that LSAs are always flooded prior to SPF run. ISIS process in Cisco IOS supports the command fast-flood

 Interface flooding pacing is the OSPF feature that mandates a minimum interval between flooding consecutive LSAs out of an interface. This timer runs per interface and only triggers when there is an LSA needed to be sent out right after the previous LSA. The process-level command to control this interval is timers pacing flood (ms) with the default value of 55ms. 

Part III: SPF Calculations

 SPF calculation times and RIB manipulation time.

So how would one pick up optimal SPF throttling values? As mentioned before, the initial delay should be kept as short as possible to allow for instant reaction to a change but long enough not to trigger the SPF before the LSA is flooded out

Part IV: RIB/FIB Update

After completing SPF computation, OSPF performs sequential RIB update to reflect the changed topology. 
 There are two major ways to minimize the update delay: advertise less prefixes and sequence FIB updates so that important paths are updated before any other.

Sample Fast Convergence Profile:


Failure Detection Delay: about 5-10ms worst case to detect/report loss of network pulses.
Maximum SPF runtime: 32ms, doubling for safety makes it 64ms
Maximum RIB update: 10ms, doubling for safety makes it 20ms
OSPF interface flood pacing timer: 5ms (does not apply to the initial LSA flooded)
LSA Generation Initial Delay: 10ms (enough to detect multiple link failures resulting from SRLG failure)
SPF Initial Delay: 10ms (enough to hold SPF to allow two consecutive LSAs to be flooded)
Network geographical size: 100 miles (signal propagation is negligible)
Network physical media: 1 Gbps links (serialization delay is negligible)
Estimated network convergence time in response to initial event: 32*2 + 10*2 + 10 + 10 = 40+64 = 100ms. This estimation does not precisely account for FIB update time, but we assume it would be approximately the same as RIB update. We need to make sure out maximum backoff timers exceed this convergence timer to ensure processing is delay above the convergence interval in the worst case scenario.
LSA Generation Hold Time: 100ms (approximately the convergence time)
LSA Generation Maximum Time: 1s (way above the 100ms)
OSPF Arrival Time: 50ms (way below the LSA Generation hold time)
SPF Hold Time: 100ms
SPF Maximum Hold Time: 1s ( Maximum SPF runtime is 32ms, meaning we skip 30 SPF runtimes in the worst condition. This results in SPF consuming no more than 3% of CPU time under worst-case scenario).
Now estimate the worst-case convergence time: LSA_Maximum_Delay (1s) + SPF_Maximum_Delay (1s) + RIB_Update (

Saturday 28 October 2017