Friday 22 July 2016

Cisco BGP CDETS BUG

CSCuj18151 -- To be done

Title:

SRP Switchover Due to BGP Failure

Summary:

The BGP peer to the asr5k i.e CRS is sending a BGP update announcing a bhp vpnv4 prefix with a RD value that is invalid and hence the issue"

CRS is generating the malformed bgp update packet and thats causing ASR5k to reset the bgp connection"

Sample ASR5K messages:
EHAout-MPLS09   13
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11615 <bgp:13> bgp_decode.c:5696] [software internal system] 69.82.10.67-Outgoing [DECODE] NLRI: VPN-IPv4, Invalid RD Type(6560)
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11615 <bgp:13> bgp_decode.c:589] [software internal system] 69.82.10.67-Outgoing [DECODE] Update: Bad Withdrawn-Routes-Len (55949) + 4 > msg_size (528)
2013-Jul-02+09:58:42.508 [snmp 22002 info] [16/0/11615 <bgp:13> trap_api.c:10512] [software internal system syslog] Internal trap notification 119 (BGPPeerSessionDown) vpn EHAout-MPLS09 ipaddr 69.82.10.67
EHAout-MPLS11   15
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11847 <bgp:15> bgp_decode.c:5696] [software internal system] 69.82.10.83-Outgoing [DECODE] NLRI: VPN-IPv4, Invalid RD Type(6560)
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11847 <bgp:15> bgp_decode.c:589] [software internal system] 69.82.10.83-Outgoing [DECODE] Update: Bad Withdrawn-Routes-Len (55949) + 4 > msg_size (528)
2013-Jul-02+09:58:42.508 [snmp 22002 info] [16/0/11847 <bgp:15> trap_api.c:10512] [software internal system syslog] Internal trap notification 119 (BGPPeerSessionDown) vpn EHAout-MPLS11 ipaddr 69.82.10.83



Opening for tracking purposes per Vijay

Analysis:

RFC 4364 --according to this implementation, we have to accept the RD other than 0, 1 and 2. 
16. IANA Considerations


The Internet Assigned Numbers Authority (IANA) has created a new
registry for the "Route Distinguisher Type Field" (see Section 4.2).
This is a two-byte field. Types 0, 1, and 2 are defined by this
document. Additional Route Distinguisher Type Field values with a
high-order bit of 0 may be allocated by IANA on a "First Come, First
Served" basis [IANA]. Values with a high-order bit of 1 may be
allocated by IANA based on "IETF consensus" [IANA].

with out the fix:
BGP RFC 4271 
BGP Error Handling.

This section describes actions to be taken when errors are detected
while processing BGP messages.

When any of the conditions described here are detected, a
NOTIFICATION message, with the indicated Error Code, Error Subcode,
and Data fields, is sent, and the BGP connection is closed (unless it
is explicitly stated that no NOTIFICATION message is to be sent and
the BGP connection is not to be closed). If no Error Subcode is
specified, then a zero MUST be used.


6.3. UPDATE Message Error Handling

All errors detected while processing the UPDATE message MUST be
indicated by sending the NOTIFICATION message with the Error Code
UPDATE Message Error. The error subcode elaborates on the specific
nature of the error.

The NLRI field in the UPDATE message is checked for syntactic
validity. If the field is syntactically incorrect, then the Error
Subcode MUST be set to Invalid Network Field.
If a prefix in the NLRI field is semantically incorrect (e.g., an
unexpected multicast IP address), an error SHOULD be logged locally,
and the prefix SHOULD be ignored.

it was syntactic incorrectness, so we close the connections

R-Comments:

rolled up to v150.main branch.
ChangeSet@1.704.1.1, 2013-07-16 11:53:59-04:00, vkatamre@bxb-mitg-dev03.cisco.com
bgp_route.c, bgp_decode.c:
CSCuh88210 SRP Switchover Due to BGP Failure
when invalid rd type is received, dont treat it has an error and close connection, rather ignore that prefix and continue with the 
rest of the packet.

Cisco BGP CDETS BUG

CSCuj18151 -- To be done

Title:

SRP Switchover Due to BGP Failure

Summary:

The BGP peer to the asr5k i.e CRS is sending a BGP update announcing a bhp vpnv4 prefix with a RD value that is invalid and hence the issue"

CRS is generating the malformed bgp update packet and thats causing ASR5k to reset the bgp connection"

Sample ASR5K messages:
EHAout-MPLS09   13
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11615 <bgp:13> bgp_decode.c:5696] [software internal system] 69.82.10.67-Outgoing [DECODE] NLRI: VPN-IPv4, Invalid RD Type(6560)
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11615 <bgp:13> bgp_decode.c:589] [software internal system] 69.82.10.67-Outgoing [DECODE] Update: Bad Withdrawn-Routes-Len (55949) + 4 > msg_size (528)
2013-Jul-02+09:58:42.508 [snmp 22002 info] [16/0/11615 <bgp:13> trap_api.c:10512] [software internal system syslog] Internal trap notification 119 (BGPPeerSessionDown) vpn EHAout-MPLS09 ipaddr 69.82.10.67
EHAout-MPLS11   15
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11847 <bgp:15> bgp_decode.c:5696] [software internal system] 69.82.10.83-Outgoing [DECODE] NLRI: VPN-IPv4, Invalid RD Type(6560)
2013-Jul-02+09:58:42.508 [bgp 85000 error] [16/0/11847 <bgp:15> bgp_decode.c:589] [software internal system] 69.82.10.83-Outgoing [DECODE] Update: Bad Withdrawn-Routes-Len (55949) + 4 > msg_size (528)
2013-Jul-02+09:58:42.508 [snmp 22002 info] [16/0/11847 <bgp:15> trap_api.c:10512] [software internal system syslog] Internal trap notification 119 (BGPPeerSessionDown) vpn EHAout-MPLS11 ipaddr 69.82.10.83



Opening for tracking purposes per Vijay

Analysis:

RFC 4364 --according to this implementation, we have to accept the RD other than 0, 1 and 2. 
16. IANA Considerations


The Internet Assigned Numbers Authority (IANA) has created a new
registry for the "Route Distinguisher Type Field" (see Section 4.2).
This is a two-byte field. Types 0, 1, and 2 are defined by this
document. Additional Route Distinguisher Type Field values with a
high-order bit of 0 may be allocated by IANA on a "First Come, First
Served" basis [IANA]. Values with a high-order bit of 1 may be
allocated by IANA based on "IETF consensus" [IANA].

with out the fix:
BGP RFC 4271 
BGP Error Handling.

This section describes actions to be taken when errors are detected
while processing BGP messages.

When any of the conditions described here are detected, a
NOTIFICATION message, with the indicated Error Code, Error Subcode,
and Data fields, is sent, and the BGP connection is closed (unless it
is explicitly stated that no NOTIFICATION message is to be sent and
the BGP connection is not to be closed). If no Error Subcode is
specified, then a zero MUST be used.


6.3. UPDATE Message Error Handling

All errors detected while processing the UPDATE message MUST be
indicated by sending the NOTIFICATION message with the Error Code
UPDATE Message Error. The error subcode elaborates on the specific
nature of the error.

The NLRI field in the UPDATE message is checked for syntactic
validity. If the field is syntactically incorrect, then the Error
Subcode MUST be set to Invalid Network Field.
If a prefix in the NLRI field is semantically incorrect (e.g., an
unexpected multicast IP address), an error SHOULD be logged locally,
and the prefix SHOULD be ignored.

it was syntactic incorrectness, so we close the connections

R-Comments:

rolled up to v150.main branch.
ChangeSet@1.704.1.1, 2013-07-16 11:53:59-04:00, vkatamre@bxb-mitg-dev03.cisco.com
bgp_route.c, bgp_decode.c:
CSCuh88210 SRP Switchover Due to BGP Failure
when invalid rd type is received, dont treat it has an error and close connection, rather ignore that prefix and continue with the 
rest of the packet.

Monday 11 July 2016

TCL


Interpeting Ospf Database

OSPF is probably the most common IGP in both the enterprise and service provider networks. It’s a pretty manageable protocol and easy to understand, although sometimes we may get confused with the different announcement types and the situations where they apply. So let’s see if we can put some light in all this.
This is the scenario we’ll use for our tests:
Every OSPF device in the network announces the state of each of its own links. And these announcements reach every OSPF device in the same area. Every time an OSPF router gets an advertisement from another OSPF router, it floods this advertisement to its neighbors (this is a little bit different in multi-access networks). So at the end, every OSPF device in the area has the same topology information.
This info is kept in the OSPF database. That’s why when a new neighbor is discovered, after checking some parameters, the process of exchanging database starts and it’s completely necessary that all the OSPF devices in the same area have the same information.
The first thing that could lead us to confusion about the OSPF database is that in it, we won’t find prefixes but LSAs. Then, each LSA encapsulates one ore more prefixes from neighbor advertisements.  So, when you run the command “show ip ospf database“, don’t expect to find at first all the prefixes that the router can reach.
Let’s start with the different OSPF LSA types, we can find:
  • LSA type 1: called Router Links LSA
  • LSA type 2: called Network Links LSA
  • LSA type 3: called Network Summary LSA
  • LSA type 4: called ASBR LSA
  • LSA type 5: called External LSA
  • LSA type 7: called NSSA External LSA
There are some other LSA types, but they come into play when traffic-engineering and other advanced features are present in the network.
For this post, we will focus on LSA types 1 and 2. We will discuss types 3, 4, 5 and 7 in future posts.

LSA Type-1

This is the basic LSA, every device participating in OSPF sends these and in it announces the state of each of its OSPF links. When the command show ip ospf database is used, every line under the statement Router Link States is an LSA type 1 belonging to a different router.
Every LSA is identified by the router-id of the router which the LSA belongs to.
R3#show ip ospf database

            OSPF Router with ID (3.3.3.3) (Process ID 1)
  Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
1.1.1.1         1.1.1.1         1341        0x80000005 0x0049DB     5
2.2.2.2         2.2.2.2         1355        0x80000005 0x006D6C     5
3.3.3.3         3.3.3.3         1340        0x80000005 0x005BBD     4

  Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.10.23.2      2.2.2.2         1443        0x80000001 0x00F8F2
From the output, we can see that there are 3 devices running OSPF in Area 0, and no more routers, because we got 3 different LSAs type 1. Each router generates one LSA type 1.
We see also information about the age of the LSA, sequence number and checksum. We see also the number of links each LSA carries, from the sixth column.
Then, so far we know there are 3 routers with router-ids 1.1.1.1, 2.2.2.2 and 3.3.3.3. The first one announces 5 links, the second one 5 links too, and the third one 4 links.
If we check these LSAs type 1, we will get more info about each router. And something very important to keep in mind is that every router in the same area has the same database. So we could get to know all the OSPF info of a router without having to connect to that remote router. Just dropping the command show ip ospf database router from any other router in the area (check this post where Daniel decodes what network type a Core OSPF Router is using without doing any show run command in that router).
Let’s check on R3 how R2’s LSA looks like:
R3#show ip ospf database router 2.2.2.2
            OSPF Router with ID (3.3.3.3) (Process ID 1)
  Router Link States (Area 0)
  LS age: 1385
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 2.2.2.2
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000005
  Checksum: 0x6D6C
  Length: 84
  Number of Links: 5

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 2.2.2.2
     (Link Data) Network Mask: 255.255.255.255
      Number of MTID metrics: 0
       TOS 0 Metrics: 1

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 20.20.20.2
     (Link Data) Network Mask: 255.255.255.255
      Number of MTID metrics: 0
       TOS 0 Metrics: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.10.23.2
     (Link Data) Router Interface address: 10.10.23.2
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 1.1.1.1
     (Link Data) Router Interface address: 10.10.12.2
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 10.10.12.0
     (Link Data) Network Mask: 255.255.255.0
      Number of MTID metrics: 0
       TOS 0 Metrics: 10
OSPF makes differentiation between Transit Networks and Stub Networks. A transit network is a network where multiple routers could exist, therefore where multiple neighborships could be built (such as broadcast networks). However, a stub network is a network where no router or at most one other router can exist (such as Loopbacks or point-to-point networks).
From the above output we see that R2 has 5 links connected. The true is that in fact has 4 links, but one of them is counted two times: first time as Stub Network, and second time as a “connected to another router”. From this info we can get to know that this link is apoint-to-point link with network/mask number 10.10.12.0/24, and connected to another router whose router-id is 1.1.1.1. The IP that R2 uses on this link is 10.10.12.2, and its OSPF cost is 10 (from the line saying Metrics). A bunch of nice info!!!
All this information is very valuable because they get a picture of the network, knowing how the routers are connected and the cost of the links.
R3 has to build the spf tree and add the cost to reach R2 in order to know the cost to reach its networks.
We see also two stub networks not connected to any other router. These networks could be Loopback interfaces or point-to-point interfaces with no other router in the link. In our scenario, these networks are the Loopbacks of R2. Looking the info we see the IP and mask of each Loopback interface, and the costs that R2 has to reach them.
There is a link remaining, and it appears as transit network. This transit network deserves a second look:
R3#sh ip ospf database router 2.2.2.2
            OSPF Router with ID (3.3.3.3) (Process ID 1)
  Router Link States (Area 0)

  LS age: 1385
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 2.2.2.2
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000005
  Checksum: 0x6D6C
  Length: 84
  Number of Links: 5

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 2.2.2.2
     (Link Data) Network Mask: 255.255.255.255
      Number of MTID metrics: 0
       TOS 0 Metrics: 1

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 20.20.20.2
     (Link Data) Network Mask: 255.255.255.255
      Number of MTID metrics: 0
       TOS 0 Metrics: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.10.23.2
     (Link Data) Router Interface address: 10.10.23.2
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 1.1.1.1
     (Link Data) Router Interface address: 10.10.12.2
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 10.10.12.0
     (Link Data) Network Mask: 255.255.255.0
      Number of MTID metrics: 0
       TOS 0 Metrics: 10
From this output we get several things. First, it’s a multi-access network, since it appears as transit network. Indeed, there is already a DR, whose IP address is 10.10.23.2, and the IP address that R2 is using in this link is 10.10.23.2, so its R2 itself who is acting as the DR. We can see also the cost that R2 uses to reach this network.
Every time we see a transit network, we should see a Type 2 related to that network advertisement.

LSA Type-2

This LSA is present every time there is an OSPF multi-access network (transit network). In this kind of network, OSPF behaves in a particular way. Instead of each device building a neighborship with each other, a Designated Router (DR) is chosen. Also a Backup DR is chosen (BDR), and the rest of the routers behave as DROther. Every DROther router builds a neighborship with DR and BDR routers in the transit network, and they send their LSA type 1 announcements only to them (using multicast address 224.0.0.6). Then, the DR retransmits these LSA Type 1 to all its neighbors, inside the transit network and outside the transit network. It also builds a new LSA type 2 with information about the transit network. Only the DR is in charge of generating this LSA Type 2, and only the DR sends it to all the neighbors, even the ones in the transit network, putting itself as the virtual owner of this LSA. That’s why every time there is a Transit Network, there is an LSA Type 2 related to that network.
In the OSPF database, the LSA Types 2 are under the line “Net Link States”
R3#show ip ospf database 

            OSPF Router with ID (3.3.3.3) (Process ID 1)
  Router Link States (Area 0)
Link ID         ADV Router      Age         Seq#       Checksum Link count
1.1.1.1         1.1.1.1         1341        0x80000005 0x0049DB 5
2.2.2.2         2.2.2.2         1355        0x80000005 0x006D6C 5
3.3.3.3         3.3.3.3         1340        0x80000005 0x005BBD 4

  Net Link States (Area 0)
Link ID         ADV Router      Age         Seq#       Checksum
10.10.23.2      2.2.2.2         1443        0x80000001 0x00F8F2
Be aware that for the same transit network, there will be as much LSA types 1 announcing it as Routers in that transit network. But there will be only one LSA Type 2 for that network, generated by the DR.
Let’s have a deeper look on this LSA type 2. To check this LSA, we need to use the command show ip ospf database network and the IP address of the link, not the router-id. That IP address is the one that the DR has on that link, in our case 10.10.23.2:
R3# show ip ospf database network 10.10.23.2
            OSPF Router with ID (3.3.3.3) (Process ID 1)
  Net Link States (Area 0)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 711
  Options: (No TOS-capability, DC)
  LS Type: Network Links
  Link State ID: 10.10.23.2 (address of Designated Router)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000001
  Checksum: 0xF8F2
  Length: 32
  Network Mask: /24
 Attached Router: 2.2.2.2
 Attached Router: 3.3.3.3
What do we have here? Well, first of all, we see that the advertising router is 2.2.2.2. So thats the router in charge of gathering all the info of the transit network and building the LSA type 2. We also see that there are 2 routers in this transit network: routers 2.2.2.2 and 3.3.3.3.
But we are missing something, don’t we? There is no cost associated to this network. Well, that’s because the cost to reach that network is announced in the LSA type 1 by each router.

Conclusion

As we have seen, each OSPF router in the network generates an LSA Type 1 with information about all its links. This LSAs are propagated all around the same area, and it gives information about IP Prefixes, costs and type of network (Stub/Transit) of the link. And for those networks that are multi-access networks (Transit Networks), the LSA Type 2 gives info about which routers are running on that network. This LSA Type 2 is generated only by the Designated Router on that transit network.
In future posts we will focus on other LSAs such as LSA Type 3, Type 4, Type 5 and Type 7.

The network which we will focus on for this post is the following:
OSPF Database Topology
One of the advantages of OSPF is the possibility of segmenting the network in areas. In fact, it’s necessary to do it in large networks, so the OSPF Database is smaller and more stable. We will see why later.Let’s have a look in the OSPF Database of R1 and let’s try to get connectivity info from the LSA type 3:
R1#sh ip ospf database
            OSPF Router with ID (1.1.1.1) (Process ID 1)
  Router Link States (Area 0)
Link ID         ADV Router      Age         Seq#       Checksum Link count
1.1.1.1         1.1.1.1         951         0x80000005 0x0049DB 5
2.2.2.2         2.2.2.2         767         0x80000002 0x007665 5
3.3.3.3         3.3.3.3         883         0x80000006 0x005CBA 4

  Net Link States (Area 0)
Link ID         ADV Router      Age         Seq#       Checksum
10.10.23.3      3.3.3.3         912         0x80000001 0x00F8F2

  Summary Net Link States (Area 0)
Link ID         ADV Router      Age         Seq#       Checksum
4.4.4.4         2.2.2.2         757         0x80000001 0x000317
5.5.5.5         3.3.3.3         840         0x80000001 0x00B65B
10.10.24.0      2.2.2.2         757         0x80000001 0x00AD51
10.10.35.0      3.3.3.3         873         0x80000001 0x0016D9
So R1 is getting four different LSAs type 3, two of them announced by the router whose ID is 2.2.2.2 (info extracted from the second colummn), and the other two announced by the router whose ID is 3.3.3.3. These advertising routers (ADV Router) are the ABRs for the Area 0.
Every router belonging to two areas is an ABR (Area Border Router), and it’s in charge of generating a summary of the prefixes in one area and announcing them to the other area as a LSA type 3. In every LSA type 3, the ABR router puts itself as next-hop for the route.
Let’s have a deeper look of the  LSA type 3. We can type the command show ip ospf database summary adv-router followed by the ID of the ABR router to get info of all the LSAs type 3 announced by that router, or we can use the command show ip ospf database summary followed by the IP of the link we want to get more info:
R1#sh ip ospf database summary 4.4.4.4

            OSPF Router with ID (1.1.1.1) (Process ID 1)

  Summary Net Link States (Area 0)

  Routing Bit Set on this LSA in topology Base with MTID 0
  LS age: 829
  Options: (No TOS-capability, DC, Upward)
  LS Type: Summary Links(Network)
  Link State ID: 4.4.4.4 (summary Network Number)
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000001
  Checksum: 0x317
  Length: 28
  Network Mask: /32
 MTID: 0  Metric: 11 
What do we got here? Well, this LSA gives us several key pieces of information:
  • First, we know the info of this LSA is valid and candidate to be added to the RIB, since the routing bit has been set on the LSA
  • The IP of the link and its mask is 4.4.4.4/32
  • In order to reach this link, we must go through the ABR 2.2.2.2
  • The metric for 2.2.2.2 to reach the link 4.4.4.4 is 11
This last bit is very interesting. The metric for that prefix is the metric from R2’s point of view, not the metric that the owner of the route has, as we could find in the R4’s LSA type 1. And this behavior – one router telling its point of view about a route to the rest of the net – reminds me of something … yeah, it reminds me to the distance vector protocols! It’s funny, but OSPF behaves like a distanve vector protocol when it takes the LSA type 3 into account. Do you see the difference? In the LSA type 1, each router informs about its links and its cost, so everyone in the area can get this info directly from the source. However, with LSA type 3, the ABR informs about its point of view. It makes a summary about the original area, it hides the topology in that area, and just gives the info about the prefix and the cost it has to reach that prefix.
This summary, as I mentioned at the beginning, makes the OSPF Database smaller and more stable. Smaller because a LSA type 3 has less info than an LSA type 1 (check Part I of this series  to learn more about LSA type 1 and LSA type 2). And more stable because LSA type 3 are calculated in a less CPU-intensive cycle of SPF (LSA type 1 and type 2 are used in the first SPF calculation, more CPU-intensive; after that, the topology of the network is built, so the rest of LSAs are added to the nodes in the topology in a second less CPU-intensive cycle of the SPF calculation).
Let’s see how R5 perceives the world from the area 5:
R5#sh ip ospf database

            OSPF Router with ID (5.5.5.5) (Process ID 1)

  Router Link States (Area 5)

Link ID         ADV Router      Age         Seq#       Checksum Link count
3.3.3.3         3.3.3.3         598         0x80000004 0x00224D 2
5.5.5.5         5.5.5.5         547         0x80000004 0x0081BF 3

  Summary Net Link States (Area 5)

Link ID         ADV Router      Age         Seq#       Checksum
1.1.1.1         3.3.3.3         598         0x80000003 0x006BB4
2.2.2.2         3.3.3.3         598         0x80000004 0x003BDF
3.3.3.3         3.3.3.3         598         0x80000003 0x00AA77
4.4.4.4         3.3.3.3         598         0x80000004 0x0043C5
10.10.12.0      3.3.3.3         598         0x80000003 0x007486
10.10.13.0      3.3.3.3         598         0x80000003 0x0005FE
10.10.23.0      3.3.3.3         598         0x80000003 0x009663
10.10.24.0      3.3.3.3         598         0x80000004 0x00EDFF
20.20.20.2      3.3.3.3         598         0x80000004 0x00B034
As we see, R5’s Database has only 2 LSA type 1, one from R3 announcing its link in area 5, and another one from R5, with all the info about its links in area 5. The rest of the routing information comes from the different LSA type 3 that the ABR (in our scenario, R3) has injected into area 5. So R3 takes all the info in Area 0 and makes a summary link advertisement to inject in Area 5. It takes not only the LSA type 1 and type 2 present in Area 0, but also the LSA type 3 that there are in Area 0.
There is something interesting to mention here, and it’s that when the ABR injects the LSA type 3 into the nonbackbone area, it takes into account all LSA type 1, type 2 and type 3 present in the backbone area. But only LSA type 1 and type 2 (intra-area routes) present in the nonbackbone area are injected back into the backbone as LSA type 3. LSA type 3 already present in a nonbackbone area are not announced back to the backbone area.
We could make some calculations based on the number of LSAs of each type in each area:
R1#sh ip ospf database

            OSPF Router with ID (1.1.1.1) (Process ID 1)

  Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
1.1.1.1         1.1.1.1         951         0x80000005 0x0049DB 5
2.2.2.2         2.2.2.2         767         0x80000002 0x007665 5
3.3.3.3         3.3.3.3         883         0x80000006 0x005CBA 4

  Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.10.23.3      3.3.3.3         912         0x80000001 0x00F8F2

  Summary Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
4.4.4.4         2.2.2.2         757         0x80000001 0x000317
5.5.5.5         3.3.3.3         840         0x80000001 0x00B65B
10.10.24.0      2.2.2.2         757         0x80000001 0x00AD51
10.10.35.0      3.3.3.3         873         0x80000001 0x0016D9
R5#sh ip ospf database

            OSPF Router with ID (5.5.5.5) (Process ID 1)

  Router Link States (Area 5)

Link ID         ADV Router      Age         Seq#       Checksum Link count
3.3.3.3         3.3.3.3         598         0x80000004 0x00224D 2
5.5.5.5         5.5.5.5         547         0x80000004 0x0081BF 3

  Summary Net Link States (Area 5)

Link ID         ADV Router      Age         Seq#       Checksum
1.1.1.1         3.3.3.3         598         0x80000003 0x006BB4
2.2.2.2         3.3.3.3         598         0x80000004 0x003BDF
3.3.3.3         3.3.3.3         598         0x80000003 0x00AA77
4.4.4.4         3.3.3.3         598         0x80000004 0x0043C5
10.10.12.0      3.3.3.3         598         0x80000003 0x007486
10.10.13.0      3.3.3.3         598         0x80000003 0x0005FE
10.10.23.0      3.3.3.3         598         0x80000003 0x009663
10.10.24.0      3.3.3.3         598         0x80000004 0x00EDFF
20.20.20.2      3.3.3.3         598         0x80000004 0x00B034
R4#sh ip ospf database

            OSPF Router with ID (4.4.4.4) (Process ID 1)

  Router Link States (Area 4)

Link ID         ADV Router      Age         Seq#       Checksum Link count
2.2.2.2         2.2.2.2         1018        0x80000002 0x00CAC9 2
4.4.4.4         4.4.4.4         1015        0x80000003 0x00C0A8 3

  Summary Net Link States (Area 4)

Link ID         ADV Router      Age         Seq#       Checksum
1.1.1.1         2.2.2.2         1037        0x80000001 0x008D98
2.2.2.2         2.2.2.2         1047        0x80000001 0x00FA31
3.3.3.3         2.2.2.2         987         0x80000002 0x002FED
5.5.5.5         2.2.2.2         987         0x80000002 0x0037D3
10.10.12.0      2.2.2.2         1037        0x80000001 0x0032D8
10.10.13.0      2.2.2.2         1037        0x80000001 0x008B74
10.10.23.0      2.2.2.2         1037        0x80000001 0x00B847
10.10.35.0      2.2.2.2         987         0x80000002 0x009652
20.20.20.2      2.2.2.2         1047        0x80000001 0x007085
Counting the LSAs per area:
LSA type 1 and LSA type 2LSA type 3
AREA 03 type 1 and 1 type 2 (the link between R2 and R3 is a broadcast network4
AREA 42 type 1 (one from each router in the area)9
AREA 52 type 1 (one from each router in the area)9
When designing an OSPF network, we have to keep in mind simplicity, scalability and stability. Depending on how many routers we got in the net and the topology of the net, we have to decide how many areas to build and how many routers to put in each area. Of course, one of the main goals is to get a very stable backbone, so it can help us to choose where to put the area border.
We can mention some other things about LSA type 3. For example, if a Router finds the same prefix learned via LSA type 1 and via LSA type 3, it will prefer always the prefix learned via the LSA type 1. This is so because an LSA type 1 means the prefix belongs to the same area where the router is looking at; however, an LSA type 3 is a summary from another area. Imagine there are 2 ABRs in area 5: the first one will inject into Area 0 an LSA type 3 from Area 5. The second ABR will have one LSA type 1 for that prefix from Area 5 and one LSA type 3 for the same prefix from Area 0. It will prefer to use the direct link to Area 5 in order to reach de prefix.
And there is another interesting thing about LSA type 3. We know OSPF doesnt allow to filter LSA type 1 and type 2 messages, because it’s necessary that all routers in the same area have the same LSA type 1 and LSA type 2 info, otherwise the result of the SPF calculation would not be accurate. But it’s allowed to filter LSA type 3, and it can be done at ABRs. Even more, not only filtering, but also aggregating is allowed. For that purpose we can use the following commands at ABRs:
  • area x range: in order to summarize several prefixes in one aggregated prefix. It’s possible to choose to advertised the aggregated (advertised) or not to advertised the aggregated (not-advertised) what means the prefixes are filtered out
  • area x filter-list prefix: to filter out a specific prefix from one area to another area

BGP Confederation

One of the things to take into account when working with iBGP is the necessity of a full-mesh BGP topology. As we already know, BGP has the rule of not announcing to iBGP neighbors what has been learned from another iBGP peer, so that’s why we need to build a full-mesh BGP topology.
As we have mentioned in other posts, there are two alternatives to the full-mesh BGP topology:
  • Route-Reflectors
  • Confederations
In my last post I already talked about Route-Reflectors. Today I want to focus on Confederations.

BGP Confederations

With confederations we partition the whole AS network in sub-AS, without needing to build a full-mesh topology between all the BGP routers involved in the network. Each confederation, or sub-autonomous system, will have a full-mesh BGP topology between the routers forming the confederation. But between confederations, the behaviour will be more like eBGP sessions with some differences (in fact, it’s called Confederation BGP, cBGP). Each confederation has a different sub-AS number, usually a private one (from 64512 to 65534).
The topology we will focus on its as follows:
BGP Confederations Topology
Two things are necessary to build a confederation:
  • The private sub-ASNs that will be used in the confederations (64990 and 64991 in our topology)
  • The ASN that the whole confederations system will assume towards true eBGP sessions (2222 in our topology)
Routers in confederations can have 3 types of BGP sessions:
  • Regular iBGP sessions: with routers inside the same sub-AS. All these routers having iBGP sessions must have the same ASN to identify the BGP process (64990 and 64991 in our topology). Furthermore, it’s necessary to build a full-mesh iBGP topology inside each sub-AS. All the rules for iBGP sessions apply here.
  • cBGP sessions: the BGP session built between sub-AS is called cBGP (confederation BGP), and its characteristics are a mix between iBGP and eBGP sessions. The topology between sub-AS doesn’t need to be full-mesh.
  • Regular eBGP sessions: the whole confederations system behaves as a unique ASN towards eBGP sessions. All the rules for eBGP sessions apply here.
Every router inside the confederations system must know the ASN of its sub-AS, the ASNs peers inside the confederations system, and the ASN that the whole confederations system will assume towards eBGP sessions.
The configuration of each router in the example is as follows:
Configuration on R1:
 router bgp 1111
 network 100.100.100.0 mask 255.255.255.0 route-map LOOPBACK
 neighbor 20.20.12.2 remote-as 2222 
Configuration on R2:
 router bgp 64990
 bgp confederation identifier 2222 
 bgp confederation peers 64991 
 neighbor 3.3.3.3 remote-as 64990
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 3.3.3.3 next-hop-self
 neighbor 4.4.4.4 remote-as 64990
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 4.4.4.4 next-hop-self
 neighbor 20.20.12.1 remote-as 1111
Configuration on R3:
 router bgp 64990
 bgp confederation identifier 2222 
 bgp confederation peers 64991 
 neighbor 2.2.2.2 remote-as 64990
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 4.4.4.4 remote-as 64990
 neighbor 4.4.4.4 update-source Loopback0
Configuration on R4:
 router bgp 64990
 bgp confederation identifier 2222 
 bgp confederation peers 64991 
 neighbor 2.2.2.2 remote-as 64990
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 64990
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 5.5.5.5 remote-as 64991
 neighbor 5.5.5.5 update-source Loopback0
 neighbor 5.5.5.5 ebgp-multihop 2
Configuration on R5:
 router bgp 64991
 bgp confederation identifier 2222
 bgp confederation peers 64990 
 neighbor 4.4.4.4 remote-as 64990
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 4.4.4.4 ebgp-multihop 2
 neighbor 6.6.6.6 remote-as 64991
 neighbor 6.6.6.6 update-source Loopback0
Configuration on R6:
 router bgp 64991
  bgp confederation identifier 2222
  bgp confederation peers 64990
 network 166.166.166.0 mask 255.255.255.0
 neighbor 5.5.5.5 remote-as 64991
 neighbor 5.5.5.5 update-source Loopback0
We must pay attention to the BGP session between R4 and R5. They belong to different confederations, so the BGP session that they build up is a cBGP. This  means some rules from eBGP apply here. And one of the rules is the bgp-multihop limit of 1. In order to build a BGP session between routers of different confederations, we need to use either the interface IP or the command ebgp-multihop 2.
Now, let’s have a look to the AS_Path attribute inside the confederations system. Let’s check how the Loopback 100.100.100.1/24 from R1 is seen by R6:
R6#show ip bgp
BGP table version is 5, local router ID is 6.6.6.6
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i100.100.100.0/24 2.2.2.2                  0    100      0 (64990) 1111 i  
*> 166.166.166.0/24 0.0.0.0                  0         32768 i
R6#
R6#show ip bgp 100.100.100.0/24
BGP routing table entry for 100.100.100.0/24, version 5
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  (64990) 1111
    2.2.2.2 (metric 31) from 5.5.5.5 (5.5.5.5)
      Origin IGP, metric 0, localpref 100, valid, confed-internal, best
You see that? The new AS_Path is (64990) 1111. Inside the confederations system, the AS_Path grows based on the eBGP rule: all the ASN confederations for which the BGP announcement passes through are added between brackets to the AS_Path attribute.
Well, nice. Let’s see how the Loopback 166.166.166.66/24 from R6 is seen by R2:
R2#sh ip bgp
BGP table version is 3, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.100.0/24 20.20.12.1               0             0 1111 i
*>i166.166.166.0/24 6.6.6.6                  0    100      0 (64991) i
R2#
R2#sh ip bgp 166.166.166.0/24
BGP routing table entry for 166.166.166.0/24, version 3
Paths: (1 available, best #1, table default)
  Advertised to update-groups:
     8
  (64991)
    6.6.6.6 (metric 31) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, confed-internal, best
And now, by R1:
R1#sh ip bgp
BGP table version is 8, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.100.0/24 0.0.0.0                  0         32768 i
*> 166.166.166.0/24 20.20.12.2                             0 2222 i
R1#
R1#sh ip bgp 166.166.166.0/24
BGP routing table entry for 166.166.166.0/24, version 8
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  2222
    20.20.12.2 from 20.20.12.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, best
As we can see, towards R1 the info comes from the ASN 2222. All the confederations system topology are hidden to R1.
In the following table we can find how the different attributes of BGP are seen when working with confederations:
FEATURESEEN IN THE CONFEDERATION
PeeringThere is no need for full-mesh peering between sub-autonomous systems.
Within each sub-AS, full-mesh peering is required (or route-reflectors used).
Communications between peersiBGP is used within each sub-AS
cBGP is used between sub-autonomous systems, similar to eBGP but with the following differences:
  • Enhancement of the AS_Path attribute
  • Change in the next-hop handling
Additions to the BGP attributesEnhancements to the AS_Path attribute, adding the sub-AS IDs.
This enhancement is not advertised to the external Autonomous Systems.
Handling of next-hop attributeAlthough there is eBGP like communication between sub-autonomous systems, next-hop attribute is preserved and passed, in opposition to the eBGP rule
Handling of Local Preference attributeAlthough there is eBGP like communication between sub-autonomous systems, the local preference attribute is preserved and passed, in opposition to eBGP rule.
Handling of MED attributeAlthough there is eBGP like communication between sub-autonomous systems, the MED attribute is preserved and passed, in opposition to eBGP rule.
Readvertising a learned prefixBecause the protocol between sub-autonomous systems is like eBGP, the prefixes learned from another sub-AS can be readvertised to other sub-autonomous systems if they are selected as best.
Communications with non member BGP peersIf a member of the confederation is peering with a BGP peer located in another AS, the sub-AS numbers located in the AS_Path attribute are supressed and only the confederation number is passed within the AS_Path attribute.
User of multi-hop parameterMight be needed. cBGP is a derivative of eBGP. Therefore, if the peer IP address is not configured as the interface IP address of the neighbor, multihop setting will be needed.

Things to remember

Some things useful to remember when facing confederations:
  • Confederations are another alternative to the iBGP full-mesh topology.
  • Inside a confederation, all the routers must be in full-mesh (or in a route-reflector topology), and the iBGP rules apply.
  • Between confederations, the multihop limit is 1, so either it’s changed to 2, or the IP of the interface directly connected is used to build the session.
  • The next-hop attribute is not changed, and is passed to the next confederation peer.
  • The AS_Path attribute is modified inside the confederation with the sub-autonomous ASNs. When announcing to an external AS, the sub-autonomous ASNs are supressed and only the ASN of the whole confederations system is added.

BGP Route Reflector

One of the peculiarities of BGP is its rule of not announcing to the iBGP neighbors what has been learned from another iBGP peer.
BGP is a path-vector protocol. It’s very similar to a distance-vector protocol, but with the capability of managing thousands of prefixes in a very efective way, and with some interesting features, such as communities (check the post of Susana’s Influencing BGP path selection with the extcommunity cost attribute).
In order to avoid routing loops, BGP uses the attribute AS_PATH. Every time a router announces a prefix to an external BGP peer, it adds its Autonomous System Number (ASN) to the AS_PATH. So the AS_PATH attribute shows the different ASNs that the packet will pass through before reaching the destination. When a router receives a prefix announcement with its own ASN in the AS_PATH, it rejects that announcement.
That works well with eBGP sessions. But how can BGP detect possible routing loops in iBGP announcements? The rule of “a route learned from an iBGP neighbor cannot be advertised to another iBGP peer” applies here.
That’s why we need to build a full-mesh topology when configuring iBGP on a network. This means we have to configure n*(n-1)/2 BGP sessions in the network, where n stands for the number of BGP routers. In a small network this could be feasible, but in a big network, configuration and operation of such a full mesh network become unbearable.
Full Mesh BGP
There are 2 alternative topologies for this situation:
  • Route-Reflectors
  • Confederations
Let’s go deeper into Route-Reflectors today.

Route-Reflectors

Route-Reflectors (RR) break the mentioned rule. The routers acting as RRs will announce to other iBGP peers whatever they have learned from their iBGP clients. So it’s not needed any more to build a full-mesh topology. It’s enough with each router having an iBGP session with the RR, becoming then clients of the RR:
Route-Reflector Topology
The steps to configure an RR are as follows:
RR#sh run | b router bgp
router bgp 1
 no synchronization
 bgp router-id 1.1.1.1
 neighbor 2.2.2.2 remote-as 1
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 2.2.2.2 route-reflector-client
 neighbor 2.2.2.2 send-community both
The first command in red is the one that makes this router act as route-reflector. As we can see, the route-reflector is configured per neighbor, so we could have a router acting as RR for some routers, and as normal peer for some others.
The second command in red is necessary if we want to transmit BGP communities within the prefixes.
The RR’s behavior is summarized in 3 points:
  • Any prefix learned from an eBGP peer is announced to every iBGP peer, regardless it’s an iBGP client or not.
  • Any prefix learned from an iBGP nonclient peer is announced to every eBGP peer and to iBGP client peers.
  • Any prefix learned from an iBGP client peer is announced to every eBGP peer and to every iBGP peer (client and non-client peers)
There is something interesting about the third point and it’s that a prefix learned from an iBGP client is reflected by the route-reflector to every iBGP peer, including the one who originated the route.
There are also some interesting topologies where two routers with an iBGP session between them serve as RR to a third router.
In order to avoid routing loops in these two scenarios, route-reflector topologies need to assume some new techniques. And the new technique is to append some attributes to the reflected prefixes.
Whenever an iBGP prefix is reflected, the route reflector appends two optional, non-transitive attributes to the BGP prefix:
  • Originator ID: it’s the router ID of the iBGP peer from which the prefix has been received. Every time the prefix is reflected by the first time, the router ID is copied into the Originator ID attribute.
  • Cluster-list: every RR is assigned a Cluster-ID. When the prefix is reflected, the Cluster ID of the RR is added to the Cluster-list.
In the following output we can see in R2 the prefix 33.33.33.0/24 reflected from the RR. This prefix belongs to router 3.3.3.3:
R2#sh ip bgp 33.33.33.0
BGP routing table entry for 33.33.33.0/24, version 9
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  Local
    3.3.3.3 (metric 21) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 3.3.3.3, Cluster list: 1.1.1.1
The RR adds the IP 3.3.3.3 as Originator (because of the the router who announces the prefix) and the IP 1.1.1.1 as Cluster-list (that’s the RR’s Loopback).
With these two new attributes, routers can detect if there is a routing loop in the announced prefix:
  • When a client peer sees its router-id in the Originator ID of a prefix, it rejects the prefix.
  • When an RR gets a prefix, it checks the cluster-list: if it finds its cluster ID in the list, it rejects the prefix.
Although these mechanisms ensure loop-free selection, RFC4456 added new route selection rules that improve the convergence and reduce the amount of BGP announcements propagated across the AS. The most interesting one is:
  • Prefixes with shorter cluster-list attribute are preferred
Before the RFC4456, it was necessary configure the same Cluster-ID for both RR when two RRs were set up. Otherwise, some routing loops could take place. Although this topology offered redundancy if one RR fails, it could lead to partial connectivity in some cases. Just check what could happen if a client looses connection to one of the RR in the cluster: this RR would get the prefix from the other RR, and it would reject it because the cluster-id is already in the cluster-list:
Route-Reflector Topology with partial conectivity
With the new rules in RFC4456, this is not necessary any more. Now you may set up two route-reflectors with different cluster-id without any routing loop taking place. Because the RR will prefer prefixes with shorter cluster-list, both RR can have different cluster-id.
Something we have to take into consideration when working with RR is that every RR will make a BGP decision when receiving two different paths for the same prefix. So RR will announce to its clients its choice. That’s why some companies designed a network with a hierarchical route-reflector topology. But that’s up to you, network designers ;)

Conclusions

In order to configure a router as route-reflector, we do it on a per-neighbor basis, and the command to use is:
router bgp 1
 neighbor 2.2.2.2 route-reflector-client
Then, the RR adds two attributes to the reflected prefixes to avoid routing loops. And these attributes are Origin (the IP of the owner of the prefix) and Cluster-List (a list with the Cluster-IDs where the prefix has passed through).