Monday, 11 July 2016

BGP Route Reflector

One of the peculiarities of BGP is its rule of not announcing to the iBGP neighbors what has been learned from another iBGP peer.
BGP is a path-vector protocol. It’s very similar to a distance-vector protocol, but with the capability of managing thousands of prefixes in a very efective way, and with some interesting features, such as communities (check the post of Susana’s Influencing BGP path selection with the extcommunity cost attribute).
In order to avoid routing loops, BGP uses the attribute AS_PATH. Every time a router announces a prefix to an external BGP peer, it adds its Autonomous System Number (ASN) to the AS_PATH. So the AS_PATH attribute shows the different ASNs that the packet will pass through before reaching the destination. When a router receives a prefix announcement with its own ASN in the AS_PATH, it rejects that announcement.
That works well with eBGP sessions. But how can BGP detect possible routing loops in iBGP announcements? The rule of “a route learned from an iBGP neighbor cannot be advertised to another iBGP peer” applies here.
That’s why we need to build a full-mesh topology when configuring iBGP on a network. This means we have to configure n*(n-1)/2 BGP sessions in the network, where n stands for the number of BGP routers. In a small network this could be feasible, but in a big network, configuration and operation of such a full mesh network become unbearable.
Full Mesh BGP
There are 2 alternative topologies for this situation:
  • Route-Reflectors
  • Confederations
Let’s go deeper into Route-Reflectors today.

Route-Reflectors

Route-Reflectors (RR) break the mentioned rule. The routers acting as RRs will announce to other iBGP peers whatever they have learned from their iBGP clients. So it’s not needed any more to build a full-mesh topology. It’s enough with each router having an iBGP session with the RR, becoming then clients of the RR:
Route-Reflector Topology
The steps to configure an RR are as follows:
RR#sh run | b router bgp
router bgp 1
 no synchronization
 bgp router-id 1.1.1.1
 neighbor 2.2.2.2 remote-as 1
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 2.2.2.2 route-reflector-client
 neighbor 2.2.2.2 send-community both
The first command in red is the one that makes this router act as route-reflector. As we can see, the route-reflector is configured per neighbor, so we could have a router acting as RR for some routers, and as normal peer for some others.
The second command in red is necessary if we want to transmit BGP communities within the prefixes.
The RR’s behavior is summarized in 3 points:
  • Any prefix learned from an eBGP peer is announced to every iBGP peer, regardless it’s an iBGP client or not.
  • Any prefix learned from an iBGP nonclient peer is announced to every eBGP peer and to iBGP client peers.
  • Any prefix learned from an iBGP client peer is announced to every eBGP peer and to every iBGP peer (client and non-client peers)
There is something interesting about the third point and it’s that a prefix learned from an iBGP client is reflected by the route-reflector to every iBGP peer, including the one who originated the route.
There are also some interesting topologies where two routers with an iBGP session between them serve as RR to a third router.
In order to avoid routing loops in these two scenarios, route-reflector topologies need to assume some new techniques. And the new technique is to append some attributes to the reflected prefixes.
Whenever an iBGP prefix is reflected, the route reflector appends two optional, non-transitive attributes to the BGP prefix:
  • Originator ID: it’s the router ID of the iBGP peer from which the prefix has been received. Every time the prefix is reflected by the first time, the router ID is copied into the Originator ID attribute.
  • Cluster-list: every RR is assigned a Cluster-ID. When the prefix is reflected, the Cluster ID of the RR is added to the Cluster-list.
In the following output we can see in R2 the prefix 33.33.33.0/24 reflected from the RR. This prefix belongs to router 3.3.3.3:
R2#sh ip bgp 33.33.33.0
BGP routing table entry for 33.33.33.0/24, version 9
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  Local
    3.3.3.3 (metric 21) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 3.3.3.3, Cluster list: 1.1.1.1
The RR adds the IP 3.3.3.3 as Originator (because of the the router who announces the prefix) and the IP 1.1.1.1 as Cluster-list (that’s the RR’s Loopback).
With these two new attributes, routers can detect if there is a routing loop in the announced prefix:
  • When a client peer sees its router-id in the Originator ID of a prefix, it rejects the prefix.
  • When an RR gets a prefix, it checks the cluster-list: if it finds its cluster ID in the list, it rejects the prefix.
Although these mechanisms ensure loop-free selection, RFC4456 added new route selection rules that improve the convergence and reduce the amount of BGP announcements propagated across the AS. The most interesting one is:
  • Prefixes with shorter cluster-list attribute are preferred
Before the RFC4456, it was necessary configure the same Cluster-ID for both RR when two RRs were set up. Otherwise, some routing loops could take place. Although this topology offered redundancy if one RR fails, it could lead to partial connectivity in some cases. Just check what could happen if a client looses connection to one of the RR in the cluster: this RR would get the prefix from the other RR, and it would reject it because the cluster-id is already in the cluster-list:
Route-Reflector Topology with partial conectivity
With the new rules in RFC4456, this is not necessary any more. Now you may set up two route-reflectors with different cluster-id without any routing loop taking place. Because the RR will prefer prefixes with shorter cluster-list, both RR can have different cluster-id.
Something we have to take into consideration when working with RR is that every RR will make a BGP decision when receiving two different paths for the same prefix. So RR will announce to its clients its choice. That’s why some companies designed a network with a hierarchical route-reflector topology. But that’s up to you, network designers ;)

Conclusions

In order to configure a router as route-reflector, we do it on a per-neighbor basis, and the command to use is:
router bgp 1
 neighbor 2.2.2.2 route-reflector-client
Then, the RR adds two attributes to the reflected prefixes to avoid routing loops. And these attributes are Origin (the IP of the owner of the prefix) and Cluster-List (a list with the Cluster-IDs where the prefix has passed through).

No comments:

Post a Comment