BGP Path Selection

  • Best Path Selection:
    • Default chooses single best path.
    • Best path installed in RIB/FIB
    • Advertised to other BGP peers.
    • Decision process
      • RFC 4271
  • Prerequisites:
    • Next hop must be in routing table
      • prevents route recursion failure.
    • Synchronization rule must be met or disabled.
    • AS-Path must not contain local-AS
      • Standard EBGP Loop prevention
      • Can be disabled with ‘allow-as in’
    • First ASN in path must be neighbor’s ASN
      • bgp enforce-first-as’ command.
  • Path Selection Order
    • Weight
      • Cisco proprietary
      • Locally significant
      • Higher value is preferred
    • Local Preference
      • Higher value is preferred
      • Not advertised to EBGP peers.
      • Carried through confederation EBGP.
    • Locally Originated
      • Locally originated gets weight of 32768
    • AS-Path
      • Smaller length is preferred
    • Origin
      • IGP over EBGP over Incomplete
        • IGP origination is from network statement.
        • Incomplete is from redistribution.
          • Network statement preferred over redistribution.
    • MED
      • Smaller value preferred.
      • Only compared for peerings to same provider by default.
        • Typically only used for comparing the same route from the same provider over multiple links.

Tie Breakers:

  • EBGP over iBGP
    • If learned from EBGP, it’s not your prefix.
    • EBGP always preferred.
  • IGP Metric to Next-Hop
    • Can use multi-path if all equal after this step.
    • Hidden command to allow multipath if AS_Path is not the same (has to be same length)
      • ‘bgp bestpath as-path multipath-relax’
  • Additional Tie Breakers:
    • Oldest
    • Lowest RID
    • Shortest cluster list
    • Lowest Neighbor Address
  • Exceptions:
    • ‘bgp bestpath as-path ignore’
    • ‘bgp always-compare-med’
      • Compares MED for routes locally originated in Confederation.
    • ‘bgp bestpath med missing-as-worst’
      • Assign 4,294,967,294 to NULL MED
    • ‘bgp deterministic med’
      • Compare MED against all possible paths.
    • IGP Metric
      • ‘bgp bestpath igp-metric ignore’
        • IOS-XE 3.4S
    • Router-ID
      • ‘no bgp bestpath compare-router-id’

Manipulating Best Path Selection

  • Outbound routing policy affects inbound traffic
  • Inbound routing policy affects outbound traffic
  • Longest match routing is above all
    • Affects both directions.
  • Attributes for influencing outbound path selection:
    • Weight and Local Pref
      • Set inbound
      • Affects outbound traffic
  • Attributes for influencing inbound path selection:
    • AS-Path and MED
      • Set outbound
      • Affects inbound traffic
  • Multipath Load Balancing
    • MP load balancing for external links with unequal bandwidth
      • Enabled for IPv4, IPv6, VPNv4, VRF AF
      • For iBGP, eBGP, eiBGP
    • Still only one best path advertised to peers.

BGP Conditional Route Injection

  • Originates subnets from aggregate for purpose of traffic engineering.
    • Longest prefix.

Configuration:

  • Inject Map
    • More specific subnet to advertise.
    • ‘set ip address prefix-list <list>
  • Exist Map
    • Aggregate to be originated from.
    • ‘match ip address prefix-list <list>
    • ‘match ip route-source prefix-list <list>

In the image below R2, R3 and R5 are all in ASN 100, R8 and R10 in ASN 810.1. R5 R8 and R10 are running EBGP between the two ASNs.

R5 is going to summarize ASN 100 transit links 10.30.2.0 and 10.30.3.0 into ASN 810.1.

Both R8 and R10 are now showing 10.30.0.0/16 for the transit links within ASN 100. To advertise a more specific route that falls within the summary address, a conditional route injection needs to be performed. This is a valid method of traffic engineering. Currently if R10 is trying to reach the subnet 10.30.2.0/30, it will go over the EBGP path directly to R5. This is because AD for EBGP is much lower than AD of iBGP. Injecting a more specific route into BGP originating from R8 would steer traffic from R10 to R8 first because longest prefix is always going to be more preferred.

Configuration:

  • Create Prefix Lists for Aggregate/Summary, Route neighbor/source, and more specific route.

Prefix-list AGGREGATE is the summarized route from R5.

Prefix-list ROUTE-SOURCE is the neighbor advertising the route inbound to ASN 810.1.

Prefix-list Transit_2 is the more specific prefix we want to advertise into BGP/down to R10.

  • Create Route-Maps for ‘INJECT MAP’ and ‘EXIST MAP’.

‘INJECT_MAP’ defines the more specific route, ‘Transit_2’, to be sent into BGP domain.

‘EXIST_MAP’ specifies what’s there, ie. existing Aggregate (10.30.0.0/16) and the neighbor advertising the Aggregate via EBGP.

  • BGP ASN statement

‘bgp inject-map INJECT_MAP exist-map EXIST_MAP’ defines in the global BGP process everything that’s just been discussed. Clearing BGP neighbors might be necessary in lab. The change takes a while to work.

BGP-NLRI

  • Network Layer Reachability Information.
    • In CCIE context, the routes passed between hosts via BGP.
  • Uses UPDATE and WITHDRAW messages to exchange NLRI.
  • BGP NLRI Origination.
    • ‘network’ statement.
      • Requires exact match in routing table.
    • ‘redistribute statement.
      • Won’t include OSPF External by default.
    • ‘aggregate-address’ statement.
      • Requires one subnet in BGP table.
    • ‘bgp inject-map’
      • Opposite of aggregation command.

Network Statement:

  • Originates prefixes with ORIGIN of IGP (i)
  • Requires exact match in RIB.
    • Does not need to be connected, can be learned via IGP.
  • Assumes classful if mask keyword not used.
  • Sets weight to 32768.

BGP Redistribute Statement:

  • Originates prefixes with ‘ORIGIN INCOMPLETE (?).
  • Originates classful summary if auto-summary enabled.
  • Auto copies IGP metric to BGP MED.
  • Won’t include OSPF External by default.
    • ‘redistribute ospf <process> match internal external’
      • Required to redistribute OSPF external.
  • Sets weight to 32768.
  • ‘bgp redistribute internal’
    • By default only external BGP routes are redistributed into IGP with redistribution.
    • Command allows for internal BgP routes to be redistributed into IGP.
    • Can result in routing loop.

BGP Conditional Advertisement:

  • ‘neighbor advertise-map map 1 <non-exist or exist>’
  • Advertise prefix matched in advertise-map
    • If prefix matched in non-exist map does not exist.
  • Inject map
    • Subnet to be advertised.
    • set ip address prefix-list <list>
  • Exist map
    • Aggregate to be originated from
    • match ip address prefix-list <list>
    • match ip route-source prefix-list <list>

The image below shows six routers all running BGP. R1, R2, R3 and R5 are all in ASN 100, while R8 and R10 are in ASN 810.1. R5 and R8 are EBGP peers.

Transit /30 subnets are setup for each link in ASN 100, 10.30.1.0, 10.30.2.0, and 10.30.3.0. These transits are being advertised in the local IGP and in BGP to ASN 810.1. This can be seen in R8’s routing table.

If we wanted to summarize these three subnets into one advertisement, the ‘aggregate-address’ command can be used under the BGP process on R5.

This results in one route showing under R8’s route table.

Similar to EIGRP, summarization in BGP can be performed nearly anywhere. The portion of the command ‘summary-only’ says that the summary will be the only route advertised, no more specific prefixes.

BGP NLRI

  • Network Layer Reachability Information
    • Network Statement
      • MUST BE EXACT MATCH!
    • Redistribute Statement
      • Won’t include OSPF E1/2 by default.
    • Aggregate Address Statement
      • Requires one subnet in BGP Table first.
      • BGP inject-map
        • opposite of aggregation.

BGP Network Statement:

  • Originates prefixes with ORIGIN of IGP (i)
  • Requires exact match in the routing table
    • can be learned via IGP.
  • No mask keyword assumes classful.
  • Sets weight to 32768

BGP Redistribute Statement:

  • Originates prefixes with ORIGIN INCOMPLETE (?)
  • Originates classful summary if auto-summary is enabled.
  • Auto copies IGP metric to BGP MED.
  • Won’t include OSPF External by default.
    • redistribute ospf <id> match internal external
  • Sets weight to 32768.

BGP Redistribute Internal:

  • By default, only external BGP routes are redistributed into IGP with redistribution.
  • ‘bgp redistribute internal’ lets internal BGP routes to be redistributed into IGP.
  • Can result in a routing loop if not careful.

BGP Conditional Advertisement:

  • Advertises prefix matched in advertise-map.
    • If prefix matched in non-exist map does not exist
    • If prefix matched in exist-map does exist.
  • Typically used to track failure of a transit link.
    • Advertise to backup provider, only if primary provider is down.

BGP Conditional Route Injection:

  • Originates subnets from aggregate for purpose of longest match traffic engineering.
  • ‘bgp inject-map inject-map exist-map exist-map <copy attributes>
    • Inject map
      • subnet to be advertised
      • set ip address prefix-list<list>
    • Exist Map
      • Aggregate to be originated from
      • Match ip address prefix-list <list>
      • match ip route-source prefix-list <list>

BGP Next-Hop Processing

  • BGP needs an IGP to perform route recursion.
  • iBGP will not modify next hop as it advertises to its neighbor.
    • Can be modified with ‘neighbor <address> next-hop-self’
      • Update source IP is used as next hop.
    • Next hop can be modified using route map as well.
      • updates being received.
      • Configuration:
        • neighbor <address> route-map <name> in | out
        • Uses set clause
  • Next hop self on edge router
    • Peer can use same next hop on outbound updates to iBGP peers.
    • Same dynamic update group
    • Don’t need to include external links to IGP.
    • cons
      • Hinders fast convergence of external uplink failure.

Changing ‘next hop self’ with route-map:

R8 and R10 are in AS810.1, iBGP peers, and R5 is an EBGP peer. R10 receives BGP routes from R5 via R8, but all next hops stay as is, the inbound IP address of 10.8.8.2 from R5. If R5 does not know how to reach that transit link, it will never add routes into its local RIB from R5. This can be changed with ‘next hop self’ in many ways, this is through a route map.

R8 will advertise next hop self via BGP through the route map method. Again, no functional difference between using the normal ‘next-hop-self’ under the global BGP config and this. Also- the next hop will look like whatever the ‘update source’ is setup as.

BGP Confederations

  • Difficult for migration to because the BGP process has to be deleted/restarted.
  • Configuration:
    • ‘router bgp <sub-as>’
    • ‘bgp confederation-id <main-as>
    • ‘bgp confederation-peers <sub-as1 sub-as>
      • Only Sub-AS that the router is peering with.

Example configuration:

In the example above the scope of the confederations configuration is all within AS 100, and the sub-AS peer is 65005.

R1 is running EBGP (within the confederation) to R5 and iBGP to R6. The confederation configuration is very similar to regular BGP, it just has the two main commands, identifier and peers. The configuration on R3 and R2 are exactly the same except they have different Sub-AS numbers. R3 is below:

192.168.10.1 is L0 of R5

After this configuration on R1, R2, R3, and R5 there’s still no BGP adjacency between the different sub-AS’ – ie. R5 to R1 R3 and R2. The reason for this is because they’re all now technically running EBGP between each other. EBGP has the default rule that peers need to be directly connected, whereas iBGP has the default TTL 255. One of the fixes is to run the neighbor ‘disable-connected-check’.

After the above command is entered in both R1 and R5, adjacency comes up.

Another option to fix this is do EBGP Multi-hop. On R3 we’ll run the below command, as well as R5 for the other side.

The Multi-Hop command defaults to 255, above has added 2.

After entering the command ‘neighbor 192.168.10.1 ebgp-multihop 2’ on both sides, adjacency comes up very quickly. The same thing will be done on R2 to R5.

Now on R5 after neighbor adjacencies are up, ‘show ip bgp’ will show the sub-AS numbers in the path to reach different destinations.

If we look at the ‘normal’ EBGP peering to R8 however, the path to these destinations all appear to be coming from AS 100.

iBGP Route Reflector

  • Eliminates need for full mesh
    • Peering is only needed to RR
    • Similar to OSPF DR
      • Minimizes prefix replication.
      • One update to RR, RR sends to clients.
      • No modification of attributes when reflecting routes.
  • Loop Prevention:
    • Performed through Cluster ID.
      • Discards routes received from its own cluster ID.
    • Sets originator ID attribute to the router-id of RR client on routes.
      • Originator ID received from client and client uses for loop prevention.
  • RR Peering
    • EBGP Peers
    • iBGP Client Peers
    • iBGP Non-Client Peers
  • RR Updates
    • Processes differently depending on type of neighbor/peer.
    • EBGP
      • Passes to EBGP peers, clients, & non-clients.
    • Client learned routes
      • Pass to EBGP Peers, clients, & non-clients.
    • Non-Client learned routes
      • Pass to EBGP Peers and clients.

Large Design

  • Should not use single RR
  • RR Cluster allows for redundancy and hierarchy
  • RRs in same cluster use same cluster-id.
  • Performed via address family
    • no fate sharing
      • ex. IPv4 RR vs. IPv6 RR
  • Inter-cluster peering
    • Can be client or non-client peerings
      • Depends on design
  • Cluster-ID
    • Based on router-id
    • Default is all RR in separate cluster.

Virtual Route Reflectors:

  • RR does not need to be in data-path
    • No need to forward through RR necessarily, sometimes there for only NLRI.
  • No need to install routes in RIB/FIB if not in data path.
  • Selective RIB
    • Prevents NLRI from being installed in RIB and FIB.
    • ‘table-map <route-map> filter’
    • Can scale to millions of VPNv4 routes.
    • Reflection is occurring, no routing table install.

BGP Confederation:

  • Minimizes iBGP full mesh because it splits autonomous system into multiple autonomous systems.
    • Full mesh remains within sub-AS.
    • Sub-AS to Sub-AS acts like eBGP.
  • Confederation unknown to devices outside of confederation.
  • Typically uses private AS
  • Can use different IGPs within each Sub-AS.
  • Next-hop, Local-pref, and MED are kept across Sub-AS EBGP peerings.
  • Confederation Loop Prevention
    • AS_CONFED_SEQUENCE
    • AS_CONFED_SET
    • Never leaves confederation

RR vs. Confederation:

  • Both accomplish same thing.
  • Migration paths are different.
    • To confederation difficult.
      • Greenfield confederation is much easier.
    • Migration to RR is easy, add peers and remove old ones.

Basic Route Reflector Config:

Below is a set of routers all running as iBGP peers to R5 in AS 100.

Each router has a loopback IP associated with their hostname.

R5 = 5.5.5.5/24
R1 = 1.1.1.1/24
R2 = 2.2.2.2/24
R3 = 3.3.3.3/24
R4 = 4.4.4.4/24
R6 = 6.6.6.6/24

R3 is connected to R4 with the subnet 172.34.0.0/24 and it’s being advertised into BGP, and R5 is receiving that route (ignore RIB failure, ospf is running in background which is preferred over iBGP).

All of the other iBGP peers will not receive this route because of the default rules and loop preventions mechanisms. To allow R5 to pass on these routes to additional iBGP peers, we’ll need to make it function as a Route-Reflector.

After adding the following command, ‘neighbor <neighbor ip> route-reflector-client’, the routers neighboring with R5 are now receiving the advertised route from R3.

R2 receiving route advertised from R3:

iBGP Full Mesh

  • iBGP loop prevention
    • filters routes by default
      • Routes learned via iBGP cannot be passed on to additional iBGP neighbors.
      • Requires full mesh or confederation.
  • Full mesh advantage:
    • All BGP peers learn all paths.
    • All BgP peers know closest egress.
    • Hot Potato Routing.
  • Full mesh disadvantage:
    • Does not scale, too many peerings.
  • Partial mesh
    • mesh, route reflector and confed can interoperate.
    • advantageous for scaling
      • can use RR and pockets of full mesh.
  • Route Reflector
    • RFC 4456
    • RR summarizes routing information and only reflects best path.
    • IGP metrics are taken into account typically with route reflectors.

Peering iBGP vs. EBGP

iBGP:

  • Packets default to TTL 255
    • Neighbors do not have to be connected as long as IGP reachability exists.
  • Peers typically peer via Loopbacks
    • ‘neighbor x.x.x.x update-source l0’
    • Allows rerouting around failed paths via IGP.
    • Required for things like MPLS L3VPN.
  • Loop Prevention
    • iBGP learned routes cannot be advertised on to another iBGP neighbor.
    • BGP requires the following:
      • Full Mesh
      • Route Reflectors
      • Confederation
  • Next Hop Processing
    • Outbound iBGP updates do not modify next-hop regardless of iBGP type.
    • Modifying next hop:
      • ‘neighbor next-hop-self’
      • Route map
        • action – set next-hop
      • IOS 15.1(1)SY – next-hop-self ALL

EBGP:

  • Packets default to TTL 1
    • Can be modified if needed.
    • ‘neighbor ebgp-multihop <ttl>’
    • ‘neighbor ttl-security hops <ttl>’
  • Single hop peers must be directly connected by default.
    • Can be modified if directly connected neighbors peer via Loopbacks.
    • neighbor disable-connected-check.
  • Loop Prevention
    • AS-Path
      • Local ASN is prepended to outbound updates.
      • Inbound updates containing local ASN are discarded.
      • Can be modified with the following:
        • ‘neighbor allowas-in’
        • ‘as-override’
  • Next Hop Processing
    • Outbound EBGP updates have local update-source set to next-hop by default.
    • Modification
      • Route map
        • action – set next-hop
      • ‘neighbor next-hop-unchanged’

General:

  • BGP next hop controls IGP route recursion.
    • BGP knows the next hop but not the outgoing interface.
      • BGP is not a routing protocol by itself.
    • IGP must be able to perform recursion otherwise route cannot be used.
    • Result of failed recursion means route does not get installed into RIB.

BGP Peering

Route advertisements will differ depending on whether the advertisement is occurring over iBGP or EBGP. Some of the important differences have to do with next hops and iBGP full mesh.

Below are three routers running BGP. R8 and R10 are running iBGP in ASN 810.1, while R8 and R5 are running EBGP, R5 being in ASN 100. R8 and R10 have OSPF for internal loopback connectivity. R5 is advertising it’s loopback network 192.168.10.0/24 down to R8, and R10 is advertising a loopback network of 192.168.1.0/24.

R5:

R10:

When R5 tries pinging R10’s advertised loopback, it will not succeed, even though R5 has a valid route via BGP in it’s routing table.

Looking on R10, a ‘show ip bgp’ displays zero ‘>’ next to R5’s route, which means that it was not actually added to the routing table.

Drilling into this even further, BGP shows that the next hop (10.8.8.2, R5) is not accessible in R10’s routing table.

What this displays is how iBGP will advertise to its neighbors. The EBGP peer will advertise 1 hop further into its autonomous system but will not change the next hop. This can be troublesome for iBGP neighbors who do not have route information for the transit to another AS.

A solution to resolve this issue is by doing BGP ‘Next Hop Self’. On R8 the config for this is under the router bgp AS.

The command is after specifying which neighbor this should be applied on. This alone will not change the advertisement. A route refresh needs to occur.

A ‘* out’ will refresh the bgp table without taking the adjacency down.

Now on R10 we’ll see the route added to the routing table and the next hop advertised as R8’s internal Loopback address that R10 is peering with.

Below is an image with another BGP neighbor in the mix, R1. R1 is going to be in AS 100 and an iBGP peer with R5.

R1 and R5 have reachability via OSPF and they’re both peering over their Loopback0 IP addresses.

R1 L0 – 100.100.100.1/24
R5 L0 – 192.168.10.1/24

After the peers are up here’s what a ‘show ip bgp’ looks like on R1:

All looks good except for prefix 192.168.1.0, which is a subnet back on R10 in AS 810.1. The reason for this is the next hop again, 10.8.8.1, is not reachable from R1. A fix for this is to perform next hop self again when setting up peering from R5 to R1, or we can simply inject that route into the OSPF domain for underlying reachability, which is the other option – add the routing into the BGP IGP.

On R5 we can see that 10.8.8.0 is a directly connected network, and all we need to do is add that interface into OSPF, or redistribute connected.

On R5 the 10.8.8.0 subnet is on GigabitEthernet0/3, and below is a route-map matching on that interface.

The route-map will then get added into the OSPF process with redistribution. This will allow only what’s in the route-map to get redistributed.

Now on R1 the 10.8.8.0 network is visible via OSPF, and the BGP table shows R10’s loopback subnet has made it to the RIB.

  • Main Takeway:
    • By default when eBGP advertises a route to a remote AS, the next hop for the remote AS will be the originating advertising router.
      • If this route gets advertised further into iBGP, the route next hop will not change, it will remain the IP of the remote AS router.
        • Fixes for this are advertising transit path into an IGP, or setting the inbound router as next-hop self – ie. next hop self to iBGP peer.