- Nexus vPC Terminology
- vPC Peers
- The two switches joined in a vPC domain to complete the vPC architecture.
- Peer Link
- The link between the two peers that syncs state.
- Creates the single logical control plane in regard to port channels.
- Forwards BPDUs and LACP packets from secondary peer to primary peer.
- Syncs IGMP and MAC tables between the two peers.
- Transport between peers for FHRP traffic, orphaned ports, and multicast.
- Most important links in architecture.
- Peer keepalive
- Operates at Layer 3.
- Works behind scenes of Peer link by monitoring whether a peer has gone completely down when there’s a peer link failure.
- Not used for any data syncing.
- Can use management interfaces or OOB.
- Member Port
- Port that is part of a vPC on peer switch.
- Orphan Port
- Port connected to a device that is either not part of a port channel, or part of a port channel that has failed.
- Single port connection.
- vPC Peers
TCP Selective Acknowledgement (SACK) is a TCP option that works as an alternative segment recovery/retransmit to the normal cumulative acknowledgements and 3 duplicate ACKs rule. If both client/server support the option then one will see SACK Permitted in a packet capture during the initial three way handshake.
When a segment is lost during a TCP connection the sender will continue transmitting data in line with the allowed sender window. When SACK is enabled the receiver will append a normal ACK of what’s received with the SACK option, telling the receiver what blocks of data have been received beyond the missing segment. The receiver sends the SACK with what’s called the Left Edge of the data block and the Right Edge of the data block. This tells the TCP sender that the missing segment is the block of data (segment number) before the left edge of the SACK Option block. The SACK option is more efficient over the wire because it does not require the sender to retransmit segments that have already reached the receiver. The 3 duplicate ACKs requires the sender to retransmit from the missing segment to the point at which the 3 duplicate ACKs were received.
Fast Retransmit/Fast Recovery:
When a TCP connection has been established and TCP segments are being sent/received, the two involved nodes keep track of how much data has been sent through Sequence and Acknowledgment numbers. If the receiver finds an out of order segment has arrived (wrong sequence number from sender) then it will immediately send a duplicate ACK back to the sender. This duplicate ACK occurs to let the sender know about the out of order segment.
The sender will not know immediately whether the duplicate ACK is due to reordering of the segments/stream or if there’s traffic loss happening over the connection. The sender in this scenario will wait until it receives 3 duplicate ACKs back from the receiver to deem there is actual segment loss. Once this happens the sender will set the Slow Start Treshold (ssthresh) to half the current send window, Congestion Window (CWND), and then retransmit the missing segment. When the segment has been retransmitted the sender then sets CWND to the current ssthresh + 3 maximum segment sizes (mss). The addition of 3 mss is to inflate the window to suffice the segments that have already reached the receiver’s buffer. From here on every duplicate ACK received by the sender will add an additional 1 mss to the window. Also the sender will attempt to transmit a new packet if the CWND will allow it.
When the next ACK arrives to the sender that is for actual new data and not duplicate, the sender sets the CWND to the current ssthresh. Per previous notes, if CWND and ssthresh are the same then the TCP connection is in the congestion avoidance phase again. In the Congestion Avoidance phase the CWND is increased by 1 segment only if ALL segments in the window have been acknowledged by the receiver. The Congestion Avoidance phase runs to find out the capacity of the connection.
- 3 Duplicate ACKs are received and Fast Retransmit is Kicked off.
- Set ssthresh to half the current send window and retransmit the missing segment.
- Sender sets Congestion Window to value of ssthresh plus 3 mss for segments already in receiver’s buffer.
- Transmit a new packet if allowed by Congestion Window and add 1 mss per each new duplicated ACK.
- Once ACK for new actual data arrives to sender, set Congestion Window to same value as ssthresh, continue Congestion Avoidance phase.
TCP Congestion Avoidance works alongside Slow Start to make sure a TCP connection is not being overrun with too much data. After a connection is established the Slow Start mechanism begins and the sender window continues to increase exponentially. Each acknowledgement the sender receives warrants the sending window size to double.
- SENDER –> 1 Segment
- RECEIVER –> 1 ACK
- SENDER –> 2 Segments
- RECEIVER –> 2 ACKs
- SENDER –> 4 Segments
- RECEIVER –> 4 ACKs
- SENDER –> 8 Segments
- RECEIVER –> 8 ACKs
A value created when establishing a TCP connection is the Slow Start Threshold (ssthresh). The ssthresh is used by the sender to know whether the TCP connection is in the Slow Start phase or the Congestion Avoidance phase. If the ssthresh is greater than the Congestion Window (CWND) then the TCP connection is in the Slow Start phase. The Slow Start phase will continue exponential growth (double amount of segments per ACK) unless there is data loss on the connection. If the ssthresh is larger than the CWND, then the TCP connection is in the Congestion Avoidance phase. In the Congestion Avoidance phase the CWND is increased by 1 segment only if ALL segments in the window have been acknowledged by the receiver. The idea is to slow down the increasing rate of transmitting data to stabilize the link or utilize the proper amount of physical bandwidth available.
The sender would discover loss/congestion over the connection through either receiving 3 of the same ACKs or a retransmission timeout (no ACK received after timer). If a sender deems there is segment or data loss over a connection then CWND is decreased to 1 MSS and the slow start process begins again.
- There are different versions of TCP that are more beneficial for LFNs, Long Flat Networks.
- ssthresh is calculated differently sometimes. Sometimes the value is set to infinite, meaning the CWND will grow infinitely until loss occurs.
TCP throughput has a lot to do with how much unacknowledged data can be sent by a sender before pausing transmission and waiting on ACKs from the receiver. In a TCP connection the two factors in how much un-ACK’d data can be sent is the sender Congestion Window (CWND) and the Receiver Window (RWND). The CWND is calculated from the connection agreed upon Maximum Segment Size (MSS) and the RWND is advertised from the receiving node during the initial three-way handshake. CWND size depends on the MSS multiplied by a value found in RFC 5681. The sender’s actual window size for sending data will always be the lower window of CWND and RWND.
The TCP Slow Start algorithm is used to combat substantial loss over internet when early implementations had the sender try transmitting the receiver’s entire window size after 3 way handshake. On low bandwidth networks that does not work. Slow Start works by initially sending the max SENDER WINDOW size to the receiver, and then adding one MSS value to the CWND per each ACK received from the receiver. If the original sender window was 2 MSS (2 * 1460 with common 1500 L2 MTU) and the sender receives two ACKs for each MSS/segment, then the next amount of data sent will be 4 MSS/segments. Once the four ACKs are received then the amount of transmitted MSS/segments is 8, then 16.
This exponential increase in CWND continues until the sender stops receiving ACKs for every segment sent, or the CWND hits a specific threshold. Once data starts to get lost or retransmissions begin, TCP starts to use an algorithm called ‘Congestion Avoidance Algorithm.’
Systems running TCP software use the 6-bit field labeled CODE BITS for determining what the TCP segment is used for. CODE BITS will tell the receiver how to interpret all the other header fields.
FIN: Sender has reached the end of its Byte stream – Used to end a TCP session. Sender issues FIN and receiver issues FIN ACK.
SYN: Step 1 in the three-way handshake and initialization of a TCP connection. ‘Synchronize Sequence Numbers’
- Sequence Number – Each client in a TCP session uses Sequence Number value to keep track of how much data has been sent over the connection.
- Acknowledgement Number – Each client in a TCP session uses Acknowledgement Number to keep track of how much data has been received/acknowledged.
- Each of the two numbers above will be different for each client. Example – Client sending HTTP GET to server. The server Sequence number will continuously go up, but client Sequence number will stay relatively steady at around the same number. However, the Acknowledgement number for the client will continuously go up, but for the server it will stay relatively steady at around the same number throughout the TCP connection.
RST: TCP Reset. Used when a segment arrives on a TCP connection that should not be there. A host receiving SYN from a client on port that is not actively open will respond with an ACK/RST. Aborts connection in response to error.
PSH: Segment requests a push. The PSH flag from a higher layer application tells the sending TCP stack to immediately send data, do not fill up buffer with maximum segment size. The PSH flag also tells the receiver in a TCP connection that the segment needs to be sent to application layer immediately instead of waiting in queue. Used in HTTP and often streaming applications. Someone would not want to wait for enough keystrokes to be pressed (buffer to be filled) in an SSH session. A user wants to see each keystroke immediately.
ACK: Acknowledgement field is valid. This adds to the acknowledgement number in a TCP connection. A TCP acknowledgment specifies the sequence number of the next octet that the receiver expects to see.
URG: Urgent field is valid. Under normal circumstances TCP data is sent and queued, processed in order of being received. With the urgent field set, the urgent segment is processed immediately instead of waiting in queue behind segments sent previously. Passes receiver’s FIFO rule. An example would be killing a remote session. While sender is waiting to receive an ACK and user kills a session, the sender can immediately send an RST without ever receiving ACK from receiver to send more data. Once the RST reaches the receiver it is immediately sent to application layer for processing (terminating remote session).
eBGP vs. iBGP:
- eBGP is peering and advertising routes via two different BGP Autonomous Systems. An eBGP peering connection is often times connecting and sharing routes between two different entities.
- eBGP route advertisements received from one peer will be automatically advertised to other existing BGP peers – eBGP or iBGP.
- Routes received from an iBGP peer will not be advertised to other existing BGP peers.
- eBGP by default assumes a peer is directly connected – ie. TTL of 1. This is not the case for iBGP. To fix this with eBGP the operator can allow BGP multi-hop.
- eBGP administrative distance is 20.
- iBGP administrative distance is 200.
- When eBGP advertises routes to iBGP, the next hop for the iBGP peer does not get updated. If the iBGP peer does not have alternate reachability to the network eBGP advertised, then the iBGP peer will not be able to reach the new destination, and it will not be added to the routing table.
BGP Route Reflector:
- Due to the protocol behavior of not advertising iBGP received routes to other iBGP peers, having numerous routers running iBGP within the same Autonomous system requires many point to point links/peering. Using a route reflector allows an operator to setup a ‘hub and spoke’ type of model with iBGP sessions. One iBGP peer will become the Route Reflector (RR) server and the other iBGP routers will become a RR client. When one router advertises a route to the server, that server then ‘reflects’ that route down to all the other iBGP nodes or ‘spokes. This drastically reduces how many point to point connections/direct peerings that need to be made.
BGP Path Attributes:
Well-Know, Mandatory – RFC Compliant. Must be supported
Well-Known, Discretionary – RFC Compliant. Does not have to be propagated.
- Local Preference
- Atomic Aggregation
Optional, Transitive – Not required – will transit to other BGP speakers.
Optional, Non-Transitive – Not required – Doesn’t have to pass along to other speakers.
- Originator ID
BGP Path Selection Metrics:
- Weight – Cisco Only
- Local Preference – Higher number wins
- Self-originated – Prefers paths originated locally
- AS Path – shortest AS Path
- Origin – IGP-learned routes over EGP.
- MED – Lowest value wins
LSA Type 1:
- Router LSA
- Sent between routers in the same area, does not leave the area. Sends interface and adjacent neighbor information to all routers in this area.
LSA Type 2:
- Network LSA – Designated Router
- Floods its area in a broadcast network to advertise what routers are
participating in OSPF, what the segment connections look like.
LSA Type 3:
- Summary LSA
- Packets are generated by ABR. The ABRA advertises summary prefixes across the area boundaries with the Type 3 LSA. Originates from the type 1 LSA, turns into type 3 at the area border.
LSA Type 4:
- ASBR Summary LSA
- Used to advertise that an ASBR is present to other areas in the Autonomous system. An ASBR sends out an LSA type 1, then the ABR changes to type 4 when injecting into other areas.
LSA Type 5:
- ASBR External LSA
- Used for route redistribution from either static or alternate routing protocols not part of the OSPF AS. The redistributed routes show up in internal AS route tables as E1 or E2.
- NSSA External LSA
- Used in not so stubby areas to translate type 7 to type 5 throughout the network. the NSSA does not allow type 5, so ABR translates to type 5 and pushes throughout the rest of the network.
Why use more than one area?
- OSPF Autonomous systems that get very large are typically very resource intensive. The AS can be segmented into areas to suppress the amount of routes propagated across the entire OSPF domain. This means its less resource intensive for each router that participates in the AS.
- NOTE – Now sometimes used for security or organization. Resources in common network devices now have enough horsepower where areas are not needed quite as much.
- A standard area uses LSA types 1, 2, 3, 4, and 5. All routers know the entire shortest path tree. They receive and pass along to other standard areas summary routes (LSA 3), ASBR notifications (LSA 4), and external/redistributed routes (LSA 5).
- A stub area receives LSA types 1, 2, and 3. The routers in the stub area do not receive external routes (LSA 5) or the ASBR notification (LSA 4). The stub routers will receive the normal inter area types 1 and 2, then receive a summary LSA 3 for a default route. This allows for the route tables in the stub area to only have a default route for all external networks.
Totally Stubby Area:
- Totally stubby areas only receive default route injected from the ABR. The only LSA types that are available to the Totally stubby area are 1 and 2.
Not So Stubby Area:
- NSSA area can be used in conjunction with a Stub or Totally Stubby area. The NSSA allows an ASBR to advertise a route into the OSPF AS and NSSA area via a type 7 LSA. The ASBR receives the external route and sends the information via a type 7 LSA to the ABR. The ABR then translates the type 7 to type 5, which gets propagated to other areas in the network.