The HTMLizing of this document is NOT finished.
I haven't gone through to
re-check it for accuracy.
This is intended for the first-time multi-homing small ISP. Feel free to give this to any of your customers, and send me comments and updates to bgp@netaxs.com if you think something can be illustrated or explained more clearly.
There'll be a book out soon, greatly expanded, with diagrams, etc... Have fun,
Avi Freedman Net Access
This document is Copyright Avi Freedman, 1997. Distribution of the original or modified versions for profit is prohibited, but please feel free to give it away.
It's obviously critical that any box inside your network know how to get (directly or indirectly) to any other box inside your network. Before you invite people to send data to your network, you've got to have a running and happy network to take the data.
If you default route into one or more providers, external routing isn't something you have in your network. But if you do want to "peer" with someone - or to "multi-home" to multiple providers and have a little bit more control over where your data goes on the Internet, you will be taking at least some external routes into your network (and will do so with BGP).
But it is much more useful to tell people outside your network (upstream providers or "peers") about what routes (or portions of the IP address space) you "know how to get to" inside your network. The primary purpose of BGP4 (as we're studying it here) is to advertise routes to other networks ("Autonomous Systems").
An AS, or Autonomous System, is a way of referring to "someone's network". That network could be yours; a friend's; MCI's; Sprintlink's; or anyone's. Normally an AS will have someone or ones responsible for it (a point of contact, typically called a NOC, or Network Operations Center) and one or multiple "border routers" (where routers in that AS peer and exchange routes with other ASs), as well as a simple or complicated internal routing scheme so that every router in that AS knows how to get to every other router and destination within that AS.
When you "advertise" routes to other entities (ASs), one way of thinking of those route "advertisements" is as "promises" to carry data to the IP space represented in the route being advertised. For example, if you advertise 192.204.4.0/24 (the "Class C" starting at 192.204.4.0 and ending at 192.204.4.255), you promise that if someone sends you data destined for any address in 192.204.4.0/24, you know how to carry that data to its ultimate destination. The cardinal sin of BGP routing is advertising routes that you don't know how to get to. This is called "black-holing" someone - because if you advertise, or promise to carry data to, some part of the IP space that is owned by someone else, and that advertisement is more specific than the one made by the owner of that IP space, all of the data on the Internet destined for the black-holed IP space will flow to your border router. Needless to say, this makes that address space "disconnected from the 'net" for the provider that owns the space, and makes many people unhappy. The second most heinous sin of BGP routing is not having strict enough filters on the routes you advertise (more on this later). Anyway, the bottom line: Test your configs and watch out for typos. Think everything that you do through in terms of how it could screw up.
Also, one terminology note: Classless routes are sometimes called "prefixes". When someone talks about a prefix they're talking about a route with a particular starting point and a particular specificity (length). So 207.8.96.0/24 and 207.8.96.0/20 are not the same prefix (route). We'll mostly use "route" in this document.
Take a look at Figure 1. We'll explain more of the details below, but note the "Home Dialup User". He's connected to AOL, which is served by ANS (AOL actually owns ANS). We're using 10.10.20.0/24 as an example.
The 10.10.x.x IP addresses are often used in examples because they're "reserved" space. Most networks will "filter" the RFC 1918 reserved space (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16), so people use them in examples because they don't get you into too much trouble if you accidentally try to use them (sort of like the film industry's yyy-555-xxxx phone number convention).
In this example, the reason that an AOL dialup user can send a packet to 10.10.20.1 (for example) is that the ISP (AS 64512) advertised that route to the two upstream providers (AS 4969 and AS 701), who in turn advertised that route to AS 690 (ANS, which provides IP service for AOL).
Every IP address that you can get to on the Internet is reachable because someone, somewhere, has advertised a route that covers it. The corollary to this is that if there is not a generally-advertised route to cover an IP address, no one on the Internet will be able to reach it.
I recommend using Cisco routers (for many reasons). In particular, the Cisco implementation of BGP is relatively easy to use, get examples for, and debug - and there's a huge community of routing engineers that's familiar with the Cisco implementation and algorithms (there's much that isn't specified in the RFCs and is left up to the vendor to decide). Cisco's online documentation (UniverCD) isn't the best (it lacks a large number of case studies) but is a very good learning tool.
PC-compatibles using gated are either the second- or third-largest community of BGP-speaking computers. You can build cheap PC routers that route Ethernet and t1 and have more than enough CPU and memory to handle all the routes you'd need for quite some time - but you've then got hardware that's not really as tested or reliable as a Cisco or Bay router. Trust me on this - the cost savings is usually not worth doing it this way. (Apologies to Riscom and ET, the leading vendors of T1 cards-for- PCs).
Bay routers are the second-largest community of BGP-speaking boxes - but we're talking about a very small percentage of the number of BGP- speaking Ciscos out there. Bay is cheaper than Cisco; pretty responsive to customers (though Cisco is as well); and almost all configuration is done through a GUI (windowing) interface that drives most routing engineers nuts. Bay claims they're working on a command-line interface, (BCC, or "Blatant Cisco Clone"), but in the mean time most are throwing money at Cisco. (It's much easier to debug BGP or other routing problems from a telnet session or over the phone than it is to have to guide someone through a GUI to examine or reconfigure a router). On the other hand, the Bays do have a better architecture and are finally showing themselves to be more or less as stable as Ciscos. What I've seen of BCC looks quite promising, and I promise to retract in print my slam of Bay when their command line interface looks featureful, fast, and solid.
We're going to talk about Cisco routers in these documents (and in this document in particular).
BGP-speaking routers exchange routes with other BGP-speaking routers via peering sessions. At a technical level, this is what it means to "peer with someone". A snippet of a Cisco "BGP clause" is:
router bgp 64512 neighbor 207.106.127.122 remote-as 4969 (omitted lines) neighbor 137.39.10.46 remote-as 701 (omitted lines)The "clause" starts out by saying "router bgp 64512". This means "What follows is a list of commands that describe how to speak BGP on behalf of ASN 64512". 64512 is also a "reserved" number - it's a number in the "reserved" section of ASNs (ASNs go from 1-65535).
In order to bring up a "peering session", all you need to do is have that one line. In this example, 137.39.10.46 is the remote IP address of a UUNET router (UUNET is ASN 701). Remote, that is, with respect to the customer's router. 207.106.127.122 is the remote IP address of a Net Access router (Net Access is ASN 4969). See Fig 1 for a diagram of the network layout used in this example.
In practice, however, you almost always use more than that one line to tell BGP how to exchange routes with that "neighbor" via that "peering session". A typical "neighbor clause" is:
router bgp 64512 (omitted lines) neighbor 207.106.127.122 remote-as 4969 neighbor 207.106.127.122 next-hop-self neighbor 207.106.127.122 send-communities neighbor 207.106.127.122 route-map prepend-once out neighbor 207.106.127.122 filter-list 2 in (omitted lines)
Every time a neighbor session comes up, each router will evaluate every BGP route it has by running it through any filters you specificity in the "neighbor" clause. Any routes that "pass" the filter are sent to the remote end.
While the session is up, "BGP Updates" will be sent from one router to the other each time one of the routers knows about a new BGP route or needs to "withdraw" a previous announcement ("promise").
The "sho ip bgp summ" command will show you a list of all peering sessions:
brain.netaxs.com#sho ip bgp summ BGP table version is 1159873, main routing table version 1159873 44796 network entries (98292/144814 paths) using 9596344 bytes of memory 16308 BGP path attribute entries using 2075736 bytes of memory 12967 BGP route-map cache entries using 207472 bytes of memory 16200 BGP filter-list cache entries using 259200 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State 205.160.5.1 4 6313 0 0 0 0 0 never Active 207.106.90.1 4 64514 1145670 237369 1159873 0 0 4d03h 207.106.91.5 4 64515 6078 5960 1159869 0 0 4d03h 207.106.92.16 4 64512 6128 6782 1159870 0 0 4d03h 207.106.92.17 4 64512 5962 6894 1159870 0 0 10:08:46 206.245.159.17 4 4231 161072 276660 1159870 0 0 2d05h 207.44.7.25 4 3564 6109 310292 1159867 0 0 22:40:50 207.106.33.3 4 64513 164708 724571 1159866 0 0 3d23h 207.106.33.4 4 3564 6086 274182 1159853 0 0 4d03h 207.106.127.6 4 6078 5793 310011 1159869 0 0 2d03hThis is a session summary from one of Net Access's core routers. The 6451X ASes are BGP sessions to other Net Access routers (using confederations, which we'll talk about in a future document) - those ASNs are not shown to the world.
Most of it is pretty self-explanatory; briefly:
More on all of this below.
The major difference between eBGP and iBGP is that eBGP tries like crazy to advertise every BGP route it knows to everyone - you have to put "filters" in place to stop it from doing so. iBGP is actually pretty difficult to get working because it tries like crazy not to redistribute routes - in fact, all iBGP-speakers inside your network have to peer with all other iBGP "speakers" in order to make it work. This is called a "routing mesh" and, as you can imagine, is quite a mess. If you have 20 routers, each router has to peer with every other router. The solution to this is "BGP confederations", also a topic for a future document.
Also, iBGP has major drawbacks as an IGP. The main one is the necessity to "peer up" every set of routers in your network (or in one POP if you're using confederations). Protocols like OSPF and IS-IS just "find" each other over serial and Ethernet interfaces (they're "broadcast" protocols). This can be a pain (you don't want to accidentally merge your IGP with a customer's or peer's) but turning off broadcasting on certain ports is easier than turning on peering sessions between a new router and every other router on your network. Also, iBGP doesn't do as good a job at "convergence" (closing the gap and re-routing around failed network segments) as OSPF and IS-IS.
And if you have one upstream provider, it's almost guaranteed that you are using sub-allocations (CIDR delegations, to be precise) of their larger IP blocks ("aggregates"). In this case your provider is not going to advertise your more "specific" routes because:
The AS-PATH is useful for a number of reasons:
See Fig 2 for a sample list of routes from an actual BGP routing table - and further explanation. Notice, though, the >'s to the left of the some of the routes. The ">" indicates the route that the router currently thinks is "best" when there are multiple choices.
A SNIPPET OF A BGP ROUTING TABLE
COMING SOON TO A TUTORIAL NEAR YOU.
ip as-path access-list NNN permit regexpor:
ip as-path access-list NNN deny regexpWhere NNN is the number (same as the name in the case of as-path access-lists), and regexp is very similar to Unix "regular expressions". (See Fig 3 for a summary of regexp characters, and the O'Reilly and Associates Regexp book for more information about regular expressions).
Fig 3 Regexp characters: NNN match the characters NNN (where each digit of NNN is from 0-9) ^ match the beginning of a string $ match the end of a string _ match any of {space, beginning of a string, or end of a string} _NNN_ match the "word" or "distinct number" NNN. Thus, the regexp "_1_" will match the string "3561 1 64000" but not "3561". (The problem is that if you don't anchor NNN with "_"s on either side, you might match something you don't really want to). (regexp) enclosing another regexp in parens means that the appearance of that regexp is optional * the * operator means that the previous regexp can be matched 0, 1, 2, or any number of times. To be safe, only use * in conjunction with parens. Thus, (regexp)* matches the regexp inside the parens 0 or any number of times. [char1char2char3] matches any one of char1, char2, char3, etc... Each charN expression can be an actual number or other symbol, or a range (i.e. 0-9, a-z). If you want to match any of the special symbols, you can escape them by putting a \ in front of them. The only special symbols you'll want to escape when matching against AS-PATHs are the parens, which pop up in AS-PATHs when you use BGP confederations.
Important note: On Ciscos, regexps are matched against the AS-PATH as if the whole thing is a string, not a sequence of numbers. Thus, as you'll see below, you need to enclose ASNs within underscores to be sure of matching only the ASN you're looking for.
How do access-lists work? When used as a filter, each route is passed through the access-list. Each rule is listed in the order it will be applied. Once a route has been matched by any rule, the decision on whether to pass the route through the filter or to drop it (and thus not let it pass) is made immediately, and no further rules are processed.
Example 1:
ip as-path access-list 1 permit .* ip as-path access-list 1 deny .*This is a good one to have around; it permits every route to flow through the filter. The "deny .*" is completely extraneous to the filter - every route has already passed through the first line and the second line is never actually used.
Example 2:
ip as-path access-list 2 deny .*This is also a handy one to have around; you might well want to always remember the number of this "deny everything" access-list - the opposite of the "permit everything" list above.
Example 3:
ip as-path access-list 3 permit ^$ ip as-path access-list 3 deny .*This access-list is the other of the triad of ever-handy ones: It permits only routes that originate within your AS (because of network statements or "redistribute" statements in "router bgp" clauses somewhere within your network).
If you have these three as-path access-lists installed and remember their numbers you'll save yourself a lot of time you'd otherwise spend searching online or through config files to find where you put your "send everything"; "send nothing"; or "send only my routes" filter.
Remember: BGP between different ASNs (eBGP) will, by default, cause a router to redistribute every BGP route that the router knows about. This could lead to VERY BAD THINGS happening. (If you redistributed all of Sprintlink's routes into UUNET, a portion of UUNET could start sending all of its Sprintlink traffic through your t1 and you'd hurt a reasonable chunk of the Internet. Both Sprintlink and UUNET do things to prevent you from doing this, but you should always be paranoid when dealing with BGP.)
Again, the "deny .*" rule is useless here, except as a safety precaution, since the router would insert that rule anyway (remember, there's an implicit "deny .*" at the end of every Cisco filter list).
A quick note: For those playing with BGP confederations on your own (a topic we'll talk about in a future document) note that your "permit internal routes only" filter might have to look something different ("permit ^$" will no longer be enough) - something like: "ip as-path access-list 30 permit ^(\([0-9 ]*\))*$". Or you'll be using BGP communities instead of AS-PATH filtering to control which routes you redistribute Everyone else please ignore this paragraph, unless you want to try to parse the regexp above as an exercise.
For Examples 4 and 5, please consult Fig 4 for a list of common ASNs you'll see when examining routes. To find out who "owns" an ASN (funny concept - owning a 16-bit integer), issue a WHOIS query on "ASN NNN", where NNN is the ASN. Note: You may actually need to put quotes around the "ASN NNN", especially if you're doing the whois query from a command line.
-----------------------------------------------------------------------------Fig 4 Common ASNs
3561 MCI 1239 Sprintlink (Sprintlink also uses other ASNs, but 1239 will always appear somewhere in the AS-PATH when looking at Sprintlink routes from some other provider) 701 UUNET 174 PSI 1673 ANS (the old ANS ASN, 690, should be retired by now) 1 BBN 4200 AGIS (the old Net99 ASN, 3830, should be retired by now) 4969 Net Access (which will appear in the examples)There are hundreds of ASNs in use in the Internet, and thousands of ASNs in use in internal networks all over the world. If you want to take a look at live ASN info, check out http://www.merit.edu/ipma/routing_table or telnet to route-server.cerf.net, a Cisco that cerf.net loads with multiple full BGP routing tables.
-----------------------------------------------------------------------------
Example 4: ip as-path access-list 20 permit _1_ ip as-path access-list 20 permit _701_ ip as-path access-list 20 permit _174_ ip as-path access-list 20 permit _1673_ ip as-path access-list 20 permit _4200_ ip as-path access-list 20 deny .*The _NNN_ notation means "match NNN as a distinct word". This means that NNN must have whitespace on either side of it (or must be the first or last word - or both - in the AS-PATH).
"_1_" would match "1"; "3561 1 6000"; and "3561 1" - but not "701". (ASN 1 is used by BBN, which has a bit of history in the Internet...)
So - this as-path access list permits, in order, BBN, UUNET, PSI, ANS, and AGIS routes, and denies all other routes. If you had a Cisco 2501, you might want to do this to accept some routes from one of your providers in an attempt to load-balance traffic a certain way (perhaps you've noticed that provider B gets better BBN connectivity than provider A...
Example 5:
ip as-path access-list 20 deny _3561_ ip as-path access-list 20 deny _1239_ ip as-path access-list 20 permit .*This filter denies any MCI or Sprintlink route, and permits all other routes. As of 4/97, this should yield about 45,000 routes.
This will fill up a 2501 with absolutely all of the routes it can take and still function well. It used to be that all routes on the 'net fit in a 2501 with 16mb - and that the 2501 could still function. Then, the routes would fit in but the 2501 didn't have enough CPU. Now, all of the routes on the 'net except for MCI, Sprintlink, or both will fit in a 2501 and still let it function at at least a single t1's worth of throughput.
So, as a security blanket, appending an explicit "deny .*" to a list ensures that you will at least not be able to modify an existing list's functionality.
Let's say you had:
ip as-path access-list 3 permit ^$And then you configured (perhaps as a typo, perhaps as a brain-o):
ip as-path access-list 3 permit _1239_You would alter the functionality of an existing filter list and potentially start redistributing Sprintlink routes to your peers and/or upstream providers.
But if you had:
ip as-path access-list 3 permit ^$ ip as-path access-list 3 deny .*Then adding a third rule of:
ip as-path access-list 3 permit _1239_Would have no effect, since every route would either be permitted or denied by the time the router had finished evaluating the second rule (the "deny .*") and the third rule would never be looked at.
So, to modify an existing access list, either:
There are, however, rules for how a Cisco will select the "best BGP" route when there are multiple BGP route possibilities of the same specificity.
It goes (basically):
For "competing" BGP routes, the most likely way the router's going to pick the best route (if you aren't playing games with weights) is by looking at the AS-PATH lengths.
"BGP selects only one path as the best path. When the path is selected, BGP puts the selected path in its routing table and propagates the path to its neighbors. BGP uses the following criteria, in the order presented, to select a path for a destination: 1. If the path specifies a next hop that is inaccessible, drop the update. 2. Prefer the path with the largest weight. 3. If the weights are the same, prefer the path with the largest local preference. 4. If the local preferences are the same, prefer the path that was originated by BGP running on this router. 5. If no route was originated, prefer the route that has the shortest AS_path. 6. If all paths have the same AS_path length, prefer the path with the lowest origin type (where IGP is lower than EGP, and EGP is lower than Incomplete). 7. If the origin codes are the same, prefer the path with the lowest MED attribute. 8. If the paths have the same MED, prefer the external path over the internal path. 9. If the paths are still the same, prefer the path through the closest IGP neighbor. 10. Prefer the path with the lowest IP address, as specified by the BGP router ID."In addition to the "core" data about a route (where in the IP space it starts; how long it is (the "specificity"); and what the next hop is, there is other data embedded in BGP routes, most of which are either used for route selection or for additional debugging information for humans.
Fig 8: BGP attributes For more info, see: RFC 2042: Registering New BGP Attribute Types RFC 1997: BGP Communities Attribute RFC 1773: Experience with the BGP-4 protocol RFC 1771: A Border Gateway Protocol 4 (BGP-4) To get an RFC, go to: http://www.internic.net/rfc/rfcXXXX.txt BGP ATTRIBUTE TYPES Value Code Possible Values ---- ---------------- ----------------------------------------------- 1 ORIGIN 0 (IGP); 1 (EGP); 2 (Incomplete) This attribute specifies the origin of a route. Straightforward except that "Incomplete" means that the route got into BGP by redistribution from an IGP. 2 AS_PATH 0-N 2-byte values A list of the ASNs of all ASs the route has traversed. 3 NEXT_HOP IP Address The most critical attribute; where to send data destined for this route. 4 MULTI_EXIT_DISC 0-2^32 A weight; designed to go outside and inside of an ASN. 5 LOCAL_PREF 0-2^32 A weight; not designed to go outside of an ASN. 6 ATOMIC_AGGREGATE TRUE/FALSE: If present, true; otherwise, false. Present if this route was not the most specific one known by the advertiser. Dangerous stuff. 7 AGGREGATOR {ASN,Ip address} pair. Data to indicate who formed the route if the route is an aggregate of smaller routes. 8 COMMUNITY 0-N 4-byte values ("communities") To be covered in a future document. 9 ORIGINATOR_ID Used for BGP Route Reflection To be covered in a future document. 10 CLUSTER_LIST Used for BGP Route Reflection To be covered in a future document.
Briefly:
(Rule 2)
(Rule 3)
(Rules 2-3,5)
(Rule 6)
(Rules 7-8)
(Rule 9)
(Rule 10)
For further reading, see for more details.
We'll be talking about using these metrics in the near future. If you want to experiment in the mean-time, that document shows you how to set these metrics. Please experiment first on test or lab networks! If you've got proper filters in place, experimenting with these things won't affect the outside world - but it could make your customers very unhappy...
Another very big caution: BGP weights and local_prefs are very powerful. Realize that if you advertise routes for a customer that you hear via BGP, you could wind up preferring an external route for that customer if you set the BGP weight or local_pref too high (or at all) for external routes. The customer won't like this - if you prefer an external route for that customer, you're not going to advertise them to your transit providers any more, which will probably not please that customer...
Routers which route IP packets have to have an "IP routing table". In that table are one or more routes of a particular {starting point, length, metric}. This IP routing table gets filled with routes heard from various sources - or configured statically (in the router's configuration store). BGP routes migrate into the IP routing table only if:
Here's a brief outline of the "order of preference" for filling the IP routing table. The exact order can be found in the Cisco documentation.
One note, though: Since static routes are really considered an "IGP" routing mechanism, there are ways to get other IGP-learned routes (say, via OSPF) to be preferred over static routes, but again - if you don't play with weights, this shouldn't be a worry.
But look at what happens when you withdraw that assertion. Your provider(s) must then also withdraw that assertion. And then their provider(s) and peer(s) must do the same. All in all, thousands of routers around the world now have to look at that route and decide if they have a next-best path in their BGP (or other routing) table, and insert it as the current best path in their IP routing table. This consumes many CPU-seconds on routers that are sometimes very busy.
In fact, it was consuming so much CPU time a few years ago that Sean Doran of Sprintlink said "this must stop" and a few people came up with an idea (which Cisco implemented in record time) to "damp"(en) the "route flap"s. You'll hear people say "damp" and "dampen". There's no real consensus about which is the correct term.
What this means in practice today is that if your routes flap more than one or two complete up-down-up cycles, you will be dampened by many providers for at least an hour or so. So even if you're only "single-homed", you will be dampened if your provider withdraws your routes every time your t1 flips up and down a few times because some Bell guy tripped over a wire.
So do not ask your upstream provider to announce you unless it makes a difference (the benefit of being multiply-announced outweighs the possible negative effects of being dampened due to instability in either your or your provider's network).
What we're talking about in this document is BGP and transit - getting global transit from upstream providers as opposed to peering, which is just mutual sharing of customer routes.
Basically, if you have any address space "inside" of your provider's larger "netblock" or "aggregate", you won't be advertised to the outside world specifically - your provider will just advertise their larger block. If you have any other networks (an old Class C; customers with address space; etc...) your provider will just statically announce those routes to the world and statically route them inside their network to your leased-line/ router interface(s).
With BGP, your provider gives you all of the routes they have (the easy part), and listens to your route announcements and then redistributes some or all of those to their peers and customers. This is the hard part (for them - just worry about understanding and configuring your end for now). The net difference is "just" that they may start advertising a more specific route (no mean task in a complicated network designed, as most networks are, to prevent the accidental "leaking" of more specific routes) or that the routes that they normally advertise for you under just their ASN will now have your ASN attached as well.
And you don't really need "full routes" so that you can "run defaultless" if you're single-homed. Since every packet destined for the Internet (as opposed to your internal network ) is going to go out the same router interface, it doesn't matter whether it's via one default route or via searching a list of 45,000 or more routes heard via BGP.
The only really valid reason is that you want to be able to have more control in advertising your routes. Of course, you'll have to argue around the flap argument even if you have your own provider-independent address space (if you're singly- connected to the 'net, why bother all of the routers in the world by telling them whether you're reachable or not currently) and the routing-table space argument (if you're in your provider's IP space or "aggregate announcement"), why pollute the routing tables with an extra few routes by announcing your routes more specifically?
You're on your own for the answers to these questions. If you think you have a good case, either talk to your current or potential provider, or perhaps send a question off to the inet-access list and see if anyone can help.
If you do want to configure BGP and are single-homed, follow the instructions on how to announce your networks (routes), and either filter all incoming routes - or accept them if you feel you really want to.
So the most important thing about being multi-homed is the ability to have your routes advertised to your providers - and by them to their providers and peers (i.e. to "the rest of the Internet"). Doing this basic level of route advertisement is not hard. You just have to do it in a paranoid way.
If you screw up BGP routing you may get slapped down pretty hard. Screwups with BGP route advertisements can be felt all over the Internet. To repeat: Screwups with BGP route advertisements can be felt all over the Internet. If your provider is smart, they will also implement "filters" to prevent you from screwing them and the Internet up. But don't count on it.
If you were to announce a route that was more specific than, say, the otherwise-best route for Yahoo's web servers, you would black-hole Yahoo for a period of time. Needless to say, they would not be very happy with you. The solution is to do good filtering on your end - and for your provider to also do excellent filtering wherever possible.
Before you start playing with BGP, you might really want to wait and read the "Configuring a Cisco Router" document (also coming out in the next few months). If you do go ahead and are implementing BGP for the first time, get a friend or another provider to review your proposed configs for you before implementing them. And for a summary of BGP-related Cisco commands, see the BGP Cisco Commands sidebar.
We'll talk a bit about how you load-balance incoming and outgoing traffic to and from your network. Incoming traffic is controlled by how you announce your routes to the world (packets will flow into your network because someone out there heard and is using a route announcement). Outgoing traffic is controlled by the routes that you allow to flow into your border router(s) - and is thus much easier to control and tune.
There are many other ways, some of which we'll talk about in future document. The way we at Net Access do it is by redistributing from our IGP (IS-IS), through a filter list, into BGP. While we do run BGP inside our network, it's strictly to pass external route announcements through the various parts of our network - no internal routes are ever passed from one of our routers to another one of our routers with BGP. But when we first started speaking BGP, we set our routers up the way described below.
You'll always set "next-hop-self" on all peering sessions. See the sidebar on next-hop-self for an explanation.
The safest way to announce your routes with BGP is to configure everything statically. You can think of the process described below as turning networks into route announcements.
To do this:
For example, let's say you're routing the following networks (also called "netblocks" sometimes):
170.100.0.0/16 (a /16 has a netmask of 255.255.0.0) 192.204.44.0/24 (a /24 has a netmask of 255.255.255.0) 206.8.128.0/17 (a /17 has a netmask of 255.255.128.0) 207.126.0.0/18 (a /18 has a netmask of 255.255.192.0)
You'd first configure your router with:
int Loopback0 descr Loopback interface for routes to be nailed to. ip route 170.100.0.0 255.255.0.0 Loopback0 10 ip route 192.204.44.0 255.255.255.0 Loopback0 10 ip route 206.8.128.0 255.255.128.0 Loopback0 10 ip route 207.126.0.0 255.255.192.0 Loopback0 10
Then:
ip as-path access-list 2 deny .* ip as-path access-list 3 permit ^$ ip as-path access-list 3 deny .* router bgp 64512 network 170.100.0.0 mask 255.255.0.0 network 192.204.44.0 mask 255.255.255.0 network 206.8.128.0 mask 255.255.128.0 network 207.126.0.0 mask 255.255.192.0 neighborremote-as neighbor next-hop-self neighbor filter-list 3 out neighbor filter-list 2 in
Explanation:
This method "statically nails down" the route announcements being advertised with the "network" statements. In order to nail them down, there must be: (1) Underlying static routes with the same netmask as each route being advertised with a network statement; and (2) Those underlying static routes must not go away. The purpose of the Loopback0 routes is to ensure that even if an existing primary route which matches the netmask of the route being announced (and this is often not the case) goes away, the Loopback0 route (with a weight of 10, which means it's only a "backup" route to any route without a weight at the end) will kick in and keep the BGP route advertisement stable. (Loopback0 routes always stay installed since there's no physical interface to go down and cause the route to be withdrawn - the interface Loopback0 will always be up, so the routes pointed to them will always be installed.)
This example uses a "deny everything" incoming filter, so it will only announce routes - it won't accept any. If you want to accept all incoming routes, replace the "filter-list 2 in" with "filter-list 1 in". Actually, you could just not specify an "inbound as-path filter" - and the effect would be the same - but it's better by far to be explicit about these things.
To add more peers, just create another similar neighbor statement. Ciscos give you 30 seconds to finish typing the neighbor statement before they start trying to establish the session. It is critical that you get those "neighbor somebody filter-list xxx .." statements in there by then. The best way by far to do it is to either cut and paste or tftp in a complete neighbor statement to the router.
Here's an example of a completely filled-in bgp clause, based on the example above (note that the 64512 is a fictitious IP address).
router bgp 64512 network 170.100.0.0 mask 255.255.0.0 network 192.204.44.0 mask 255.255.255.0 network 206.8.128.0 mask 255.255.128.0 network 207.126.0.0 mask 255.255.192.0 neighbor 207.106.127.45 remote-as 4969 neighbor 207.106.127.45 next-hop-self neighbor 207.106.127.45 filter-list 3 out neighbor 207.106.127.45 filter-list 2 in neighbor 137.10.10.121 remote-as 701 neighbor 137.10.10.121 next-hop-self neighbor 137.10.10.121 filter-list 3 out neighbor 137.10.10.121 filter-list 2 in
Let's say you are using 207.106.96.0/20. Your provider (let's call him oldprovider) has 207.106.0.0/16. So oldprovider announces only 207.106.0.0/16 to the world. There is no advertisement for 207.106.96.0/20 in this case - any packet destined to 207.106.96.0/20 will be picked up by the less specific (more general) route 207.106.0.0./16.
Now you want to multi-home. So you buy a T1 from newprovider. You set up BGP with both oldprovider and newprovider. Suddenly, the world sees two routes for you:
207.106.0.0/16, advertised by oldprovider; and 207.106.96.0/20, advertised by newprovider.
Remember, the most specific route always wins, so newprovider will wind up carry almost all, if not all, of your incoming traffic! In fact, certain parts of oldprovider's network may actually prefer newprovider's t1 to get to you!
The problem is that most large-ish providers use something called "aggregate-address statements" - and they certainly have some sort of filter to keep the more specific routes floating around inside of their networks from being advertised to the world. Remember, the world only wants to hear about 207.106.0.0/16 if the little, more specific routes inside of 207.106.0.0 are not multi-homed.
So what does oldprovider have to do? Blow holes in their "filter". One way or another, it's going to take modifications in oldprovider's 'border' routers to make incoming load-balancing work properly for you - and oldprovider may not want to do this. Basically, everywhere that oldprovider peers with anyone else (and this is usually at least 5-10 places), they have to modify their aggregation statements or other filters to "allow" your more specific route announcement to pass through.
This is why it's important to choose a primary provider based on how cooperative they'll be when you want to multi-home.
There are a couple of reasons. First, each provider obviously knows best the way to get to its customers. Meaning, if you're multi-homed to Sprintlink and UUNET, you always want to send data to Sprintlink customers out your Sprintlink T1 and data to UUNET customers out your UUNET T1. Second, though AS-PATH length is a pretty poor selection tool, it's what we've got right now - and it does bear some relation to an indicator of how "close" a given provider is to some other provider.
So filling your router with routes from all of your upstream providers means that, for routes of the same specificity, AS-PATH length will decide which one actually gets used. See Fig 7 for examples and explanation.
The minimum set of "less than full" routes you'll want to take is customer routes from each provider (from each provider, get only the routes for them and their customers). This is a problem if your providers include Sprintlink and MCI, however, since Sprintlink and MCI customer routes together are such a large percentage of "full routes" that you can't really put Sprintlink and MCI routes in Cisco 2501s or 4000s either. You should, however, be able to put Sprintlink and any other few sets of customer routes or MCI and any other few sets in even a 2501 or 4000.
The problem is getting just customer routes (also called "peering routes"). You can tell your providers to only send you customer routes - and most providers that do a significant amount of BGP can do this pretty easily - but if any one of your providers screws up (changes a filter list slowly, for example) then they may blast more than enough routes at you to "melt your router". Unfortunately, when many brands of routers (Ciscos included) run out of memory, they don't just shut down BGP routing - or crash and restart. Ciscos, in particular, do not handle running out of memory gracefully at all, and will gleefully consume so much memory with routing data that basic command functionality gets trashed and someone needs to physically power cycle the router.
(Ciscos use ! at the beginning of a line to denote a comment line.)
! Filter everything but Sprintlink (ASN 1239) from Sprintlink ip as-path access-list 40 deny _3561_ ip as-path access-list 40 deny _701_ ip as-path access-list 40 deny _1673_ ip as-path access-list 40 deny _174_ ip as-path access-list 40 deny _1_ ip as-path access-list 40 deny _4200_ ip as-path access-list 40 permit .* ! Filter everything but UUNET (ASN 701) from UUNET ip as-path access-list 41 deny _3561_ ip as-path access-list 41 deny _1239_ ip as-path access-list 41 deny _1673_ ip as-path access-list 41 deny _174_ ip as-path access-list 41 deny _1_ ip as-path access-list 41 deny _4200_ ip as-path access-list 41 permit .* ! Filter the major providers from Net Access ip as-path access-list 42 deny _3561_ ip as-path access-list 42 deny _1239_ ip as-path access-list 42 deny _701_ ip as-path access-list 42 deny _1673_ ip as-path access-list 42 deny _174_ ip as-path access-list 42 deny _1_ ip as-path access-list 42 deny _4200_ ip as-path access-list 42 permit .* router bgp 64512neighbor remote-as 1239 neighbor next-hop-self neighbor filter-list 3 out neighbor filter-list 40 in neighbor remote-as 701 neighbor next-hop-self neighbor filter-list 3 out neighbor filter-list 41 in neighbor remote-as 4969 neighbor filter-list 3 out neighbor filter-list 42 out
That will ensure that even if Sprintlink, UUNET, or Net Access screw up and blow you all of the routes they know about, you'll still take their customer routes but won't take the vast majority of other routes from them. (Sprintlink, MCI, UUNET, ANS, PSI, BBN, and AGIS) make up the vast majority of routes - well over 80-85% of the routes out there.
Note: If you're a Sprintlink customer, you'll probably be peering with AS
179x - or at least some ASN other than 1239. Sprintlink uses ASNs for each major
POP (as do many other providers) - but unlike other providers, these ASNs are
visible to the outside world. Any non-Sprintlink customer route, though (any
route from the outside world), will still have the ASN 1239 (which is
Sprintlink's "peering" ASN) in the AS-PATH, though. The bottom line is that
instead of
AS-PATH padding is probably the most widely-used BGP tuning method, and we'll
go into it in more detail next month.
Basically, if you make sure not to set weights or local_prefs, AS-PATH length
is going to decide which of multiple BGP routes of the same specificity will be
preferred. So if you want to make one path preferred or another one not
preferred, you can "pad" the AS-PATH with extra ASNs to make one path
look longer than another. This is done with route-maps, which we'll talk more
about next month.
Thanks to Alexis Rosen at Panix (alexis@panix.com), who sent me some
last-minute suggestions for clarification and pointed out an ugly factual error.
Thanks to John Hawkinson (jhawk@panix.com) of BBN, who told me about something
new called BGP in 1993 at a Science Fiction convention in the DC area. Thanks to
Dave Siegel (dsiegel@rtd.net) who's shared his BGP experience with others since
1995. And thanks to Alec Peterson (ahp@hilander.com) for reviewing this document
- and who explored some of the more advanced BGP features (oh, the joy of
route-maps) using my network when I didn't have the time.
Ciscos keep the originating address of a route intact in the next-hop field
when they pass it from eBGP peer to eBGP peer. (And ditto for iBGP, but we're
talking about eBGP here). It turns out that this behavior is sometimes useful in
large networks where there's an IGP running to tell every router which way to
send a packet that says it came from 192.41.177.x (some other provider's
MAE-East router); 192.157.69.x (some other provider's Pennsauken router); etc...
But this is really subtle and can screw you up big-time. In the best case
you'll piss someone off (if you forget to set "next-hop-self" in an
exchange-point peering environment. In the worst case you'll cause routing loops
for yourself (examples of this will be given when we talk more about IGPs).
Setting next-hop-self causes a Cisco to override the originating address of a
route and stamp instead its own address as the "next-hop" part of the route.
Remember that the critical parts of a route are: What the base IP address is;
how big the route is (the specificity or netmask); and what destination
(next-hop) to use to send data to the IP space represented by the route.
We'll use an exchange point environment to illustrate next-hop-self. Refer to
the figure (XXX) below. When AS 4969 advertises 250.20.0.0/16 to AS 64500, AS
4969 sets next-hop-self, so the next-hop is 192.41.177.87 (AS 4969's mae-east IP
address).
Now, AS 64500 advertises it to AS 64600 (see the top diagram) without
next-hop-self. When AS 64600 processes the route and installs it into the IP
routing table, the next-hop used will be 192.41.177.87.
But AS 64600 doesn't peer with AS 4969 - yet it's going to send data to a
route advertised by AS 4969 - right to AS 4969's router. People generally
do not like this. In this case, AS 4969 might discover this "behavior" by
running a few careful probes of other routers at mae-east. AS 4969 would then
look to see how it hears AS 64600 (who is announcing AS 64600 to AS 4969) and
see if they're the culprits. If AS 4969 really wants to, it can find out who the
culprit is by passing a bogus route or two to each peer in turn, and see when AS
64600's router starts using the bogus route.
The solution is for 64500 to use next-hop-self as well (see the bottom
diagram). In this case, the route as heard by 64600 has 192.41.177.NNN (AS
64500's mae-east IP address) in the next-hop field - though the AS-PATH and
certain other fields still show that AS 4969 is the origin of the route. So when
AS 64600 wants to send data to AS 4969 based on this route it'll "bounce the
traffic off of" AS 64500's router. Some people don't even like this (since it's
a form of providing service to downstream customers over the "shared medium" of
the exchange-point switches), but it's not going to be as strenuously objected
to as not using next-hop-self.
Any packet not destined to the inside of the ISP's network will then hit the
"wildcard", or "default" route, and be sent out the router interface towards the
provider(s).
There are a few ways you can do this.
Outgoing Data Flow: Option 1
Option 1 is to default to one provider and install a "backup default" to your
other provider. On a Cisco, this is done with:
If you do it this way, the route with a lower weight will be around when
Serial0 is up. If Serial0 goes down for some reason (actually, if the "line
protocol" on Serial0 goes down), the route will be invalidated and will go away,
so the Cisco will look for the next-best route, which will be the route through
Serial1. Even though it has a lower weight, it's the only valid route
left to consider, so it'll "win".
Outgoing Data Flow: Option 2
Option 2 is to default equally to both providers. However, there's a catch.
If you just do:
The fix is easy, however:
And actually, many Ciscos come pre-configured with "ip route-cache" set on
all of the interfaces - but even so, it doesn't hurt to be explicit.
If you do this, the Cisco will keep a cache of all destinations you're
sending packets to, and will "lock in" each destination to one specific
interface. In general, this method leads to decent load-balancing (in the 40/60
to 50/50 split range). The worst case in this scenario is not IP degradation,
but poor use of your additional bandwidth (which can, of course, lead to IP
degradation if you need your second outgoing pipe because your first has a
tendency to get full). Anyway, this kind of load-balancing works pretty well and
is what people use when they can't accept "full BGP routes" from multiple
providers.
AS-PATH PADDING
Some people just aren't content to leave things the way
nature intended them. Bored routing engineers are very dangerous. If you don't
give them work to do they'll either sit and read news or Cisco documentation -
or start optimizing ("tuning") routing.
QUESTIONS AND COMMENTS
I expect that this document will generate a lot
of questions. Please do not send them to freedman@netaxs.com. Please use either
the inet-access list, which I and many of my routing-geek friends patrol
regularly, or bgp@netaxs.com. Thanks.
THANKS TO
In no particular order:
Sidebar on next-hop-self
If you've followed the "peering and transit"
discussions, you may have heard of the "next-hop-self issue". Here's the
problem.
Sidebar on Outgoing Data Flow Control Without BGP
Without BGP, your
only way to send data out (and the way 90% or more of the ISPs out there run
their networks) is to default route into their provider(s).
ip route 0.0.0.0 0.0.0.0 Serial0
ip route 0.0.0.0 0.0.0.0 Serial1 10
This says: "The default route (0.0.0.0/0, or 0.0.0.0, netmask 0.0.0.0)
goes out Serial0 with a preference of 0 (if you don't put a 4th field in an "ip
route" statement on a Cisco, it'll assume a weight of 0)." "Another default
route is out Serial1, with a weight of 10".
ip route 0.0.0.0 0.0.0.0 Serial0
ip route 0.0.0.0 0.0.0.0 Serial1
You will almost certainly not be happy with the result! Unless "ip
route-cache" is set on the interfaces in question, the Cisco will simply
"round-robin" outgoing packets, sending packet N out Serial0 and packet N+1 out
Serial1. Why is this bad? Well, if you are sending data to site X, and site X is
on Provider A's network (and let's say that Provider A is at the other end of
Serial0), data sent to site X out Serial0 may arrive in 10ms. Data sent to site
X out Serial1 may arrive in 30-100ms. This means that packets 1 and 3 could
arrive before packets two in a pathologically worst-case scenario. Or even
packets 1, 3, 5, and 7 could arrive before packet2 does. This kind of
out-of-order (or even worse, packet-lossy) performance spells doom for IP
traffic.
int Serial0
ip route-cache
int Serial1
ip route-cache
Note, though, that if you are using any Cisco bigger than a 2500 series,
the "ip route-cache" command might be "ip route-cache cbus" or "ip route-cache
optimum" or some other command.
TO BE DONE
aggregate-address
transit
bgp and peering
bgp: the
provider's side: filtering
as-path padding
sync