Hi,
The bandwidth of my site has increased exponentially as the site grows, both in the number of (stories/nodes) and the number of taxonomy categories.
Suppose I have 1000 stories, 400 categories of which 60% of stories fall within two or more categories. I have menu links pointing to all category items. There are links within 30% of the stories that provide a category menu linking to other related stories (populated by a custom module).
Thus a visiting spider hits the site and loads the same story under multiple different links. Thus the number of unique pages on the site may be 1000, but the number of links for a search engine is more like 1000x10 = 10,000 stories. If I add a seperate taxonomy with the categories commercial / non-commercial the previous count could double to 20,000 pages linked on the site. And so on.
When a spicer hits the site it loads 1000 stories under many categories and subcategories and continues until about 540Mb is consumed for every spider that hits my site. Some spiders are brain-dead and hit my site with 1.87Gb per visit!
I am forced to move to a high bandwidth hosting co. However, there must be another solution since ultimately I am only delaying the crisis until my site gets to 50,000 stories. Searching for bandwidth, I found a lot of posts concerning high bandwidth but little that really addresses this issue. I cannot see how a robots.txt file would help, other than to exclude a spider totally (as I have done with the 1.87Gb culprit).