AlphaOne Technology Support Forums
Welcome, Guest. Please login or register.
November 21, 2008, 01:09:01 AM

Login with username, password and session length
Search:     Advanced search
1733 Posts in 827 Topics by 4633 Members
Latest Member: keplekidsCini
* Home Help Search Login Register
AlphaOne Technology Support Forums  |  Search Engines & Marketing  |  Website Traffic Discussions  |  Topic: googled to death 0 Members and 1 Guest are viewing this topic. « previous next »
Pages: [1] Go Down Print
Author Topic: googled to death  (Read 5075 times)
songdove
Tribforce Tribble, I mean Tribbie
Full Member
***
Offline Offline

Posts: 16


Top tribble at Trib


View Profile
googled to death
« on: August 20, 2005, 07:21:22 PM »

How can I slow google down?  Here are the bot stats for July and August:

August
Googlebot
hits:   34595   
bandwidth: 1.12 GB   
date/time: 19 Aug 2005 - 22:50

July
Googlebot   
Hits: 1407   
bandwidth: 38.47 MB   
date/time: 31 Jul 2005 - 23:58

As you can see, they didn't crawl the site even half as much as they have this month.  I don't know what tipped them off to us, but I have been regularly getting bandwidth notices, some coming at strange hours of the night, like 3am!!!  My reseller has already upped our site bandwidth allowance for the month, and I've gotten 4 more bandwidth notices since he did that.  At this rate google is going to shut down the site before the end of the month.

How do I slow it down? I like that google sees us and is crawling the site, but our bandwidth is being killed by the attention.
Logged

We sacrifice all that we are and all that we love for the greater good, the One above. Visit me at http://tribforcehq.com, http://tribkids.com, http://teshuvatrumpet.org, http://sswat.uni.cc, http://planetlogos.now.nu
TJ
Tech Team
Hero Member
********
Offline Offline

Posts: 136



View Profile
Re: googled to death
« Reply #1 on: August 20, 2005, 08:06:48 PM »

I would use the robots.txt file to tell them what directories they can access and limit their access to only some directories.  I think Google is doing something new because my site has been hit hard too this month.

TJ
Logged
songdove
Tribforce Tribble, I mean Tribbie
Full Member
***
Offline Offline

Posts: 16


Top tribble at Trib


View Profile
Re: googled to death
« Reply #2 on: August 20, 2005, 08:30:35 PM »

Thanks for the tip.  I've edited the robots file adding a good number of directories to it.  Apparently according to another webmaster forum I hang out at, this sort of thing was happening in the spring thanks to a new bot/adword system known as FastClick.  All my bot log would say is that it was a googlebot however this time, so I don't know what is making google go so crawl crazy.
Logged

We sacrifice all that we are and all that we love for the greater good, the One above. Visit me at http://tribforcehq.com, http://tribkids.com, http://teshuvatrumpet.org, http://sswat.uni.cc, http://planetlogos.now.nu
Brad
SysAdmin
Tech Team
Hero Member
********
Offline Offline

Posts: 391



View Profile
Re: googled to death
« Reply #3 on: August 21, 2005, 12:47:57 PM »

Googlebot, or something posing as googlebot, is hitting several sites hard.  But others not much harder than before.  It seems to be getting stuck when a site has a php app running as its main site.  Or so it seems.

Googlebot hits AlphaOne about 3 times a day on a normal month and goblles about 80-90 MB.  This month it has hit us about 4 times a day and already taken 90 MB.  But MSN is still ahead in number of times it visits with 8 times a day and about 200 MB.  Inktomi spiders us over 40 times a day, (I think they have this forum and our webmaster resources heavily spidered), but takes less bandwidth than either of the other two.

We have two OSCommerce sites getting slammed really bad, 2 PostNuke and 1 UBB.  Everything else seems slightly higher, but not enough to be alarmed about.  I spent a lot of time checking into this today because one of the sites, a UBB site, was already really close to maxing out what is allowable on a shared server and this brought them so far over the edge we had to tell them they had to move to a semi-dedicated server immediately.  Between Google dancing all over them and their increased normal traffic they were keeping backups from running automatically and we had to run them during the day.

I have seen others talk about this as well in other forums, and someone said CNN announced Google was doing a big update dance.  This is a good thing for anyone in Google's sandbox. 

thanks

Brad
Logged

songdove
Tribforce Tribble, I mean Tribbie
Full Member
***
Offline Offline

Posts: 16


Top tribble at Trib


View Profile
Re: googled to death
« Reply #4 on: August 21, 2005, 03:08:29 PM »

Good on one hand, bad on the other.  I'm still getting notices that my site is hitting or nearing the 80% bandwidth alotment for the month, and my W4C reseller already upped the bandwidth for me just recently.  Trib is running Postnuke as well, and I've disallowed a good number of the postnuke directories in the robots.txt file now in an attempt to cut down on what google can spider. 

If this is just merely an update, then hopefully the traffic will die off before we end up causing problems on the server we're on.
Logged

We sacrifice all that we are and all that we love for the greater good, the One above. Visit me at http://tribforcehq.com, http://tribkids.com, http://teshuvatrumpet.org, http://sswat.uni.cc, http://planetlogos.now.nu
Brad
SysAdmin
Tech Team
Hero Member
********
Offline Offline

Posts: 391



View Profile
Re: googled to death
« Reply #5 on: August 25, 2005, 12:02:06 PM »

Google is murdering a few sites on the server.  One site has had only 7 visitors this month and has used 6 GB of bandwidth with Google alone.

No one seems to have an answer.  But it is not just our servers getting hit so bad.
Logged

songdove
Tribforce Tribble, I mean Tribbie
Full Member
***
Offline Offline

Posts: 16


Top tribble at Trib


View Profile
Re: googled to death
« Reply #6 on: October 22, 2005, 04:11:52 PM »

Well, took them a month to catch up to the upgrades I did, but the googlebot is back.

AwStats for one visit today:

Googlebot
Hits:4770   
Bandwidth: 218.72 MB      
Date/Time: 22 Oct 2005 - 03:40

More than July but less than August.

I've re-uploaded the robots.txt file where I've disallowed a number of folders that have tons of images, documents, or that have alot of database calls to them. 

This quote from another forum I found while searching about the unknown bot knowns as crawler, said this:

Quote
nuthin
Fast-Webcrawler (AllTheWeb) is just as bad on some sites i monitor, but ahh well.. if you dont want to be indexed into that particular search engine, you can always as akashik mentioned place it in your robots.txt

but i don't know many websites that dont want to have their website indexed into google and other major search engines.

if you think googles hitting your website too "hard" they mention;

"If you find that we are placing too high a load on your site, please let us know by sending us e-mail at googlebot@google.com"
http://www.webhostingtalk.com/archive/thread/194438-1.html

I'm getting hit a fair bit by the unknown crawler too.

Unknown robot (identified by 'crawl')   
Hits: 2841   
Bandwidth: 103.00 MB      
date/time: 21 Oct 2005 - 09:03

Any ideas on slowing these down would be awesome.  It's great to be crawled, but horrible not to be.

I sent the following email to the address for googlebot above:

Quote
Hello,

I am the administrator for http://tribforcehq.com and its subdomains.  Googlebot is taking far too much time on the site and is starting to force my webhost to send me notices that my hosting bandwidth is about to exceed its limit again.  I would appreciate it, if your bot could be configured to spend no more than 1 meg a day on spidering tribforcehq.com or any of its subdomains.  Basedon AwStats, this is what your bot used on its last visit that I looked at for today:

Googlebot            4770   218.72 MB      22 Oct 2005 - 03:40

I need that bandwidth for my users.  It's nice to be indexed on a search engine.  It's NOT nice to have my bandwidth eaten up to get there.

If you do not wish to slow down the usage of the bot, let me know and we'll set up a billing system so that Google pays for the excess bandwidth each month.  I will need the billing address to send it to, and will expect that payment be on time to cover expenses.

Sincerely,
Marilynn Dawson
Administrator
http://tribforcehq.com
Logged

We sacrifice all that we are and all that we love for the greater good, the One above. Visit me at http://tribforcehq.com, http://tribkids.com, http://teshuvatrumpet.org, http://sswat.uni.cc, http://planetlogos.now.nu
AlphaWolf
AOT Administrator
Administrator
Hero Member
*****
Offline Offline

Posts: I am a geek!!



View Profile WWW
Re: googled to death
« Reply #7 on: October 23, 2005, 08:51:11 AM »

something crazy is going on with Google.

Webs 4 Christ gets spidered at least 6-8 times a day by Google.  2 weeks ago we submitted a Google site map using their tool.  3 days ago, they finally ran it...and REMOVED W4C TOTALLY from google!  We are completely GONE.  And yet google still spiders 6-8 times a day
Logged

AlphaOne Tech Webmaster Resources
http://www.alphaone-tech.com/resources/
songdove
Tribforce Tribble, I mean Tribbie
Full Member
***
Offline Offline

Posts: 16


Top tribble at Trib


View Profile
Re: googled to death
« Reply #8 on: October 26, 2005, 04:37:44 PM »

Ok, here is what I got back from google's help department:

Thank you for your reply. From the log snippet you provided, we can see that your
site is using session IDs. As you've observed, session IDs can cause problems for
our robots. Please disable session IDs for Googlebot so that our robots may crawl
your site more efficiently.
 
Regards,
The Google Team
 
Original Message Follows:
------------------------
From: "M Dawson"
 
Hello,
 
The forgoogle.html file has been made and now sits at http://tribforcehq.com/
forgoogl.html.  There is nothing on this page as you did not specify that it needs
anything.
 
The most recent visit by the googlebot is shown as the following:
 
Host: 66.249.66.2
 
/kids/modules.php?op=modload&name=PNphpBB2&file=viewtopic&p=151946
  Http Code: 200 Date: Oct 24 18:58:21 Http Version: HTTP/1.1
Size in Bytes: 60799
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/teens/modules.php?name=UpDownload&req=NewDownloadsDate&selectdate=1129584652
  Http Code: 200 Date: Oct 24 18:59:05 Http Version: HTTP/1.1
Size in Bytes: 28548
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/leftbehind/index.php?name=PNphpBB2&file=posting&mode=quote&p=151944
  Http Code: 200 Date: Oct 24 18:59:31 Http Version: HTTP/1.1
Size in Bytes: 79030
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/fbc/index.php?name=PNphpBB2&file=viewforum&f=49
  Http Code: 200 Date: Oct 24 19:00:23 Http Version: HTTP/1.1
Size in Bytes: 144364
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/fbc/index.php?name=PNphpBB2&file=viewtopic&p=177574
  Http Code: 200 Date: Oct 24 19:01:22 Http Version: HTTP/1.1
Size in Bytes: 114167
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/games/index.php?name=PNphpBB2&file=viewtopic&t=10652
  Http Code: 200 Date: Oct 24 19:01:50 Http Version: HTTP/1.1
Size in Bytes: 52463
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/college/index.php?name=PNphpBB2&file=profile&mode=viewprofile&u=7
  Http Code: 302 Date: Oct 24 19:02:56 Http Version: HTTP/1.1
Size in Bytes: 5
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/college/index.php?name=PNphpBB2&file=login&redirect=profile&mode=viewprofile&u=7

  Http Code: 200 Date: Oct 24 19:02:57 Http Version: HTTP/1.1
Size in Bytes: 32411
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/newhope/index.php?name=News&file=article&sid=54
  Http Code: 200 Date: Oct 24 19:03:49 Http Version: HTTP/1.1
Size in Bytes: 33465
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=10676&POSTNUKESID=2c239aebedba97ed55c843251ef6348b
 
  Http Code: 404 Date: Oct 24 19:04:24 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=6916&start=0&POSTNUKESID=7ea46ed0f8ad74b232915cd039becfe5
 
  Http Code: 404 Date: Oct 24 19:05:24 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=10553&POSTNUKESID=166f3128513559720a745eb33172f2c9
 
  Http Code: 404 Date: Oct 24 19:06:15 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/games/index.php?name=PNphpBB2&file=profile&mode=viewprofile&u=38
  Http Code: 302 Date: Oct 24 19:06:44 Http Version: HTTP/1.1
Size in Bytes: 5
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/games/index.php?name=PNphpBB2&file=login&redirect=profile&mode=viewprofile&u=38

  Http Code: 200 Date: Oct 24 19:06:45 Http Version: HTTP/1.1
Size in Bytes: 32083
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&p=149999&POSTNUKESID=2c239aebedba97ed55c843251ef6348b
 
  Http Code: 404 Date: Oct 24 19:07:50 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&p=150366&POSTNUKESID=4e5d78dc1ed715af67c819f58ef52b8f
 
  Http Code: 404 Date: Oct 24 19:08:43 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/fiction/index.php?name=Web_Links&req=viewlink&cid=3
  Http Code: 200 Date: Oct 24 19:09:24 Http Version: HTTP/1.1
Size in Bytes: 29330
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&p=153064&POSTNUKESID=6a195a657fad9df6645792ca1485588e
 
  Http Code: 404 Date: Oct 24 19:09:52 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=6973&POSTNUKESID=0d8e2934b760ff045968a4b1003ee393
 
  Http Code: 404 Date: Oct 24 19:10:55 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&p=149839&POSTNUKESID=2c239aebedba97ed55c843251ef6348b
 
  Http Code: 404 Date: Oct 24 19:11:19 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&p=151722&POSTNUKESID=35a1c62cb845dff6642f54f4245a5895
 
  Http Code: 404 Date: Oct 24 19:11:36 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=4870&start=0&sid=6b1a5089eb52586b261daeeea4afeb62
 
  Http Code: 404 Date: Oct 24 19:11:43 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)
 
/home/
index.php?name=PNphpBB2&file=viewtopic&t=10567&sid=da440a9bae6e2c6db30cfd49df851859

  Http Code: 404 Date: Oct 24 19:12:30 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)

/home/
index.php?name=PNphpBB2&file=viewtopic&t=6024&start=0&sid=6b1a5089eb52586b261daeeea4afeb62
 
  Http Code: 404 Date: Oct 24 19:13:23 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)

/home/
index.php?name=PNphpBB2&file=viewtopic&p=149052&sid=1f10b723f1bb46e0c4d0fc1520f48115

  Http Code: 404 Date: Oct 24 19:13:50 Http Version: HTTP/1.1
Size in Bytes: -
  Referer: -
  Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)

********************

Question now is, how do I turn off session ID's just for google?

I found in postnuke that I can turn off session ID's for unregistered users.  I've done that.  Now to see if this cuts down on the bandwidth usage or not.
« Last Edit: October 26, 2005, 05:19:22 PM by songdove » Logged

We sacrifice all that we are and all that we love for the greater good, the One above. Visit me at http://tribforcehq.com, http://tribkids.com, http://teshuvatrumpet.org, http://sswat.uni.cc, http://planetlogos.now.nu
Pages: [1] Go Up Print 
AlphaOne Technology Support Forums  |  Search Engines & Marketing  |  Website Traffic Discussions  |  Topic: googled to death « previous next »
Jump to:  

Powered by MySQL Powered by PHP AlphaOne Technology Support Forums | Powered by SMF 1.0.7.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!