經常爬站的搜尋引擎总结|各大搜索引擎的蜘蛛名称 不指定

, 2009/02/09 11:16 , SEO知识库 , 锁定(0) , 阅读(4379) , Via 本站原创 | |
經常爬站的搜尋引擎总结

Googlebot-Image/1.0
Mediapartners-Google
Mediapartners-Google/2.1
msnbot/1.1 (+http://search.msn.com/msnbot.htm)
Sosospider+(+http://help.soso.com/webspider.htm)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)
Gaisbot/3.0+(robot06@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
這代表什麼意思?大家可以試算一下:



一個網站上有5萬頁,有10個不同的搜尋引擎來捉資料,如果擠在一天捉完,那一天就擠進50萬個需求,但不會有任何產值,還可能拖累主機。而如果在一週內捉完,就算主機沒什麼事,也浪費不少頻寬。或許,這樣的量好像還OK嘛~~那再想想,如果一台主機上有十個類似的網站呢?耗費這麼多資源在搜尋引擎上面,網站得到什麼?

或許大家會說,不給搜尋引擎來捉,網站怎麼有辦法被找到咧?這點我也認同。但全球搜尋引擎那麼多,每個都來捉,顯然不是對網站最理想的狀態。

建議要有以下的作為:

汱弱留強
網路要有曝光管道,搜尋引擎的途徑不能錯失。但是擇優曝光即可,例如Google, Yahoo等。其他小咖的搜尋引擎,等他作出口碑後再開通未遲。
逐水草而居
如果搜尋引擎有特別的區域性,例如大陸的知名搜尋引擎,而和網路的目標族群有重疊性,那麼就有必要開放這樣的搜尋引擎。但同樣要汱弱留強。
層層把關
不成熟的搜尋引擎機器人根本不按 robots.txt 的協定作事,一旦選上網站,就一股腦死命狂捉。所以,要三不五時檢視網站流量記錄,將記錄中的搜尋引擎透過 robots.txt 作第一層的控管。然後應該在網站主機或程式的設定上作第二層的把關,排除不想往來的搜尋引擎,省下資源去服務更多的客戶。
擴大通路
網站的宣傳通路越多越好,搜尋引擎不可或缺,卻也不是唯一管道。網站應就其定位、服務供應鏈去思考適合的宣傳通路;並且利用時下流行的傳播方式多方宣傳,例如 RSS Feeds、書籤網站、MSN傳播、Email分享、....。

各大搜索引擎的蜘蛛名称
本文记录了全世界比较出名的Robots.txt 列表需要设置的搜索蜘蛛。如何设置那个目录不想被搜索引擎收录的可参照下去设置。
当然也必须从Robots.txt 去设置,此文内容如果你会利用登陆奇兵结合程序,可以帮你带来超大的国外IP流量!
下列为比较出名的搜索引擎蜘蛛名称:
Google的蜘蛛: Googlebot
百度的蜘蛛:baiduspider
Yahoo的蜘蛛:Yahoo Slurp
MSN的蜘蛛:Msnbot
Altavista的蜘蛛:Scooter
Lycos的蜘蛛: Lycos_Spider_(T-Rex)
Alltheweb的蜘蛛: FAST-WebCrawler/
INKTOMI的蜘蛛: Slurp
如需要参考的可以参照本文:
User-agent(用户代理设置):(蜘蛛名字)
拒绝:(文件名字)
User-agent: Black Hole
Disallow: /
User-agent: Titan
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Wget
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: mozilla/4
Disallow: /
User-agent: mozilla/5
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 9
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: ia_archiver/1.6
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Wget
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebStripper
Disallow: /

----
User-agent: WebSauger
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: RMA
Disallow: /
User-agent: libWeb/clsHTTPDisallow: /
User-agent: asterias
Disallow: /
User-agent: turingos
Disallow: /
User-agent: spanner
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: Crescent Internet ToolPak HTTPOLE Control v.1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: WebmasterWorldForumBot
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: Wget
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: moget
Disallow: /
User-agent: hloader
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Openfind data gathere
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: LinkScan/8.1a Unix Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Cegbfeieh
Disallow: /