标题:經常爬站的搜尋引擎总结|各大搜索引擎的蜘蛛名称 出处:北京SEO_北京SEO培训 - 【元创SEO】 时间:Mon, 09 Feb 2009 11:16:27 +0000 作者: 地址:http://www.yuan-chuang.cc/read.php/146.htm 内容: 經常爬站的搜尋引擎总结 Googlebot-Image/1.0 Mediapartners-Google Mediapartners-Google/2.1 msnbot/1.1 (+http://search.msn.com/msnbot.htm) Sosospider+(+http://help.soso.com/webspider.htm) Baiduspider+(+http://www.baidu.com/search/spider.htm) Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/) Gaisbot/3.0+(robot06@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php) Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp) Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07) 這代表什麼意思?大家可以試算一下: 一個網站上有5萬頁,有10個不同的搜尋引擎來捉資料,如果擠在一天捉完,那一天就擠進50萬個需求,但不會有任何產值,還可能拖累主機。而如果在一週內捉完,就算主機沒什麼事,也浪費不少頻寬。或許,這樣的量好像還OK嘛~~那再想想,如果一台主機上有十個類似的網站呢?耗費這麼多資源在搜尋引擎上面,網站得到什麼? 或許大家會說,不給搜尋引擎來捉,網站怎麼有辦法被找到咧?這點我也認同。但全球搜尋引擎那麼多,每個都來捉,顯然不是對網站最理想的狀態。 建議要有以下的作為: 汱弱留強 網路要有曝光管道,搜尋引擎的途徑不能錯失。但是擇優曝光即可,例如Google, Yahoo等。其他小咖的搜尋引擎,等他作出口碑後再開通未遲。 逐水草而居 如果搜尋引擎有特別的區域性,例如大陸的知名搜尋引擎,而和網路的目標族群有重疊性,那麼就有必要開放這樣的搜尋引擎。但同樣要汱弱留強。 層層把關 不成熟的搜尋引擎機器人根本不按 robots.txt 的協定作事,一旦選上網站,就一股腦死命狂捉。所以,要三不五時檢視網站流量記錄,將記錄中的搜尋引擎透過 robots.txt 作第一層的控管。然後應該在網站主機或程式的設定上作第二層的把關,排除不想往來的搜尋引擎,省下資源去服務更多的客戶。 擴大通路 網站的宣傳通路越多越好,搜尋引擎不可或缺,卻也不是唯一管道。網站應就其定位、服務供應鏈去思考適合的宣傳通路;並且利用時下流行的傳播方式多方宣傳,例如 RSS Feeds、書籤網站、MSN傳播、Email分享、....。 各大搜索引擎的蜘蛛名称 本文记录了全世界比较出名的Robots.txt 列表需要设置的搜索蜘蛛。如何设置那个目录不想被搜索引擎收录的可参照下去设置。 当然也必须从Robots.txt 去设置,此文内容如果你会利用登陆奇兵结合程序,可以帮你带来超大的国外IP流量! 下列为比较出名的搜索引擎蜘蛛名称: Google的蜘蛛: Googlebot 百度的蜘蛛:baiduspider Yahoo的蜘蛛:Yahoo Slurp MSN的蜘蛛:Msnbot Altavista的蜘蛛:Scooter Lycos的蜘蛛: Lycos_Spider_(T-Rex) Alltheweb的蜘蛛: FAST-WebCrawler/ INKTOMI的蜘蛛: Slurp 如需要参考的可以参照本文: User-agent(用户代理设置):(蜘蛛名字) 拒绝:(文件名字) User-agent: Black Hole Disallow: / User-agent: Titan Disallow: / User-agent: WebStripper Disallow: / User-agent: NetMechanic Disallow: / User-agent: CherryPicker Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: Crescent Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Wget Disallow: / User-agent: SiteSnagger Disallow: / User-agent: ProWebWalker Disallow: / User-agent: CheeseBot Disallow: / User-agent: mozilla/4 Disallow: / User-agent: mozilla/5 Disallow: / User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) Disallow: / User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) Disallow: / User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 9 Disallow: / User-agent: ia_archiver Disallow: / User-agent: ia_archiver/1.6 Disallow: / User-agent: Alexibot Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: Wget Disallow: / User-agent: MIIxpc Disallow: / User-agent: Telesoft Disallow: / User-agent: Website Quester Disallow: / User-agent: WebZip Disallow: / User-agent: moget/2.1 Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebStripper Disallow: / ---- User-agent: WebSauger Disallow: / User-agent: WebCopier Disallow: / User-agent: NetAnts Disallow: / User-agent: Mister PiX Disallow: / User-agent: WebAuto Disallow: / User-agent: TheNomad Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: RMA Disallow: / User-agent: libWeb/clsHTTPDisallow: / User-agent: asterias Disallow: / User-agent: turingos Disallow: / User-agent: spanner Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: ExtractorPro Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: Crescent Internet ToolPak HTTPOLE Control v.1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Foobot Disallow: / User-agent: WebmasterWorldForumBot Disallow: / User-agent: SpankBot Disallow: / User-agent: BotALot Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: URLy Warning Disallow: / User-agent: Wget Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: LinkWalker Disallow: / User-agent: cosmos Disallow: / User-agent: moget Disallow: / User-agent: hloader Disallow: / User-agent: humanlinks Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Mata Hari Disallow: / User-agent: LexiBot Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Web Image Collector Disallow: / User-agent: The Intraformant Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: JennyBot Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: WebEnhancer Disallow: / User-agent: TightTwatBot Disallow: / User-agent: suzuran Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Openfind data gathere Disallow: / User-agent: Openfind Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Webster Pro Disallow: / User-agent: EroCrawler Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Cegbfeieh Disallow: / Generated by Bo-blog 2.1.1 Release