<![CDATA[北京SEO_北京SEO培训

<![CDATA[北京SEO_北京SEO培训 - 【元创SEO】]]> http://www.yuan-chuang.cc/index.php zh-cn http://www.yuan-chuang.cc/read.php/.htm <![CDATA[經常爬站的搜尋引擎总结各大搜索引擎的蜘蛛名称]]> <> Mon, 09 Feb 2009 03:16:27 +0000 http://www.yuan-chuang.cc/read.php/.htm 經常爬站的搜尋引擎总结

Googlebot-Image/1.0
Mediapartners-Google
Mediapartners-Google/2.1
msnbot/1.1 (+http://search.msn.com/msnbot.htm)
Sosospider+(+http://help.soso.com/webspider.htm)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)
Gaisbot/3.0+(robot06@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
這代表什麼意思？大家可以試算一下：

一個網站上有５萬頁，有１０個不同的搜尋引擎來捉資料，如果擠在一天捉完，那一天就擠進５０萬個需求，但不會有任何產值，還可能拖累主機。而如果在一週內捉完，就算主機沒什麼事，也浪費不少頻寬。或許，這樣的量好像還ＯＫ嘛～～那再想想，如果一台主機上有十個類似的網站呢？耗費這麼多資源在搜尋引擎上面，網站得到什麼？

或許大家會說，不給搜尋引擎來捉，網站怎麼有辦法被找到咧？這點我也認同。但全球搜尋引擎那麼多，每個都來捉，顯然不是對網站最理想的狀態。

建議要有以下的作為：

汱弱留強
網路要有曝光管道，搜尋引擎的途徑不能錯失。但是擇優曝光即可，例如Google, Yahoo等。其他小咖的搜尋引擎，等他作出口碑後再開通未遲。
逐水草而居
如果搜尋引擎有特別的區域性，例如大陸的知名搜尋引擎，而和網路的目標族群有重疊性，那麼就有必要開放這樣的搜尋引擎。但同樣要汱弱留強。
層層把關
不成熟的搜尋引擎機器人根本不按 robots.txt 的協定作事，一旦選上網站，就一股腦死命狂捉。所以，要三不五時檢視網站流量記錄，將記錄中的搜尋引擎透過 robots.txt 作第一層的控管。然後應該在網站主機或程式的設定上作第二層的把關，排除不想往來的搜尋引擎，省下資源去服務更多的客戶。
擴大通路
網站的宣傳通路越多越好，搜尋引擎不可或缺，卻也不是唯一管道。網站應就其定位、服務供應鏈去思考適合的宣傳通路；並且利用時下流行的傳播方式多方宣傳，例如 RSS Feeds、書籤網站、MSN傳播、Email分享、....。

各大搜索引擎的蜘蛛名称
本文记录了全世界比较出名的Robots.txt 列表需要设置的搜索蜘蛛。如何设置那个目录不想被搜索引擎收录的可参照下去设置。
当然也必须从Robots.txt 去设置,此文内容如果你会利用登陆奇兵结合程序,可以帮你带来超大的国外IP流量!
下列为比较出名的搜索引擎蜘蛛名称：
Google的蜘蛛： Googlebot
百度的蜘蛛：baiduspider
Yahoo的蜘蛛：Yahoo Slurp
MSN的蜘蛛：Msnbot
Altavista的蜘蛛：Scooter
Lycos的蜘蛛： Lycos_Spider_(T-Rex)
Alltheweb的蜘蛛： FAST-WebCrawler/
INKTOMI的蜘蛛： Slurp
如需要参考的可以参照本文：
User-agent（用户代理设置）：(蜘蛛名字)
拒绝：(文件名字)
User-agent: Black Hole
Disallow: /
User-agent: Titan
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Wget
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: mozilla/4
Disallow: /
User-agent: mozilla/5
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 9
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: ia_archiver/1.6
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Wget
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebStripper
Disallow: /

----
User-agent: WebSauger
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: RMA
Disallow: /
User-agent: libWeb/clsHTTPDisallow: /
User-agent: asterias
Disallow: /
User-agent: turingos
Disallow: /
User-agent: spanner
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: Crescent Internet ToolPak HTTPOLE Control v.1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: WebmasterWorldForumBot
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: Wget
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: moget
Disallow: /
User-agent: hloader
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Openfind data gathere
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: LinkScan/8.1a Unix Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Cegbfeieh
Disallow: /
]]> http://www.yuan-chuang.cc/read.php/.htm#blogcomment <![CDATA[[评论] 經常爬站的搜尋引擎总结各大搜索引擎的蜘蛛名称]]> <user@domain.com> Thu, 01 Jan 1970 00:00:00 +0000 http://www.yuan-chuang.cc/read.php/.htm#blogcomment