控制Wordpress对搜索引擎的可见性

网站通过Robots协议告诉搜索引擎哪些页面可以抓取，哪些页面不能抓取，这些通过robots.txt体现。

wordpress本身没有robots.txt，但是用根目录访问/robots.txt，如果服务器没有这文件的话，wordpress会自动生成一个，而这个文件可以通过后台进行开启和关闭。

方法一：修改动态函数规则，后台->设置，

当禁止浏览器访问时，访问/robots.txt

User-agent: *
Disallow: /

当允许时，访问/robots.txt

User-agent: *
Disallow: /w/wp-admin/
Allow: /w/wp-admin/admin-ajax.php

方法二：修改生成规则

在wp-includes目录下找到functions.php文件，可以看到系统默认的robots.txt文件的定义规则

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    /**
     * Fires when displaying the robots.txt file.
     *
     * @since 2.1.0
     */
    do_action( 'do_robotstxt' );

    $output = "User-agent: *\n";
    $public = get_option( 'blog_public' );
    if ( '0' == $public ) {
        $output .= "Disallow: /\n";
    } else {
        $site_url = parse_url( site_url() );
        $path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
        $output .= "Disallow: $path/wp-admin/\n";
        $output .= "Allow: $path/wp-admin/admin-ajax.php\n";
    }

    /**
     * Filter the robots.txt output.
     *
     * @since 3.0.0
     *
     * @param string $output Robots.txt output.
     * @param bool   $public Whether the site is considered "public".
     */
    echo apply_filters( 'robots_txt', $output, $public );
}

方法三：手动创建robots.txt上传至根目录

robots.txt文件格式和使用>>

http://zhanzhang.baidu.com/robots/index

在线生成工具>>

http://tool.chinaz.com/robots/

方法四：禁止收录某一个页面

可以在当页头部加入：

<meta name="robots" content="noindex,nofollow">

意思是禁止所有搜索引擎索引本页面,禁止跟踪本页面上的链接。

noindex:告诉蜘蛛不要索引本页面。

nofollow:告诉蜘蛛不要跟踪本页面上的链接。

nosnippet:告诉蜘蛛怒要在搜索结果中显示说明文字。

noarchive:告诉搜索引擎不要显示快照。

noodp:告诉搜索引擎不要使用开放目录中的标题和说明。

posted @ 2016-10-11 11:51 tinyphp Views(4761) Comments(0) 收藏举报

刷新页面返回顶部

tinyphp

发表是吸收的利器！胡适先生说的

控制Wordpress对搜索引擎的可见性

公告