之前尝试在CentOS7上部署ROOT集群,却发现无论是源码包安装,还是官方提供的二进制包,都缺少了关键的xproofd可执行文件,导致PoD不能运行。没有办法,只能尝试在其他OS上部署,这里我选择了Ubuntu14.04

部署准备

修改apt源

  修改/etc/apt/sources.list,换成国内的163源,下载会更快和稳定一些。

# vim /etc/apt/sources.list
deb http://mirrors.163.com/ubuntu/ trusty main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-backports main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-backports main restricted universe multiverse

  再调用apt-get update更新index。

安装gcc,g++

  如果系统已经安装gcc和g++,可跳过此步骤。

# apt-get install gcc
# apt-get install g++

安装cmake

  直接通过apt-get安装的cmake在安装ROOT组件时会出现问题,所以这里建议源码安装,我这里使用的是2.8.8版本。官网下载地址:https://cmake.org/files/,可选择自己适合的版本。

- 解压:tar xvf cmake-2.8.8.tar.gz
- 进入解压目录:cd cmake-2.8.8
- ./bootstrap
- make
- make install

安装zlib库

  github上(https://github.com/madler/zlib)可下载对应的zlib库,我使用的是1.2.3版本,下载地址为:https://github.com/madler/zlib/archive/v1.2.3.zip
- 解压:unzip zlib-1.2.3.zip
- 进入解压目录:cd zlib-1.2.3
- ./configure
注意:在make之前,需要修改Makefile,否则调用库时会出现错误。找到 CFLAGS=-O3 -DUSE_MMAP这一行,在后面加入-fPIC,即变成CFLAGS=-O3 -DUSE_MMAP -fPIC
- make
- make install

其他库

apt-get install procmail

部署ROOT集群

安装ROOT

  binary安装:https://root.cern.ch/content/release-60606。选择对应的OS系统编译包。解压并将其移动至/opt目录下:

# tar zxvf root_v6.06.06.Linux-ubuntu14-x86_64-gcc4.8.tar.gz
# mv root /opt

  再将ROOT相关配置写入初始化文件,这里在/etc/profile.d/root.sh末尾加入以下语句:

export ROOTSYS=/opt/root
export PATH=$PATH:$ROOTSYS/bin
source $ROOTSYS/bin/thisroot.sh

  source /etc/profile.d/root.sh让配置生效。运行命令root -b测试root是否能正常运行:

#root -b
root: error while loading shared libraries: libXpm.so.4: cannot open shared object file: No such file or directory

  缺少libXpm库,运行apt-get install libxpm4命令安装。安装时有可能会提示缺少安装包,这和本地的源index有关系,需要先向远端源同步后(运行apt-get update命令),再安装xpm包。安装成功。

# apt-get install libxpm4
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  libxpm4
0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
Need to get 37.0 kB of archives.
……

  再次运行root -b命令来测试,再次报错。

# root -b
ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
    echo | LC_ALL=C c++  -pipe -m64 -Wall -W -Woverloaded-virtual -fsigned-char -fPIC -pthread -std=c++11 -Wno-deprecated-declarations -Wno-comment -Wno-unused-parameter -Wno-maybe-uninitialized -Wno-unused-but-set-variable -Wno-missing-field-initializers  -fPIC -fvisibility-inlines-hidden -std=c++11 -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wcast-qual -fno-strict-aliasing -pedantic -Wno-long-long -Wall -W -Wno-unused-parameter -Wwrite-strings -Wno-unused-local-typedefs -O2 -DNDEBUG -xc++ -E -v - 2>&1 >/dev/null | awk '/^#include </,/^End of search/{if (!/^#include </ && !/^End of search/){ print }}' | grep -E "(c|g)\+\+"
results in
results in
with exit code 256
input_line_1:1:10: fatal error: 'new' file not found
#include <new>

  缺少C++的new包,这个报错极有可能是未安装c++引起的,因为ROOT及其它组件都是使用C++编写的。因此需要安装gcc和gcc-c++。

# apt-get install gcc
……
# apt-get install g++
……

  运行root -b命令,终于成功,未报错。

安装XRootD

  安装XRootD有两种方法,通过ROOT源码包中的脚本安装,或者直接从官网下载源码安装。

通过ROOT源码包中的脚本安装XRootD

  进入ROOT源码包目录,执行以下语句即可:

./build/unix/installXrootd.sh -v 3.0.0 /opt

源码安装XRootD:

  解压后并进入源码目录:

# mkdir build; cd build
# cmake /root/xrootd-3.3.0 -DCMAKE_INSTALL_PREFIX=/opt/xrootd
# make
# make install

  如果完全成功,则可以相关配置写入初始化文件,这里可以继续加入/etc/profile.d/root.sh的末尾:

source $ROOTSYS/bin/setxrd.sh /opt/xrootd/

安装PoD

  官网(http://pod.gsi.de)下载源码,这里下载使用的是3.16版本的源码:pod.gsi.de/releases/pod/3.16/PoD-3.16-Source.tar.gz。如果连接失效,可自行查找。解压源码压缩包后,并进入源码目录:

cmake命令

mkdir build
cd build
cmake -C ../BuildSetup.cmake ..

  运行cmake时,提示缺少boost库,这里需要安装boost库。

apt-get install libboost-dev

  安装后继续运行上述的cmake命令,还是报错,提示缺少以下库:

  The following Boost libraries could not be found:

          boost_thread
          boost_program_options
          boost_filesystem
          boost_system
          boost_unit_test_framework

  有个小tips:这些库直接使用apt-get install +库名的方式是不成功的,因为安装包和这个名称并不完全匹配,这里可以用apt-cache search的方法来查找安装包的名称再安装,以boost_thread为例。

# apt-cache search boost | grep thread
libboost-thread-dev - portable C++ multi-threading (default version)
libboost-thread1.46-dev - portable C++ multi-threading
libboost-thread1.46.1 - portable C++ multi-threading
libboost-thread1.48-dev - portable C++ multi-threading
libboost-thread1.48.0 - portable C++ multi-threading

  根据这个提示,我就可以直接安装apt-get install libboost-thread-dev即可。以下:

apt-get install libboost-thread-dev
apt-get install libboost-program-options-dev
apt-get install libboost-filesystem-dev
apt-get install libboost-system-dev
apt-get install libboost-test-dev

  再接着运行cmake -C ../BuildSetup.cmake ..命令,终于成功。

make命令

  运行make命令,又报错了。

/usr/include/boost/thread/xtime.hpp:23:5: error: expected identifier before numeric constant
     TIME_UTC=1

  这个是boost1.5版本以下的一个固有bug,变量么命名重复了。修改起来很简单,打开/usr/include/boost/thread/xtime.hpp,将23行和71行的TIME_UTC都修改为TIME_UTC_即可,也就是说保证没有重命名。
  再次运行make命令,再次提示错误。

/root/PoD-3.16-Source/app/MiscCommon/proof_status_file/ProofStatusFile.h:88:13: error: 'uint16_t' does not name a type
             uint16_t xpdPort() const

  看起来是编译时不认识uint16_t这个别名,修改很简单,头文件包含即可。在/root/PoD-3.16-Source/app/MiscCommon/proof_status_file/ProofStatusFile.h中的第19行加入#include <stdint.h>。具体插入的位置可能因PoD代码版本不同而有些差别,但有C或者C++基础的人应该很容易能找到合适的位置。
  再次运行make命令,终于完美通过。

make install命令

  该命令运行无任何报错。如果无指定配置,PoD会被安装在用户目录的Pod目录下,如我以root用户安装,则安装在/root/PoD目录下。

PoD安装最后一步

  相关配置写入初始化文件,这里可以继续加入/etc/profile.d/root.sh的末尾:

source /root/PoD/3.16/PoD_env.sh

  source /etc/profile.d/root.sh让配置生效。运行pod-server start,如果是第一次运行,会下载相关组件wn_bins目录到/root/PoD/3.16/bin/。如果服务器没有访问外网的权限,可以使用虚拟机搭建以上所有步骤,下载wn_bins目录。无论什么OS,下载的wn_bins目录都是一样的,可以直接拷贝。

组成ROOT集群

  运行pod-server start,待其下载wn_bins目录后,如果没有出现错误,会出现如下结果:

# pod-server start
Starting PoD server...
updating xproofd configuration file...
starting xproofd...
starting PoD agent...
preparing PoD worker package...
selecting pre-compiled bins to be added to worker package...
PoD worker package: /root/.PoD/wrk/PoDWorker.sh
------------------------
XPROOFD [1809] port: 21001
PoD agent [1848] port: 22002
PROOF connection string: root@mac00000102030a.hostname.com:21001

  使用上述所有方法,搭建两个服务器环境,从而搭建一套拥有一个server和一个client的小集群。ROOT服务器之间又多种通讯方式,这里,我们使用最简单直接的ssh方式。首先,两台服务器需要建立ssh登录互信,从而实现ssh登录免密码。搭建方法可见:http://chenlb.iteye.com/blog/211809。
  之后,选择其中服务器A作为server,服务器B作为client(worker)。在server上,编辑/root/pod_ssh.cfg文件,内容如下:

@bash_begin@
	. /etc/profile.d/root.sh
@bash_end@

r1, root@109.105.115.249,,/tmp/test, 2

  前三行是ssh到client之后,需要执行的脚本文件,这里就是在各client上执行下ROOT系统的参数配置,设置环境变量等。第五行则是访问client的配置,每个client都对应一行,因为我们这里只有一个client,所以就只有一行。这一行的格式是:

1 2 3 4 5
client唯一识别符,不可重复 用户名@ip或者hostname ssh参数,可以为空 client端工作目录 期望的client端worker个数,可以为空
  然后,在server端执行pod-ssh -c /root/pod_ssh.cfg submit --debug来建立集群。显示如下,则说明server端成功:
# pod-ssh -c /root/pod_ssh.cfg submit --debug
**	[Mon, 29 Aug 2016 10:40:18 +0800]	preparing PoD worker package...
**	[Mon, 29 Aug 2016 10:40:18 +0800]	selecting pre-compiled bins to be added to worker package...
**	[Mon, 29 Aug 2016 10:40:18 +0800]	PoD worker package: /root/.PoD/wrk/PoDWorker.sh
**	[Mon, 29 Aug 2016 10:40:18 +0800]	pod-ssh config contains an inline shell script. It will be injected it into wrk. package
**	[Mon, 29 Aug 2016 10:40:18 +0800]	preparing PoD worker package...
**	[Mon, 29 Aug 2016 10:40:18 +0800]	inline shell script is found and will be added to the package...
**	[Mon, 29 Aug 2016 10:40:18 +0800]	selecting pre-compiled bins to be added to worker package...
**	[Mon, 29 Aug 2016 10:40:18 +0800]	PoD worker package: /root/.PoD/wrk/PoDWorker.sh
**	[Mon, 29 Aug 2016 10:40:18 +0800]	There are 5 threads in the tread-pool.
**	[Mon, 29 Aug 2016 10:40:18 +0800]	Number of PoD workers: 1
**	[Mon, 29 Aug 2016 10:40:18 +0800]	Number of PROOF workers: 2
**	[Mon, 29 Aug 2016 10:40:18 +0800]	Workers list:
**	[Mon, 29 Aug 2016 10:40:18 +0800]	[r1] with 2 workers at root@109.105.115.249:/tmp/test/r1
r1	[Mon, 29 Aug 2016 10:40:18 +0800]	pod-ssh-submit-worker is started for root@109.105.115.249 (dir: /tmp/test/r1, nworkers: 2, sshopt: )
**	[Mon, 29 Aug 2016 10:40:19 +0800]	
*******************
Successfully processed tasks: 1
Failed tasks: 0
*******************

  我们再登录client端,进入/root/pod_ssh.cfg中设置的client端工作目录工作目录。

# ls
libboost_filesystem-mt.so.5       libpod_protocol.so       PoD.cfg            PoDWorker.sh                              proof.conf       user_worker_env.sh  xpd.log
libboost_program_options-mt.so.5  libproof_status_file.so  pod-user-defaults  pod-wrk-bin-3.16-Darwin-universal.tar.gz  server_info.cfg  version
libboost_system-mt.so.5           libSSHTunnel.so          PoDWorker.lock     pod-wrk-bin-3.16-Linux-amd64.tar.gz       ssh-tunnel       xpd.cf
libboost_thread-mt.so.5           pod-agent                PoDWorker.pid      pod-wrk-bin-3.16-Linux-x86.tar.gz         ssh_worker.log   xpd.cf.bup

  可见,都是一些库、配置文件和日志等。我们暂时主要关注日志文件ssh_worker.log,日志末尾显示如下,则表示完全成功:

***     [Mon, 29 Aug 2016 10:44:48 +0800]       Attempt to start pod-agent (1 out of 3)
***     [Mon, 29 Aug 2016 10:44:48 +0800]       Attempt to start and detect xproofd (1 out of 10)
***     [Mon, 29 Aug 2016 10:44:48 +0800]       trying to use XPROOF port: 21002
***     [Mon, 29 Aug 2016 10:44:48 +0800]       starting xproofd...
***     [Mon, 29 Aug 2016 10:44:48 +0800]       xproofd is running. pid=[2794] port=[21002]
***     [Mon, 29 Aug 2016 10:44:48 +0800]       starting pod-agent...