trec_eval c++版本使用,以及python版本使用及离线安装
c++版本
- 下载 https://github.com/usnistgov/trec_eval/tree/master 的zip包
- Installation: 在文件夹下执行
make
- 而后可以在文件下得到一个trec_eval文件
- 执行方式
./trec_eval -m all_trec dev_qrel_detail.trec all_rank_eval.trec
# 评测mrr@10
./trec_eval -m recip_rank
./trec_eval -c -M 10 -mrecip_rank qrel.trec pre.trec
# 测评recall
./trec_eval -m recall.50
# 测评map
./trec_eval -m map
# 测评ndcg
./trec_eval -m ndcg_cut
./trec_eval -m ndcg_cut.50
- 参数介绍
./trec_eval -h
trec_eval [-h] [-q] [-m measure[.params] [-c] [-n] [-l <num>]
[-D debug_level] [-N <num>] [-M <num>] [-R rel_format] [-T results_format]
rel_info_file results_file
-h
--help:
-h: Print full help message and exit. Full help message will include
descriptions for any measures designated by a '-m' parameter, and
input file format descriptions for any rel_info_format given by '-R'
and any top results_format given by '-T.'
Thus to see all info about preference measures use
trec_eval -h -m all_prefs -R prefs -T trec_results
打印帮助信息,完整的帮助消息可以通过’-m’参数指定的任何指标获得。可以通过'-R'获取true label文件rel_info_format的格式,'-T'获得pred label文件results_file的格式。
-q
--query_eval_wanted:
-q: In addition to summary evaluation, give evaluation for each query/topic
除了汇总后的在整个测试集上的指标值以外,还可以为每个query输出具体的指标。
-m
--measure measure_name[.measure_params]:
添加需要被打印的指标
-m measure: Add 'measure' to the lists of measures to calculate and print.
If 'measure' contains a '.', then the name of the measure is everything
preceeding the period, and everything to the right of the period is
assumed to be a list of parameters for the measure, separated by ','.
There can be multiple occurrences of the -m flag.
'measure' can also be a nickname for a set of measures. Current
nicknames include
'official': the main measures often used by TREC
'all_trec': all measures calculated with the standard TREC
results and rel_info format files.
'set': subset of all_trec that calculates unranked values.
'prefs': Measures not in all_trec that calculate preference measures.
-c
相关性判断中的所有query上的平均值,而不是相关性判断和结果文件中的交集query.
缺失的查询将导致所有评估度量值为0(这对于特定评估度量值可能合理,也可能不合理,但对于标准TREC度量值是合理的)
默认关闭
--complete_rel_info_wanted:
-c: Average over the complete set of queries in the relevance judgements
instead of the queries in the intersection of relevance judgements
and results. Missing queries will contribute a value of 0 to all
evaluation measures (which may or may not be reasonable for a
particular evaluation measure, but is reasonable for standard TREC
measures.) Default is off.
-l
--level_for_rel num:
-l<num>: Num indicates the minimum relevance judgement value needed for
a document to be called relevant. Used if rel_info_file contains
relevance judged on a multi-relevance scale. Default is 1.
-n
--nosummary:
-n: No summary evaluation will be printed
-D
--Debug_level num:
-D <num>: Debug level. 1 and 2 used for measures, 3 and 4 for merging
rel_info and results, 5 and 6 for input. Currently, num can be of the
form <num>.<qid> and only qid will be evaluated with debug info printed.
Default is 0.
-N
--Number_docs_in_coll num:
-N <num>: Number of docs in collection Default is MAX_LONG
-M
-Max_retrieved_per_topic num:
-M <num>: Max number of docs per topic to use in evaluation (discard rest).
Default is MAX_LONG.
-J
--Judged_docs_only:
-J: Calculate all values only over the judged (either relevant or
nonrelevant) documents. All unjudged documents are removed from the
retrieved set before any calculations (possibly leaving an empty set).
DO NOT USE, unless you really know what you're doing - very easy to get
reasonable looking numbers in a file that you will later forget were
calculated with the -J flag.
-R
--Rel_info_format format:
-R format: The rel_info file is assumed to be in format 'format'. Current
values for 'format' include 'qrels', 'prefs', 'qrels_prefs'. Note not
all measures can be calculated with all formats.
-T
--Results_format format:
-T format: the top results_file is assumed to be in format 'format'. Current
values for 'format' include 'trec_results'. Note not all measures can be
calculated with all formats.
-Z
--Zscore Zmean_file:
-Z Zmean_file: Instead of printing the raw score for each measure, print
a Z score instead. The score printed will be the deviation from the mean
of the raw score, expressed in standard deviations, where the mean and
standard deviation for each measure and query are found in Zmean_file.
If mean is not in Zmeanfile for a measure and query, -1000000 is printed.
Zmean_file format is ascii lines of form
qid measure_name mean std_dev
输入转换脚本参考:
- qrels 文件
import argparse
import jsonlines
parser = argparse.ArgumentParser()
parser.add_argument('--qrel_file')
args = parser.parse_args()
with open(args.qrel_file) as f:
lines = f.readlines()
f_out = open(args.qrel_file.strip('.txt') + '.trec', 'w', encoding='utf-8')
with open(args.qrel_file, "r+", encoding="utf8") as f:
count = 0
for item in jsonlines.Reader(f):
for doc_id in set(item['new_neg_list']):
f_out.write("{} {} {} {}\n".format(item['q_id'], '0', doc_id, 0))
for doc_id in set(item['new_pos_list']):
if doc_id in item['new_neg_list']: # 如果一个doc 既被标了+,又被标了-, 看作neg
continue
f_out.write("{} {} {} {}\n".format(item['q_id'], '0', doc_id, 1))
- 模型result 文件转换为trec的脚本
import argparse
from collections import defaultdict
parser = argparse.ArgumentParser()
parser.add_argument('--score_file')
args = parser.parse_args()
with open(args.score_file) as f:
lines = f.readlines()
all_scores = defaultdict(list)
for line in lines:
if len(line.strip()) == 0:
continue
qid, did, score = line.strip().split()
score = float(score)
all_scores[qid].append((did, score))
qq = list(all_scores.keys())
with open(args.score_file.strip('.txt') + '.trec', 'w', encoding="utf8") as f:
for qid in qq:
score_list = sorted(all_scores[qid], key=lambda x: x[1], reverse=True)
for rank, (did, score) in enumerate(score_list):
f.write("{} {} {} {} {} {}\n".format(qid, 'Q0', did, rank+1, score, 'Dense'))
python版本
https://github.com/cvangysel/pytrec_eval
- 离线安装(服务器无网络)
- 参考linux上离线安装python模块,搜索下载
pytrec_eval-0.5.tar.gz
,传到服务器上 tar -zxvf pytrec_eval-0.5.tar.gz
- 下载
pytrec_eval-0.5/setup.py
中的REMOTE_TREC_EVAL_URI,换到公司内部可以访问的url中 cd pytrec_eval-0.5
python setup.py install
- 使用,参见
pytrec_eval-0.5/examples/simple.py
import pytrec_eval
pytrec_eval.supported_measures
"""
支持的评价指标
{'num_rel_ret', 'ndcg_cut', 'set_map', 'Rndcg', 'G', 'gm_map', 'recall', '11pt_avg', 'map', 'num_q', 'gm_bpref', 'Rprec_mult', 'P', 'success', 'num_nonrel_judged_ret', 'set_relative_P', 'set_recall', 'recip_rank', 'num_rel', 'iprec_at_recall', 'ndcg_rel', 'infAP', 'relative_P', 'ndcg', 'runid', 'num_ret', 'set_P', 'set_F', 'bpref', 'binG', 'utility', 'map_cut', 'relstring', 'Rprec'
"""
如果没有可以访问的url,把REMOTE_TREC_EVAL_URI下载到服务器,一个修改好的pytrec_eval-0.5/setup.py
文件如下:
"""Sets up pytrec_eval."""
from setuptools import setup, Extension
import os
import sys
import tempfile
REMOTE_TREC_EVAL_PATH = '/home/gomall/work/downloads/trec_eval-9.0.8.tar.gz'
REMOTE_TREC_EVAL_TLD_NAME = 'trec_eval-9.0.8'
LOCAL_TREC_EVAL_DIR = os.path.realpath(
os.path.join(__file__, '..', 'trec_eval'))
TREC_EVAL_SRC = []
with tempfile.TemporaryDirectory() as tmp_dir:
if os.path.isfile(os.path.join(LOCAL_TREC_EVAL_DIR, 'trec_eval.h')):
# Use local version.
trec_eval_dir = LOCAL_TREC_EVAL_DIR
else: # Fetch remote version.
print('Fetching trec_eval from {}.'.format(REMOTE_TREC_EVAL_PATH))
import io
import urllib.request
with open(REMOTE_TREC_EVAL_PATH, 'rb') as f:
mmap_f = io.BytesIO(f.read())
if REMOTE_TREC_EVAL_PATH.endswith('.zip'):
import zipfile
trec_eval_archive = zipfile.ZipFile(mmap_f)
elif REMOTE_TREC_EVAL_PATH.endswith('.tar.gz'):
import tarfile
trec_eval_archive = tarfile.open(fileobj=mmap_f)
trec_eval_archive.extractall(tmp_dir)
trec_eval_dir = os.path.join(tmp_dir, REMOTE_TREC_EVAL_TLD_NAME)
for filename in os.listdir(trec_eval_dir):
if filename.endswith('.c') and not filename == "trec_eval.c":
TREC_EVAL_SRC.append(os.path.join(trec_eval_dir, filename))
#include the windows/ subdirectory on windows machines
if sys.platform == 'win32':
for filename in os.listdir(os.path.join(trec_eval_dir, "windows")):
if filename.endswith('.c') and not filename == "trec_eval.c":
TREC_EVAL_SRC.append(os.path.join(trec_eval_dir, "windows", filename))
pytrec_eval_ext = Extension(
'pytrec_eval_ext',
sources=['src/pytrec_eval.cpp'] + TREC_EVAL_SRC,
#windows doesnt need libm
libraries=[] if sys.platform == 'win32' else ['m', 'stdc++'],
include_dirs=[trec_eval_dir, os.path.join(trec_eval_dir, "windows")] if sys.platform == 'win32' else [trec_eval_dir],
undef_macros=['NDEBUG'],
extra_compile_args=['-g', '-Wall', '-O3'],
define_macros=[('VERSIONID', '\"pytrec_eval\"'),
('_GLIBCXX_USE_CXX11_ABI', '0'),
('P_NEEDS_GNU_CXX_NAMESPACE', '1')])
setup(name='pytrec_eval',
version='0.5',
description='Provides Python bindings for popular '
'Information Retrieval measures implemented '
'within trec_eval.',
author='Christophe Van Gysel',
author_email='cvangysel@uva.nl',
ext_modules=[pytrec_eval_ext],
packages=['pytrec_eval'],
package_dir={'pytrec_eval': 'py'},
python_requires='>=3',
url='https://github.com/cvangysel/pytrec_eval',
download_url='https://github.com/cvangysel/pytrec_eval/tarball/0.5',
keywords=[
'trec_eval',
'information retrieval',
'evaluation',
'ranking',
],
classifiers=[
'Development Status :: 3 - Alpha',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python',
'Programming Language :: C++',
'Intended Audience :: Science/Research',
'Operating System :: POSIX :: Linux',
'Topic :: Text Processing :: General',
])