Loading

trec_eval c++版本使用,以及python版本使用及离线安装

c++版本

  1. 下载 https://github.com/usnistgov/trec_eval/tree/master 的zip包
  2. Installation: 在文件夹下执行 make
  3. 而后可以在文件下得到一个trec_eval文件
  4. 执行方式
./trec_eval -m all_trec dev_qrel_detail.trec all_rank_eval.trec
# 评测mrr@10
./trec_eval -m recip_rank
./trec_eval -c -M 10 -mrecip_rank qrel.trec pre.trec

# 测评recall
./trec_eval -m recall.50

# 测评map
./trec_eval -m map

# 测评ndcg
./trec_eval -m ndcg_cut
./trec_eval -m ndcg_cut.50
  1. 参数介绍 ./trec_eval -h
trec_eval [-h] [-q] [-m measure[.params] [-c] [-n] [-l <num>]
   [-D debug_level] [-N <num>] [-M <num>] [-R rel_format] [-T results_format]
   rel_info_file  results_file

-h

--help:
 -h: Print full help message and exit. Full help message will include
     descriptions for any measures designated by a '-m' parameter, and
     input file format descriptions for any rel_info_format given by '-R'
     and any top results_format given by '-T.'
     Thus to see all info about preference measures use
          trec_eval -h -m all_prefs -R prefs -T trec_results
打印帮助信息,完整的帮助消息可以通过’-m’参数指定的任何指标获得。可以通过'-R'获取true label文件rel_info_format的格式,'-T'获得pred label文件results_file的格式。

-q

--query_eval_wanted:
 -q: In addition to summary evaluation, give evaluation for each query/topic
除了汇总后的在整个测试集上的指标值以外,还可以为每个query输出具体的指标。

-m

--measure measure_name[.measure_params]:
添加需要被打印的指标
 -m measure: Add 'measure' to the lists of measures to calculate and print.
    If 'measure' contains a '.', then the name of the measure is everything
    preceeding the period, and everything to the right of the period is
    assumed to be a list of parameters for the measure, separated by ','. 
    There can be multiple occurrences of the -m flag.
    'measure' can also be a nickname for a set of measures. Current 
    nicknames include 
       'official': the main measures often used by TREC
       'all_trec': all measures calculated with the standard TREC
                   results and rel_info format files.
       'set': subset of all_trec that calculates unranked values.
       'prefs': Measures not in all_trec that calculate preference measures.

-c

相关性判断中的所有query上的平均值,而不是相关性判断和结果文件中的交集query.
缺失的查询将导致所有评估度量值为0(这对于特定评估度量值可能合理,也可能不合理,但对于标准TREC度量值是合理的)
默认关闭
--complete_rel_info_wanted:
 -c: Average over the complete set of queries in the relevance judgements  
     instead of the queries in the intersection of relevance judgements 
     and results.  Missing queries will contribute a value of 0 to all 
     evaluation measures (which may or may not be reasonable for a  
     particular evaluation measure, but is reasonable for standard TREC 
     measures.) Default is off.

-l

--level_for_rel num:
 -l<num>: Num indicates the minimum relevance judgement value needed for 
      a document to be called relevant. Used if rel_info_file contains 
      relevance judged on a multi-relevance scale.  Default is 1.

-n

--nosummary:
 -n: No summary evaluation will be printed

-D

--Debug_level num:
 -D <num>: Debug level.  1 and 2 used for measures, 3 and 4 for merging
     rel_info and results, 5 and 6 for input.  Currently, num can be of the
     form <num>.<qid> and only qid will be evaluated with debug info printed.
     Default is 0.

-N

--Number_docs_in_coll num:
 -N <num>: Number of docs in collection Default is MAX_LONG

-M

-Max_retrieved_per_topic num:
 -M <num>: Max number of docs per topic to use in evaluation (discard rest). 
      Default is MAX_LONG.

-J

--Judged_docs_only:
 -J: Calculate all values only over the judged (either relevant or  
     nonrelevant) documents.  All unjudged documents are removed from the 
     retrieved set before any calculations (possibly leaving an empty set). 
     DO NOT USE, unless you really know what you're doing - very easy to get 
     reasonable looking numbers in a file that you will later forget were 
     calculated  with the -J flag.

-R

--Rel_info_format format:
 -R format: The rel_info file is assumed to be in format 'format'.  Current
    values for 'format' include 'qrels', 'prefs', 'qrels_prefs'.  Note not
    all measures can be calculated with all formats.

-T

--Results_format format:
 -T format: the top results_file is assumed to be in format 'format'. Current
    values for 'format' include 'trec_results'. Note not all measures can be
    calculated with all formats.

-Z

--Zscore Zmean_file:
 -Z Zmean_file: Instead of printing the raw score for each measure, print
    a Z score instead. The score printed will be the deviation from the mean
    of the raw score, expressed in standard deviations, where the mean and
    standard deviation for each measure and query are found in Zmean_file.
    If mean is not in Zmeanfile for a measure and query, -1000000 is printed.
    Zmean_file format is ascii lines of form 
       qid  measure_name  mean  std_dev

输入转换脚本参考:

  1. qrels 文件
import argparse
import jsonlines

parser = argparse.ArgumentParser()
parser.add_argument('--qrel_file')
args = parser.parse_args()

with open(args.qrel_file) as f:
    lines = f.readlines()

f_out = open(args.qrel_file.strip('.txt') + '.trec', 'w', encoding='utf-8')

with open(args.qrel_file, "r+", encoding="utf8") as f:
    count = 0
    for item in jsonlines.Reader(f):
        for doc_id in set(item['new_neg_list']):
            f_out.write("{} {} {} {}\n".format(item['q_id'], '0', doc_id, 0))
        for doc_id in set(item['new_pos_list']):
            if doc_id in item['new_neg_list']:  # 如果一个doc 既被标了+,又被标了-, 看作neg
                continue
            f_out.write("{} {} {} {}\n".format(item['q_id'], '0', doc_id, 1))
  1. 模型result 文件转换为trec的脚本
import argparse
from collections import defaultdict

parser = argparse.ArgumentParser()
parser.add_argument('--score_file')
args = parser.parse_args()

with open(args.score_file) as f:
    lines = f.readlines()

all_scores = defaultdict(list)

for line in lines:
    if len(line.strip()) == 0:
        continue
    qid, did, score = line.strip().split()
    score = float(score)
    all_scores[qid].append((did, score))

qq = list(all_scores.keys())

with open(args.score_file.strip('.txt') + '.trec', 'w', encoding="utf8") as f:
    for qid in qq:
        score_list = sorted(all_scores[qid], key=lambda x: x[1], reverse=True)
        for rank, (did, score) in enumerate(score_list):
            f.write("{} {} {} {} {} {}\n".format(qid, 'Q0', did, rank+1, score, 'Dense'))

python版本

https://github.com/cvangysel/pytrec_eval

  1. 离线安装(服务器无网络)
  • 参考linux上离线安装python模块,搜索下载 pytrec_eval-0.5.tar.gz,传到服务器上
  • tar -zxvf pytrec_eval-0.5.tar.gz
  • 下载 pytrec_eval-0.5/setup.py 中的REMOTE_TREC_EVAL_URI,换到公司内部可以访问的url中
  • cd pytrec_eval-0.5
  • python setup.py install
  1. 使用,参见 pytrec_eval-0.5/examples/simple.py
import pytrec_eval
pytrec_eval.supported_measures
"""
支持的评价指标
{'num_rel_ret', 'ndcg_cut', 'set_map', 'Rndcg', 'G', 'gm_map', 'recall', '11pt_avg', 'map', 'num_q', 'gm_bpref', 'Rprec_mult', 'P', 'success', 'num_nonrel_judged_ret', 'set_relative_P', 'set_recall', 'recip_rank', 'num_rel', 'iprec_at_recall', 'ndcg_rel', 'infAP', 'relative_P', 'ndcg', 'runid', 'num_ret', 'set_P', 'set_F', 'bpref', 'binG', 'utility', 'map_cut', 'relstring', 'Rprec'
"""

如果没有可以访问的url,把REMOTE_TREC_EVAL_URI下载到服务器,一个修改好的pytrec_eval-0.5/setup.py文件如下:

"""Sets up pytrec_eval."""

from setuptools import setup, Extension
import os
import sys
import tempfile

REMOTE_TREC_EVAL_PATH = '/home/gomall/work/downloads/trec_eval-9.0.8.tar.gz'

REMOTE_TREC_EVAL_TLD_NAME = 'trec_eval-9.0.8'

LOCAL_TREC_EVAL_DIR = os.path.realpath(
    os.path.join(__file__, '..', 'trec_eval'))

TREC_EVAL_SRC = []

with tempfile.TemporaryDirectory() as tmp_dir:
    if os.path.isfile(os.path.join(LOCAL_TREC_EVAL_DIR, 'trec_eval.h')):
        # Use local version.
        trec_eval_dir = LOCAL_TREC_EVAL_DIR
    else:  # Fetch remote version.
        print('Fetching trec_eval from {}.'.format(REMOTE_TREC_EVAL_PATH))

        import io
        import urllib.request
        
        with open(REMOTE_TREC_EVAL_PATH, 'rb') as f:
            mmap_f = io.BytesIO(f.read())

        if REMOTE_TREC_EVAL_PATH.endswith('.zip'):
            import zipfile

            trec_eval_archive = zipfile.ZipFile(mmap_f)
        elif REMOTE_TREC_EVAL_PATH.endswith('.tar.gz'):
            import tarfile

            trec_eval_archive = tarfile.open(fileobj=mmap_f)

        trec_eval_archive.extractall(tmp_dir)

        trec_eval_dir = os.path.join(tmp_dir, REMOTE_TREC_EVAL_TLD_NAME)

    for filename in os.listdir(trec_eval_dir):
        if filename.endswith('.c') and not filename == "trec_eval.c":
            TREC_EVAL_SRC.append(os.path.join(trec_eval_dir, filename))
    #include the windows/ subdirectory on windows machines
    if sys.platform == 'win32':
        for filename in os.listdir(os.path.join(trec_eval_dir, "windows")):
            if filename.endswith('.c') and not filename == "trec_eval.c":
                TREC_EVAL_SRC.append(os.path.join(trec_eval_dir, "windows", filename))

    pytrec_eval_ext = Extension(
        'pytrec_eval_ext',
        sources=['src/pytrec_eval.cpp'] + TREC_EVAL_SRC,
        #windows doesnt need libm
        libraries=[] if sys.platform == 'win32' else ['m', 'stdc++'],
        include_dirs=[trec_eval_dir, os.path.join(trec_eval_dir, "windows")] if sys.platform == 'win32' else [trec_eval_dir],
        undef_macros=['NDEBUG'],
        extra_compile_args=['-g', '-Wall', '-O3'],
        define_macros=[('VERSIONID', '\"pytrec_eval\"'),
                       ('_GLIBCXX_USE_CXX11_ABI', '0'),
                       ('P_NEEDS_GNU_CXX_NAMESPACE', '1')])

    setup(name='pytrec_eval',
          version='0.5',
          description='Provides Python bindings for popular '
                      'Information Retrieval measures implemented '
                      'within trec_eval.',
          author='Christophe Van Gysel',
          author_email='cvangysel@uva.nl',
          ext_modules=[pytrec_eval_ext],
          packages=['pytrec_eval'],
          package_dir={'pytrec_eval': 'py'},
          python_requires='>=3',
          url='https://github.com/cvangysel/pytrec_eval',
          download_url='https://github.com/cvangysel/pytrec_eval/tarball/0.5',
          keywords=[
              'trec_eval',
              'information retrieval',
              'evaluation',
              'ranking',
          ],
          classifiers=[
              'Development Status :: 3 - Alpha',
              'License :: OSI Approved :: MIT License',
              'Programming Language :: Python',
              'Programming Language :: C++',
              'Intended Audience :: Science/Research',
              'Operating System :: POSIX :: Linux',
              'Topic :: Text Processing :: General',
          ])
posted @ 2022-11-07 16:07  戴墨镜的长颈鹿  阅读(942)  评论(0编辑  收藏  举报