Testlib-Generator使用笔记

Testlib 使用来配合算法竞赛出题的工具，本文仅介绍其中的一个模块——数据生成器的使用方法。

Testlib 分为四部分：

编写 Generator，即数据生成器。
编写 Validator，即数据校验器，判断生成数据是否符合题目要求，如数据范围、格式等。
编写 Interactor，即交互器，用于交互题。
编写 Checker，即 Special Judge。

下载该项目，拷贝出 testlib.h 文件。

Testlib库仅有 testlib.h 这一个文件，使用时仅仅需要在编写的程序开头添加 #include "testlib.h"即可。本文记录 Generator 数据生成器的常见使用方法。

从一个简单的例子开始

// clang-format off
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[]) { // argc 与 argv 是命令行参数
  registerGen(argc, argv, 1);
  int n = atoi(argv[1]);  // 获取命令行的第一个参数
  cout << rnd.next(1, n) << " ";
  cout << rnd.next(1, n) << endl;
}

这个程序可以生成两个\([1,n]\) 范围内的整数，n可以通过程序执行时传入。

平时我们用的 rand() 或 C++11 的 mt19937/uniform_int_distribution ，当操作系统不同、使用不同编译器编译、不同时间运行等，它们的输出都可能不同（对于非常常用的 srand(time(0)) ，这是显然的），而这就会给生成数据带来不确定性。而 Testlib 中的随机值生成函数则保证了相同调用会输出相同值，与 generator 本身或平台均无关。

需要注意的是，一旦使用了 Testlib，就不能再使用标准库中的 srand() ， rand() 等随机数函数，否则在编译时会报错。另外，不要使用 std::random_shuffle() ，请使用 Testlib 中的 shuffle() ，它同样接受一对迭代器。它使用 rnd 来打乱序列，即满足如上“好的 generator”的要求。

在一切之前，先执行 registerGen(argc, argv, 1) 初始化 Testlib（其中 1 是使用的 generator 版本，通常保持不变），然后我们就可以使用 rnd 对象来生成随机值。

随机数种子取自命令行参数的哈希值，对于某 generator g.cpp ， g 100 (类Unix系统) 和 g.exe "100" (Windows 系统) 将会有相同的输出，而 g 100 0 则与它们不同。

rnd 的用法总结：

rnd.next(4)：等概率生成一个 [0,4)范围内的整数
rnd.next(4, 100)：等概率生成一个 [4,100]范围内的整数
rnd.next(4.0)：等概率生成一个 [0, 4.0)范围内的浮点数
rnd.next("one | two | three")：等概率返回one 、two、three 中的一个
rnd.next("[1-9][0-9]{99}")：长度为100的数字型字符串
rnd.wnext(4, t)

wnext() 是一个生成不等分布（具有偏移期望）的函数，\(t\) 表示调用 next() 的次数，并取生成值的最大值。例如 rnd.wnext(3, 1) 等同于 max({rnd.next(3), rnd.next(3)}) ； rnd.wnext(4, 2) 等同于 max({rnd.next(4), rnd.next(4), rnd.next(4)}) 。如果 \(t\lt 0\)，则为调用\(-t\) 次，取最小值；如果 \(t=0\) ，等同于 next() 。

关于 rnd.wnext(i,t) 的形式化定义：

\[wnext(i,t) = \begin{cases} next(i) & t = 0\\ max(next(t), wnext(i, t-1)) & t > 0\\ min(next(t), wnext(i, t+1)) & t < 0 \\ \end{cases} \]
另外，从官方给定的示例中，也支持传入两个范围参数：

\[wnext(l,r,t) = \begin{cases} next(l,r) & t=0\\ max(next(l,r), wnext(l,r,t-1)) & t > 0 \\ min(next(l,r), wnext(l,r,t+1)) & t < 0 \end{cases} \]
rnd.any(container)：等概率返回一个具有随机访问迭代器（如 std::vector 和 std::string ）的容器内的某一元素的引用

新特性：解析命令行参数

通常，我们使用 int a = atoi(argv[3]) 来获取命令行中的参数，并将其转换为整数，但这么做有时候会出现一些问题：

不存在第三个参数时，这么做不安全
第三个参数可能不是有效的32位有符号整数

使用 testlib，你可以这样写：int a = opt<int>(3)。

同时，你也可以这样：long long b = opt<long long>(2)；bool f = opt<bool>(2)；string s= opt(4)；

如果你有很多参数需要输入，执行命令类似于这样：g 10 20000 a true，那么将其改成这样会更具有可读性： g -n10 -m200000 -t=a -increment 。

在这种情况下，你可以在 generator 中使用如下方法获取参数

int n = opt<int>("n");
long long n = opt<long long>("m");
string t = opt("t");
bool increment = opt<bool>("increment");

编写命名参数的方案有如下几种：

-key=value、--key=value
-key value、--key value。这种情况下，value不能以 - 开头
--k12345 或 -k12345 ——如果 key k 是一个字母，且后面是一个数字；
-prop 或 --prop ——启用 bool 属性。

g1 -n1
g2 --len=4 --s=oops
g3 -inc -shuffle -n=5
g4 --length 5 --total 21 -ord

一些示例

下面的例子均来自官方示例：testlib/generators at master · MikeMirzayanov/testlib (github.com)

1. 生成随机整数

// igen.cpp
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);
    
    cout << rnd.next(1, 1000000) << endl;

    return 0;
}

如果你运行上述程序多次，每一次将会得到相同的结果。如果想生成不同的结果，可以在执行程序时，加入不同的参数。例如分别运行igen.exe 1 与 igen.exe 3 ，将会产生不同的结果。testlib通过传入的参数来设置随机数种子。可以将此方法运用到后面的例子中。

2. 生成指定范围内的不等分布随机整数

//iwgen.cpp
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    cout << rnd.wnext(1, 1000000, opt<int>(1)) << endl;

    return 0;
}

通过参数指定生成的数字是偏大还是偏小，iwgen.exe 1000 将产生偏大的数字，iwgen.exe -1000 将产生偏小的数字。

3. 生成多组测试数据

以下内容引用自：Testlib——最强出题辅助工具库

有两种方法可以一次性生成多组数据：

写一个批处理脚本。
使用 Testlib 内置的 startTest(test_index) 函数。

第一种方法非常简单，只需设好参数，将输出重定向到指定输出文件即可。

gen 1 1 > 1.in
gen 2 1 > 2.in
gen 1 10 > 3.in

对于第二种方法，在每生成一组数据前，调用一次 startTest(test_index)，即可将输出重定向至名为 test_index 的文件。

//multigen.cpp 生成10组100以内的数字对
#include "testlib.h"
#include <iostream>

using namespace std;

void writeTest(int test)
{
    startTest(test);
    
    cout << rnd.next(1, test * test) 
        << " " << rnd.next(1, test * test) << endl;
}

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    for (int i = 1; i <= 10; i++)
        writeTest(i);
    
    return 0;
}

4. 生成字符串

//sgen.cpp 生成 random token
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    cout << rnd.next("[a-zA-Z0-9]{1,1000}") << endl;

    return 0;
}

其中 [a-zA-Z0-9] 指定了生成的字符串中所包含的字符，{1,1000} 指定了生成的长度范围。关于类似的写法规范可以参考：Testlib极简正则表达式 - OI Wiki (oi-wiki.org)

在下面这个例子中，可以使用参数控制生成字符串的长度范围。

//swgen.cpp
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    int length = rnd.wnext(1, 1000, opt<int>(1));
    cout << rnd.next("[a-zA-Z0-9]{1,%d}", length) << endl;

    return 0;
}

如果灵活使用参数，可以指定构造字符串。

下面这个程序通过参数给定的结构来输出字符串。

Examples:
    gs 1 4 ab => abababab 
    gs 2 5 a 1 b => aaaaab
    gs 3 1 a 5 b 1 a => abbbbba

// gs.cpp
#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[]) {
    registerGen(argc, argv, 1);
    string t;
    int n = opt<int>(1);
    for (int i = 2; i <= 1 + 2 * n; i += 2) {
        int k = opt<int>(i);
        string s = opt<string>(i + 1);
        for (int j = 0; j < k; j++)
            t += s;
    }
    println(t);
}

5. 生成一棵树

下面是生成一棵树（无根树）的主要代码，它接受两个参数——顶点数和伸展度。例如，当 \(n=10,t=1000\)时，可能会生成链；当 \(n=10,t=-1000\) 时，可能会生成菊花。

#include "testlib.h"
#include <bits/stdc++.h>

#define forn(i, n) for (int i = 0; i < int(n); i++)

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    int n = opt<int>(1);
    int t = opt<int>(2);

    vector<int> p(n);
    forn(i, n)
        if (i > 0)
            p[i] = rnd.wnext(i, t); 

    printf("%d\n", n);
    vector<int> perm(n); 
    forn(i, n)
        perm[i] = i;
    shuffle(perm.begin() + 1, perm.end());
    vector<pair<int,int> > edges;

    for (int i = 1; i < n; i++)
        if (rnd.next(2))
            edges.push_back(make_pair(perm[i], perm[p[i]]));
        else
            edges.push_back(make_pair(perm[p[i]], perm[i]));

    shuffle(edges.begin(), edges.end());

    for (int i = 0; i + 1 < n; i++)
        printf("%d %d\n", edges[i].first + 1, edges[i].second + 1);

    return 0;
}

如果想要生成一颗无根树，你可以：

#include "testlib.h"

#include <bits/stdc++.h>

#define forn(i, n) for (int i = 0; i < int(n); i++)

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    int n = opt<int>(1);
    int t = opt<int>(2);

    vector<int> p(n);
    forn(i, n)
        if (i > 0)
            p[i] = rnd.wnext(i, t);

    printf("%d\n", n);
    vector<int> perm(n);
    forn(i, n)
        perm[i] = i;
    shuffle(perm.begin() + 1, perm.end());

    vector<int> pp(n);
    for (int i = 1; i < n; i++)
        pp[perm[i]] = perm[p[i]];

    for (int i = 1; i < n; i++)
    {
        printf("%d", pp[i] + 1); // 输出 2 到 n 的每个父亲
        if (i + 1 < n)
            printf(" ");
    }
    printf("\n");

    return 0;
}

6. 生成一个二分图

#include "testlib.h"

#include <bits/stdc++.h>

#define forn(i, n) for (int i = 0; i < int(n); i++)

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);

    int n = opt<int>(1);
    int m = opt<int>(2);
    size_t k = opt<int>(3);

    int t = rnd.next(-2, 2);

    set<pair<int,int> > edges;

    while (edges.size() < k)
    {
        int a = rnd.wnext(n, t);
        int b = rnd.wnext(m, t);
        edges.insert(make_pair(a, b));
    }

    vector<pair<int,int> > e(edges.begin(), edges.end());
    shuffle(e.begin(), e.end());

    vector<int> pa(n);
    for (int i = 0; i < n; i++)
        pa[i] = i + 1;
    shuffle(pa.begin(), pa.end());

    vector<int> pb(m);
    for (int i = 0; i < m; i++)
        pb[i] = i + 1;
    shuffle(pb.begin(), pb.end());

    println(n, m, e.size());
    forn(i, e.size())
        println(pa[e[i].first], pb[e[i].second]);

    return 0;
}

posted @ 2021-07-20 20:52 kpole 阅读(1832) 评论(2) 编辑收藏举报

刷新页面返回顶部

今夕

Testlib-Generator使用笔记

Testlib-Generator使用笔记

从一个简单的例子开始

新特性：解析命令行参数

一些示例

1. 生成随机整数

2. 生成指定范围内的不等分布随机整数

3. 生成多组测试数据

4. 生成字符串

5. 生成一棵树

6. 生成一个二分图

公告