二分搜索 - Binary Search
二分搜索是一种在有序数组中寻找目标值的经典方法,也就是说使用前提是『有序数组』。非常简单的题中『有序』特征非常明显,但更多时候可能需要我们自己去构造『有序数组』。下面我们从最基本的二分搜索开始逐步深入。
一、lower/upper bound
定义 lower bound 为在给定升序数组中大于等于目标值的最小索引,upper bound 则为小于等于目标值的最大索引,下面上代码和测试用例。
import java.util.*; public class Main { public static void main(String[] args) { int[] nums = new int[]{1,2,2,3,4,6,6,6,13,18}; System.out.println(lowerBound(nums, 6)); // 5 System.out.println(upperBound(nums, 6)); // 7 System.out.println(lowerBound(nums, 7)); // 8 System.out.println(upperBound(nums, 7)); // 7 } /* * nums[index] >= target, min(index) */ public static int lowerBound(int[] nums, int target) { if (nums == null || nums.length == 0) return -1; int lb = -1, ub = nums.length; while (lb + 1 < ub) { int mid = lb + (ub - lb) / 2; if (nums[mid] < target) { lb = mid; } else { ub = mid; } } return lb + 1; } /* * nums[index] <= target, max(index) */ public static int upperBound(int[] nums, int target) { if (nums == null || nums.length == 0) return -1; int lb = -1, ub = nums.length; while (lb + 1 < ub) { int mid = lb + (ub - lb) / 2; if (nums[mid] > target) { ub = mid; } else { lb = mid; } } return ub - 1; } }
源码分析
以lowerBound
的实现为例,以上二分搜索的模板有几个非常优雅的实现:
while
循环中lb + 1 < ub
, 而不是等号,因为取等号可能会引起死循环。初始化lb < ub
时,最后循环退出时一定有lb + 1 == ub
.mid = lb + (ub - lb) / 2
, 可有效防止两数相加后溢出。lb
和ub
的初始化,初始化为数组的两端以外,这种初始化方式比起0
和nums.length - 1
有不少优点,详述如下。
如果遇到有问插入索引的位置时,可以分三种典型情况:
- 目标值在数组范围之内,最后返回值一定是
lb + 1
- 目标值比数组最小值还小,此时
lb
一直为-1
, 故最后返回lb + 1
也没错,也可以将-1
理解为数组前一个更小的值 - 目标值大于等于数组最后一个值,由于循环退出条件为
lb + 1 == ub
, 那么循环退出时一定有lb = A.length - 1
, 应该返回lb + 1
综上所述,返回lb + 1
是非常优雅的实现。其实以上三种情况都可以统一为一种方式来理解,即索引-1
对应于数组前方一个非常小的数,索引ub
即对应数组后方一个非常大的数,那么要插入的数就一定在lb
和ub
之间了。
有时复杂的边界条件处理可以通过『补项』这种优雅的方式巧妙处理。
关于lb 和 ub 的初始化,由于mid = lb + (ub - lb) / 2
, 且有lb + 1 < ub
,故 mid 还是有可能为ub - 1
或者lb + 1
的,在需要访问mid + 1
或者mid - 1
处索引的元素时可能会越界。这时候就需要将初始化方式改为lb = 0, ub = A.length - 1
了,最后再加一个关于lb, ub
处索引元素的判断即可。如 Search for a Range 和 Find Peak Element. 尤其是 Find Peak Element 中 lb 和 ub 的初始值如果初始化为-1和数组长度会带来一些麻烦。
二、最优解
除了在有序数组中寻找目标值这种非常直接的二分搜索外,我们还可以利用二分搜索求最优解(最大值/最小值),通常这种题中只是隐含了『有序数组』,需要我们自己构造。
用数学语言来描述就是『求满足某条件 C(x) 的最小/大的 x』,以求最小值为例,对于任意满足条件的 x, 如果所有的 x≤x′≤UB 对于 C(x′) 都为真(其中 UB
可能为无穷大,也可能为满足条件的最大的解,如果不满足此条件就不能保证二分搜索的正确性),那么我们就能使用二分搜索进行求解,其中初始化时下界lb
初始化为不满足条件的值LB
, 上界初始化为满足条件的上界UB
. 随后在while
循环内部每次取中,满足条件就取ub = mid
, 否则lb = mid
, 那么最后ub
就是要求的最小值。求最大值时类似,只不过处理的是lb
.
以 POJ No.1064 为例。
Cable master Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 66707 Accepted: 13737 Description Inhabitants of the Wonderland have decided to hold a regional programming contest. The Judging Committee has volunteered and has promised to organize the most honest contest ever. It was decided to connect computers for the contestants using a "star" topology - i.e. connect them all to a single central hub. To organize a truly honest contest, the Head of the Judging Committee has decreed to place all contestants evenly around the hub on an equal distance from it. To buy network cables, the Judging Committee has contacted a local network solutions provider with a request to sell for them a specified number of cables with equal lengths. The Judging Committee wants the cables to be as long as possible to sit contestants as far from each other as possible. The Cable Master of the company was assigned to the task. He knows the length of each cable in the stock up to a centimeter,and he can cut them with a centimeter precision being told the length of the pieces he must cut. However, this time, the length is not known and the Cable Master is completely puzzled. You are to help the Cable Master, by writing a program that will determine the maximal possible length of a cable piece that can be cut from the cables in the stock, to get the specified number of pieces. Input The first line of the input file contains two integer numb ers N and K, separated by a space. N (1 = N = 10000) is the number of cables in the stock, and K (1 = K = 10000) is the number of requested pieces. The first line is followed by N lines with one number per line, that specify the length of each cable in the stock in meters. All cables are at least 1 meter and at most 100 kilometers in length. All lengths in the input file are written with a centimeter precision, with exactly two digits after a decimal point. Output Write to the output file the maximal length (in meters) of the pieces that Cable Master may cut from the cables in the stock to get the requested number of pieces. The number must be written with a centimeter precision, with exactly two digits after a decimal point. If it is not possible to cut the requested number of pieces each one being at least one centimeter long, then the output file must contain the single number "0.00" (without quotes). Sample Input 4 11 8.02 7.43 4.57 5.39 Sample Output 2.00
Problem
有 N 条绳子,它们的长度分别为 Li. 如果从它们中切割出 K 条长度相同的绳子的话,这 K 条绳子每条最长能有多长?答案保留到小数点后两位。
输入
N = 4, L = {8.02, 7.43, 4.57, 5.39}, K = 11
输出
2.00
题解
这道题看似是一个最优化问题,我们来尝试下使用模板二的思想求解,
**令 C(x) 为『可以得到 K 条长度为 x 的绳子』。
**根据题意,我们可以将上述条件进一步细化为:C(x)=∑i(floor(Li/x))≥KC(x) =i∑(floor(Li/x))≥K
我们现在来分析下可行解的上下界。由于答案保留小数点后两位,显然绳子长度一定大于0,大于0的小数点后保留两位的最小值为0.01
, 显然如果问题最后有解,0.01
一定是可行解中最小的,且这个解可以分割出的绳子条数是最多的。一般在 OJ 上不同变量都是会给出范围限制,那么我们将上界初始化为最大范围 + 0.01
, 它一定在可行解之外(也可以遍历一遍数组取数组最大值,但其实二分后复杂度相差不大)。使用二分搜索后最后返回lb
即可。
import java.io.*; import java.util.*; public class Main { public static void main(String[] args) { Scanner in = new Scanner(System.in); int n = in.nextInt(); int k = in.nextInt(); double[] nums = new double[n]; for (int i = 0; i < n; i++) { nums[i] = in.nextDouble(); } System.out.printf("%.2f\n", Math.floor(solve(nums, k) * 100) / 100); } public static double solve(double[] nums, int K) { double lb = 0.00, ub = 10e5 + 0.01; // while (lb + 0.001 < ub) { for (int i = 0; i < 100; i++) { double mid = lb + (ub - lb) / 2; if (C(nums, mid, K)) { lb = mid; } else { ub = mid; } } return lb; } public static boolean C(double[] nums, double seg, int k) { int count = 0; for (double num : nums) { count += Math.floor(num / seg); } return count >= k; } }
源码分析
方法C
只做一件事,给定数组nums
, 判断是否能切割出K
条长度均为seg
的绳子。while
循环中使用lb + 0.001 < ub
, 不能使用0.01
, 因为计算mid
时有均值的计算,对于double
型数值否则会有较大误差。
三、二分搜索的 while
结束条件判定
对于整型我们通常使用lb + 1 < ub
, 但对于double
型数据来说会有些精度上的丢失,使得结束条件不是那么好确定。像上题中采用的方法是题目中使用的精度除10。但有时候这种精度可能还是不够,如果结束条件lb + EPS < ub
中使用的 EPS 过小时 double 型数据精度有可能不够从而导致死循环的产生!这时候我们将while
循环体替换为for (int i = 0; i < 100; i++)
, 100 次循环后可以达到 10−3010^{-30}10−30 精度范围,一般都没问题。