面试题, floor(sqrt(x))的几种方法
如题,x是32位unsigned integer.
有Quake公式,(n+1)**2 - n**2 = 2n + 1, 即1+3=4, 4+5=9, 9+7=16 ... [说错了:Babbage difference and Quake's Fast Inverse Square Root - 博客园]
Here are the numbers from original Pentium, but these change dramatically in the later chips. mul 10 fmul 1 div 41 fdiv 39 Yes, floating multiply is much faster than an integer multiply. [link]
查表法: 65536的平方就溢出了。把0~65535这6万多个数的平方放在数组里。数组是有序的,可二分查找——找不到时low和high是啥情况呢?判断某个x是不是平方数可用STL的set(放水版)。
今天看了下Icarus Verilog的sample里的sqrt.vl, 不太懂,改了个python版,特意用了**和*而不是<<, +而不是|
x = 101 acc = 0; acc2 = 0 for bitl in range(15, -1, -1): bit = 2 ** bitl; bit2 = 2 ** (bitl * 2) guess = acc + bit guess2 = acc2 + bit2 + acc * (2 ** bitl) * 2 if guess2 <= x: acc = guess; acc2 = guess2 print(acc)
又想了想好像彻底懂了。它里面的提到的binary search是二进制查找,不是二分查找。用十进制也可以: 6789的平方根是90吗?90*90>6789, 所以十位不可能是9,因为91, 92...更大。十位是8吗?肯定是, 因为80*80都比6789小了,79, 68, 57之流更没戏。倒过来从1到9,10*10<6789不说明十位一定是1,因为19*19使出洪荒之力也不行。二进制每位除了1就是0,把一个5x10的循环变成16x1的循环。acc2 + bit2... 写成x**2 + 2xy + y**2更好懂。注释里本来写得就是acc2 + 2 * (acc<<bitl) + bit,bit应为bit2. bit2是bit的2次方,咱也不能说这个名字起得不好啊。:-)
Verilog代码如下:
/* * Copyright (c) 1999 Stephen Williams (steve@icarus.com) * * This source code is free software; you can redistribute it * and/or modify it in source code form under the terms of the GNU * General Public License as published by the Free Software * Foundation; either version 2 of the License, or (at your option) * any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA * * $Id: sqrt.vl,v 1.4 2004/10/04 01:10:56 steve Exp $" */ /* * This example shows that Icarus Verilog can run non-trivial * programs, too. This uses a variety of Verilog language features * to implement the module of a square-root device. The program * uses IEEE1364-1995 language features and should work correctly * on any Verilog compiler. * * Run the file with Icarus Verilog under UNIX using the command: * * % iverilog -osqrt sqrt.v * % ./sqrt */ /* * This module approximates the square root of an unsigned 32bit * number. The algorithm works by doing a bit-wise binary search. * Starting from the most significant bit, the accumulated value * tries to put a 1 in the bit position. If that makes the square * to big for the input, the bit is left zero, otherwise it is set * in the result. This continues for each bit, decreasing in * significance, until all the bits are calculated or all the * remaining bits are zero. * * Since the result is an integer, this function really calculates * value of the expression: * * x = floor(sqrt(y)) * * where sqrt(y) is the exact square root of y and floor(N) is the * largest integer <= N. * * For 32bit numbers, this will never run more then 16 iterations, * which amounts to 16 clocks. */ module sqrt32(clk, rdy, reset, x, .y(acc)); input clk; output rdy; input reset; input [31:0] x; output [15:0] acc; // acc holds the accumulated result, and acc2 is the accumulated // square of the accumulated result. reg [15:0] acc; reg [31:0] acc2; // Keep track of which bit I'm working on. reg [4:0] bitl; wire [15:0] bit = 1 << bitl; wire [31:0] bit2 = 1 << (bitl << 1); // The output is ready when the bitl counter underflows. wire rdy = bitl[4]; // guess holds the potential next values for acc, and guess2 holds // the square of that guess. The guess2 calculation is a little bit // subtle. The idea is that: // // guess2 = (acc + bit) * (acc + bit) // = (acc * acc) + 2*acc*bit + bit*bit // = acc2 + 2*acc*bit + bit2 // = acc2 + 2 * (acc<<bitl) + bit // // This works out using shifts because bit and bit2 are known to // have only a single bit in them. wire [15:0] guess = acc | bit; wire [31:0] guess2 = acc2 + bit2 + ((acc << bitl) << 1); task clear; begin acc = 0; acc2 = 0; bitl = 15; end endtask initial clear; always @(reset or posedge clk) if (reset) clear; else begin if (guess2 <= x) begin acc <= guess; acc2 <= guess2; end bitl <= bitl - 1; end endmodule module main; reg clk, reset; reg [31:0] value; wire [15:0] result; wire rdy; sqrt32 root(.clk(clk), .rdy(rdy), .reset(reset), .x(value), .y(result)); always #1 clk = ~clk; always @(posedge rdy) begin $display("sqrt(%d) --> %d", value, result); $finish; end initial begin clk = 0; reset = 1; $monitor($time,,"a=%b a2=%b g=%x g2=%x c=%d r=%d", root.acc, root.acc2, root.guess, root.guess2, clk, reset); #1 value = 1666666663; reset = 0; end endmodule /* main */
我把"printf"改了下,#100 value...和#5 clk = ~clk都改成了#1,好像没改坏。原始版上电后过100个timescale再说,每5个timescale 1个clock。想起来以前进机房要换拖鞋,“最烦你们这帮调中断的,pia-pia地把电脑都搞坏了!”
很遗憾上面那个变不成电路,合成不了,Logic synthesis失败?下面这个可以但我看不懂:
/* * Copyright (c) 2002 Stephen Williams (steve@icarus.com) * * This source code is free software; you can redistribute it * and/or modify it in source code form under the terms of the GNU * General Public License as published by the Free Software * Foundation; either version 2 of the License, or (at your option) * any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA * * $Id: sqrt-virtex.v,v 1.4.2.2 2005/02/23 18:37:52 steve Exp $" */ /* * This module is a synthesizeable square-root function. It is also a * detailed example of how to target Xilinx Virtex parts using * Icarus Verilog. In fact, for no particular reason other than to * be excessively specific, I will step through the process of * generating a design for a Spartan-II XC2S15-VQ100, and also how to * generate a generic library part for larger Virtex designs. * * In addition to Icarus Verilog, you will need implementation * software from Xilinx. As of this writing, this example was tested * with Foundation 4.2i, but it should work the same with ISE and * Webpack software. * * This example source contains all the Verilog needed to do * everything described below. We use conditional compilation to * select the bits of Verilog that are needed to perform each specific * task. * * SIMULATE THE DESIGN * * This source file includes a simulation test bench. To compile the * program to include this test bench, use the command line: * * iverilog -DSIMULATE=1 -oa.out sqrt-virtex.v * * This generates the file "a.out" that can then be executed with the * command: * * vvp a.out * * This causes the simulation to run a long set of example sqrt * calculations. Each result is checked by the test bench to assure * that the result is valid. When it is done, the program prints * "PASSED" and finishes the simulation. * * When you take a close look at the "main" module below, you will see * that it uses Verilog constructs that are not synthesizeable. This * is fine, as we will never try to synthesize it. * * LIBRARY PARTS * * One can use the sqrt32 module to generate an EDIF file suitable for * use as a library part. This part can be imported to the Xilinx * schematic editor, then placed like any other pre-existing * macro. One can also pass the generated EDIF as a precompiled macro * that other designers may use as they see fit. * * To make an EDIF file from the sqrt32 module, execute the command: * * iverilog -osqrt32.edf -tfpga -parch=virtex sqrt-virtex.v * * The -parch=virtex tells the code generator to generate code for the * virtex architecture family (we don't yet care what specific part) * and the -osqrt32.edf places the output into the file * sqrt32.edf. * * Without any preprocessor directives, the only module is the sqrt32 * module, so sqrt32 is compiled as the root. The ports of the module * are automatically made into ports of the sqrt32.edf netlist, and * the contents of the sqrt32 module are connected approprately. * * COMPLETE CHIP DESIGNS * * To make a complete chip design, there are other bits that need to * be accounted for. Signals must be assigned to pins, and some * special devices may need to be created. We also want to write into * the EDIF file complete part information so that the implementation * tools know how to route the complete design. The command to compile * for our target part is: * * iverilog -ochip.edf -tfpga \ * -parch=virtex -ppart=XC2S15-VQ100 \ * -DMAKE_CHIP=1 sqrt-virtex.v * * This command uses the "chip" module as the root. This module in * turn has ports that are destined to be the pins of the completed * part. The -ppart= option gives complete part information, that is * in turn written into the EDIF file. This saves us the drudgery of * repeating that part number for later commands. * * The next steps involve Xilinx software, and to talk to Xilinx * software, the netlist must be in the form of an "ngd" file, a * binary netlist format. The command: * * ngdbuild chip.edf chip.ngd * * does the trick. The input to ngdbuild is the chip.edf file created * by Icarus Verilog, and the output is the chip.ngd file that the * implementation tools may read. From this point, it is best to refer * to Xilinx documentation for the software you are using, but the * quick summary is: * * map -o map.ncd chip.ngd * par -w map.ncd chip.ncd * * The result of this sequence of commands is the chip.ncd file that * is ready to be viewed by FPGA Edit, or converted to a bit stream, * or whatever. * * POST MAP SIMULATION * * Warm fuzzies are good, and retesting your design after the part * is mapped by the Xilinx backend tools is a cheap source of fuzzies. * The command to make a Verilog file out of the mapped design is: * * ngd2ver chip.ngd chip_root.v * * This command creates from the chip.ngd the file "chip_root.v" that * contains Verilog code that simulates the mapped design. This output * Verilog has the single root module "chip_root", which came from the * name of the root module when we were making the EDIF file in the * first place. The module has ports named just like the ports of the * chip_root module below. * * The generated Verilog uses the library in the directory * $(XILINX)/verilog/src/simprims. This directory comes with the ISE * WebPACK installation that you are using. Icarus Verilog is able to * simulate using that library. * * To compile a post-map simulation of the chip_root.v, use the * command: * * iverilog -DSIMULATE -DPOST_MAP -ob.out \ * -y $(XILINX)/verilog/src/simprims \ * sqrt-virtex.v chip_root.v \ * $(XILINX)/verilog/src/glbl.v * * This command line generates b.out from the source files * sqrt-virtex.v and chip_root.v (the latter from ngd2ver) * and the "-y <path>" flag specifies the library directory that will * be needed. The glbl.v source file is also included to provide the * GSR and related signals. * * The POST_MAP compiler directive causes the GSR manipulations * included in the test bench to be compiled in, to simulate the chip * startup. Other then that, the test bench runs the post-map design * the same way the pre-synthesis design works. * * Run this design with the command: * * vvp b.out * * And there you go. */ `ifndef POST_MAP /* * This module approximates the square root of an unsigned 32bit * number. The algorithm works by doing a bit-wise binary search. * Starting from the most significant bit, the accumulated value * tries to put a 1 in the bit position. If that makes the square * too big for the input, the bit is left zero, otherwise it is set * in the result. This continues for each bit, decreasing in * significance, until all the bits are calculated or all the * remaining bits are zero. * * Since the result is an integer, this function really calculates * value of the expression: * * x = floor(sqrt(y)) * * where sqrt(y) is the exact square root of y and floor(N) is the * largest integer <= N. * * For 32bit numbers, this will never run more then 16 iterations, * which amounts to 16 clocks. */ module sqrt32(clk, rdy, reset, x, .y(acc)); input clk; output rdy; input reset; input [31:0] x; output [15:0] acc; // acc holds the accumulated result, and acc2 is the accumulated // square of the accumulated result. reg [15:0] acc; reg [31:0] acc2; // Keep track of which bit I'm working on. reg [4:0] bitl; wire [15:0] bit = 1 << bitl; wire [31:0] bit2 = 1 << (bitl << 1); // The output is ready when the bitl counter underflows. wire rdy = bitl[4]; // guess holds the potential next values for acc, and guess2 holds // the square of that guess. The guess2 calculation is a little bit // subtle. The idea is that: // // guess2 = (acc + bit) * (acc + bit) // = (acc * acc) + 2*acc*bit + bit*bit // = acc2 + 2*acc*bit + bit2 // = acc2 + 2 * (acc<<bitl) + bit // // This works out using shifts because bit and bit2 are known to // have only a single bit in them. wire [15:0] guess = acc | bit; wire [31:0] guess2 = acc2 + bit2 + ((acc << bitl) << 1); (* ivl_synthesis_on *) always @(posedge clk or posedge reset) if (reset) begin acc = 0; acc2 = 0; bitl = 15; end else begin if (guess2 <= x) begin acc <= guess; acc2 <= guess2; end bitl <= bitl - 5'd1; end endmodule // sqrt32 `endif // `ifndef POST_MAP `ifdef SIMULATE /* * This module is a test bench for the sqrt32 module. It runs some * test input values through the sqrt32 module, and checks that the * output is valid. If an invalid output is generated, print and * error message and stop immediately. If all the tested values pass, * then print PASSED after the test is complete. */ module main; reg [31:0] x; reg clk, reset; wire [15:0] y; wire rdy; `ifdef POST_MAP chip_root dut(.clk(clk), .reset(reset), .rdy(rdy), .x(x), .y(y)); `else sqrt32 dut(.clk(clk), .reset(reset), .rdy(rdy), .x(x), .y(y)); `endif (* ivl_synthesis_off *) always #5 clk = !clk; task reset_dut; begin reset = 1; @(posedge clk) ; #1 reset = 0; @(negedge clk) ; end endtask // reset_dut task crank_dut; begin while (rdy == 0) begin @(posedge clk) /* wait */; end end endtask // crank_dut `ifdef POST_MAP reg GSR; assign glbl.GSR = GSR; `endif integer idx; (* ivl_synthesis_off *) initial begin reset = 0; clk = 0; /* If doing a post-map simulation, when we need to wiggle The GSR bit to simulate chip power-up. */ `ifdef POST_MAP GSR = 1; #100 GSR = 0; `endif #100 x = 1; reset_dut; crank_dut; $display("x=%d, y=%d", x, y); x = 3; reset_dut; crank_dut; $display("x=%d, y=%d", x, y); x = 4; reset_dut; crank_dut; $display("x=%d, y=%d", x, y); for (idx = 0 ; idx < 200 ; idx = idx + 1) begin x = $random; reset_dut; crank_dut; $display("x=%d, y=%d", x, y); if (x < (y * y)) begin $display("ERROR: y is too big"); $finish; end if (x > ((y + 1)*(y + 1))) begin $display("ERROR: y is too small"); $finish; end end $display("PASSED"); $finish; end endmodule // main `endif `ifdef MAKE_CHIP /* * This module represents the chip packaging that we intend to * generate. We bind pins here, and route the clock to the global * clock buffer. */ module chip_root(clk, rdy, reset, x, y); input clk; output rdy; input reset; input [31:0] x; output [15:0] y; wire clk_int; (* cellref="BUFG:O,I" *) buf gbuf (clk_int, clk); sqrt32 dut(.clk(clk_int), .reset(reset), .rdy(rdy), .x(x), .y(y)); /* Assign the clk to GCLK0, which is on pin P39. */ $attribute(clk, "PAD", "P39"); // We don't care where the remaining pins go, so set the pin number // to 0. This tells the implementation tools that we want a PAD, // but we don't care which. Also note the use of a comma (,) // separated list to assign pins to the bits of a vector. $attribute(rdy, "PAD", "0"); $attribute(reset, "PAD", "0"); $attribute(x, "PAD", "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"); $attribute(y, "PAD", "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"); endmodule // chip_root `endif
轻飘飘一句floor(sqrt()),有没有想过背后的tears与欢笑?:-) 这还没到浮点呢。