php 截取UTF-8文档某个位置开始的n个字符
ucut.php :
#!/usr/bin/php <?php define('INPUT_FILE', 't.txt'); define('OUTPUT_FILE', 'a.txt'); $pos = max(intval($argv[1]), 0); $len = max(intval($argv[2]), 0); $file_size = filesize(INPUT_FILE); if($pos >= $file_size) exit; $fp = fopen(INPUT_FILE, 'rb'); $point = 0; //current byte position $string = ''; while(ftell($fp) < $file_size) { if($point >= $pos + $len) break;$byte = fread($fp, 1); //php version >= 5.4 $char = unpack('C', $byte)[1]; if($char <= 0x7f) { //single byte if($point >= $pos) $string .= $byte; $point += 1; continue; } elseif($char >= 0xc0 && $char <= 0xdf) { //double bytes if($point >= $pos) { $string .= $byte.fread($fp, 1); } else { fseek($fp, 1, SEEK_CUR); } $point += 1; continue; } elseif($char >= 0xe0 && $char <= 0xef) { //three bytes if($point >= $pos) { $string .= $byte.fread($fp, 2); } else { fseek($fp, 2, SEEK_CUR); } $point += 1; continue; } elseif($char >= 0xf0 && $char <= 0xf7) { //four bytes if($point >= $pos) { $string .= $byte.fread($fp, 3); } else { fseek($fp, 3, SEEK_CUR); } $point += 1; continue; } } fclose($fp); file_put_contents(OUTPUT_FILE, $string); ?>
测试文件t.txt内容:
dei小五5维在fe测试修字d集合啊
测试命令:
./ucut.php 7 2
结果查看命令:
hexdump -C t.txt && hexdump -C a.txt