学习记录-随机算法

2022.01.06 - 20222.01.18 知识点

1.Java数据结构ArrayDeque

看一下数据类的描述

Resizable-array implementation of the Deque interface. Array deques have no capacity restrictions; they grow as necessary to support usage. They are not thread-safe; in the absence of external synchronization, they do not support concurrent access by multiple threads. Null elements are prohibited. This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue.
Most ArrayDeque operations run in amortized constant time. Exceptions include remove, removeFirstOccurrence, removeLastOccurrence, contains, iterator.remove(), and the bulk operations, all of which run in linear time.

The iterators returned by this class's iterator method are fail-fast: If the deque is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will generally throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
  • extends Deque and Queue
  • No capacity restrictions
  • No thread-safe
  • Faster than Stack(stack) and LinkedList(queue)

Deque的一些特性

Two forms method:

  • Throw exceptions : add() ,remove() ,elements()
  • return special value: offer(),poll() ,peek()
  <table BORDER CELLPADDING=3 CELLSPACING=1>
  <caption>Summary of Deque methods</caption>
   <tr>
     <td></td>
     <td ALIGN=CENTER COLSPAN = 2> <b>First Element (Head)</b></td>
     <td ALIGN=CENTER COLSPAN = 2> <b>Last Element (Tail)</b></td>
   </tr>
   <tr>
     <td></td>
     <td ALIGN=CENTER><em>Throws exception</em></td>
     <td ALIGN=CENTER><em>Special value</em></td>
     <td ALIGN=CENTER><em>Throws exception</em></td>
     <td ALIGN=CENTER><em>Special value</em></td>
   </tr>
   <tr>
     <td><b>Insert</b></td>
     <td>{addFirst addFirst(e)}</td>
     <td>{offerFirst offerFirst(e)}</td>
     <td>{addLast addLast(e)}</td>
     <td>{offerLast offerLast(e)}</td>
   </tr>
   <tr>
     <td><b>Remove</b></td>
     <td>{removeFirst removeFirst()}</td>
     <td>{pollFirst pollFirst()}</td>
     <td>{removeLast removeLast()}</td>
     <td>{pollLast pollLast()}</td>
   </tr>
   <tr>
     <td><b>Examine</b></td>
     <td>{getFirst getFirst()}</td>
     <td>{peekFirst peekFirst()}</td>
     <td>{getLast getLast()}</td>
     <td>{peekLast peekLast()}</td>
   </tr>
  </table>

2.函数式编程定义的接口

函数式接口 :

  • 有且仅有一个抽象方法,但可以有多个非抽象方法的接口interface
  • 函数式接口可以转换为lambda表达式(使用lambda表达式表示该接口方法的一个实现)

常见的函数式接口:

Java1.8(java.util.fucntion)

  • Consumer, Predicate,Supplier,Function

另外较早的函数式接口

  • Runnable,Comparator,Callabler,InvocationHandler

3.Pycharm 中的自动测试

pycharm中定义包含"test"字段关键字的函数,当作为main运行时,会自动进行

nosetests测试编译器模式

python的日志模块 logging

日志事件级别:

  • DEBUG 细节信息,仅当诊断问题时
  • INFO 确保程序安装预期进行
  • WARNING 表明已有或即将发生的意外
  • ERROR 由于严重问题,程序某些功能不能正常使用
  • CRITICAL 严重的错误,表明程序已不能继续执行

4.布隆过滤器(bloom filter)

  • 实现多个内部hash实现类
  • hash函数中传入seed
  • hash函数的重写
public class MyBloomFilter {
    private static final int DEFAULT_SIZE = 2 << 24;
    private static int[] SEEDS = new int[]{3,13,46};
    private static BitSet bits = new BitSet(DEFAULT_SIZE);
    // hash function array
    private SimpleHash[] func = new SimpleHash[SEEDS.length];
    public MyBloomFilter(){
        for(int i = 0;i< SEEDS.length;i++){
            func[i] = new SimpleHash(DEFAULT_SIZE,SEEDS[i]);
        }
    }
    public void add(Object value){
        for(SimpleHash f: func){
            bits.set(f.hash(value),true);
        }
    }
    public boolean contains(Object value){
        boolean res = true;
        for(SimpleHash f:func){
            res = bits.get(f.hash(value));
            if(!res){
                return false;
            }
        }
        return true;
    }
    public static class SimpleHash{
        private int cap;
        private int seed;

        public SimpleHash(int cap ,int seed){
            this.cap = cap;
            this.seed = seed;
        }
        // generate the hash value for object
        public int hash(Object value){
            int h;
            return (value == null) ? 0:
                    Math.abs(seed*(cap -1) & (h=value.hashCode())^(h>>>16));
        }
    }
}

5.重复url过滤及大数据计数问题

解决方向:

  • 位向量 bitset
  • hash函数
  • 分文件

6.python导包目录结构

python解释器查找模块文件过程(sys.path)

1.current 当前目录

2.PYTHONPATH 环境变量

3.python默认安装目录

# 添加当前目录到环境变量
PATH=$PATH:$PWD

import 和 from ... import 的区别?

  • import导入后在使用该模块的时候,需要添加 包名.模块名为前缀

  • 向文件导入某个模块,导入的时该模块中哪些名称不以下划线"_"开头的变量

  • 导入包时不到导入package下面的module模块,只会执行package下的"init"方法

常见的导入包的类型

  • Import 包名[.模块名[as 别名]]
  • from 包名 import 模块名 [as 别名]
  • from 包名.模块名 import 成员名 [as 别名]

7.随机算法

  • 蓄水池算法
  • 洗牌算法

随机算法:

随机算法也叫随机化算法
在算法中使用了随机函数,且随机函数的返回值直接或间接影响算法的执行流程或执行结果。
将算法的某一步或某几步置于运气的控制下,即该算法在运行的过程中的某一步或者某几步涉及一个随机决策。
或者说其中的一个决策以来于某种随机事件

蓄水池抽样算法

抽象于实际问题,大数据流中的随机抽样问题

内存无法加载全部数据,从未知大小的数据中随机抽取k个数据,并保证每个数据被抽取的概率相等

抽取一个数据 1/i (i 为当前所选取的数值)

leetcode382

抽取k个数据

class Solution {
    ListNode head;
    Random random;

    public Solution(ListNode head) {
        this.head = head;
        random = new Random();
    }

    public int getRandom() {
        int i = 1, ans = 0;
        for (ListNode node = head; node != null; node = node.next) {
            if (random.nextInt(i) == 0) { // 1/i 的概率选中(替换为答案)
                ans = node.val;
            }
            ++i;
        }
        return ans;
    }
}

洗牌算法

// 经典的Fisher–Yates算法
for(int i=suit.length-1;i>0;i--)
{
     random1 = Random.next(1,i);
     exchange(suit[random1],suit[i]);
}
class Solution:
    def __init__(self, nums):
        self.nums = nums

    def reset(self):
        return self.nums

    def shuffle(self):
        array = copy.copy(self.nums)
        for i in range(len(array)):
            random_num = random.randint(i,len(array)-1)
            array[i], array[random_num] = array[random_num], array[i]
        return array

posted @ 2022-01-23 21:11  dengshuo7412  阅读(46)  评论(0编辑  收藏  举报