Flink开发_Flink中的函数接口

org.apache.flink.api.common.functions
org.apache.flink.api.java.functions
org.apache.flink.streaming.api.functions

  group
  reduce
  combine

1.转换Transformation

public interface FilterFunction<T> extends Function, Serializable {
    boolean filter(T var1) throws Exception;
}

public interface MapFunction<T, O> extends Function, Serializable {
    O map(T var1) throws Exception;
}
public interface FlatMapFunction<T, O> extends Function, Serializable {
    void flatMap(T var1, Collector<O> var2) throws Exception;
}

2.聚合

 Flink在reduce之前大量使用了Combine操作。
 Combine可以理解为是在map端的reduce的操作,对单个map任务的输出结果数据进行合并的操作
 combine是对一个map的
 map函数操作所产生的键值对会作为combine函数的输入,
 经combine函数处理后再送到reduce函数进行处理,
   减少了写入磁盘的数据量,同时也减少了网络中键值对的传输量
---------------------------------------------------------------
 public interface CombineFunction<IN, OUT> extends Function, Serializable {
 OUT combine(Iterable<IN> var1) throws Exception;
 }

public interface ReduceFunction<T> extends Function, Serializable {
    T reduce(T var1, T var2) throws Exception;
}

public interface AggregateFunction<IN, ACC, OUT> extends Function, Serializable {
    ACC createAccumulator();
    ACC add(IN var1, ACC var2);
	ACC merge(ACC var1, ACC var2);
    OUT getResult(ACC var1);   
}
----------------------------------------------------------------------

3.关联

--Join操作DataStream时只能用在window中,和cogroup操作一样
------关联---------这种 JoinFunction 和 FlatJoinFunction 与 MapFunction 和 FlatMapFunction 的关系类似------------
public interface JoinFunction<IN1, IN2, OUT> extends Function, Serializable {
    OUT join(IN1 var1, IN2 var2) throws Exception;
}

public interface FlatJoinFunction<IN1, IN2, OUT> extends Function, Serializable {
    void join(IN1 var1, IN2 var2, Collector<OUT> var3) throws Exception;
}

public interface CrossFunction<IN1, IN2, OUT> extends Function, Serializable {
OUT cross(IN1 var1, IN2 var2) throws Exception;
}
-- CoGroup  两个数据流/集合按照key进行group,然后将相同key的数据进行处理,
--- 但是它和join操作稍有区别,它在一个流/数据集中没有找到与另一个匹配的数据还是会输出。
--  如果是 DataStream中  则需要在window中使用
--- 如果在Dataser中 
public interface CoGroupFunction<IN1, IN2, O> extends Function, Serializable {
    void coGroup(Iterable<IN1> var1, Iterable<IN2> var2, Collector<O> var3) throws Exception;
}

4.分组

-- 分组reduce,即 GroupReduce  GroupReduceFunction
-- 这与Reduce的区别在于用户定义的函数会立即获得整个组。在组的所有元素上使用Iterable调用该函数,
并且可以返回任意数量的结果元素
public interface GroupReduceFunction<T, O> extends Function, Serializable {
    void reduce(Iterable<T> var1, Collector<O> var2) throws Exception;
}

-- 使group-reduce函数可组合,它必须实现GroupCombineFunction接口
-- GroupCombine转换是可组合GroupReduceFunction中组合步骤的通用形式。
它在某种意义上被概括为允许将输入类型I组合到任意输出类型O.
-- 相反,GroupReduce中的组合步骤仅允许从输入类型I到输出类型I的组合。
这是因为reduce步骤中, GroupReduceFunction 期望输入类型为I.
public interface GroupCombineFunction<IN, OUT> extends Function, Serializable {
    void combine(Iterable<IN> var1, Collector<OUT> var2) throws Exception;
}

5.分区

-----------------------------------------------
public interface Partitioner<K> extends Serializable, Function {
    int partition(K var1, int var2);
}
 
public interface MapPartitionFunction<T, O> extends Function, Serializable {
    void mapPartition(Iterable<T> var1, Collector<O> var2) throws Exception;
}

6.其他

其他
 org.apache.flink.streaming.api.functions.co;
  public interface CoMapFunction<IN1, IN2, OUT> extends Function, Serializable {
    OUT map1(IN1 var1) throws Exception;
    OUT map2(IN2 var1) throws Exception;
	}
   public interface CoFlatMapFunction<IN1, IN2, OUT> extends Function, Serializable {
       void flatMap1(IN1 var1, Collector<OUT> var2) throws Exception;
       void flatMap2(IN2 var1, Collector<OUT> var2) throws Exception;
	}

sink:org.apache.flink.streaming.api.functions.sink;
   public interface SinkFunction<IN> extends Function, Serializable {
source:
    public interface SourceFunction<T> extends Function, Serializable {
	public interface ParallelSourceFunction<OUT> extends SourceFunction<OUT> {}

org.apache.flink.streaming.api.functions.windowing
   public interface AllWindowFunction<IN, OUT, W extends Window> extends Function, Serializable {
   void apply(W var1, Iterable<IN> var2, Collector<OUT> var3) throws Exception;
   }
org.apache.flink.streaming.api.functions;
  public interface TimestampAssigner<T> extends Function {
 org.apache.flink.api.java.functions
  public interface KeySelector<IN, KEY> extends Function, Serializable

function

   1.Implementing an interface
   2.Anonymous classes
   3.Java 8 Lambdas

rich functions

  RichFunction是接口,然后AbstractRichFunction实现了 RichFunction的抽象类,其余对应的函数继承抽象类
    public interface RichFunction extends Function 
	public abstract class AbstractRichFunction implements RichFunction, Serializable 
  All transformations that require a user-defined function can instead take as argument a rich function
  @Public
   public abstract class RichMapFunction<IN, OUT> extends AbstractRichFunction implements MapFunction<IN, OUT> {
     private static final long serialVersionUID = 1L;
     public RichMapFunction() {}
     public abstract OUT map(IN var1) throws Exception;
    }
   1.class MyMapFunction extends RichMapFunction and pass the function as usual to a map transformation
   2.Rich functions can also be defined as an anonymous class:
 特点:有四个方法
   five methods: open, close, setRuntimeContext, and getRuntimeContext getIterationRuntimeContext
   These are useful for 
       parameterizing the function (see Passing Parameters to Functions), 
       creating and finalizing local state, 
       accessing broadcast variables (see Broadcast Variables),
       and for accessing runtime information such as accumulators and counters (see Accumulators and Counters), 
       and information on iterations (see Iterations).

Spark SQL UDAF与Flink的 aggreagte

  Spark SQL的UDAF  UserDefinedAggregateFunction
  inputSchema  bufferSchema  dataType
  initialize  update  merge  evaluate

  Flink:
      AggregateFunction
	 <IN, ACC, OUT>
    createAccumulator, add,merge;getResult;   
   }

Spark functions

org.apache.spark.api.java.function 
 public interface Function<T1, R> extends Serializable { R call(T1 var1) throws Exception;}
 public interface Function2<T1, T2, R> extends Serializable { R call(T1 v1, T2 v2) throws Exception;}
 public interface PairFunction<T, K, V> extends Serializable { Tuple2<K, V> call(T t) throws Exception;}

 public interface FilterFunction<T> extends Serializable {boolean call(T value) throws Exception;}
 public interface ReduceFunction<T> extends Serializable { T call(T v1, T v2) throws Exception;}

 public interface FlatMapFunction<T, R> extends Serializable {Iterator<R> call(T t) throws Exception;}
posted @ 2020-12-02 19:08  辰令  阅读(285)  评论(0编辑  收藏  举报