spark优化之数据结构(减少内存)
官网是这么说的:
The first way to reduce memory consumption is to avoid the Java features that add overhead, such as pointer-based data structures and wrapper objects. There are several ways to do this:
1、Design your data structures to prefer arrays of objects, and primitive types, instead of the standard Java or Scala collection classes (e.g. HashMap). The fastutil library provides convenient collection classes for primitive types that are compatible with the Java standard library. 2、Avoid nested structures with a lot of small objects and pointers when possible. 3、Consider using numeric IDs or enumeration objects instead of strings for keys. 4、If you have less than 32 GB of RAM, set the JVM flag -XX:+UseCompressedOops to make pointers be four bytes instead of eight. You can add these options in spark-env.sh.
总之,尽量使用原生类型或者数组,而不要使用诸如hashmap,linkedlist之类的复杂类型,因为它会占用更多的存储空间。
然后,避免使用很多小对象嵌套结构的指针?(这个没太看懂)
然后,尽量使用数字,或者枚举类型,而不要使用字符串,因为字符串会占用更多的内存空间。
如果你的结点内存小于32G(不知道为什么一定是32G),那么设计JVM参数 -XX:+UseCompressedOops 使用对象指针占用从8字节变成4字节