JVM专题2: JVM内存结构

Java 内存分配

JVM 内存结构

The JVM is an abstract computing machine that enables a computer to run a Java program. There are three notions of JVM:
specification (where working of JVM is specified. But the implementation has been provided by Sun and other companies),
implementation (known as (JRE) Java Runtime Environment) and
instance (after writing Java command, to run Java class, an instance of JVM is created).

The JVM loads the code, verifies the code, executes the code, manages memory (this includes allocating memory from the Operating System (OS), managing Java allocation including heap compaction and removal of garbage objects) and finally provides the runtime environment.

JVM memory is divided into multiple parts: Heap Memory, Non-Heap Memory and Other.

Heap memory

Heap memory is the run time data area from which the memory for all java class instances and arrays is allocated. The heap is created when the JVM starts up and may increase or decrease in size while the application runs. The size of the heap can be specified using –Xms VM option. The heap can be of fixed size or variable size depending on the garbage collection strategy. Maximum heap size can be set using –Xmx option. By default, the maximum heap size is set to 64 MB.

Non-Heap memory

The JVM has memory other than the heap, referred to as Non-Heap Memory. It is created at the JVM startup and stores per-class structures such as runtime constant pool, field and method data, and the code for methods and constructors, as well as interned Strings. The default maximum size of non-heap memory is 64 MB. This can be changed using –XX:MaxPermSize VM option.

Other memory

JVM uses this space to store the JVM code itself, JVM internal structures, loaded profiler agent code and data, etc.

JVM内存划分

堆内存

这一部分存储的是运行时产生的java对象实例, 数组等.

非堆内存

存储的是类结构, 常量和构造方法这些的数据和代码, 运行时栈内存, 还有字符串等. PermGen属于非堆内存, 在JAVA8之后, 字符串常量被挪到堆内存了, 类和方法定义以及运行时常量池挪到了MetaSpace, 但是MetaSpace不属于JVM内存, 而是原生内存.

其他内存

这块主要是用来存储JVM自己的代码, JVM的结构和数据等.

一个堆大小为2G的JVM可能占用的内存

= 堆内存 + (线程数 * 线程栈) + 永久代 + 二进制代码 + 堆外内存 
= 2G + (1000 * 1M) + 256M + 48~240M + (about 2G) 
= 5.xG
  • 堆内存: 存储Java对象, 默认为物理内存的1/64, 由Xms, Xmx, Xmn等参数控制
  • 线程栈: 存储局部变量(原子类型, 引用)及其他, JDK5以后每个线程堆栈大小为1M, 以前每个线程堆栈大小为256K, 根据应用的线程所需内存大小进行调整. 在相同物理内存下,减小这个值能生成更多的线程. 但是操作系统对一个进程内的线程数还是有限制的, 不能无限生成, 经验值在3000 - 5000左右
  • 永久代: 存储类定义及常量池, JDK7以前为PermSize, MaxPermSize, JDK8之后为MetaspaceSize, MaxMetaspaceSize
  • 二进制代码: JDK7与8, 打开多层编译时的默认值不一样, 从48到240M
  • 堆外内存: 被Netty 堆外缓存等使用, 默认最大值约为堆内存大小

也就是说堆内存为2G的JVM需要准备差不多4G内存, 一个实例如果有1000个线程, 可能需要占到5.5G. 如果有3000个线程加上堆内存和PerGem/MetaSpace内存差不多要12G.

Java8 client takes Larger of 1/64th of your physical memory for your Xmssize (Minimum HeapSize) and Smaller of 1/4th of your physical memory for your -Xmxsize (Maximum HeapSize). But for large boxes, "1/4th RAM" rule of thumb definitely does not hold. On a 4-socket, 64gb per socket server (256gb RAM), Xmx defaults to 32gb. 32gb may be related to CompressedOops' limitations being at around this point, too

If you want to use 32-bit references, your heap is limited to 32 GB. Note: You can access large direct memory and memory mapped sizes even if you use 32-bit references in your heap. i.e. use well above 32 GB.

However, if you are willing to use 64-bit references, the size is likely to be limited by your OS, just as it is with 32-bit JVM. e.g. on Windows 32-bit this is 1.2 to 1.5 GB.

Note: you will want your JVM heap to fit into main memory, ideally inside one NUMA region. That's about 1 TB on the bigger machines. If your JVM spans NUMA regions the memory access and the GC in particular will take much longer. If your JVM heap start swapping it might take hours to GC, or even make your machine unusable as it thrashes the swap drive.

参考

Java 8 中的常量池、字符串池、包装类对象池

常量池分为静态常量池、运行时常量池.

静态常量池在 .class 中, 运行时常量池在方法区中, JDK 1.8 中方法区(method area)已经被元空间(metaspace)代替.
字符串池在JDK 1.7 之后被分离到堆区.
String str = new String("Hello world") 创建了 2 个对象, 一个驻留在字符串池, 一个分配在 Java 堆, str指向堆上的实例.
String.intern() 能在运行时向字符串池添加常量.
部分包装类实现了池化技术, -128~127 以内的对象可以重用.

在 JDK 1.6 以及以前的版本中, String pool是放在 Perm 区(Permanent Generation). 字符串如果不存在, 会在Perm区新建实例.
在 JDK 1.7 的版本及之后, String pool移到Java Heap, 字符串如果不存在, 会在堆上新建实例.

String.intern()的大体实现: Java 调用 c++ 实现的 StringTable 的 intern() 方法, StringTable 的 intern() 方法跟 Java 中的 HashMap 的实现是差不多的, 只是不能自动扩容, 默认大小是1009.

字符串池实际上是一个 HashTable, Java 中 HashMap 和 HashTable 的原理大同小异, 将字符串池看作哈希表更便于我们套用学习数据结构时的一些知识, 比如解决数据冲突时, HashMap 和 HashTable 使用的是开散列(或者说拉链法). 字符串池实际存的是引用, 这些引用指向字符串实例.

参考

Java 堆的结构是什么样子的?

说说各个区域的作用?

默认情况下, jvm会在每次垃圾回收后增长或收缩heap大小, 以便保持合适比例的空闲空间. 对于服务器应用的heap大小, 有以下原则

  1. 给jvm设置尽可能多的内存, 默认的大小远远不够
  2. 将-Xms和-Xmx设置为一样, 避免jvm做heap大小决策
  3. 在增加处理器核数时也增加内存, 内存分配是可以同步处理的

堆内存的结构如下

  • 一个年老代: 在保证程序正常运行的前提下, 设置10~20%的冗余, 其余分配给年轻代
  • 一个年轻代
    • 一个伊甸区: 一般初始化时设置伊甸区和幸存区的比例为 6:1:1, 可以用SurvivorRatio来调解幸存区的大小, 但是这个比例是会随着YCG变化的
    • 两个幸存区: 这是用来做YCG用的, 在给定的时间总是有一个幸存区是空的, 在做YCG的时候, 会将需要保留的数据复制到空的幸存区, 再将原幸存区清空. 在每次垃圾回收时, jvm会选择一个阈值, 即某个对象被移入老年代前要经历的回收次数. 这个阈值取决于是否可以将幸存区保持50%可用空间. 参数 -XX:+PrintTenuringDistribution 可以用于显示这个阈值以及各年轻代对象的年龄, 这对于获取应用的对象的生命周期分布特别有用

大多数情况下新对象都被分配在新生代中, 新生代由Eden Space和两块相同大小的Survivor Space组成, 后两者主要用于Minor GC时的对象复制

JVM在Eden Space中会开辟一小块独立的TLAB(Thread Local Allocation Buffer)区域用于更高效的内存分配, 我们知道在堆上分配内存需要锁定整个堆, 而在TLAB上则不需要, JVM在分配对象时会尽量在TLAB上分配, 以提高效率.

什么是堆中的永久代(Perm Gen space)?

Permanent Generation or “Perm Gen” contains the application metadata required by the JVM to describe the classes and methods used in the application. Perm Gen is populated by JVM at runtime based on the classes used by the application. Perm Gen also contains Java SE library classes and methods. Perm Gen objects are garbage collected in a full garbage collection.

With Java 8, there is no Perm Gen, that means there is no more “java.lang.OutOfMemoryError: PermGen” space problems. Unlike Perm Gen which resides in the Java heap, Metaspace is not part of the heap. Most allocations of the class metadata are now allocated out of native memory. Metaspace by default auto increases its size (up to what the underlying OS provides), while Perm Gen always has fixed maximum size. Two new flags can be used to set the size of the metaspace, they are: “-XX:MetaspaceSize” and “-XX:MaxMetaspaceSize”. The theme behind the Metaspace is that the lifetime of classes and their metadata matches the lifetime of the classloaders. That is, as long as the classloader is alive, the metadata remains alive in the Metaspace and can’t be freed.

http://www.openkb.info/2014/07/garbage-collection-in-permgen.html

存储类定义及常量池, JDK7以前为PermSize, MaxPermSize, PermGen是heap的一部分,
在Java SE 6 Update 3 or earlier, 默认是不回收的, 可以配置,
在Java SE 6 Update 3之后, PermGen在Full GC时会默认被回收
在JAVA8之后PermGen被替代为MetaspaceSize, MaxMetaspaceSize, 不再是heap的一部分. 但也是会被GC的

JDK 1.6 下的 永久代 = 字符串池 + 方法区 或者 永久代 = (包含字符串池的)方法区. 永久代拥有了实例对象, 不符合虚拟机规范.

JVM规范中运行时数据区域中的方法区, 在HotSpot虚拟机中又被习惯称为永生代或者永生区, Permanet Generation 中存放的为一些class的信息、常量、静态变量等数据, 当系统中要加载的类、反射的类和调用的方法较多时, Permanet Generation可能会被占满, 在未配置为采用CMS GC的情况下也会执行Full GC. 如果经过Full GC仍然回收不了, 那么JVM会抛出如下错误信息:java.lang.OutOfMemoryError: PermGen space 为避免Perm Gen占满造成Full GC现象, 可采用的方法为增大Perm Gen空间或转为使用CMS GC

Java 中会存在内存泄漏吗, 简述一下?

  1. 对象用完引用不释放, 或者在static字段上创建了大对象
  2. 创建intern大字符串
  3. 资源用完不关闭, 例如stream, connection等
  4. 在HashSet里添加未正确实现hashCode和equals方法的对象

You cannot really "leak memory" in Java unless you:

  • intern strings
  • generate classes
  • leak memory in the native code called by jni
  • keep references to things that you do not want in some forgotten or obscure place.

I take it that you are interested in the last case. The common scenarios are:
listeners, especially done with inner classes caches. A nice example would be to:
build a Swing gui that launches a potentially unlimited number of modal windows;
have the modal window do something like this during its initialization:

StaticGuiHelper.getMainApplicationFrame().getOneOfTheButtons().addActionListener(new ActionListener(){
  public void actionPerformed(ActionEvent e){
     // do nothing...
  }
})

The registered action does nothing, but it will cause the modal window to linger in memory forever, even after closing, causing a leak - since the listeners are never unregistered, and each anonymous inner class object holds a reference (invisible) to its outer object. What's more - any object referenced from the modal windows have a chance of leaking too.

Another answer:

  • Static Field Holding Onto the Object Reference
    The first scenario that might cause a Java memory leak is referencing a heavy object with a static field.
private Random random = new Random();
public static final ArrayList<Double> list = new ArrayList<Double>(1000000);
@Test
public void givenStaticField_whenLotsOfOperations_thenMemoryLeak() throws InterruptedException {
    for (int i = 0; i < 1000000; i++) {
        list.add(random.nextDouble());
    }
    
    System.gc();
    Thread.sleep(10000); // to allow GC do its job
}
  • Calling String.intern() on Long String
    The second group of scenarios that frequently causes memory leaks involves String operations – specifically the String.intern() API.
@Test
public void givenLengthString_whenIntern_thenOutOfMemory()
  throws IOException, InterruptedException {
    Thread.sleep(15000);
    
    String str 
      = new Scanner(new File("src/test/resources/large.txt"), "UTF-8")
      .useDelimiter("\\A").next();
    str.intern();
    
    System.gc(); 
    Thread.sleep(15000);
}
  • Unclosed Streams
    Forgetting to close a stream is a very common scenario, and certainly, one that most developers can relate to. The problem was partially removed in Java 7 when the ability to automatically close all types of streams was introduced into the try-with-resource clause.
    Why partially? Because the try-with-resources syntax is optional:
@Test(expected = OutOfMemoryError.class)
public void givenURL_whenUnclosedStream_thenOutOfMemory()
  throws IOException, URISyntaxException {
    String str = "";
    URLConnection conn 
      = new URL("http://norvig.com/big.txt").openConnection();
    BufferedReader br = new BufferedReader(
      new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
    
    while (br.readLine() != null) {
        str += br.readLine();
    } 
    //
}
  • Unclosed Connections
    This scenario is quite similar to the previous one, with the primary difference of dealing with unclosed connections (e.g. to a database, to an FTP server, etc.). Again, improper implementation can do a lot of harm, leading to memory problems.
@Test(expected = OutOfMemoryError.class)
public void givenConnection_whenUnclosed_thenOutOfMemory()
  throws IOException, URISyntaxException {
    
    URL url = new URL("ftp://speedtest.tele2.net");
    URLConnection urlc = url.openConnection();
    InputStream is = urlc.getInputStream();
    String str = "";
    
    //
}
  • Adding Objects With no hashCode() and equals() Into a HashSet
    A simple but very common example that can lead to a memory leak is to use a HashSet with objects that are missing their hashCode() or equals() implementations.
    Specifically, when we start adding duplicate objects into a Set – this will only ever grow, instead of ignoring duplicates as it should. We also won’t be able to remove these objects, once added.
public class Key {
    public String key;
    
    public Key(String key) {
        Key.key = key;
    }
}

Now, let’s see the scenario:

@Test(expected = OutOfMemoryError.class)
public void givenMap_whenNoEqualsNoHashCodeMethods_thenOutOfMemory()
throws IOException, URISyntaxException {
    Map < Object, Object > map = System.getProperties();
    while (true) {
        map.put(new Key("key"), "value");
    }
}

Java Stack 栈结构

Java栈由栈帧组成, 一个帧对应一个方法调用. 调用方法时压入栈帧, 方法返回时弹出栈帧并抛弃.

Java栈的主要任务是存储方法参数, 局部变量, 中间运算结果, 并且提供部分其它模块工作需要的数据. 前面已经提到Java栈是线程私有的, 这就保证了线程安全性, 使得程序员无需考虑栈同步访问的问题, 只有线程本身可以访问它自己的局部变量区.

它分为三部分: 局部变量区、操作数栈、帧数据区

  • 局部变量区
    局部变量区是以字长为单位的数组, 在这里, byte、short、char类型会被转换成int类型存储, 除了long和double类型占两个字长以外, 其余类型都只占用一个字长. 特别地, boolean类型在编译时会被转换成int或byte类型, boolean数组会被当做byte类型数组来处理. 局部变量区也会包含对象的引用, 包括类引用、接口引用以及数组引用.
    局部变量区包含了方法参数和局部变量, 此外, 实例方法隐含第一个局部变量this, 它指向调用该方法的对象引用. 对于对象, 局部变量区中永远只有指向堆的引用.

  • 操作数栈
    操作数栈也是以字长为单位的数组, 但是正如其名, 它只能进行入栈出栈的基本操作. 在进行计算时, 操作数被弹出栈, 计算完毕后再入栈.

  • 帧数据区
    帧数据区的任务主要有:

    • 记录指向类的常量池的指针, 以便于解析.
    • 帮助方法的正常返回, 包括恢复调用该方法的栈帧, 设置PC寄存器指向调用方法对应的下一条指令, 把返回值压入调用栈帧的操作数栈中.
    • 记录异常表, 发生异常时将控制权交由对应异常的catch子句, 如果没有找到对应的catch子句, 会恢复调用方法的栈帧并重新抛出异常.

局部变量区和操作数栈的大小依照具体方法在编译时就已经确定. 调用方法时会从方法区中找到对应类的类型信息, 从中得到具体方法的局部变量区和操作数栈的大小, 依此分配栈帧内存, 压入Java栈.

深拷贝和浅拷贝

在 Java 中除了基本数据类型primitive之外, 还存在类的实例对象这个引用数据类型, 而一般使用=号做赋值操作的时候对于基本数据类型是拷贝的它的值, 但是对于对象而言, 其实赋值的只是这个对象的引用, 将原对象的引用传递过去, 他们实际上还是指向的同一个对象.

浅拷贝和深拷贝就是在这个基础之上做的区分
如果在拷贝这个对象的时候, 只对基本数据类型进行了拷贝, 而对引用数据类型只是进行了引用的传递, 而没有真实的创建一个新的对象, 则认为是浅拷贝. 反之, 在对引用数据类型进行拷贝的时候, 创建了一个新的对象, 并且复制其内的成员变量, 则认为是深拷贝.

如果一个对象内部只有基本数据类型, 那用 clone() 方法获取到的就是这个对象的深拷贝, 而如果其内部还有引用数据类型, 那用 clone() 方法就是一次浅拷贝的操作

进行一个深拷贝比较常用的方案有两种:

  • 序列化这个对象再反序列化回来, 就可以得到这个新的对象, 无非就是序列化的规则需要我们自己来写.
  • 利用 clone() 方法, 重写 clone() 方法, 可以对其内的引用类型的变量(以及再下面的变量), 都进行一次 clone(), 确保对象内部的对象也是深拷贝.

posted on 2022-01-15 21:15  Milton  阅读(218)  评论(0编辑  收藏  举报

导航