对non-bpm生僻字的处理
1,问题
有一个接口返回的字符串中有生僻字,但是mysql表用的字符集是utf8 即 utf8mb3,报错是
### Cause: java.sql.SQLException: Incorrect string value: '\xF0\xA8\xBA\x93\xE7\x94...' for column 'response_json' at row 1 ; uncategorized SQLException; SQL state [HY000]; error code [1366]; Incorrect string value: '\xF0\xA8\xBA\x93\xE7\x94...' for column 'response_json' at row 1; nested exception is java.sql.SQLException: Incorrect string value: '\xF0\xA8\xBA\x93\xE7\x94...' for column 'response_json' at row 1]
2,解决
两种解决方法,一种是将表字段的字符集改为utf8-mb4,另一种是将字符中的生僻字转为unicode,第一种方法会影响后续业务,所以采用第二种
参考文章有:MySQL 5.7 版本的 UTF8 字符集调研 , Java Unicode编码 及 Mysql utf8 utf8mb3 utf8mb4 的区别与utf8mb4的过滤
代码如下:
private String covertNonBmpString(String input){ StringBuilder origin = new StringBuilder(input); StringBuilder sb = new StringBuilder(); for (int i = 0; i < origin.length(); ) { int count = Character.charCount(origin.codePointAt(i)); if (count > 1) { for (int j = 0; j < count; j++) { String hex = Integer.toHexString(origin.charAt(i + j)); sb.append("\\u").append(hex); } } else { sb.append(origin.charAt(i)); } i += count; } return sb.toString(); }