Golang基础-Runes
rune与string
The rune
type in Go is an alias for int32
. Given this underlying int32
type, the rune
type holds a signed 32-bit integer value. However, unlike an int32
type, the integer value stored in a rune
type represents a single Unicode character.
myRune := '¿'
fmt.Printf("myRune type: %T\n", myRune)
// Output: myRune type: int32
fmt.Printf("myRune value: %v\n", myRune)
// Output: myRune value: 191
fmt.Printf("myRune Unicode character: %c\n", myRune)
// Output: myRune Unicode character: ¿
fmt.Printf("myRune Unicode code point: %U\n", myRune)
// Output: myRune Unicode code point: U+00BF
myRune := rune(0xbf)
myRune = 191
fmt.Printf("myRune Unicode character: %c\n", myRune)
// Output: myRune Unicode character: ¿
一个字符串是一个不可改变的字节序列。字符串可以包含任意的数据,包括byte值0,但是通常是用来包含人类可读的文本。文本字符串通常被解释为采用UTF8编码的Unicode码点 (rune)序列。
内置的len函数可以返回一个字符串中的字节数目(不是rune字符数目),索引操作s[i]返回第i 个字节的字节值,i必须满足0 ≤ i< len(s)条件约束。如果试图访问超出字符串索引范围的字节将会导致panic异常。
第i个字节并不一定是字符串的第i个字符,因为对于非ASCII字符的UTF8编码会要两个或多个 字节。
- 一个字节8bit
- int32是4个字节
- UTF8每个文字的字节长度是变化的(1到4)
- range遍历string的每个文字,而不是byte
- 单引号是rune(一个字符,byte,int32),双引号是string(字符串)
myString := "❗hello"
for index, char := range myString {
fmt.Printf("Index: %d\tCharacter: %c\t\tCode Point: %U\n", index, char, char)
}
// Output:
// Index: 0 Character: ❗ Code Point: U+2757
// Index: 3 Character: h Code Point: U+0068
// Index: 4 Character: e Code Point: U+0065
// Index: 5 Character: l Code Point: U+006C
// Index: 6 Character: l Code Point: U+006C
// Index: 7 Character: o Code Point: U+006F
常用函数
UTF8解码
import "unicode/utf8"
for i := 0; i < len(s); {
r, size := utf8.DecodeRuneInString(s[i:])
fmt.Printf("%d\t%c\n", i, r)
i += size
}
字符串的字节长度和文字长度
import "unicode/utf8"
myString := "❗hello"
stringLength := len(myString)
numberOfRunes := utf8.RuneCountInString(myString)
fmt.Printf("myString - Length: %d - Runes: %d\n", stringLength, numberOfRunes)
// Output: myString - Length: 8 - Runes: 6
类型转换
myRuneSlice := []rune{'e', 'x', 'e', 'r', 'c', 'i', 's', 'm'}
myString := string(myRuneSlice)
fmt.Println(myString)
// Output: exercism
myString := "exercism"
myRuneSlice := []rune(myString)
fmt.Println(myRuneSlice)
// Output: [101 120 101 114 99 105 115 109]
- string是紧凑的不可改变到字节序列
- 所谓到紧凑就是英文1字节,中文3字节等,紧凑排列
- rune是int32的别名,把每个字符统一扩展到4字节
- rune是单个字符,多个应使用rune slice,相当于int32的数组
Exercise
package logs
import "unicode/utf8"
// Application identifies the application emitting the given log.
func Application(log string) string {
for _, c := range log {
if c == '❗' {
return "recommendation"
} else if c == '🔍' {
return "search"
} else if c == '☀' {
return "weather"
}
}
return "default"
}
// Replace replaces all occurrences of old with new, returning the modified log
// to the caller.
func Replace(log string, oldRune, newRune rune) string {
res := ""
for _, c := range log {
if c == oldRune {
res += string(newRune)
} else {
res += string(c)
}
}
return res
}
// WithinLimit determines whether or not the number of characters in log is
// within the limit.
func WithinLimit(log string, limit int) bool {
return utf8.RuneCountInString(log) <= limit
}