String vs &str in Rust

Most likely, soon after you’ve started your Rust journey, you ran into this scenario where you tried to work with string types (or should I say, you thought you were?), and the compiler refused to compile your code because of something that looks like a string, actually isn’t a string.

当你开始Rust的学习之旅后,很可能遇到需要使用字符串的场景,但是编译器却无法让你的代码通过编译,因为有一部分代码,看起来像字符串,事实上却又不是。

For example, let’s take a look at this super simple function greet(name: String) which takes something of type String and prints it to screen using the println!() macro:

例如,让我们看看下面这个简单的函数greet(name: String),这个函数接收一个String类型的参数,然后使用println!()这个宏将它打印到屏幕上:

fn main() {
  let my_name = "Pascal";
  greet(my_name);
}

fn greet(name: String) {
  println!("Hello, {}!", name);
}

Compiling this code will result in a compile error that looks something like this:

编译这段代码会产生下面的编译错误:

error[E0308]: mismatched types
 --> src/main.rs:3:11
  |
3 |     greet(my_name);
  |           ^^^^^^^
  |           |
  |           expected struct `std::string::String`, found `&str`
  |           help: try using a conversion method: `my_name.to_string()`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0308`.

You can see this behaviour in action here. Just hit the “Run” button and look at the compiler output.

你可以在这里运行代码。只要点击Run按钮就可以看到编译输出。

Luckily, Rust’s compiler is very good at telling us what’s the problem. Clearly, we’re dealing with two different types here: std::string::String, or short String, and &str. While greet() expects a String, apparently what we’re passing to the function is something of type &str. The compiler even provides a hint on how it can be fixed. Changing line 3 to let my_name = "Pascal".to_string(); fixes the issue.

幸运地是, Rust编译器很友好地告诉了我们问题所在。很明显,这里我们使用了两个不同的类型: std::string::String,简写为String,和&str。但是greet() 期望传入一个String, 很显然,我们传给函数的类型是&str。 编译器甚至已经提示我们如何修正这个错误。 把第3行改为let my_name= "Pascal".to_string();即可修正这个问题。

What’s going on here? What is a &str? And why do we have to perform an explicit conversion using to_string()?

这里发生了什么? &str是什么? 为什么我们不得不使用to_string()执行一个显式的转换?

Understanding the String type

To answer these questions, it’s beneficial to have a good understanding of how Rust stores data in memory. If you haven’t read our article on Taking a closer look at Ownership in Rust yet, I highly recommend checking it out first.

要回答这些问题,需要我们很好地理解Rust是如何在内存中存储数据的。如果你还没有阅读我们的文章 Taking a closer look at Ownership in Rust, 我强烈推荐你先去阅读一下。

Let’s take the example from above and look at how my_name is stored in memory, assuming that it’s of type String (e.g we’ve used .to_string() as the compiler suggested):

让我们以上面的代码为例,看看my_name是如何在内存中存储的,先假定它是String类型(我们已经按照编译器提示使用了 .to_string()):

                     buffer
                   /   capacity
                 /   /  length
               /   /   /
            +–––+–––+–––+
stack frame │ • │ 86 │ <- my_name: String
            +–│–+–––+–––+
              │
            [–│–––––––– capacity –––––––––––]
              │
            +–V–+–––+–––+–––+–––+–––+–––+–––+
       heap │ P │ a │ s │ c │ a │ l │   │   │
            +–––+–––+–––+–––+–––+–––+–––+–––+

            [––––––– length ––––––––]

Rust will store the String object for my_name on the stack. The object comes with a pointer to a heap-allocated buffer which holds the actual data, the buffer’s capacity and the length of the data that is being stored. Given this, the size of the String object itself is always fixed and three words long.

Rust会在栈上存储String对象。这个对象里包含以下三个信息: 一个指针指向一块分配在堆上的缓冲区,这也是数据真正存储的地方,数据的容量和长度。因此,String对象本身长度总是固定的三个字(word)。

One of the things that make a String a String, is the capability of resizing its buffer if needed. For example, we could use its .push_str() method to append more text, which potentially causes the underlying buffer to increase in size (notice that my_name needs to be mutable to make this work):

String之所以为String的一个原因在于它能够根据需要调整缓冲区的容量。例如,我们能够使用push_str()方法追加更多的文本,这种追加操作可能会引起缓冲区的增长。(注意,my_name需要是可变(mutable)的):

let mut my_name = "Pascal".to_string();
my_name.push_str( " Precht");

In fact, if you’re familiar with Rust’s Vec<T> type, you already know what a String is because it’s essentially the same in behaviour and characteristics, just with the difference that it comes with guarantees of only holding well-formed UTF-8 text.

事实上, 如果你熟悉Rust的Vec<T>类型,你就可以理解String是什么样子的了。因为它们的行为和特性在本质上是相同的,唯一不同地是,String保证内部只保存标准的UTF-8文本。

Understanding string slices

String slices (or str) are what we work with when we either reference a range of UTF-8 text that is “owned” by someone else, or when we create them using string literals.

当我们需要引用一个被拥有的UTF-8文本的区间(range),或者当我们使用字符串字面量(string literals)时,我们就需要使用字符串切片(也就是 str)。

If we were only interested in the last name stored in my_name, we can get a reference to that part of the string like this:

如果我们只是对存储在my_name中的last name感兴趣,我们可以像下面这样来获取一个针对字符串中的特定部分的引用:

let mut my_name = "Pascal".to_string();
my_name.push_str( " Precht");

let last_name = &my_name[7..];

By specifying the range from the 7th byte (because there’s a whitespace) until the end of the buffer (”..”), last_name is now a string slice referencing text owned by my_name. It borrows it. Here’s what it looks like in memory:

通过指定从第7个字节(因为有空格)开始一直到缓冲区的结尾(".."),last_name现在是一个引用自my_name拥有的文本的字符串切片(string slice)。它借用了这个文本。这里是它在内存中的样子:

my_name: String   last_name: &str
            [––––––––––––]    [–––––––]
            +–––+––––+––––+–––+–––+–––+
stack frame │ • │ 1613 │   │ • │ 6+–│–+––––+––––+–––+–│–+–––+
              │                 │
              │                 +–––––––––+
              │                           │
              │                           │
              │                         [–│––––––– str –––––––––]
            +–V–+–––+–––+–––+–––+–––+–––+–V–+–––+–––+–––+–––+–––+–––+–––+–––+
       heap │ P │ a │ s │ c │ a │ l │   │ P │ r │ e │ c │ h │ t │   │   │   │
            +–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+

Notice that last_name does not store capacity information on the stack. This is because it’s just a reference to a slice of another String that manages its capacity. The string slice, or str itself, is what’s considered ”unsized”. Also, in practice string slices are always references so their type will always be &str instead of str.

注意last_name没有在栈上存储容量信息。这是因为它只是对一个字符串切片的引用,而该字符串管理它的容量。这个字符串切片,即str本身,是不确定大小(unsized)的。 而且,在实际使用中,字符串切片总是以引用的形式出现,也就是它们的类型总是&str而不是str

Okay, this explains the difference between String&String and str and &str, but we haven’t actually created such a reference in our original example, did we?

上面已经解释了String,&String,和str以及&str的区别,但是我们还没有在最开始的示例中创建过这样的引用,不是吗?

Understanding string literals

As mentioned earlier, there are two cases when we’re working with string slices: we either create a reference to a sub string, or we use string literals.

正如前面所提到的,有两种情况我们需要使用字符串切片:要么创建一个对子字符串的引用,或者我们使用字符串字面量(string literals)。

A string literal is created by surrounding text with double quotes, just like we did earlier:

一个字符串字面量由一串被双引号包含的文本创建,就像我们之前写的:

let my_name = "Pascal Precht"; // This is a `&str` not a `String`

The next question is, if a &str is a slice reference to a String owned by someone else, who is the owner of that value given that the text is created in place?

下一个问题是,如果&str是一个引用了被(某人)拥有的String的切片,假定这个文本在适当的地方被创建,那么这么String的所有者是谁?

It turns out that string literals are a bit special. They are string slices that refer to “preallocated text” that is stored in read-only memory as part of the executable. In other words, it’s memory that ships with our program and doesn’t rely on buffers allocated in the heap.

很显然,字符串字面量有点特殊。他们是引用自“预分配文本(preallocated text)”的字符串切片,这个预分配文本存储在可执行程序的只读内存中。换句话说,这是装载我们程序的内存并且不依赖于在堆上分配的缓冲区。
That said, there’s still an entry on the stack that points to that preallocated memory when the program is executed:

也就是说,栈上还有一个入口,指向当程序执行时预分配的内存。

            my_name: &str
            [–––––––––––]
            +–––+–––+
stack frame │ • │ 6+–│–+–––++––+                
                 │
 preallocated  +–V–+–––+–––+–––+–––+–––+
 read-only     │ P │ a │ s │ c │ a │ l │
 memory        +–––+–––+–––+–––+–––+–––+

With a better understanding of the difference between String and &str, there’s probably another question that comes up.

当我们对String&str的区别有了更好的理解之后,另一个问题也就随之而来了。

Which one should be used?

Obviously, this depends on a number of variables, but generally, it’s safe to say that, if the API we’re building doesn’t need to own or mutate the text it’s working with, it should take a &str instead of a String. This means, an improved version of the original greet() function would look like this:

显然,这取决于很多因素,但是一般地,保守来讲,如果我们正在构建的API不需要拥有或者修改使用的文本,那么应该使用&str而不是String。这意味着,我们可以改进一下最原始的greet()函数:

fn greet(name: &str) {
  println!("Hello, {}!", name);
}

Wait, but what if the caller of this API really only has a String and can’t convert it to a &str for unknown reasons? No problem at all. Rust has this super powerful feature called deref coercing which allows it to turn any passed String reference using the borrow operator, so &String, to a &str before the API is executed. This will be covered in more detail in another article.

等一下,但是如果这个API的调用者真的有一个String并且出于某些未知原因无法将其转换成&str呢?完全没有问题。Rust有一个超级强大的特性叫做deref coercing,这个特性能够允许把传进来的带有借用操作符的String引用,也就是&String,在API执行之前转成&str。我们会在另一篇文章里介绍更多地相关细节。

Our greet() function therefore will work with the following code:

因此,我们的greet()函数在下面代码中也可以正常工作:

fn main() {
  let first_name = "Pascal";
  let last_name = "Precht".to_string();

  greet(first_name);
  greet(&last_name); // `last_name` is passed by reference
}

fn greet(name: &str) {
  println!("Hello, {}!", name);
}

See it in action here!

这里可以运行代码。

That’s it! I hope this article was useful. There’s an interesting discussion on Reddit about this content as well! Let me know what you think or what you would like to learn about next on twitter or sign up for the Rust For JavaScript Developers mailing list!

这就是本文全部内容,希望这篇文章对你有用。关于这部分内容,Reddit上有一个很有意思的讨论

 

posted @ 2023-09-25 10:45  ImreW  阅读(16)  评论(0编辑  收藏  举报