Peter's Adventures in Ruby: Creating Ruby strings in C

Peter's Adventures in Ruby: Creating Ruby strings in C

This is an article in a multi-part series called “Peter’s Adventures in Ruby”

Introduction 

Creating a string in Ruby is probably one of the easiest things you can do in the language, you can create it just like this:

my_string = "Hello world!"

But when you’re developing MRI itself or writing a C extension, you are given many ways to create a string. So which one do you choose? Just use the one called rb_str_new? Pick at random? What’s the worst that can happen right? Turns out, the one you choose will have an impact on the performance and, most importantly, the correctness of your program. At the end, I’ll also share a real story about the problems that happen when the wrong way to create a string is used.

In fact, there are a total of 24 ways (in Ruby 2.7) to create a string using the C API (and there are many, many more ways inside MRI). I will talk about the three most common ways to create strings through the Ruby C API. Many of the others are variations of these three and are self-explanatory (e.g. creating a string with a specific encoding).

Ways to create strings in Ruby’s C API 

rb_str_new 

VALUE rb_str_new(const char *ptr, long len);

This one is pretty straightforward. It takes a pointer ptr to an array of characters and the length len of the string and returns the VALUE pointer to the created Ruby string object. Note that the created object points to a copy of the character array, so you can change the contents of ptr afterward without affecting the Ruby string.

Example:

char *c_str = malloc(13);
strcpy(c_str, "Hello world!");
VALUE my_string = rb_str_new(c_str, 12);
free(c_str);

rb_str_buf_new 

VALUE rb_str_buf_new(long capa);

This one is also pretty straightforward. It just creates an empty string with a buffer that is capa in length. If you know ahead of time the size or approximate size of the string you’re going to create, it is efficient to set capa to that size. Of course, if you set capa to be larger than what you need you’ll be wasting memory.

Example:

VALUE my_string = rb_str_buf_new(c_str, 12);
rb_str_cat_cstr(my_string, "Hello world!");

rb_str_new_static 

VALUE rb_str_new_static(const char *ptr, long len);

This looks awfully similar to rb_str_new doesn’t it? It actually works quite differently! This function requires you to pass a C string literal or a malloc‘d region which is NEVER free‘d (or at least not free‘d until this string has been garbage collected). This function creates a string without allocating extra memory for the string, meaning the created string object points directly to the character array pointer.

Example:

VALUE my_string = rb_str_new_static("Hello world!", 12);

So, what happens if you use the wrong one? 

See my article on The Ruby inplace bug.

posted @   unicornsir  阅读(38)  评论(0编辑  收藏  举报
编辑推荐:
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
阅读排行:
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
点击右上角即可分享
微信分享提示