Peter's Adventures in Ruby: Creating Ruby strings in C
Peter's Adventures in Ruby: Creating Ruby strings in C
This is an article in a multi-part series called “Peter’s Adventures in Ruby”
Introduction
Creating a string in Ruby is probably one of the easiest things you can do in the language, you can create it just like this:
my_string = "Hello world!"
But when you’re developing MRI itself or writing a C extension, you are given many ways to create a string. So which one do you choose? Just use the one called rb_str_new
? Pick at random? What’s the worst that can happen right? Turns out, the one you choose will have an impact on the performance and, most importantly, the correctness of your program. At the end, I’ll also share a real story about the problems that happen when the wrong way to create a string is used.
In fact, there are a total of 24 ways (in Ruby 2.7) to create a string using the C API (and there are many, many more ways inside MRI). I will talk about the three most common ways to create strings through the Ruby C API. Many of the others are variations of these three and are self-explanatory (e.g. creating a string with a specific encoding).
Ways to create strings in Ruby’s C API
rb_str_new
VALUE rb_str_new(const char *ptr, long len);
This one is pretty straightforward. It takes a pointer ptr
to an array of characters and the length len
of the string and returns the VALUE
pointer to the created Ruby string object. Note that the created object points to a copy of the character array, so you can change the contents of ptr
afterward without affecting the Ruby string.
Example:
char *c_str = malloc(13);
strcpy(c_str, "Hello world!");
VALUE my_string = rb_str_new(c_str, 12);
free(c_str);
rb_str_buf_new
VALUE rb_str_buf_new(long capa);
This one is also pretty straightforward. It just creates an empty string with a buffer that is capa
in length. If you know ahead of time the size or approximate size of the string you’re going to create, it is efficient to set capa
to that size. Of course, if you set capa
to be larger than what you need you’ll be wasting memory.
Example:
VALUE my_string = rb_str_buf_new(c_str, 12);
rb_str_cat_cstr(my_string, "Hello world!");
rb_str_new_static
VALUE rb_str_new_static(const char *ptr, long len);
This looks awfully similar to rb_str_new
doesn’t it? It actually works quite differently! This function requires you to pass a C string literal or a malloc
‘d region which is NEVER free
‘d (or at least not free
‘d until this string has been garbage collected). This function creates a string without allocating extra memory for the string, meaning the created string object points directly to the character array pointer.
Example:
VALUE my_string = rb_str_new_static("Hello world!", 12);
So, what happens if you use the wrong one?
See my article on The Ruby inplace bug.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix