Peter's Adventures in Ruby: The Ruby inplace bug
Peter's Adventures in Ruby: The Ruby inplace bug
This is an article in a multi-part series called “Peter’s Adventures in Ruby”
It is strongly recommended that you read my article on the ways to create Ruby strings in C extensions if you’re not familiar with Ruby’s C string API
Here’s a story of string corruption in MRI when the wrong function is used to create the Ruby string. But before that, let me explain a few Ruby features that you may not know.
Ruby features that you may not know
Editing a file in place
Consider the following script:
while gets
puts $_.gsub(/perl/, "ruby")
end
And then we run:
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby -i script.rb temp.txt
$ cat temp.txt
I like ruby, it is my favourite language.
As you can see, with the -i
flag we can read the text file line-by-line through gets
, and replace that line with whatever is in the standard output (the puts
).
Backup files
Let’s run the script above again, but instead of the -i
we pass to Ruby, let’s pass -i.bak
:
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby -i.bak script.rb temp.txt
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txt.bak
I like perl, it is my favourite language.
All this does is create a backup file (with the extension .bak
in our case) with the original contents before modifying it.
Run the Ruby script line-by-line
Did you know we can do the above, but with a single line of Ruby code? Consider the following Ruby script:
$_.gsub!(/perl/, "ruby")
And then we run it as the following (notice we are running the script with an extra -p
flag):
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby -pi.bak script.rb temp.txt
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txt.bak
I like perl, it is my favourite language.
The -p
flag conviniently wraps your code around an implicit while gets(); ...; puts $_; end
.
BEGIN
blocks
One question you might ask yourself is, if I use the -p
flag, how do I do global setup? Ruby’s got a feature for that! Enter BEGIN
blocks.
Note: Ruby’s keywords are case sensitive, so
BEGIN
is not the same asbegin
.
So now, if we do:
BEGIN {
puts "It is starting!"
}
$_.gsub!(/perl/, "ruby")
And then we can run it as follows:
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby -pi.bak script.rb temp.txt
It is starting!
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txt.bak
I like perl, it is my favourite language.
Notice we get the It is starting!
output in the terminal, and not in the file.
Shebang
We want to minimize the number of characters we have to type. So we can omit the flags from the terminal and instead include it in the file using a shebang line.
#!/usr/bin/ruby -pi.bak
BEGIN {
puts "It is starting!"
}
$_.gsub!(/perl/, "ruby")
And then we can run it as follows:
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby script.rb temp.txt
It is starting!
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txt.bak
I like perl, it is my favourite language.
In fact, Ruby doesn’t actually check that the binary in the shebang is valid, just that it ends in ruby
. So the following shebang would have worked too:
#!/some/invalid/dir/ruby -pi.bak
The Ruby inplace bug
Ok, we finally have everything we need to reproduce the bug. Consider the following script:
#!/usr/bin/ruby -pi.bak
BEGIN {
GC.start(full_mark: true)
arr = []
1000000.times do |x|
arr << "fooo#{x}"
end
}
$_.gsub!(/perl/, "ruby")
So what do we expect it to do? Well, this is pretty much the same script as the one in example #5 (while doing some seemingly useless work in the BEGIN
block).
So let’s run this script:
$ echo "I like perl, it is my favourite language." > temp.txt
$ ruby script.rb temp.txt
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txt.bak
cat: temp.txt.bak: No such file or directory
$ ls
script.rb
temp.txt
temp.txto106
Wait what!?!? Where is our backup file temp.txt.bak
? And what is temp.txto106
? Let’s inspect these files.
$ cat temp.txt
I like ruby, it is my favourite language.
$ cat temp.txto106
I like perl, it is my favourite language.
It seems like temp.txto106
contains the original file (the one that should have been the backup file). So what’s going on? You might have guessed it, because the wrong C function was used to create Ruby strings!
So, what’s wrong?
When you pass the flag -i.bak
to Ruby, it parses the arguments in your shebang from a Ruby string read from the contents of the Ruby script1. We then set the extension of the backup file by calling ruby_set_inplace_mode
. This function is really simple, it’s defined like this.
void
ruby_set_inplace_mode(const char *suffix)
{
ARGF.inplace = !suffix ? Qfalse : !*suffix ? Qnil : rb_fstring_cstr(suffix);
}
The problem here is the usage of rb_fstring_cstr
, which (similar to rb_str_new_static
) expects a pointer to a region of memory that is not free
‘d before this Ruby string is swept (because it sets the pointer of the Ruby string directly to the string passed in).
To explain using diagrams (don’t we all love diagrams?), we have a Ruby string of the flags in the shebang stored in argv
. We then call ruby_set_inplace_mode
and set ARGF.inplace
to a Ruby string that points to .bak
.

But then when we no longer need argv
, but ARGF.inplace
is still pointing to the string that argv
was referencing.

So then we call GC.start
which guarantees that argv
is swept, and then we create a large amount of fooo#{x}
strings. This helps us make sure the string -pi.bak
is overwritten and so we get ARGF.inplace
pointing to some other value. So then once the script terminates, the backup extension we use is o106
(or some other gibberish).

The fix
The fix only changes one line and a few characters!2 The solution is instead of calling rb_fstring_cstr
to create the string, we use rb_str_new
, which allocates memory for the string and copies over the contents of the string. Once we do that, we no longer need to ensure that the original string is not free
‘d.
Acknowledgements
I would like to acknowledge Matt Valentine-House as the co-discoverer of the bug and the co-author of the fix.
-
The string is created in
load_file_internal
and parsed inproc_options
. ↩
https://blog.peterzhu.ca/ruby-inplace-bug/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix