Writing Efficient String Functions in C#
The .NET Framework provides a set of powerful string functions. These building blocks can be used to write more complex algorithms for handling string data. However developers aiming to write fast and efficient string functions must be careful of how they use those building blocks.
To write efficient string handling functions, it is important to understand the characteristics of string objects in C#.
String Characteristics
First and foremost it is important to know that strings in .NET are class objects. There is no difference between the types System.String and string, they are both class objects. Unlike value types, class objects are stored in the heap (instead of the stack). This is an important fact because it means that creating a string object can trigger garbage collection, which is costly in terms of performance. In terms of string functions, this means we want to avoid creating new strings as much as possible.
However that is easier said than done. Another important thing about strings in .NET is that they are immutable. This means string objects cannot be modified. To edit a string object, you have to instead create a new string that will have the modification.
Working with Characters
The solution is to work with characters instead of strings as much as possible. The char object in C# is a value type, which means all char variables are stored in the stack. Furthermore, since a string is a collection of characters, converting between chars and strings is very simple.
To convert a string to a char array, use the ToCharArray() .NET function:
string myStr = “hello world”;
char[] myStrChars = myStr.ToCharArray();
To convert a char array back to a string, simply create a new instance of a string:
char[] myChars = { ‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘ ‘, ‘w’, ‘o’, ‘r’, ‘l’, ‘d’ };
string myStr = new string(myChars);
Writing efficient string functions thus boils down to working with char arrays. However you might remember that arrays are stored in the heap. Thus there isn’t much difference between working with a string and a character array in terms of performance if we end up handling arrays in the same way as strings.
Yet this does not mean working with array is not faster. For one thing, we can make use of dynamic arrays such as List (or ArrayList in .NET Framework 1.1) to make our array management as efficient as possible.
Example Function
Let's write a very simple string function and compare the difference between using strings and char arrays. The function will capitalize all the vowels in a string (working with the English alphabet), and make all other characters lowercase.
Using just strings:
public string CapitalizeVowels(string input)
{
if (string.IsNullOrEmpty(input)) //since a string is a class object, it could be null
return string.Empty;
else
{
string output = string.Empty;
for (int i = 0; i < input.Length; i++)
{
if (input[i] == 'a' || input[i] == 'e' ||
input[i] == 'i' || input[i] == 'o' ||
input[i] == 'u')
output += input[i].ToString().ToUpper(); //Vowel
else
output += input[i].ToString().ToLower(); //Not vowel
}
return output;
}
}
Using character arrays:
public string CapitalizeVowels(string input)
{
if (string.IsNullOrEmpty(input)) //since a string is a class object, it could be null
return string.Empty;
else
{
char[] charArray = input.ToCharArray();
for (int i = 0; i < charArray.Length; i++)
{
if (charArray[i] == 'a' || charArray[i] == 'e' ||
charArray[i] == 'i' || charArray[i] == 'o' ||
charArray[i] == 'u')
charArray[i] = char.ToUpper(charArray[i]); //Vowel
else
charArray[i] = char.ToLower(charArray[i]); //Not vowel
}
return new string(charArray);
}
}
Both functions will produce the exact same results given the same input data. We can perform some basic benchmarks to compare the performance of each function. For example, the string-based function took an average of 2181ms to process the string “hello world” 1,000,000 times while the array-based function only took 448ms (measured on my computer).
Conclusion
As with anything, working with character arrays to write efficient string functions in C# must be done with care. The code can quickly become less readable. When working with more complex string algorithms, the code can become very difficult to maintain. However since the transition between working with strings and working with character arrays is easy, a combination of both can reach an advantageous middle ground.