关于字符串效率问题
最近开始着手研究一下字符串的效率问题,大致了解了一下常规提高性能的方法。现在考虑以前碰到的一个题目:有一个很长的字符串,需要统计其中字母的出现频率。
我的大致思路是:
1 不管怎么样,都得循环一遍,最好能保证循环一遍就统计完
2 在循环的时候,涉及到对比的情况
比如,循环碰到字符f,起码f当前的次数有个存储的地方,另外就是要更新当前的出现次数,首先在缓存的地方找到对应的位置
按照我的想法,主要在2里面提高效率了。正常写法,每循环一个字符,然后在对比26遍(假设主要小写字母,然后更新当前字符出现的频率!
目前我想到的就是用哈希表,相当于直接定位,统计! 程序性能如下:
大约是 543w的字符长度,用了375毫秒,测试代码如下:
public static void Test()
{
DateTime a=DateTime.Now;
Console.WriteLine(string.Concat("开始",a.ToString("yyyy-MM-dd hh:mm:ss")));
int lenght;
string strFilePath="c:\\11.txt";
StreamReader reader=null;
reader=new StreamReader(strFilePath);
string str =reader.ReadToEnd();
//增加点长度
str = string.Concat(str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str);
lenght=str.Length;
System.Collections.Hashtable ha = new System.Collections.Hashtable();
for (int k = 0; k < 26;k++)
ha.Add(97+k, 0);
for (int k = 0; k < str.Length; k++)
{
if (ha.ContainsKey(Asc(str[k].ToString())))
ha[Asc(str[k].ToString())] = int.Parse(ha[Asc(str[k].ToString())].ToString()) + 1;
}
int max=int.Parse(ha[97].ToString());
string cs = "a";
for (int k = 0; k < 26; k++)
{
if (int.Parse(ha[97+k].ToString()) > max)
{
max = int.Parse(ha[97 + k].ToString());
cs = Chr(97 + k);
}
}
DateTime b=DateTime.Now;
Console.WriteLine(string.Concat("结束:", b.ToString("yyyy-MM-dd 24hh:mm:ss")));
Console.WriteLine(string.Concat("总长为", lenght.ToString(), "的字符串中出现次数最多的字符是", cs, "次数为", max.ToString(), "用时", b.Subtract(a).Milliseconds.ToString(), "毫秒"));
Console.ReadLine();
}
{
DateTime a=DateTime.Now;
Console.WriteLine(string.Concat("开始",a.ToString("yyyy-MM-dd hh:mm:ss")));
int lenght;
string strFilePath="c:\\11.txt";
StreamReader reader=null;
reader=new StreamReader(strFilePath);
string str =reader.ReadToEnd();
//增加点长度
str = string.Concat(str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str, str);
lenght=str.Length;
System.Collections.Hashtable ha = new System.Collections.Hashtable();
for (int k = 0; k < 26;k++)
ha.Add(97+k, 0);
for (int k = 0; k < str.Length; k++)
{
if (ha.ContainsKey(Asc(str[k].ToString())))
ha[Asc(str[k].ToString())] = int.Parse(ha[Asc(str[k].ToString())].ToString()) + 1;
}
int max=int.Parse(ha[97].ToString());
string cs = "a";
for (int k = 0; k < 26; k++)
{
if (int.Parse(ha[97+k].ToString()) > max)
{
max = int.Parse(ha[97 + k].ToString());
cs = Chr(97 + k);
}
}
DateTime b=DateTime.Now;
Console.WriteLine(string.Concat("结束:", b.ToString("yyyy-MM-dd 24hh:mm:ss")));
Console.WriteLine(string.Concat("总长为", lenght.ToString(), "的字符串中出现次数最多的字符是", cs, "次数为", max.ToString(), "用时", b.Subtract(a).Milliseconds.ToString(), "毫秒"));
Console.ReadLine();
}
估计应该还有更好得算法。