



Implement a console application to tally the frequency of words under a directory (2 modes).


For all text files under a directory (recursively) (file extensions: "txt", "cpp", "h", “cs”),    calculate the frequency of each word, and output the result into a text file.  Write the code in C++ or C#, using .Net Framework,  the running environment is 32-bit Win7 or  WinVista.


Run performance analysis tool on your code, find performance bottlenecks and improve.


Enable Code Quality Analysis for your code and get rid of all warnings.

Write  10 simple test cases to make sure your program can handle these cases correctly (e.g.  a good test case could be: one of the sub-directories is empty).



  • Submit your source code and exe to TA, TA will run it on his testing environment and check for
    • correctness   (incorrect program will get 0 points)
    • performance
    • write a blog (see blog requirement below)



  • A word: a string with at least 3 English alphabet letters, then followed by optional alphanumerical characters.  Words are separated by delimiters. If a string contains non-alphanumerical characters, it’s not a word.   Word is case insensitive,  i.e. “file”, “FILE” and “File” are considered the same word.

“hao123” is a word,  and “123hao” is NOT a word.


  • Alphabetic letters:  A-Z, a-z.
  • Alphanumerical characters: A-Z, a-z, 0-9.
  • Delimiter: space, non-alphanumerical letters (,.<>|\)[]{!@#$%^&*()_+=-}”).
  • Output text file: filename is <your email name>.txt
    • Each line has this format

<word>: number


         Where “number” is the number of times this word appears in the scan.  The output should be sorted with most frequently word first.  If 2 words have the same frequency, list the words by alphabetical order.



1)     Simple mode.   Output simple word frequency.

Myapp.exe <directory-name>

Will output <your-name>.txt file in current directory,  the text file contains word ranking list.

2)     Extended mode. 

This only applies to some special cases of words.   If 2 words are different only in the ending numbers,  we think they are the same number.  For example, we consider “win”, “win95” and “win7” are ONE WORD;  “Office” and “Office15” are the same;  “iPhone4” and “Iphone5” are the same word.   “win”  and “win32a” are DIFFERENT words, as the difference are more than just ending numbers. “21century” and “century” are DIFFERENT words too.


When running with “-e” command line parameter,

Myapp.exe –e <directory-name>


The app will output <your-name>.txt file  in current directory,  the text file contains word ranking list, but the frequency is calculated based on the extended mode definition.  

