The format for an URI is defined in RFC 3986. See section 3.3 for details.
-------------------
You are best keeping only some characters (whitelist) instead of removing certain characters (blacklist).
You can technically allow any character, just as long as you properly encode it. But, to answer in the spirit of the question, you should only allow these characters:
- Lower case letters (convert upper case to lower)
- Numbers, 0 through 9
- A dash - or underscore _(在将一些字串转成base64时,我一般将/转成_,+转成-)
- Tilda ~
Everything else has a potentially special meaning. For example, you may think you can use +, but it can be replaced with a space. & is dangerous, too, especially if using some rewrite rules.
As with the other comments, check out the standards and specifications for complete details.