Embarcadero加油

呵呵呵。

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

 

This chapter specifies the lexical structure of the Java programming language.

Programs are written in Unicode($$3.1),but lexical translations are provided($$3.2) so that Unicode escapes($$3.3) can be used to include any Unicode character using only AASCII characters.Line terminators are defined ($$3.4) to support the different conventions of existing host systems while maintaining consistent line numbers.

The Unicode characters resulting from the lexical translations are reduced to a sequence of input elements($$3.5),which are white space($$3.6),comments($$3.7),and tokens.The tokens are the identifiers($$3.8),keywords($$3.9),literals($$3.10),separators($$3.11),and operators($$3.12) of the syntactic grammar.

 

3.1 Unicode 

Programs are written using the Unicode character set.Information about this character set and its associated character encodings may be found at http://www.unicode.org/.

 

The Java SE platform tracks the Unicode specification as it evolves.The precise version of Unicode used by a given release is specified in the documentation of the class Character.

   Versions of the Java programming language prior to 1.1 used Unicode version 1.1.5. Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1(to Unicode 2.0),JDK 1.1.7(to Unicode 2.1),Java SE 1.4(to Unicode 3.0),and Java SE 5.0(to Unicode 4.0).

 

 The Unicode standard was originally designed as a fixed-width 16-bit character encoding.It has since been changed to allow for characters whose representation requires more than 16 bits.The range of legal code points is now U+0000 to U+10FFFF,using the hexadecimal U+n notation.Characters whose code points are greater than U+FFFF are called supplementary characters.To represent the complete range of characters using only 16-bit units,the Unicode standard defines an encoding called UTF-16.In this encoding,supplementary characters are represented as pairs of 16-bit code units,the first from the high-surrogates range,(U+D800 to U+DBFF),the second from the low-surrogates range(U+DC00 to U+DFFF).For characters in the range U+0000 to U+FFFF,the values of code points and UTF-16 code units are the same.

 

The Java programming language represents text in sequences of 16-bit code units,using the UTF-16 encoding.

   Some APIs of the Java SE platform ,primarily in the Character class,use 32-bit integers to represent code points as individual entities.The Java SE platform provides methods to convert between 16-bit and 32-bit representations.

 

This specification uses the terms code point and UTF-16 code unit where the representation is relevant,and the generic term character where the representation is irrelevant to the discussion.

 

Except for comments($$3.7),identifiers,and  the contents of character and string literals($$3.10.4,$$3.10.5),all input elements($#3.5) in a program are formed only from ASCII characters (or Unicode escapes($$3.3) which result in ASCII characters).

 

     ASCII (ANSI X3.4) is the American Standard Code for Information Interchange.The first 128 characters of the Unicode UTF-16 encoding are the ASCII characters.

 

3.2 Lexical Translations

A raw Unicode character stream is translated into a sequence of tokens,using the following three lexical translation steps,which are applied in turn:

1.A translation of Unicode escapes($$3.3) in the raw stream of Unicode characters to the corresponding Unicode character.A Unicode escape of the form \uxxxx,where xxxx is a hexadecimal value,represents the UTF-16 code unit whose encoding is xxxx.This translation step allows any program to be expressed using only ASCII characters.

 

2.A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators($$3.4).

 

3.A translation of the stream of input characters and line terminators resulting from step2  into  a sequence of input elements($$3.5) which ,after white space($$ 3.6)and comments($$3.7) are discarded ,comprise the tokens($$3.5) that are the terminal symbols of the syntactic grammar($$2.3).

 

The  longest possible translation is used at each step,even if the result does not ultimately make a correct program while another lexical translation would.

 

     Thus,the input characters a -- b are tokenized($$3.5) as a,--,b,which is not part of any grammatically correct program,even though the tokenization a ,-,-,b could be part of a grammatically correct program.

 

 3.3Unicode Escapes

 

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input,translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit($$ 3.1) of the indicated hexadecimal  value,and passing all other characters unchanged.Representing supplementary characters requires two consecutive Unicode escapes.This translation step results in a sequence of Unicode input characters.

 

UnicodeInputCharacter:

    UnicodeEscape

     RawInputCharacter

 

UnicodeEscape:

   \UnicodeMarker HexDigit HexDigit HexDigit HexDigit

 

UnicodeMarker:

 u

 UnicodeMarker u

  

RawInputCharacter:

  any Unicode character

 

HexDigit: one of

 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F 

 

The \,u ,and hexadecimal digits here are all ASCII characters.

 

In addtion to the processing implied by the grammar,for each raw input character that is a backslash \ ,input processing must consider how many other \ characters contiguously precede it,separating it fromm a non-\ character or the start of the input stream.If this number is even,then the \ is eligible to begin a Unicode escape;if the number is odd,then the \  is not eligible to begin a Unicode escape.

 

  For example,the raw input "\\u2297 = \u2297" results in the eleven characters " \  \ u 2 2 9 7 =X “

(\u2297 is the Unicode encoding of the character X)。(注意这里查号的意思,需要校对一下子。)

 

If an eligible \ is not followed by u, then it is treated as a RawInputCharacter and remains part of the escaped Unicode stream.

 

If an eligible \ is followed by u,or more than one u,and the last u is not followed by four hexadecimal digits,then a compile-time error occurs.

 

The character produced by a Unicode escape does not participate in further Unicode escapes.

 For example,the raw input \u005cu005a results in the six characters \ u 0 0 5 a,because 005c is the Unicode value for \.It does not result in the character Z,which is Unicode character 005a ,because the  \ that resulted from the \u005c is not interpreted as the start of a further Unicode escape.

 

The Java programming language specifies a standard way of transforming a program wirtten in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools.The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example,\uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.

 

This transformed version is equally acceptable to a Java compiler and represents the exact same program.The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u,while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

 

     A Java compiler should use the \uxxxx notation as an output format to display Unicode characters when a suitable font is not available.

 

  3.4 Line Terminators

 

A Java compiler next divides the sequence of Unicode input characters into lines by recognizing line terminators.

    

     LineTerminator:

              the ASCII LF character ,also known as "newline"

              the ASCII CR character,also known as "return"

             the ASCII CR character followed by the ASCII LF character 

InputCharacter:

      UnicodeInputCharacter but not CR or LF

 

Lines are terminated by the ASCII characters CR,or LF,or CR LF.The two characters CR immediately followed by LF are counted as one line terminator,not two .

 

A line terminator specifies the termination of the // form of a comment($$3.7).

    The lines defined by line terminators may determine the line numbers produced by a Java compiler.

The result is a sequence of line terminators and input characters,which are the terminal symbols for the third step in the tokenization process.

 

3.5 Input Elements and Tokens

The input characters and line terminators that result from escape processing ($$3.3) and then input line recognition($$3.4) are reduced to a sequence of input elements.

        Input:

         InputElementsopt Subopt

   

         InputElements:

             InputElement

             InputElements InutElement

 

         InputElement:

             WhiteSpace

              Comment

              Token

 

         Token:

                Identifier

                Keyword

               Literal 

                 Separator

                Operator

 

                Sub:

                       the ASCII SUB character,also known as "control - Z"

 

      Those input elements that are not white space($$3.6) or comments($$3.7) are tokens.

The tokens are the terminal symbols of the syntactic grammar($$2.3).

 

White space($$3.6) and comments($$3.7) can serve to separate tokens that ,if adjacent,might be tokenized in another manner.For example,the ASCII characters - and = in the input can form the operator token -=($3.12) only if there is no intervening white space or comment.

 

As a special concession for compatibility with certain operating systems,the ASCII SUB character(\u001a,or control-Z) is ignored if it is the last character in the escaped input stream.

 

Consider two tokens x and y in the resulting input stream.If x precedes y , then we say that x is to the left of y and that y is to the right of x.

 

 For example,in this simple piece of code:

 

 class Empty{

}

We say that the } token is to the right of the { token,even though it appears,in this two-dimensional representation,downward and to the left of the { token.This convention about the use of the words left and right allows us to speak,for example,of the right-hand operand of a binary operator or of the left-hand side of an assignment.

 

3.6 White Space

 

White space is defined as the ASCII space character ,horizontal tab character,form feed character,and line terminater characters($3.4).

 

 WhiteSpace:

       the ASCII SP character,also known as "space"

      the ASCII HT character,also known as "horizontal tab"

          the ASCII FF character,also know as "form feed"

        LineTerminator

 

3.7  Comments

There are two kinds of comments.

1>/*text*/

A traditional comment:all the text from the ASCII characters /* to the ASCII characters */ is ignored(as in C and C++).

2>//text

An end-of-line comment:all the text from the ASCII characters // to the end of the line is ignored(as in C++).

 

Comment:

     TraditionalComment

     EndOfLineComment

 

TraditionalComment:

     /* CommentTail

 

EndOfLineComment:

//CharactersInLineopt

 

CommentTail:

*CommentTailStar

NotStar CommentTail

 

CommentTailStar:

/

*CommentTailStar

NotStarNotSlash CommentTail

 

NotStar:

      InputCharacter but not *

      LineTerminator

 

NotStarNotSlash:

     InputCharacter but not * or /

     LineTerminator

 

CharactersInLine:

     InputCharacter

     CharactersInLine InputCharacter

 

These productions imply all of the following properties:

a)Comments do not nest.

b)/* and */ have no special meaning in comments that begin with //

c)// has no special meaning in comments that begin with /* or /**

 As a result, the text:

   /* this comment /*  //  /** ends here: */

is a single completle comment.

 

The lexical grammar implies that comments do not occur within character literals ($$3.10.4) or string literals($$3.10.5).

 

3.8 Identifiers

 

An identifier is an unlimited-length sequence of Java letters and Java digits,the first of which must be a Java letter.

          Identifier:

               IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral

          

          IdentifierChars:

               JavaLetter

              IdentifierChars JavaLetterOrDigit

       JavaLetter:

                any Unicode character that is a Java letter (see below)

       JavaLetterOrDigit:

                 any Unicode character that is a Java letter-or-digit (see below)

 

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.

A ”Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

 

  The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z(\u0041-\u005a),and a-z(\u0061-\u007a),and,for historial reasons,the ASCII underscore(_,or \u005f) and dollar sign($,or \u0024).The $ character should be used only in mechanically generated source code or,rarely,to access pre-existing names on legacy systems.

 

The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).

 

Letters and digits may be drawn from the entire Unicode character set,which supports most writing scripts in use in the world today,including the large sets for Chinese,Japanese,and Korean.This allows programmers to use identifiers in their programs that are written  in their native languages.

 

An identifier cannot have the same spelling(Unicode character sequence) as a keyword($$3.9),boolean literal($$3.10.3),or the null literal($$3.10.7),or a compile-time error occurs.

 

Two identifiers are the same only if they are identical,that is,have the same Unicode character for each letter or digit.Identifiers that have the same external appearance may yet be different.

 

 For example,the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A,\u0041),LATIN SMALL LELTTERL LA (a\u0061),GREEK CAPTIAL LETTER ALPHA(A,\u0391),CYRILLIC SMALL LETTER A (a,\u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a,\ud835\udc82) are all different.

 

Unicode composite characters are different from their canonical equivalent decomposed characters.For example,a LATIN CAPITAL LETTER A ACUTE……这里无法用文本打印出来,对于这里的情况,还是参考原文吧。

 

3.9Keywords

   50 character sequences,formed from ASCII letters,are reserved for use as keywords and cannot be used as identifiers($3.8).

 

   Keyword :one of 

       abstract          continue             for                      new           switch

       assert              default               if                         package               synchronized

       boolean            do                 goto                        private                this

         break          double               implements              protected              throw

           byte               else                 import                 public                      throws

            case              enum              instanceof             return                   transient

         catch                  extends         int                        short                     try

             char               final                       interface        static                      void

             class                   finally                long                 strictfp                  volatile

           const                   float                  native               super                         while

 

         The keywords const and goto are reserved ,even though they are not currently used.

 This may allow a Java compiler to produce better error messages if these C++ keywords incorrectly appear in programs.

               While true and false might appear to be keywords,they are technically Boolean literals($3.10.3).Similarly,while null might appear to be a keyworld,it is technically the null literal($$3.10.7).

 

3.10 Literals

A literal  is the source code representation of a value  of a primitive type($$4.2),the String type($$4.3.3),or the null type($$4.1).

 

                   Literal:

                          IntegerLiteral

                           FloatingPointLiteral

                          BooleanLiteral

          CharacterLiteral

          StringLiteral

          NullLiteral

 

3.10.1 Integer Literals

 

An integer literal may be expressed in decimal(base 10),hexadecimal(base 16),octal(base 8),or binary(base 2).

            IntegerLiteral:

                       DecimalIntegerLiteral

                       HexIntegerLiteral

                       OctalIntegerLiteral

                       BinaryIntegerLiteral

 

    DecimalIntegerLiteral:

      DecimalNumeral IntergerTypeSuffixopt

    HexIntegerLiteral:

      HexNumeral IntegerTypeSuffixopt

    OctalIntegerLiteral:

      OctalNumeral IntegerTypeSuffixopt

    BinaryIntegerLiteral:

      BinaryNumeral IntegerTypeSuffixopt

    IntegerTypeSuffix:one of

                 恩。这里的这两个元素无法打印出来。

An integer literal is of type long if it is suffixed with an ASCII letter L or xxx(ell);恩。这里有元素打印不出来。xxxx。otherwise it is of type int($$4.2.1).

   The suffix L is preferred ,because the letter XX(ell) is often hard to distinguish from the digit  1(one).

恩。这句话当中也有打印不出来的元素,和上面一个句子中的是同一个元素 。

 

Underscores are allowed as separators between digits that denote the integer.

 

In a hexadecimal or binary literal ,the integer is only denoted by the digits after the 0x or 0b characters and before any type suffix.Therefore,underscores may not appear immediately after 0x or 0b,of ater the last digit in the numeral.

 

In a decimal or octal literal,the integer is denoted by all the digits in the literal before any type suffix.Therefore,underscores may not appear before the first digit or after the last digit in the numeral.Underscores may appear after the initial 0 in an octal numeral(since 0 is a digit that denotes part of the integer) and after the initial non-zero digit in a non-zero decimal literal.

  A decimal numeral is either the single ASCII digit 0,representing the integer zero,or consists of an ASCII digit from 1 to 9 optionally followed by one or more ASCII digits from 0 to 9 interspersed with underscores ,representing a positive integer

  DecimalNumeral:

    0

    NonZeroDigit Digitsopt

        NonZeroDigit Underscores Digits

   Digits:

    Digit

    Digit DigitsAndUnderscoresopt Digit

   Digit:

           0

    NonZeroDigit

   NonZeroDigit:one of 

            1 2 3 4 5 6 7 8 9

       DigitsAndUnderscores:

            DigitOrUnderscore

            DigitsAndUnderscores DigitOrUnderscore

        DigitOrUnderscore:

    Digit

            _

        Underscores:

            _

            Underscores _

 

  A hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits interspersed with underscores,and can represent a positive ,zero,or negative integer.

 

  Hexadecimal digits with values 10 through 15 are represented by the ASCII letters a through f or A through F,respectively;each letter used as a hexadecimal  digit may be uppercase or lowercase.

 

  HexNumeral:

    0x HexDigits

    0X HexDigits

  HexDigits:

    HexDigit

    HexDigit HexDigitsAndUnderscoresopt HexDigit

  HexDigit:one of

    0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

      HexDigitsAndUnderscores:

    HexDigitOrUnderscore

    HexDigitsAndUnderscores HexDigitOrUnderscore 

  HexDigitOrUnderscore:

    HexDigit

    _

  The HexDigit production above comes from $$3.3.

 

  An octal numeral consists of an ASCII digit 0 followed by one or more of the ASCII digits 0 through 7 interspersed with underscores,and can represent a positive ,zero,or negative integer.

 

  OctalNumeral:

    0 OctalDigits

    0 Underscores OctalDigits

  OctalDigits:

    OctalDigit

    OctalDigit OctalDigitsAndUnderscoresopt OctalDigit

  OctalDigit:one of

    0 1 2 3 4 5 6 7 

  OctalDigitsAndUnderscores:

    OctalDigitOrUnderscore

    OctalDigitsAndUnderscores OctalDigitOrUnderscore

  OctalDigitOrUnderscore:

    OctalDigit

    _

 

  Note that octal numerals always consist of two or more digits: 0 is always considered to be a decimal numeral -  not that it matters much in practice, for the numerals 0,00,and 0x0 all represent exactly the same integer value.

  

  A binary numeral consists of the leading ASCII characters 0b or 0B followed by one or more of the ASCII digits 0 or 1 interspersed with underscores,and can represent a positive ,zero ,or negative integer.

  BinaryNumeral:

    0 b BinaryDigits

    0 B BinaryDigits

 

  BinaryDigits:

    BinaryDigit

    BinaryDigit BinaryDigitsAndUnderscoresopt BinaryDigit

  BinaryDigit:one of

    0 1

  BinaryDigitsAndUnderscores:

    BinaryDigitOrUnderscore

    BinaryDigitsAndUnderscores BinaryDigitOrUnderscore

 

  BinaryDigitOrUnderscore:

    BinaryDigit

    _

 

 

  The largest decimal literal of type int is 2147483648(2的31次方).

  All decimal literals from 0 to 2147483647 may appear anywhere an int literal may appear.

  It is a compile-time error if a decimal literal of type int is larger than 2147483648(2的31次方),or if the decimal literal 2147483648  appears anywhere other than as the operand of the unary minus operator($$15.15.4).

  The largest positive hexadecimal ,octal,and binary literals of type int - each of which represents the decimal value 2147483647(2的31次方-1) - are respectively:

  0x7fff_ffff,

  0177_7777_7777,and

  0b0111_1111_1111_1111_1111_1111_1111_1111

  The most negative hexadecimal ,octal ,and binary literals of type int - each of which represents the decimal value - 2147483648(-2的31次方) - are respectively:

  0x8000_0000,

  0200_0000_0000,and

  0b1000_0000_0000_0000_0000_0000_0000_0000

  The following hexadecimal,octal ,and binary literals represent the decimal value -1:

  0xffff_fffff,

      0377_7777_7777,and

  0b1111_1111_1111_1111_1111_11111_1111_1111

 

  It is a compile-time error if a hexadecimal ,octal ,or binary int literal does not fit in 32 bits.

  The largest decimal literal of type long is 9223372036854775808L(2的63次方).

 

  All decimal literals from 0L to 9223372036854775807L may appear anywhere a long literal may appear.

  It is a compile-time error if a decimal literal of type long is larger than 9223372036854775808L(2的63次方),or if  the decimal literal 9223372036854775808L appears anywhere other than as the operand of the unary minus operator($15.15.4).

  The largest positive hexadecimal,octal,and binary literals of type long - each of which represents the decimal value 9223372036854775807L(2的63次方-1) - are respectively:

   0x7fff_ffff_ffff_ffffL,

  07_7777_7777_7777_7777_7777L,and

  0b0111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111

 

  The most negative hexadecimal ,octal ,and binary literals of type long - each of which represents the decimal value -9223372036854775808L(-2的63次方)-are respectively:

  0x8000_0000_0000_0000L,and

  010_0000_0000_0000_0000_0000L,and

  0b1000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000

 

  The following hexadecimal,octal,and binary literals represent the decimal value -1L:

  0xffff_ffff_ffff_ffffL,

  017_7777_7777_7777_7777_7777L,and

      0b1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111

 

  It is a compile-time error if a hexadecimal ,octal,or binary long literal does not fit in 64 bits.

  Examples of int literals:

    0   2  0372  0xDada_Cafe  1996  0x00_FF_00_FF

  Examples of long literals:

  这里面有不知道如何打印的字符,故而请参阅原文的pdf文档吧。

 

3.10.2Floating-Point Literals

 

  A floating-point literal has the following parts:a whole-number part,a decimal or hexadecimal point(represented by an ASCII period character),a fraction part,and exponent,and a type suffix.

  A floating-point literal may be expressed in decimal(base 10)or hexadecimal(base 16).

  For decimal floating-point literals,at least one digit(in either the whole number or the fraction part) and either a decimal point,an exponent ,or a float type suffix are required.All other parts are optional.The exponent,if present,is indicated by the ASCII letter e or E followed by an optionally signed integer.

  For hexadecimal floating-point literals,at least one digit is required (in either the whole number or the fraction part),and the exponent is mandatory,and the float type suffix is optional.The exponent is indicated by the ASCII letter p or P followed by an optionally signed integer.  

  Underscores are allowed as separators between digits that denote the whole-number part,and between digits that denote the fraction part,and between digits that denote the exponent.

  FloatingPointLiteral:

    DecimalFloatingPointLiteral

    HexadecimalFloatingPointLiteral

  DecimalFloatingPointLiteral

    Digits (这里不是很清楚格式,还是看原文pdf吧。)

  ExponentPart:

    ExponentIndicator SignedInteger

  ExponentIndicator:one of

    e E

  SingedInteger

    Singopt Digits

  Sing:one of

    + _

  FloatTypeSuffix:one of

    f F d D

  HexadecimalFloatingPointLiteral:

    HexSignificand BinaryExponent FloatTypeSuffixopt

  HexSignificaned:

    HexNumeral

    HexNumeral.

    0x HexDigitsopt . HexDigits

    0x HexDigitsopt .   HexDigits

 

  BinaryExponent:

    BinaryExponentIndicator SignedInteger

  BinaryExponentIndicator:one of

    p  P

  A floating-point literal is of type float if it is suffixed with an ASCII letter F or f;otherwise its type is double and it can optionally be suffixed with an ASCII letter D or d.

  The elements of the types float and double are those values that can be represented using the IEEE 754 32-bit single-precision and 64-bit double-precision binary floating-point formats,respectively.

  The largeslt positive finite literal of type float is 3.4028235e38f.

  The smallest positive finite non-zero literal of type floata is 1.40e-45f.

  The largest positive finite literal of type double is 1.7976931348623157e308.

  The smallest positive finite non-zero literal of  type double is 4.9e-324.

  It is a compile-time error if a non-zero floating-point literal is too large ,so that on rounded  conversion to its internal representation,it becomes an IEEE 754 infinity.

  A program can represent infinities without producing a compile-time error by using constant expressions such as 1f/0f or -1d/0d or by using the predefined constants POSITIVE_INFINITY and NEGATIVE_INFINITY of the classes Float and Double.

  It is a compile-time error if a non-zero floating-point literal is too small,so that ,on rounded conversion to its internal representation,it becomes a zero.

  A compile-time error does not occur if a non-zero floating-point literal has a small value that ,on rounded conversion to its internal representation,becomes a non-zero denormalized number.

  Predefined constants representing Not-a-Number values are defined in the classes Float and Double as Float.NaN and Double.NaN.

  Examples of float literals:

  1e1f  2.f  .3f  0f  3.14f  6.022137e+23f

  Examples of double literals:

  1e1   2.  3. 0.0  3.14  1e-9d  1e137  

 

3.10.3   Boolean Literals

  The boolean type has two values,represented by the boolean literals true and false,formed from ASCII letters.

  BooleanLiteral:one of 

  true false

  A boolean literal is always of type boolean.

  

3.10.4  Character Literals

  A character literal is expressed as a  character or an escape sequence($$3.10.6),enclosed in ASCII single quotes.(The single-quote,or apostrophe,character is \u0027.)

 

  CharacterLiteral:

    'SingleCharacter'

    'EscapeSequence'

  SingleCharacter:

     InputCharacter but not 'or \

 

  See $$3.10.6 for the definition of EscapeSequence.

  Character literals can only represent UTF-16 code units($$3.1),i.e.,they are limited to values from 

\u0000 to \uffff.Supplementary characters must be represented either as a surrogate pair within a char sequence,or as an integer,depending on the API they are used with.

  A character literal is always of type char.

  It is a compile-time error for the character following the SingleCharacter or EscapeSequence to be other than a '.

  It is a compile-time error for a line terminator($$3.4) to appear after the opening ' and before the closing '.

  As specified in $$3.4 ,the characters CR and LF are never an InputCharacter;each is recognized as constituting a LineTerminator.

  The following are examples of char literals:

  'a'

  '%'

  '\t'

  '\\'

  '\''

  '\u03a9'

  '\uFFFF'

  '\177'

  '电阻符号,自己打不进去,所以,看原文pdf吧。'

 

  Because Unicode escapes are processed very early,it is not correct to write '\u000a' for a character literal whose value is linefeed(LF);the Unicode escape \u000a is transformed into an antual linefeed in translation step 1($$3.3) and the linefeed becomes a LineTerminator in step 2($$3.4),and so the character literal is not valid in step 3.Instead, one should use the escape sequence '\n' ($$3.10.6).Similarly,it is not correct to write '\u000d' for a character literal whose value is carriage return (CR).Instead,use '\r'.

  In C and C++,a character literal may contain representations of more than one character,but the value of such a character literal is implementation-defined.In the Java programming language,a character literal always represents exactly one character.

  

3.10.5 String Literals

  A string literal consists of zero or more characters enclosed in double quotes.Characters may be represented by escape sequences($$3.10.6) - one escape sequence for characters in the range U+0000

to U+FFFF,two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.

 

  StringLiteral:

    "StringCharactersopt"

  StringCharacters:

    StringCharacter

    StringCharacters StringCharacter

  StringCharacter:

    InputCharacter but not " or \

    EscapeSequence

  

  see $$3.10.6 for the definiton of EscapeSequence.

  

  A string literal is always of type String ($$4.33).

  It is a compile-time error for a  line terminator to appear after the opening " and before the closing matching ".

  As specified in $3.4,the characters CR and LF are never an InputCharacter,each is recognized as constituting a LineTerminator.

  A long string literal can always be broken up into shorter pieces and written as as(possibly parenthesized) expression using the string concatenation operator + ($$15.18.1).

 

  The following are examples of string literals:

 

    "" //the empty string

  "\""//a string containing " alone

  "This is a string"//a string containing 16 characters

  "This is a" +  //actually a string-valued constant expression,

  "two-line string"//formed from two string literals

 

  A string literal is a reference to an instance of class String($$4.3.1,$$4.3.3).

  Moreover,a string literal always refers to the same instance of class String.This is because string literals - or,more generally,strings that are the values of constant expressions($$15.28) - are "interned"  so as to share unique instances,using the method String.intern.

Example 3.10.5-1.String Literals

  The program consisting of the compilation unit($$7.3):

  

  package testPackage;

  class Test{

    public static void main(String[] args){

        String hello = "Hello",lo="lo";

        System.out.print((hello == "Hello")+" ");

        System.out.print((Other.hello ==hello)+" ");

        System.out.print((other.Other.hello == hello)+" ");

                      System.out.print((hello == ("Hel"+"lo"))+" ");

                      System.out.print((hello == ("Hel"+lo))+" ");

                      System.out.print((hello == ("Hel"+lo).intern());

 

    }

  }

class Other{static String hello = "Hello";}

and the compilation unit:

   package other;

   public clalss Other{public static String hello = "Hello";}

produces the output:

  true true true true false true

This example illustrates six points:

  1>Literal strings within the same class($$8) in the same package($$7)represent references to the same String object($$4.3.1).

  2>Literal strings within different classes in the same package represent references to the same String object.

  3>Literal strings within different classes in different packages likewise represent references to the same String object.

  4>Strings computed by constant expression($$15.2*) are computed at compile time and then treated as if they were literals.

  5>Strings computed by concatenation at run-time are newly created and therefore distinct.

  6>The result of explicitly  interning a computed string is the same string as any pre-existing literal string with the same contents.

  

3.10.6 Escape Sequences for Character and String Literals

  The character and string escape sequences allow for the representation of some nongraphic characters as well as the single quote,double quote,and backslash characters in character literals($$3.10.4) and string literals($3.10.5).

  EscapeSequence:

  \b  /* \u0008 :backspace BS */

  \t  /*  \u0009:horizontal tab HT*/

  \n   /*  \u000a:linefeed LF */

  \f      /*  \u000c:form feed FF */

     \r      /*   \u000d:carriage return CR */

  \ "     /*  \U0022:double quote " */

      \'      /*  \u0027:single quote ' */

  \ \          /* \u005c:backslash \  */

      OctalEscape  /* \u0000 to \u00ff :from octal value */

 

OctalEscape:

  \OctalDigit

  \OctalDigit   OctalDigit

  \ZeroToThree OctalDigit OctalDigit

  

ZeroDigit:one of

  0 1 2 3 4 5  6 7 

ZeroToThree :one of

  0 1 2 3

 

  It is a compile-time error if the character following a backslash in an escape is not an ASCII b,t,n,f,r,",',\,0,1,2,3,4,5,6,or 7.The Unicode escape \u is processed earlier($$3.3).

  

  Octal  escapes are provided for compatibility with C,but an express only  Unicode values \u0000 through \u00FF ,so Unicode escapes are usually preferred.

  

3.10.7 The Null Literal

  The null type has one value,the null reference,represented by the null literal null,which is formed from ASCII characters.

  NullLiteral:

    null

    

  A null literal is always of the null type.

  

3.11 Separators

  Nine ASCII characters are the separaors(punctuators).

    Separator:one of

  ()    {}      []      ;     ,        .

 

3.12 Operators

 

   37 tokens are the operators,formed from ASCII characters.

    Operator :one of

    =  >  <  !  ~  ?  :

    ==  <=  >=  !=  &&  ||  ++  --

    +  -  *  /  &  |  ^  %  <<  >>  >>>

    +=  -=  *=  /=  &=  |=  ^=  %=  <<=  >>=  >>>=

 

 

  

 

 

  

  

  

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

      

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

posted on 2011-11-18 20:09  评评  阅读(1019)  评论(1编辑  收藏  举报