INTERNATIONAL STANDARD ISO/IEC ISO/IEC 14882:2003(E)
__ __________________________________________________ ___________________________________
Programming languages – C+ +

1 General [intro]
1.1 Scope [intro.scope]
1 This International Standard specifies requirements for implementations of the C + + programming language.
The first such requirement is that they implement the language, and so this International Standard also
defines C+ +. Other requirements and relaxations of the first requirement appear at various places within this
International Standard.
2 C + + is a general purpose programming language based on the C programming language as described in
ISO/IEC 9899:1990 Programming languages – C (1.2). In addition to the facilities provided by C, C + +
provides additional data types, classes, templates, exceptions, namespaces, inline functions, operator overloading,
function name overloading, references, free store management operators, and additional library
facilities.

1.2 Normative references [intro.refs]

1
ISO/IEC 2382 (all parts), Information technology – Vocabulary
ISO/IEC 9899:1999, Programming languages – C
ISO/IEC 10646-1:2000, Information technology – Universal Multiple-Octet Coded Character Set
(UCS) – Part 1: Architecture and Basic Multilingual Plane
2 The library described in clause 7 of ISO/IEC 9899:1990 and clause 7 of ISO/IEC 9899/Amd.1:1995 is hereinafter
called the Standard C Library.1)
1.3 Terms et definitions [intro.defs]
2 Terms that are used only in a small portion of this International Standard are defined where they are used
and italicized where they are defined.
[1d.3e.f1n sa.arrgguummeenntt]
an expression in the comma-separated list bounded by the parentheses in a function call expression, a
sequence of preprocessing tokens in the comma-separated list bounded by the parentheses in a function-like
macro invocation, the operand of throw, or an expression, type-id or template-name in the commaseparated
list bounded by the angle brackets in a template instantiation. Also known as an actual argument
or actual parameter.
__________________
1) With the qualifications noted in clauses 17 through 27, and in C.2, the Standard C library is a subset of the Standard C + + library.
1
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
For the purposes of this document, the definitions given in ISO/IEC 2382 and the following apply.
17.1 defines additional terms that are used only in clauses 17 through 27.
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
1.3.2 diagnostic message 1 General
1.3.2 diagnostic message [defns.diagnostic]
a message belonging to an implementation-defined subset of the implementation’s output messages.
1.3.3 dynamic type [defns.dynamic.type]
the type of the most derived object (1.8) to which the lvalue denoted by an lvalue expression refers. [Example:
if a pointer (8.3.1) p whose static type is “pointer to class B” is pointing to an object of class D, derived
from B (clause 10), the dynamic type of the expression *p is “D.” References (8.3.2) are treated similarly. ]
The dynamic type of an rvalue expression is its static type.
1.3.4 ill-formed program [defns.ill.formed]
input to a C + + implementation that is not a well-formed program (1.3.14).
1.3.5 implementation-defined behavior [defns.impl.defined]
behavior, for a well-formed program construct and correct data, that depends on the implementation and
that each implementation shall document.
1.3.6 implementation limits [defns.impl.limits]
restrictions imposed upon programs by the implementation.
1.3.7 locale-specific behavior [defns.locale.specific]
behavior that depends on local conventions of nationality, culture, and language that each implementation
shall document.
1.3.8 multibyte character [defns.multibyte]
a sequence of one or more bytes representing a member of the extended character set of either the source or
the execution environment. The extended character set is a superset of the basic character set (2.2).
1.3.9 parameter [defns.parameter]
an object or reference declared as part of a function declaration or definition, or in the catch clause of an
exception handler, that acquires a value on entry to the function or handler; an identifier from the commaseparated
list bounded by the parentheses immediately following the macro name in a function-like macro
definition; or a template-parameter. Parameters are also known as formal arguments or formal parameters.
1.3.10 signature [defns.signature]
the information about a function that participates in overload resolution (13.3): the types of its parameters
and, if the function is a class member, the cv- qualifiers (if any) on the function itself and the class in which
the member function is declared.2) The signature of a function template specialization includes the types of
its template arguments (14.5.5.1).
1.3.11 static type [defns.static.type]
the type of an expression (3.9), which type results from analysis of the program without considering execution
semantics. The static type of an expression depends only on the form of the program in which the
expression appears, and does not change while the program is executing.
1.3.12 undefined behavior [defns.undefined]
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this
International Standard imposes no requirements. Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined
behavior ranges from ignoring the situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic of the environment (with or without
the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a
diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are
__________________
2) Function signatures do not include return type, because that does not participate in overload resolution.
2
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
1 General 1.3.12 undefined behavior
required to be diagnosed. ]
1.3.13 unspecified behavior [defns.unspecified]
behavior, for a well-formed program construct and correct data, that depends on the implementation. The
implementation is not required to document which behavior occurs. [Note: usually, the range of possible
behaviors is delineated by this International Standard. ]
1.3.14 well-formed program [defns.well.formed]
a C + + program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition
Rule (3.2).
1.4 Implementation compliance [intro.compliance]
1 The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard
except for those rules containing an explicit notation that “no diagnostic is required” or which are described
as resulting in “undefined behavior.”
2 Although this International Standard states only requirements on C + + implementations, those requirements
are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution
of programs. Such requirements have the following meaning:
— If a program contains no violations of the rules in this International Standard, a conforming implementation
shall, within its resource limits, accept and correctly execute3) that program.
— If a program contains a violation of any diagnosable rule, a conforming implementation shall issue at
least one diagnostic message, except that
— If a program contains a violation of a rule for which no diagnostic is required, this International Standard
places no requirement on implementations with respect to that program.
3 For classes and class templates, the library clauses specify partial definitions. Private members (clause 11)
are not specified, but each implementation shall supply them to complete the definitions according to the
description in the library clauses.
4 For functions, function templates, objects, and values, the library clauses specify declarations. Implementations
shall supply definitions consistent with the descriptions in the library clauses.
5 The names defined in the library have namespace scope (7.3). A C + + translation unit (2.1) obtains access to
these names by including the appropriate standard library header (16.2).
6 The templates, classes, functions, and objects in the library have external linkage (3.5). The implementation
provides definitions for standard library entities, as necessary, while combining translation units to
form a complete C + + program (2.1).
7 Two kinds of implementations are defined: hosted and freestanding. For a hosted implementation, this
International Standard defines the set of available libraries. A freestanding implementation is one in which
execution may take place without the benefit of an operating system, and has an implementation-defined set
of libraries that includes certain language-support libraries (17.4.1.3).
8 A conforming implementation may have extensions (including additional library functions), provided they
do not alter the behavior of any well-formed program. Implementations are required to diagnose programs
that use such extensions that are ill-formed according to this International Standard. Having done so, however,
they can compile and execute such programs.
__________________
3) “Correct execution” can include undefined behavior, depending on the data being processed; see 1.3 and 1.9.
3
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
1.5 Structure of this International Standard 1 General
1.5 Structure of this International Standard [intro.structure]
1 Clauses 2 through 16 describe the C + + programming language. That description includes detailed syntactic
specifications in a form described in 1.6. For convenience, Annex A repeats all such syntactic specifications.
2 Clauses 17 through 27 (the library clauses) describe the Standard C + + library, which provides definitions
for the following kinds of entities: macros (16.3), values (clause 3), types (8.1, 8.3), templates (clause 14),
classes (clause 9), functions (8.3.5), and objects (clause 7).
3 Annex B recommends lower bounds on the capacity of conforming implementations.
4 Annex C summarizes the evolution of C + + since its first published description, and explains in detail the
differences between C + + and C. Certain features of C + + exist solely for compatibility purposes; Annex D
describes those features.
5 Finally, Annex E says what characters are valid in universal-character names in C + + identifiers (2.10).
6 Throughout this International Standard, each example is introduced by “[Example:” and terminated by “]”.
Each note is introduced by “[Note:” and terminated by “]”. Examples and notes may be nested.
1.6 Syntax notation [syntax]
1 In the syntax notation used in this International Standard, syntactic categories are indicated by italic type,
and literal words and characters in constant width type. Alternatives are listed on separate lines
except in a few cases where a long set of alternatives is presented on one line, marked by the phrase “one
of.” An optional terminal or nonterminal symbol is indicated by the subscript “opt,” so
{ expressionopt }
indicates an optional expression enclosed in braces.
2 Names for syntactic categories have generally been chosen according to the following rules:
X-name is a use of an identifier in a context that determines its meaning (e.g. class-name, typedefname).
X-id is an identifier with no context-dependent meaning (e.g. qualified-id).
X-seq is one or more X’s without intervening delimiters (e.g. declaration-seq is a sequence of declarations).
X-list is one or more X’s separated by intervening commas (e.g. expression-list is a sequence of expressions
separated by commas).
1.7 The C+ + memory model [intro.memory]
1 The fundamental storage unit in the C + + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution character set and is composed of a contiguous sequence of bits, the
number of which is implementation-defined. The least significant bit is called the low-order bit; the most
significant bit is called the high-order bit. The memory available to a C + + program consists of one or more
sequences of contiguous bytes. Every byte has a unique address.
2 [Note: the representation of types is described in 3.9. ]
1.8 The C+ + object model [intro.object]
1 The constructs in a C + + program create, destroy, refer to, access, and manipulate objects. An object is a
region of storage. [Note: A function is not an object, regardless of whether or not it occupies storage in the
way that objects do. ] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the
implementation (12.2) when needed. The properties of an object are determined when the object is created.
An object can have a name (clause 3). An object has a storage duration (3.7) which influences its lifetime
(3.8). An object has a type (3.9). The term object type refers to the type with which the object is created.
4
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
1 General 1.8 The C + + object model
Some objects are polymorphic (10.3); the implementation generates information associated with each such
object that makes it possible to determine that object’s type during program execution. For other objects,
the interpretation of the values found therein is determined by the type of the expressions (clause 5) used to
access them.
2 Objects can contain other objects, called sub-objects. A sub-object can be a member sub-object (9.2), a
base class sub-object (clause 10), or an array element. An object that is not a sub-object of any other object
is called a complete object.
3 For every object x, there is some object called the complete object of x, determined as follows:
— If x is a complete object, then x is the complete object of x.
— Otherwise, the complete object of x is the complete object of the (unique) object that contains x.
4 If a complete object, a data member (9.2), or an array element is of class type, its type is considered the
most derived class, to distinguish it from the class type of any base class subobject; an object of a most
derived class type is called a most derived object.
5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more
bytes of storage. Base class sub-objects may have zero size. An object of POD4) type (3.9) shall occupy
contiguous bytes of storage.
6 [Note: C + + provides a variety of built-in types and several ways of composing new types from existing
types (3.9). ]
1.9 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract
machine. This International Standard places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as
explained below.5)
2 Certain aspects and operations of the abstract machine are described in this International Standard as
implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract
machine. Each implementation shall include documentation describing its characteristics and behavior in
these respects. Such documentation shall define the instance of the abstract machine that corresponds to
that implementation (referred to as the ‘‘corresponding instance’’ below).
3 Certain other aspects and operations of the abstract machine are described in this International Standard as
unspecified (for example, order of evaluation of arguments to a function). Where possible, this International
Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the
abstract machine. An instance of the abstract machine can thus have more than one possible execution
sequence for a given program and a given input.
4 Certain other operations are described in this International Standard as undefined (for example, the effect of
dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior
of programs that contain undefined behavior. ]
5 A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible execution sequences of the corresponding instance of the abstract machine with
the same program and the same input. However, if any such execution sequence contains an undefined
operation, this International Standard places no requirement on the implementation executing that program
__________________
4) The acronym POD stands for “plain old data.”
5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International
Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior
of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used
and that no side effects affecting the observable behavior of the program are produced.
5
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
1.9 Program execution 1 General
with that input (not even with regard to operations preceding the first undefined operation).
6 The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions.6)
7 Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O
function, or calling a function that does any of those operations are all side effects, which are changes in the
state of the execution environment. Evaluation of an expression might produce side effects. At certain
specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place.7)
8 Once the execution of a function begins, no expressions from the calling function are evaluated until execution
of the called function has completed.8)
9 When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with
type other than volatile sig_atomic_t are unspecified, and the value of any object not of
volatile sig_atomic_t that is modified by the handler becomes undefined.
10 An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its
block. Such an object exists and retains its last-stored value during the execution of the block and while the
block is suspended (by a call of a function or receipt of a signal).
11 The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
— At program termination, all data written into files shall be identical to one of the possible results that
execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place in such a fashion that prompting
messages actually appear prior to a program waiting for input. What constitutes an interactive device is
implementation-defined.
[Note: more stringent correspondences between abstract and actual semantics may be defined by each
implementation. ]
12 A full-expression is an expression that is not a subexpression of another expression. If a language construct
is defined to produce an implicit call of a function, a use of the language construct is considered to be an
expression for the purposes of this definition.
13 [Note: certain contexts in C + + cause the evaluation of a full-expression that results from a syntactic construct
other than expression (5.18). For example, in 8.5 one syntax for initializer is
( expression-list )
but the resulting construct is a function call upon a constructor function with expression-list as an argument
list; such a function call is a full-expression. For example, in 8.5, another syntax for initializer is
= initializer-clause
but again the resulting construct might be a function call upon a constructor function with one assignmentexpression
as an argument; again, the function call is a full-expression. ]
__________________
6) An implementation can offer additional library I/O functions as an extension. Implementations that do so should treat calls to those
functions as ‘‘observable behavior’’ as well.
7) Note that some aspects of sequencing in the abstract machine are unspecified; the preceding restriction upon side effects applies to
that particular execution sequence in which the actual code is generated. Also note that when a call to a library I/O function returns,
the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) may not have completed
yet.
8) In other words, function executions do not interleave with each other.
6
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
1 General 1.9 Program execution
14 [Note: the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions
(8.3.6) are considered to be created in the expression that calls the function, not the expression that
defines the default argument. ]
15 [Note: operators can be regrouped according to the usual mathematical rules only where the operators really
are associative or commutative.9) For example, in the following fragment
int a, b;
/*...*/
a = a + 32760 + b + 5;
the expression statement behaves exactly the same as
a = (((a + 32760) + b) + 5);
due to the associativity and precedence of these operators. Thus, the result of the sum (a + 32760) is
next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in
which overflows produce an exception and in which the range of values representable by an int is
[–32768,+32767], the implementation cannot rewrite this expression as
a = ((a + b) + 32765);
since if the values for a and b were, respectively, –32754 and –15, the sum a + b would produce an
exception while the original expression would not; nor can the expression be rewritten either as
a = ((a + 32765) + b);
or
a = (a + (b + 32765));
since the values for a and b might have been, respectively, 4 and –8 or –17 and 12. However on a machine
in which overflows do not produce an exception and in which the results of overflows are reversible, the
above expression statement can be rewritten by the implementation in any of the above ways because the
same result will occur. ]
16 There is a sequence point at the completion of evaluation of each full-expression10).
17 When calling a function (whether or not the function is inline), there is a sequence point after the evaluation
of all function arguments (if any) which takes place before execution of any expressions or statements in
the function body. There is also a sequence point after the copying of a returned value and before the execution
of any expressions outside the function11). Several contexts in C + + cause evaluation of a function
call, even though no corresponding function call syntax appears in the translation unit. [Example: evaluation
of a new expression invokes one or more allocation and constructor functions; see 5.3.4. For another
example, invocation of a conversion function (12.3.2) can arise in contexts in which no function call syntax
appears. ] The sequence points at function-entry and function-exit (as described above) are features of the
function calls as evaluated, whatever the syntax of the expression that calls the function might be.
18 In the evaluation of each of the expressions
a && b
a || b
a ? b : c
a , b
using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence
__________________
9) Overloaded operators are never assumed to be associative or commutative.
10) As specified in 12.2, after the "end-of-full-expression" sequence point, a sequence of zero or more invocations of destructor functions
for temporary objects takes place, usually in reverse order of the construction of each temporary object.
11) The sequence point at the function return is not explicitly specified in ISO C, and can be considered redundant with sequence
points at full-expressions, but the extra clarity is important in C + +. In C + +, there are more ways in which a called function can terminate
its execution, such as the throw of an exception.
7
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
1.9 Program execution 1 General
point after the evaluation of the first expression12).
1.10 Acknowledgments [intro.ack]
1 The C + + programming language as described in this International Standard is based on the language as
described in Chapter R (Reference Manual) of Stroustrup: The C + + Programming Language (second edition,
Addison-Wesley Publishing Company, ISBN 0–201–53992–6, copyright 1991 AT&T). That, in
turn, is based on the C programming language as described in Appendix A of Kernighan and Ritchie: The C
Programming Language (Prentice-Hall, 1978, ISBN 0–13–110163–3, copyright 1978 AT&T).
2 Portions of the library clauses of this International Standard are based on work by P.J. Plauger, which was
published as The Draft Standard C + + Library (Prentice-Hall, ISBN 0–13–117003–1, copyright 1995 P.J.
Plauger).
3 All rights in these originals are reserved.
__________________
12) The operators indicated in this paragraph are the built-in operators, as described in clause 5. When one of these operators is overloaded
(clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation,
and the operands form an argument list, without an implied sequence point between them.
8
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
2 Lexical conventions [lex]
1 The text of the program is kept in units called source files in this International Standard. A source file
together with all the headers (17.4.1.2) and source files included (16.2) via the preprocessing directive
#include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives,
is called a translation unit. [Note: a C + + program need not all be translated at the same time. ]
2 [Note: previously translated translation units and instantiation units can be preserved individually or in
libraries. The separate translation units of a program communicate (3.5) by (for example) calls to functions
whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or
manipulation of data files. Translation units can be separately translated and then later linked to produce an
executable program. (3.5). ]
2.1 Phases of translation [lex.phases]
1 The precedence among the syntax rules of translation is specified by the following phases.13)
1 Physical source file characters are mapped, in an implementation-defined manner, to the basic source
character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph
sequences (2.3) are replaced by corresponding single-character internal representations. Any source file
character not in the basic source character set (2.2) is replaced by the universal-character-name that designates
that character. (An implementation may use any internal encoding, so long as an actual
extended character encountered in the source file, and the same extended character expressed in the
source file as a universal-character-name (i.e. using the \uXXXX notation), are handled equivalently.)
2 Each instance of a new-line character and an immediately preceding backslash character is deleted,
splicing physical source lines to form logical source lines. If, as a result, a character sequence that
matches the syntax of a universal-character-name is produced, the behavior is undefined. If a source
file that is not empty does not end in a new-line character, or ends in a new-line character immediately
preceded by a backslash character, the behavior is undefined.
3 The source file is decomposed into preprocessing tokens (2.4) and sequences of white-space characters
(including comments). A source file shall not end in a partial preprocessing token or partial comment14).
Each comment is replaced by one space character. New-line characters are retained. Whether
each nonempty sequence of white-space characters other than new-line is retained or replaced by one
space character is implementation-defined. The process of dividing a source file’s characters into preprocessing
tokens is context-dependent. [Example: see the handling of < within a #include preprocessing
directive. ]
4 Preprocessing directives are executed and macro invocations are expanded. If a character sequence that
matches the syntax of a universal-character-name is produced by token concatenation (16.3.3), the
behavior is undefined. A #include preprocessing directive causes the named header or source file to
be processed from phase 1 through phase 4, recursively.
5 Each source character set member, escape sequence, or universal-character-name in character literals
and string literals is converted to a member of the execution character set (2.13.2, 2.13.4).
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
7 White-space characters separating tokens are no longer significant. Each preprocessing token is
__________________
13) Implementations must behave as if these separate phases occur, although in practice different phases might be folded together.
14) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating
sequence of characters, such as a header-name that is missing the closing " or >. A partial comment would arise from a
source file ending with an unclosed /* comment.
9
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
2.1 Phases of translation 2 Lexical conventions
converted into a token. (2.6). The resulting tokens are syntactically and semantically analyzed and
translated. [Note: Source files, translation units and translated translation units need not necessarily be
stored as files, nor need there be any one-to-one correspondence between these entities and any external
representation. The description is conceptual only, and does not specify any particular implementation.
]
8 Translated translation units and instantiation units are combined as follows: [Note: some or all of these
may be supplied from a library. ] Each translated translation unit is examined to produce a list of
required instantiations. [Note: this may include instantiations which have been explicitly requested
(14.7.2). ] The definitions of the required templates are located. It is implementation-defined whether
the source of the translation units containing these definitions is required to be available. [Note: an
implementation could encode sufficient information into the translated translation unit so as to ensure
the source is not required here. ] All the required instantiations are performed to produce instantiation
units. [Note: these are similar to translated translation units, but contain no references to uninstantiated
templates and no template definitions. ] The program is ill-formed if any instantiation fails.
9 All external object and function references are resolved. Library components are linked to satisfy external
references to functions and objects not defined in the current translation. All such translator output
is collected into a program image which contains information needed for execution in its execution
environment.
2.2 Character sets [lex.charset]
1 The basic source character set consists of 96 characters: the space character, the control characters representing
horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:15)
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ˆ & | ˜ ! = , \ " ’
2 The universal-character-name construct provides a way to name other characters.
hex-quad:
hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
\u hex-quad
\U hex-quad hex-quad
The character designated by the universal-character-name \UNNNNNNNN is that character whose character
short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name
\uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal
value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal
character name designates a character in the basic source character set, then the program is illformed.
3 The basic execution character set and the basic execution wide-character set shall each contain all the
members of the basic source character set, plus control characters representing alert, backspace, and carriage
return, plus a null character (respectively, null wide character), whose representation has all zero bits.
For each basic execution character set, the values of the members shall be non-negative and distinct from
one another. In both the source and execution basic character sets, the value of each character after 0 in the
above list of decimal digits shall be one greater than the value of the previous. The execution character set
and the execution wide-character set are supersets of the basic execution character set and the basic
__________________
15) The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646
which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set
(described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic
source characters are represented in source files.
10
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
2 Lexical conventions 2.2 Character sets
execution wide-character set, respectively. The values of the members of the execution character sets are
implementation-defined, and any additional members are locale-specific.
2.3 Trigraph sequences [lex.trigraph]
1 Before any other processing takes place, each occurrence of one of the following sequences of three characters
(“trigraph sequences”) is replaced by the single character indicated in Table 1.
Table 1—trigraph sequences
__ __________________________________________________ _______________
____ tr__i__g__r__a__p__h__ ______re__p__l__a__c__e__m__e__n__t ______t__r__ig__r__a__p__h__ ______r__e__p__la__c__e__m____en__t__ ______t__ri__g__r__a__p__h__ ______re__p__l__a__c__e__m__e__n__t__
__ _?__?_=_ _________#_ _________?_?_(__ ________[__ ________?__?_<_ _________{______
__ _?__?_/_ _________\_ _________?_?_)__ ________]__ ________?__?_>_ _________}______
??’ ˆ ??! | ??- ˜ _ __________________________________________________ ________________  




     
     
     
2 [Example:
??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
becomes
#define arraycheck(a,b) a[b] || b[a]
end example]
3 No other trigraph sequence exists. Each ? that does not begin one of the trigraphs listed above is not
changed.
2.4 Preprocessing tokens [lex.pptoken]
preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
1 Each preprocessing token that is converted to a token (2.6) shall have the lexical form of a keyword, an
identifier, a literal, an operator, or a punctuator.
2 A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.
The categories of preprocessing token are: header names, identifiers, preprocessing numbers, character
literals, string literals, preprocessing-op-or-punc, and single non-white-space characters that do not lexically
match the other preprocessing token categories. If a or a " character matches the last category, the
behavior is undefined. Preprocessing tokens can be separated by white space; this consists of comments
(2.7), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As
described in clause 16, in certain circumstances during translation phase 4, white space (or the absence
thereof) serves as more than preprocessing token separation. White space can appear within a preprocessing
token only as part of a header name or between the quotation characters in a character literal or string
literal.
3 If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing
token is the longest sequence of characters that could constitute a preprocessing token, even if that would
cause further lexical analysis to fail.
4 [Example: The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid
floating or integer literal token), even though a parse as the pair of preprocessing tokens 1 and Ex might
11
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
2.4 Preprocessing tokens 2 Lexical conventions
produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program fragment
1E1 is parsed as a preprocessing number (one that is a valid floating literal token), whether or not E is
a macro name. ]
5 [Example: The program fragment x+++++y is parsed as x ++ ++ + y, which, if x and y are of built-in
types, violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a
correct expression. ]
2.5 Alternative tokens [lex.digraph]
1 Alternative token representations are provided for some operators and punctuators16).
2 In all respects of the language, each alternative token behaves the same, respectively, as its primary token,
except for its spelling17). The set of alternative tokens is defined in Table 2.
Table 2—alternative tokens
__ __________________________________________________ ___________
____ a__l__te__r__n__a__t__iv__e__ ______p__r__im____a__ry__ ______a__l__te__r__n__a__t__iv__e__ ______p__r__im____a__r__y ______a__l__t__er__n__a__t__iv__e__ ______p__r__im____a__r__y__
__ ___<__%_ ________{__ ________a_n_d_ ________&_&_ ______a_n__d___e_q_ ______&_=____
__ ___%__>_ ________}__ _______b_i_t_o_r__ _______|_ _______o_r___e__q_ ______|_=____
__ ___<__:_ ________[__ ________o__r_ ________|_|_ ______x_o__r___e_q_ ______ˆ_=____
__ ___:__>_ ________]__ ________x_o_r_ _________ˆ_ ________n_o_t__ ________!____
__ ___%__:_ ________#__ _______c_o_m_p_l__ _______˜_ ______n_o__t___e_q_ ______!_=____
%:%: ## bitand & _ __________________________________________________ ____________ 









         
         
         
2.6 Tokens [lex.token]
token:
identifier
keyword
literal
operator
punctuator
1 There are five kinds of tokens: identifiers, keywords, literals,18) operators, and other separators. Blanks,
horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “white space”), as described
below, are ignored except as they serve to separate tokens. [Note: Some white space is required to separate
otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic
characters. ]
2.7 Comments [lex.comment]
1 The characters /* start a comment, which terminates with the characters */. These comments do not nest.
The characters // start a comment, which terminates with the next new-line character. If there is a formfeed
or a vertical-tab character in such a comment, only white-space characters shall appear between it and
the new-line that terminates the comment; no diagnostic is required. [Note: The comment characters //,
/*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly,
the comment characters // and /* have no special meaning within a /* comment. ]
__________________
16) These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not perfectly
descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters.
Nonetheless, those alternative tokens that aren’t lexical keywords are colloquially known as “digraphs”.
17) Thus the “stringized” values (16.3.2) of [ and <: will be different, maintaining the source spelling, but the tokens can otherwise be
freely interchanged.
18) Literals include strings and character and numeric literals.
12
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
2 Lexical conventions 2.8 Header names
2.8 Header names [lex.header]
header-name:
<h-char-sequence>
"q-char-sequence"
h-char-sequence:
h-char
h-char-sequence h-char
h-char:
any member of the source character set except
new-line and >
q-char-sequence:
q-char
q-char-sequence q-char
q-char:
any member of the source character set except
new-line and "
1 Header name preprocessing tokens shall only appear within a #include preprocessing directive (16.2).
The sequences in both forms of header-names are mapped in an implementation-defined manner to headers
or to external source file names as specified in 16.2.
2 If either of the characters or \, or either of the character sequences /* or // appears in a q-charsequence
or a h-char-sequence, or the character " appears in a h-char-sequence, the behavior is undefined.
19)
2.9 Preprocessing numbers [lex.ppnumber]
pp-number:
digit
. digit
pp-number digit
pp-number nondigit
pp-number e sign
pp-number E sign
pp-number .
1 Preprocessing number tokens lexically include all integral literal tokens (2.13.1) and all floating literal
tokens (2.13.3).
2 A preprocessing number does not have a type or a value; it acquires both after a successful conversion (as
part of translation phase 7, 2.1) to an integral literal token or a floating literal token.
2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit
__________________
19) Thus, sequences of characters that resemble escape sequences cause undefined behavior.
13
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
2.10 Identifiers 2 Lexical conventions
nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
digit: one of
0 1 2 3 4 5 6 7 8 9
1 An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier
shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified in
Annex E. Upper- and lower-case letters are different. All characters are significant.20)
2 In addition, some identifiers are reserved for use by C + + implementations and standard libraries (17.4.3.1.2)
and shall not be used otherwise; no diagnostic is required.
2.11 Keywords [lex.key]
1 The identifiers shown in Table 3 are reserved for use as keywords (that is, they are unconditionally treated
as keywords in phase 7):
Table 3—keywords
__ __________________________________________________ _______________________________
asm do if return typedef
auto double inline short typeid
bool dynamic_cast int signed typename
break else long sizeof union
case enum mutable static unsigned
catch explicit namespace static_cast using
char export new struct virtual
class extern operator switch void
const false private template volatile
const_cast float protected this wchar_t
continue for public throw while
default friend register true
delete goto reinterpret_cast try _ __________________________________________________ ________________________________ 















               
2 Furthermore, the alternative representations shown in Table 4 for certain operators and punctuators (2.5) are
reserved and shall not be used otherwise:
Table 4—alternative representations
__ __________________________________________________ ____
and and_eq bitand bitor compl not
not_eq or or_eq xor xor_eq _ __________________________________________________ _____  

  
__________________
20) On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name may be used in forming
valid external identifiers. For example, some otherwise unused character or sequence of characters may be used to encode the \u
in a universal-character-name. Extended characters may produce a long external identifier, but C + + does not place a translation limit on
significant characters for external identifiers. In C + +, upper- and lower-case letters are considered different for all identifiers, including
external identifiers.
14
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
2 Lexical conventions 2.12 Operators and punctuators
2.12 Operators and punctuators [lex.operators]
1 The lexical representation of C + + programs includes a number of preprocessing tokens which are used in
the syntax of the preprocessor or are converted into tokens for operators and punctuators:
preprocessing-op-or-punc: one of
{ } [ ] # ## ( )
<: :> <% %> %: %:%: ; : ...
new delete ? :: . .*
+ - * / % ˆ & | ˜
! = < > += -= *= /= %=
ˆ= &= |= << >> >>= <<= == !=
<= >= && || ++ -- , ->* ->
and and_eq bitand bitor compl not not_eq
or or_eq xor xor_eq
Each preprocessing-op-or-punc is converted to a single token in translation phase 7 (2.1).
2.13 Literals [lex.literal]
1 There are several kinds of literals.21)
literal:
integer-literal
character-literal
floating-literal
string-literal
boolean-literal
2.13.1 Integer literals [lex.icon]
integer-literal:
decimal-literal integer-suffixopt
octal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
decimal-literal:
nonzero-digit
decimal-literal digit
octal-literal:
0
octal-literal octal-digit
hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal hexadecimal-digit
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
octal-digit: one of
0 1 2 3 4 5 6 7
__________________
21) The term “literal” generally designates, in this International Standard, those tokens that are called “constants” in ISO C.
15
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
2.13.1 Integer literals 2 Lexical conventions
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
1 An integer literal is a sequence of digits that has no period or exponent part. An integer literal may have a
prefix that specifies its base and a suffix that specifies its type. The lexically first digit of the sequence of
digits is the most significant. A decimal integer literal (base ten) begins with a digit other than 0 and consists
of a sequence of decimal digits. An octal integer literal (base eight) begins with the digit 0 and consists
of a sequence of octal digits.22) A hexadecimal integer literal (base sixteen) begins with 0x or 0X and
consists of a sequence of hexadecimal digits, which include the decimal digits and the letters a through f
and A through F with decimal values ten through fifteen. [Example: the number twelve can be written 12,
014, or 0XC. ]
2 The type of an integer literal depends on its form, value, and suffix. If it is decimal and has no suffix, it has
the first of these types in which its value can be represented: int, long int; if the value cannot be represented
as a long int, the behavior is undefined. If it is octal or hexadecimal and has no suffix, it has the
first of these types in which its value can be represented: int, unsigned int, long int, unsigned
long int. If it is suffixed by u or U, its type is the first of these types in which its value can be represented:
unsigned int, unsigned long int. If it is suffixed by l or L, its type is the first of these
types in which its value can be represented: long int, unsigned long int. If it is suffixed by ul,
lu, uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int.
3 A program is ill-formed if one of its translation units contains an integer literal that cannot be represented
by any of the allowed types.
2.13.2 Character literals [lex.ccon]
character-literal:
c-char-sequence
L’c-char-sequence
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except
the single-quote , backslash \, or new-line character
escape-sequence
universal-character-name
__________________
22) The digits 8 and 9 are not octal digits.
16
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC ISO/IEC 14882:2003(E)
2 Lexical conventions 2.13.2 Character literals
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of
\’ \" \? \\
\a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
1 A character literal is one or more characters enclosed in single quotes, as in ’x’, optionally preceded by
the letter L, as in L’x’. A character literal that does not begin with L is an ordinary character literal, also
referred to as a narrow-character literal. An ordinary character literal that contains a single c-char has type
char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter
literal has type int and implementation-defined value.
2 A character literal that begins with the letter L, such as L’x’, is a wide-character literal. A wide-character
literal has type wchar_t.23) The value of a wide-character literal containing a single c-char has value
equal to the numerical value of the encoding of the c-char in the execution wide-character set. The value of
a wide-character literal containing multiple c-chars is implementation-defined.
3 Certain nongraphic characters, the single quote , the double quote ", the question mark ?, and the backslash
\, can be represented according to Table 5.
Table 5—escape sequences
__ _____________________________
new-line NL (LF) \n
horizontal tab HT \t
vertical tab VT \v
backspace BS \b
carriage return CR \r
form feed FF \f
alert BEL \a
backslash \ \\
question mark ? \?
single quote ’ \’
double quote " \"
octal number ooo \ooo
hex number hhh \xhhh _ ______________________________ 















               
The double quote " and the question mark ?, can be represented as themselves or by the escape sequences
\" and \? respectively, but the single quote and the backslash \ shall be represented by the escape
sequences \’ and \\ respectively. If the character following a backslash is not one of those specified, the
behavior is undefined. An escape sequence specifies a single character.
__________________
23) They are intended for character sets where a character does not fit into a single byte.
17
Licensed to C & Systems Programmers Association/Kaan ASLAN
ANSI Store order #X116767 Downloaded: 1/25/2004 5:51:06 PM ET
Single user license only. Copying and networking prohibited.
ISO/IEC 14882:2003(E) ISO/IEC
2.13.2 Character literals 2 Lexical conventions
4 The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify
the value of the desired character. The escape \xhhh consists of the backslash followed by x followed
by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no
limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated
by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a
character literal is implementation-defined if it falls outside of the implementation-defined range defined
for char (for ordinary literals) or wchar_t (for wide literals).
5 A universal-character-name is translated to the encoding, in the execution character set, of the character
named. If there is no such encoding, the universal-character-name is translated to an implementationdefined
encoding. [Note: in translation phase 1, a universal-character-name is introduced whenever an
actual extended character is encountered in the source text. Therefore, all extended characters are described
in terms of universal-character-names. However, the actual compiler implementation may use its own
native character set, so long as the same results are obtained. ]
2.13.3 Floating literals [lex.fcon]
floating-literal:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit
floating-suffix: one of
f l F L
1 A floating literal consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed
integer exponent, and an optional type suffix. The integer and fraction parts both consist of a sequence of
decimal (base ten) digits. Either the integer part or the fraction part (not both) can be omitted; either the
decimal point or the letter e (or E) and the exponent (not both) can be omitted. The integer part, the
optional decimal point and the optional fraction part form the significant part of the floating literal. The
exponent, if present, indicates the power of 10 by which the significant part is to be scaled. If the scaled
value is in the range of representable values for its type, the result is the scaled value if representable, else
the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined
manner. The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and
F specify float, the suffixes l and L specify long double. If the scaled value is not in the range of
representable values for its type, the program is ill-formed.


GO ON THIS LINK dosyanın 1000 sayfalık kaynağına erişmek için TIKLAYINIZ