ISO 10303-21:2016(E)

6 Tokens

6.1 Token types

In the exchange structure, a token is a special token, a keyword, a simple data type encoding, or an IETF encoding.

6.2 Special tokens

The special token "ISO-10303-21;" shall be used to open an exchange structure, and the special token "END-ISO-10303-21;" shall be used to close an exchange structure.

The special token "HEADER;" shall be used to open the optional header section of an exchange structure, and the special token "ENDSEC;" shall be used to close the header section of an exchange structure.

The special token "ANCHOR;" shall be used to open the optional anchor section of an exchange structure, and the special token "ENDSEC;" shall be used to close the anchor section of an exchange structure.

The special token "REFERENCE;" shall be used to open the optional reference section of an exchange structure, and the special token "ENDSEC;" shall be used to close the reference section of an exchange structure.

The special token "DATA" shall be used to open the optional data sections of an exchange structure, and the special token "ENDSEC;" shall be used to close the data sections of an exchange structure.

The special token "SIGNATURE" shall be used to open the optional signature sections of an exchange structure, and the special token "ENDSEC;" shall be used to close the signature sections of an exchange structure.

The special token dollar sign ("$") is used to represent an object whose value is not provided in the exchange structure.

The special token asterisk ("*") is used to represent an object whose value is not provided in the exchange structure but can be derived from other values according to rules given in the EXPRESS schema (see 12.2.6).

The special tokens semicolon (";"), parentheses ("(", ")"), comma (",") and solidus ("/") are used to punctuate the exchange structure.

6.3 Keywords

Keywords are sequences of graphic characters indicating an entity or a defined type in the exchange structure. Keywords shall consist of capital letters, digits, low lines, and possibly an exclamation mark "!". The exclamation mark shall occur at most once, and only as the first character in a keyword.

Keywords may be schema-defined keywords or user-defined keywords. Keywords that do not begin with the exclamation mark are schema-defined keywords. Keywords that begin with the exclamation mark are user-defined keywords. A user-defined keyword is the identifier for a named type (an entity data type or a defined type) in the EXPRESS schema governing the exchange structure. The meaning of a user-defined keyword is a matter of agreement between the partners using the exchange structure.

6.4 Simple data type encodings

Six simple data type encodings are used in exchange structures: integer, real, string, instance name, enumeration and binary.

6.4.1 Integer

An integer shall be encoded as a sequence of one or more digits, as prescribed in Table 2, optionally preceded by a plus sign "+" or a minus sign "-". Integers shall be expressed in base 10. If no sign is associated with the integer, the integer shall be assumed to be positive.

EXAMPLE

Valid integer expressions Meaning
16 Positive 16
+12 Positive 12
-349 Negative 349
012 Positive 12
00 Zero

Invalid integer expressions Problem
26 54 Contains spaces
32.0 Contains full stop
+ 12 Contains space between plus sign and digits

6.4.2 Real

A real shall be encoded as prescribed in Table 2. The encoding shall consist of a decimal mantissa optionally followed by a decimal exponent. The decimal mantissa consists of an optional plus sign "+" or minus sign "-", followed by a sequence of one or more digits, followed by a full stop ".", followed by a sequence of zero or more digits. A decimal exponent consists of the latin capital letter E optionally followed by a plus sign "+" or minus sign "-", followed by one or more digits.

NOTE 1      No attempt is made to convey the concept of precision in this part of ISO 10303. Where a precise meaning is necessary, the sender and receiver of the exchange structure should agree on one. Where a precise meaning is required as part of the description of an entity data type, this meaning should be included in the entity data type definition in the EXPRESS schema.

NOTE 2      Under certain conditions, transfer of clear text files via electronic mail attachment has been observed to corrupt the full stop in a real value. See A.2.2 for recommendations.

EXAMPLE

Valid real expressions Meaning
+0.0E0 0.0
-0.0E-0 0.0, as above example
1.5 1.5
-32.178E+02 -3217.8
0.25E8 25 million
0.E25 0.
2. 2.
5.0 5.0

Invalid real expressions Problem
1.2E3. Decimal point not allowed in exponent
1E05 Decimal point required in mantissa
1,000.00 Comma not allowed
3.E Digit(s) required in exponent
.5 At least one digit must precede the decimal point
1 Decimal point required in mantissa

6.4.3 String

6.4.3.1 String structure

A string shall be encoded as an apostrophe "'", followed by zero or more characters from the basic alphabet, and ended by an apostrophe "'". The null string (string of length zero) shall be encoded by two consecutive apostrophes "''". Within a string, a single apostrophe shall be encoded as two consecutive apostrophes. Within a string, a single reverse solidus "\" shall be encoded as two reverse solidi "\\".

As specified in 5.2, the octet representation of the characters at code points U+0080 to U+10FFFF is given by UTF-8. These characters may be encoded as hexadecimal digits (see HEX in Table 2) using control directives defined in 6.4.3.3 when compatibility with previous editions of ISO 10303-21 is desired.

Characters not in the basic alphabet shall be encoded using the control directives defined in 6.4.3.2, 6.4.3.3 and 6.4.3.4. The WSN of control directives for encoding strings is given in Table 4.

NOTE       Under certain conditions, transfer of clear text files via electronic mail attachment has been observed to corrupt a full stop in a string value. See A.2.2 for recommendations.

Table 4 — String control directives
CONTROL_DIRECTIVE = PAGE | ALPHABET | EXTENDED2 
                  | EXTENDED4 | ARBITRARY .

PAGE = REVERSE_SOLIDUS "S" REVERSE_SOLIDUS LATIN_CODEPOINT .

ALPHABET = REVERSE_SOLIDUS "P" UPPER REVERSE_SOLIDUS .

EXTENDED2 = REVERSE_SOLIDUS "X2" REVERSE_SOLIDUS 
            HEX_TWO { HEX_TWO } END_EXTENDED .

EXTENDED4 = REVERSE_SOLIDUS "X4" REVERSE_SOLIDUS
            HEX_FOUR { HEX_FOUR } END_EXTENDED .

END_EXTENDED = REVERSE_SOLIDUS "X0" REVERSE_SOLIDUS .

ARBITRARY = REVERSE_SOLIDUS "X" REVERSE_SOLIDUS HEX_ONE .

HEX_ONE = HEX HEX .

HEX_TWO = HEX_ONE HEX_ONE .

HEX_FOUR = HEX_TWO HEX_TWO .
6.4.3.2 Encoding ISO/IEC 8859 characters within a string

In ISO/IEC 8859, G(x/y) is the notation for the character in "column" x "row" y, i.e., code value (16 · x) + y, in the code table. Each part of ISO/IEC 8859 is identical to the ISO/IEC 10646 code points U+0000 to U+007F in positions G(00/00) through G(07/15). The various parts of ISO/IEC 8859 differ in the symbols of the extended character set — positions G(10/00) through G(15/14). To include characters from the extended character set in a string requires the use of control directives.

NOTE      The control directives described in this section are retained for compatibility with previous editions of ISO 10303-21. It is recommended that all ISO/IEC 8859 characters be converted to corresponding ISO/IEC 10646 values.

The PAGE control directive — reverse solidus latin capital letter S reverse solidus ("\S\") followed by a LATIN_CODEPOINT character (see Table 1) — is used within a string to allow a character in the basic alphabet to represent the character in the corresponding position in the ISO/IEC 8859 extended alphabet. The PAGE control directive shall be interpreted in the string as the single character G((x+8)/y), where G(x/y) is the basic alphabet character following the "\S\". That is, if the basic alphabet character has code value v, it shall be interpreted as the character with code value v + 128.

The control directive reverse solidus latin capital letter P UPPER reverse solidus shall indicate that, for this string only, the subsequent reverse solidus latin capital letter S reverse solidus control directives shall be interpreted as referring to the extended alphabet defined in that part of ISO/IEC 8859 indicated by the value of UPPER. The capital letter referred to shall be one of the following letters : "A", "B", "C", "D", "E", "F", "G", "H", "I". In this context, the latin capital letter A identifies ISO/IEC 8859-1; latin capital letter B identifies ISO/IEC 8859-2, etc. If this control directive does not appear within a string, the value "A" shall be assumed for all PAGE control directives; i.e., the extended alphabet shall be that specified in ISO/IEC 8859-1.

EXAMPLE

String as stored Effective contents Comments
'CAT' CAT
'Don''t' Don't
'''' '
'' string of length zero
'\S\Drger' Ärger
'h\S\ttel' hôtel
'\PE\\S\*\S\U\S\b' Њет Cyrillic, 'Nyet'
6.4.3.3 Encoding ISO/IEC 10646 characters within a string

This part of ISO 10303 specifies control directives that allow encoding of ISO/IEC 10646 characters as a sequence of hexadecimal characters. These control directives may be used in place of UTF-8 encoded characters when compatibility with previous editions of the exchange structure encoding is desired.

The control directive reverse solidus latin capital letter X digit two reverse solidus "\X2\" shall be followed by multiples of four hexadecimal characters. Each multiple of four hexadecimal characters shall be the interpreted as a 16-bit number giving an integer position within the UCS codespace.

The control directive reverse solidus latin capital letter X digit four reverse solidus "\X4\" shall be followed by multiples of eight hexadecimal characters. Each multiple of eight hexadecimal characters shall be the interpreted as a 32-bit number giving an integer position within the UCS codespace.

The control directive reverse solidus latin capital letter X digit zero reverse solidus "\X0\" shall be used to indicate the end of the "\X2\" or "\X4\" hexadecimal character sequence.

NOTE     This use of eight hexadecimal characters in the "\X4\" encoding predates the restriction of the UCS codespace to a maximum value of 10FFFF. The first two characters in each eight character group will always be digit zero.

EXAMPLE

String as stored Code point Character
'\X2\03C0\X0\' U+03C0 greek small letter pi (π)
'\X2\03B103B203B3\X0\' U+03B1 U+03B2 U+03B3 greek small letters alpha, beta, gamma (αβγ)
'\X4\001F638\X0\' U+1F638 grinning cat face with smiling eyes (an emoticon, 😸)
'\X4\001F638001F596\X0\' U+1F638 U+1F596 grinning cat face with smiling eyes, raised hand with part between middle and ring fingers (two emoticons, 😸 🖖)
6.4.3.4 Encoding U+0000 to U+00FF in a string
The control directive reverse solidus latin capital letter X reverse solidus "\X\" followed by two hexadecimal characters shall encode a UCS code point in the range U+0000 to U+00FF. The two hexadecimal characters shall be the interpreted as an 8-bit number giving the integer position within the UCS codespace.

This control directive shall be used for UCS code points U+0000 to U+001F and code point U+007F. This control directive may be used in place of UTF-8 encoded code points U+0080 to U+00FF when compatibility with earlier editions of the exchange structure encoding is desired.

NOTE      The characters defined by ISO/IEC 10646 and ISO/IEC 8859-1 are identical within this range.

EXAMPLE

String as storedEffective contentsComments
'see \X\A7 4.1' see § 4.1 Contains section sign.
'line one\X\0Aline two' line one
line two
Contains line feed control character.
6.4.3.5 Maximum string length

The maximum length of a string as stored in an exchange structure is 32769 octets, including the beginning and ending apostrophes. If embedded quotation marks, reverse solidi, apostrophes, print control directives (see clause 12) or characters encoded according to 6.4.3.2, 6.4.3.3, or 6.4.3.4 are included in the string as stored, the maximum length of the effective contents of the string will be less than 32767 graphic characters. The effective contents is the sequence of graphic characters after these encoding conventions have been resolved.

6.4.4 Occurrence names

An occurrence name shall be a constant instance name, a constant value name, an entity instance name or a value instance name.

NOTE 1      This edition of this part of ISO 10303 allows constant values, constant entities, values instances and entity instances to be named and referenced in an exchange structure. Previous editions only allowed entity instances to be named and referenced (see clause 4.3).

6.4.4.1 Constant instance names

A constant instance name shall be encoded as a number sign, "#", followed by an UPPER character, followed by a sequence of UPPER or DIGIT characters.

Constant instance names are references to entity instances defined in the EXPRESS schema. If there are multiple EXPRESS schemas defined in the file_schema of the exchange structure then the constant instance name shall reference an entity instance defined in the first schema (see clause 8.2.4).

The WSN for constant instance names is given in Table 2 in the CONSTANT_INSTANCE_NAME production.

EXAMPLE

Valid name expressions Meaning
#FARADAY Reference to constant named FARADAY in the EXPRESS schema
#INCH Reference to constant named INCH in the EXPRESS schema

Invalid name expressions Problem
#23 Name begins with a digit
#INCHES INCHES is not defined in the EXPRESS schema
#PI PI is defined as a value in the EXPRESS schema
#Inch All letters must be normalized to upper case

Constant instance names may be used in RHS_OCCURRENCE productions only (see Table 2).

6.4.4.2 Constant value names

A constant value name shall be encoded as an at sign, "@", followed by an UPPER character, followed by a sequence of UPPER or DIGIT characters.

Constant value names are references to values defined in the EXPRESS schema. If there are multiple EXPRESS schemas defined in the file_schema of the exchange structure then the constant value name shall reference a value defined in the first schema (see clause 8.2.4).

The WSN for constant value names is given in Table 2 in the CONSTANT_VALUE_NAME production.

EXAMPLE

Valid name expressions Meaning
@PI Reference to the value of PI as defined in the EXPRESS schema
@E Reference to the value of E as defined in the EXPRESS schema

Invalid name expressions Problem
@23 Name begins with a digit
@INCH INCH is defined as an ENTITY instance in the EXPRESS schema
@Pie All letters must be normalized to upper case

Constant value names may be used in RHS_OCCURRENCE productions only (see Table 2).

6.4.4.3 Entity instance names

An entity instance name shall be encoded as a number sign, "#", followed by a sequence of DIGIT characters. At least one character shall not be "0". Leading zeros are not significant. An entity instance name shall not use the same integer as a value instance name.

NOTE 1      The integer spaces for ENTITY_INSTANCE_NAME and VALUE_INSTANCE_NAME are not permitted to overlap because both types may be referenced using a URI, for example "<abc.stp#123> " (see clause 10.2.7).

NOTE 2      Leading zeros in entity instance names are ignored so "#001" is the same identifier as "#1".

The WSN for entity instance names is given in Table 2 in the ENTITY_INSTANCE_NAME production.

EXAMPLE

Valid name expressions Meaning
#12 Names or refers to entity with identifier 12
#023 Names or refers to entity with identifier 23

Invalid name expressions Problem
#Faraday Contains non-numeric character
#439A6 Contains non-numeric character
#+23 Contains '+' sign
#00.1 Contains decimal point
74 Does not begin with a number sign

Entity instance names are used as references to entity instances. Both forward and backward references are permitted. An entity instance name may be defined in the reference section (see clause 10) or a data section (clause 11). Entity instance names may be used in LHS_OCCURRENCE and RHS_OCCURRENCE productions (see Table 2).

6.4.4.4 Value instance names

A value instance name shall be encoded as an at sign, "@", followed by a sequence of DIGIT characters. At least one character shall not be "0". Leading zeros are not significant. An value instance name shall not use the same integer as an entity instance name.

NOTE      This edition of this part of ISO 10303 allows instance names to be assigned to values so that values can be defined in external files. See annex K for examples.

The WSN for value instance names is given in Table 2 in the VALUE_INSTANCE_NAME production.

EXAMPLE

Valid name expressions Meaning
@12 Names or refers to value with identifier 12
@023 Names or refers to value with identifier 23

Value instance names are used as references to values. A value instance name is defined in the reference section (see clause 10). Value instance names may be used in LHS_OCCURRENCE and RHS_OCCURRENCE productions (see Table 2). A value instance name shall be defined in the reference section only.

6.4.5 Enumeration values

An enumeration value shall be encoded as a sequence of latin capital letters or digits beginning with a latin capital letter delimited by full stops. The meaning of a given enumeration value is determined by the EXPRESS schema and its associated definitions from the enumeration type declarations.

NOTE      Under certain conditions, transfer of clear text files via electronic mail attachment has been observed to corrupt the full stop at the start or end of an enumeration value. See A.2.2 for recommendations.

EXAMPLE

Valid enumeration expressions Meaning
.STEEL. Indicates a value of STEEL

Invalid enumeration expressions Problem
.RED Missing ending full stop
.123. Does not start with an alphabetic character.

6.4.6 Binary

A binary is a sequence of bits (0 or 1). A binary shall be encoded as determined by the following procedure.

NOTE      This is a binary to hexadecimal conversion.

EXAMPLE

Binary value Representation
'null' or 'empty' "0"
0 "30"
1 "31"
111011 "23B"
100100101010 "092A"

6.5 Anchor, reference and signature section encodings

The following encodings are used in the anchor, reference and signature sections.

6.5.1 Resource

A resource shall be encoded as a URI preceded by a less-than sign, "<" and followed by a greater-than sign, ">".

The WSN for resources is given in Table 2 in the RESOURCE production.

NOTE 1      In the anchor section the resource is on the right of the equals sign ("=") and the anchor name is on the left see clause 6.5.4.

EXAMPLE 1

Valid expression in the anchor section Meaning
<picture> = <a.jpeg>; Sets anchor "picture" to the resource <a.jpeg>
<BOM> = <b.xml#123>; Sets the anchor "BOM" to the resource <b.xml#123>

NOTE 2     A resource in the reference section must resolve to an entity instance or a value instance. See clause 10 for the resolution process

EXAMPLE 2

Valid expression in the reference section Meaning
#10 = <a#b>; Sets entity instance 10 to the entity identified by the resource <a#b>
@20 = <c#d>; Sets value instance 20 to the value identified by the resource <c#d>
6.5.2 Universal Resource Identifier (URI)

A UNIVERSAL_RESOURCE_IDENTIFIER token of Table 2 shall meet the requirements defined by the IETF (see 3.1.7.1).

EXAMPLE

External Reference Example Usage
<http://www.giant.com/examples/part.stpnc#first_workpiece> Reference to a workpiece in a STEP-NC file stored at the given world wide web address
<building.ifc#first_floor> Reference to a floor in an IFC building on the current server
<file:///c:/users/jt_files/assembly.jt.#first_shape> Reference to a shape in a JT file

6.5.3 URI Fragment identifier

A URI_FRAGMENT_IDENTIFIER token of Table 2 is the name following the number sign, "#", in a Universal Resource Identifier.

EXAMPLE

Universal Resource Identifier Fragment Identifier Example Usage
<http://www.tool_vendor.com/mill.stp#tool_tip> tool_tip Fragment identifier for a point at the tip of a cutting tool
<#first_floor>first_floor Fragment identifier for a floor in the current exchange structure
<http://www.plumber.com/structure.ifc#3F2504E0-4F89-11D3-9A0C-0305E82C3301>3F2504E0-4F89-11D3-9A0C-0305E82C3301 Fragment identifier defined by a UUID (see annex G)

6.5.4 Anchor name

An anchor name shall be encoded as a URI Fragment identifier preceded by a less-than sign, "<" and followed by a greater-than sign, ">". At least one character in a URI Fragment identifier that references an anchor name shall not be a digit.

NOTE 1      URI Fragment identifiers defined as digits are assumed to be references to occurrence names in exchange structures defined by previous editions of ISO 10303-21. See 10.2.7.

An anchor name that meets the requirements of annex G is a Universally Unique IDentitifer (UUID).

NOTE 2      Anchors defined by a UUID can be found without a URI because they are universally unique. See 10.2.2.

The WSN for anchor names is given in Table 2 in the ANCHOR_NAME production. Anchor names are used to define identifiers that can be externally referenced (see clause 9).

EXAMPLE

Valid expression in the anchor section Meaning
<a> = 3.142; Sets anchor "a" to 3.142
<b> = @10; Sets anchor "b" to value @10
<c> = #20; Sets anchor "c" to entity #20
<ad3f1724-19cf-4d19-94ef-eed90b7b4dde> = 2.71828; Sets anchor with the UUID "ad3f1724-19cf-4d19-94ef-eed90b7b4dde" to 2.71828
<2f0cb220-355d-11e5-a2cb-0800200c9a66> = @30; Sets anchor with the UUID "2f0cb220-355d-11e5-a2cb-0800200c9a66" to value @30
<3f553e90-355d-11e5-a2cb-0800200c9a66> = #40; Sets anchor with the UUID "3f553e90-355d-11e5-a2cb-0800200c9a66" to entity #40

6.5.5 Tag name

A tag name shall be encoded as a sequence of UPPER, LOWER and DIGIT characters. The first character shall be an UPPER or LOWER character.

The WSN for tag name is given in Table 2 in the TAG_NAME production. Tag names associate additional information with anchors. This information is not part of the information model.(see 9.2.8).

NOTE       Tag names are allowed in this edition of this part of ISO 10303 so that programmers can create data structures to optimize traversals when an information model is distributed across many exchange structures linked by anchors and references.

EXAMPLE

Valid expression in the anchor section Meaning
<plate_edge> = #20 {preparation:<WELD_DC.XML>} Associates edge at #20 with file WELD_DC.XML using the tag name "preparation"

6.5.6 Base64

A BASE64 token of Table 2 is data encoded to the meet the requirements of the IETF (3.1.7.5). Base64 is used to encode signatures and message digests.

EXAMPLE

Base64 encoding of a message digest
873b48e9dd16ec9c7a8423faba7e75a7a9d19ea07abce2808d94b3176ee8bd60

© ISO 2016 — All rights reserved