ISO 10303-21:2016(E)

5 Formal definitions

5.1 Formal notation

Wirth Syntax Notation (WSN) is used in this part of ISO 10303 to specify the syntax of the exchange structure in a formal notation. WSN is described in annex B.

5.2 Basic alphabet definition

The alphabet of the exchange structure is defined as the code points U+0020 to U+007E and U+0080 to U+10FFFF of ISO/IEC 10646. The alphabet shall be represented by octets in the exchange structure using the UTF-8 encoding scheme defined by ISO/IEC 10646. Table 1 divides the basic alphabet into subsets.

The UTF-8 encoding scheme results in a single octet with a hexadecimal value from 20 to 7E for each LATIN_CODEPOINT character, and a sequence of octets with hexadecimal values from 80 to F4 for each HIGH_CODEPOINT character. Octets with values outside of these ranges shall be ignored when processing the exchange structure.

NOTE     The set of LATIN_CODEPOINT character is equivalent to the basic alphabet in the first and second editions of ISO 10303-21. The UTF-8 representation of code points U+0020 to U+007E is the same as the ISO/IEC 8859-1 characters G(02/00) to G(07/14) that defined the basic alphabet in earlier editions. Use of HIGH_CODEPOINT characters within the exchange structure can be avoided when compatibility with previous editions of ISO 10303-21 is desired.

Table 1 — WSN defining subsets of the basic alphabet
SPACE    = " " .

DIGIT    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
         | "8" | "9" .

LOWER    = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h"
         | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p"
         | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x"
         | "y" | "z" .

UPPER    = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H"
         | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P"
         | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X"
         | "Y" | "Z" | "_" .

SPECIAL  = "!" | """" | "*" | "$" | "%" | "&" | "." | "#"
	 | "+" | ","  | "-" | "(" | ")" | "?" | "/" | ":" 
	 | ";" | "<"  | "=" | ">" | "@" | "[" | "]" | "{" 
	 | "|" | "}"  | "^" | "`" | "~" .

REVERSE_SOLIDUS  = "\" .

APOSTROPHE = "'" .

LATIN_CODEPOINT = SPACE | DIGIT | LOWER | UPPER | SPECIAL 
          | REVERSE_SOLIDUS | APOSTROPHE

HIGH_CODEPOINT	= (U+0080 to U+10FFFF, see 5.2)

5.3 Exchange structure

The exchange structure shall be a sequential file using a clear text encoding. The exchange structure shall contain a header section and four optional sections: the anchor section, the reference section, one or more data sections and one or more signature sections. The role of each section is described below in the same order as which they appear in the exchange structure.

The exchange structure is defined by the WSN in Table 3.

NOTE      The header section is at the start because it defines context information for the rest of the exchange structure. The anchor and reference sections appear next because they define how the structure is linked to other files. Putting them near the beginning allows search systems to find these dependencies without reading the entire structure. The signature sections is at the end so that new signatures can be added without disturbing the text validated by earlier signatures.

The exchange structure is a stream of octets that are encodings of the graphic characters of the basic alphabet. The graphic characters are collected into recognizable sequences called tokens. Tokens may be separated by token separators. The exchange structure can be considered as a sequence of tokens and token separators.

The exchange structure may be compressed and stored in an archive using the organization described in annex A.4.

5.4 Definition of tokens

The tokens used in the exchange structure are defined by the WSN in Table 2. The tokens UNIVERSAL_RESOURCE_IDENTIFIER, URI_FRAGMENT_IDENTIFIER and BASE64 are defined in 6.5.

Table 2 — WSN of token definitions
KEYWORD           = USER_DEFINED_KEYWORD | STANDARD_KEYWORD .

USER_DEFINED_KEYWORD = "!" UPPER { UPPER | DIGIT } .

STANDARD_KEYWORD  = UPPER { UPPER | DIGIT } .

SIGN              = "+" | "-" .

INTEGER           = [ SIGN ] DIGIT { DIGIT } .

REAL              = [ SIGN ] DIGIT { DIGIT } "." { DIGIT }
                    [ "E" [ SIGN ] DIGIT { DIGIT } ] .

STRING            = "'" { SPECIAL | DIGIT | SPACE | LOWER | UPPER | 
                    HIGH_CODEPOINT |
                    APOSTROPHE APOSTROPHE | 
                    REVERSE_SOLIDUS REVERSE_SOLIDUS | 
                    CONTROL_DIRECTIVE } "'" .

ENTITY_INSTANCE_NAME      = "#" ( DIGIT ) { DIGIT } .

VALUE_INSTANCE_NAME       = "@" ( DIGIT ) { DIGIT } .

CONSTANT_ENTITY_NAME      = "#" ( UPPER ) { UPPER | DIGIT } .

CONSTANT_VALUE_NAME       = "@" ( UPPER ) { UPPER | DIGIT } .

LHS_OCCURRENCE_NAME       = ( ENTITY_INSTANCE_NAME | VALUE_INSTANCE_NAME ) . 

RHS_OCCURRENCE_NAME       = ( ENTITY_INSTANCE_NAME | VALUE_INSTANCE_NAME |
                              CONSTANT_ENTITY_NAME | CONSTANT_VALUE_NAME) . 

ANCHOR_NAME       = "<" URI_FRAGMENT_IDENTIFIER ">" .

TAG_NAME          = ( UPPER | LOWER) { UPPER | LOWER | DIGIT } .

RESOURCE          = "<" UNIVERSAL_RESOURCE_IDENTIFIER ">" .

ENUMERATION       = "." UPPER { UPPER | DIGIT } "." .

HEX               = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                    "8" | "9" | "A" | "B" | "C" | "D" | "E" | "F" .

BINARY            = """" ( "0" | "1" | "2" | "3" ) { HEX } """" .

SIGNATURE_CONTENT = BASE64 .

5.5 WSN of the exchange structure

The syntax of the exchange structure is specified in Table 3. Table 3 references the tokens defined in Table 2. The relationship between the syntax and the EXPRESS schema is specified in clause 12.

Table 3 — WSN of the exchange structure
EXCHANGE_FILE      = "ISO-10303-21;"
                     HEADER_SECTION [ ANCHOR_SECTION ] 
                     [ REFERENCE_SECTION ] { DATA_SECTION } 
                     "END-ISO-10303-21;" { SIGNATURE_SECTION }.

HEADER_SECTION     = "HEADER;" 
                     HEADER_ENTITY HEADER_ENTITY HEADER_ENTITY
                     [HEADER_ENTITY_LIST]
                     "ENDSEC;" .
HEADER_ENTITY_LIST = HEADER_ENTITY { HEADER_ENTITY } .
HEADER_ENTITY      = KEYWORD  "(" [ PARAMETER_LIST ] ")" ";" .

PARAMETER_LIST     = PARAMETER { "," PARAMETER } .
PARAMETER          = TYPED_PARAMETER  |
                     UNTYPED_PARAMETER | OMITTED_PARAMETER  .
TYPED_PARAMETER    = KEYWORD "(" PARAMETER ")" .
UNTYPED_PARAMETER  = "$" | INTEGER | REAL | STRING | RHS_OCCURENCE_NAME
                     | ENUMERATION | BINARY | LIST .
OMITTED_PARAMETER  = "*" .
LIST               = "(" [ PARAMETER { "," PARAMETER } ] ")" .

ANCHOR_SECTION     = "ANCHOR;" ANCHOR_LIST "ENDSEC;" .
ANCHOR_LIST        = { ANCHOR } .
ANCHOR             = ANCHOR_NAME "=" ANCHOR_ITEM { ANCHOR_TAG } ";" .
ANCHOR_ITEM        = "$" | INTEGER | REAL | STRING | ENUMERATION | BINARY
                     | RHS_OCCURRENCE_NAME | RESOURCE | ANCHOR_ITEM_LIST .
ANCHOR_ITEM_LIST   = "(" [ ANCHOR_ITEM { "," ANCHOR_ITEM } ] ")" .
ANCHOR_TAG         = "{" TAG_NAME ":" ANCHOR_ITEM "}" .

REFERENCE_SECTION  = "REFERENCE;" REFERENCE_LIST "ENDSEC;" .
REFERENCE_LIST     = { REFERENCE } .
REFERENCE          = LHS_OCCURRENCE_NAME "=" RESOURCE ";" .

DATA_SECTION       = "DATA" [ "(" PARAMETER_LIST ")" ] ";" 
                     ENTITY_INSTANCE_LIST "ENDSEC;" .
ENTITY_INSTANCE_LIST = { ENTITY_INSTANCE } .
ENTITY_INSTANCE    = SIMPLE_ENTITY_INSTANCE | COMPLEX_ENTITY_INSTANCE .
SIMPLE_ENTITY_INSTANCE  = ENTITY_INSTANCE_NAME "=" SIMPLE_RECORD ";" .
COMPLEX_ENTITY_INSTANCE = ENTITY_INSTANCE_NAME "=" SUBSUPER_RECORD ";" .
SIMPLE_RECORD      = KEYWORD "(" [ PARAMETER_LIST ] ")" .
SUBSUPER_RECORD    = "(" SIMPLE_RECORD_LIST ")" .
SIMPLE_RECORD_LIST = SIMPLE_RECORD { SIMPLE_RECORD } .

SIGNATURE_SECTION  = "SIGNATURE" SIGNATURE_CONTENT "ENDSEC;".

5.6 Token separators

A token separator is an element that separates two tokens. Token separators are space, the explicit print control directives, and comments. A token separator may appear between the terminals or non-terminals of the productions of Table 3. Any number of token separators may appear wherever one token separator may appear. A token separator shall not appear within tokens except that explicit print control directives may also appear within binaries and within strings. Print control directives are defined in clause 13.

NOTE      Space is the only whitespace character that separates tokens. Line-delimiters such as line feed or carriage return and other control characters such as form feed or character tabulation (tab) may appear in the exchange structure but are required by 5.2 to be ignored when processing the exchange structure. Consequently, line breaks may appear anywhere within the structure, including within tokens.

A comment shall be encoded as a solidus asterisk "/*" followed by any number of characters from the basic alphabet, and terminated by an asterisk solidus "*/". Any occurrence of solidus asterisk following the first occurrence shall not be significant, i.e. comments cannot be nested. All graphic characters appearing inside a comment shall not be significant to the exchange structure and are only intended to be read by humans.

© ISO 2016 — All rights reserved