1.1. Terminology¶

The keywords “MUST”, “MUST NOT”, “SHALL”, “SHALL NOT”, and “MAY” in this document are to be interpreted as described in RFC 2119 [1].

The key phrase “information item”, as well as any specific type of information item such as an “element information item”, are to be interpreted as described in XML Information Set [2].

CellML infoset: An XML information set containing a hierarchy of information items conforming to the rules described in this document. In this specification, such infosets are assumed to be CellML 2.0 infosets.

CellML model: A mathematical model represented by a hierarchy of one or more CellML infosets, according to the rules described in this document. In this specification, the topmost CellML infoset in a hierarchy is referred to as the top-level CellML infoset.

Namespace: An XML namespace, as defined in Namespaces in XML 1.1 [3].

CellML namespace: The CellML 2.0 namespace.

CellML 2.0 namespace: The namespace http://www.cellml.org/cellml/2.0#.

MathML namespace: The namespace http://www.w3.org/1998/Math/MathML.

CellML information item: Any information item in the CellML namespace.

Basic Latin alphabetical character: A Unicode [4] character in the range U+0041 to U+005A or in the range U+0061 to U+007A.

Digit: A Unicode character in the range U+0030 to U+0039.

Basic Latin alphanumerical character: A Unicode character which is either a Basic Latin alphabetical character or a digit.

Basic Latin underscore: The Unicode character U+005F.

Basic Latin plus: The Unicode character U+002B.

Basic Latin minus: The Unicode character U+002D.

Basic Latin full stop: The Unicode character U+002E.

Whitespace character: Any one of the Unicode characters U+0009, U+000A, U+000D, or U+0020.

Namespaces

Namespaces are a way of making sure that names and definitions are interpreted within the right frame of reference. In CellML, two namespaces are used. These are the CellML namespace itself (which helps to define the units elements as distinct from the XML default units system) and the MathML namespace used to interpret the math elements. For more information on namespaces in general, please refer to the W3 schools namespace page, or for the MathML namespace please refer to the W3 MathML namespace page.

Unicode, Basic Latin alphabetical characters, and digits

The Unicode project is an attempt to codify all of the symbols - alphabets, writings, even emojis - of all the languages of the world so that they can be interchangeable and interpreted by computers. Since computers understand numbers rather than symbols, each character in each language is given a unique numerical code. The codes themselves are arranged into blocks representing sets of characters, and the Basic Latin block is one of these. It contains, amongst other things, the upper and lowercase alphabet without decoration:

abcdefghijklmnopqrstuvwxyz (symbols between U+0061 and U+007A) ABCDEFGHIJKLMNOPQRSTUVWXYZ (symbols between U+0041 and U+005A)

These characters are referred to in CellML as the “Basic Latin alphabetical characters”.

The “digits” are the Unicode characters:

0123456789 (symbols between U+0030 and U+0039)

Together with the Basic Latin alphabetical characters, they form the “Basic Latin alphanumerical characters”.

In addition, CellML recognises four special characters:

+ the plus sign (U+002B),
- the minus sign (U+002D),
. the full stop or decimal point (U+002E),
_ the underscore (U+005F),

and the following whitespace characters:

the space (U+0020),
the tab (U+0009),
the carriage return (U+000D), and
the line feed (U+000A).

These symbols and character sets will be used throughout the CellML definition to define allowed content of attributes and elements.