/** *

* *

Generalities * *
Implementation conventions * *
- Tokenizing * *
  - Recognized tokens *
  - Important notes regarding lexical analysis *
  * *
- Parsing * *
  - Recognized PS constructs *
  - Important notes regarding syntax analysis *
  * *
* *
The raw BNF's * *
* *
XML serialization annotations * *

* *

Generalities

* * This is a Jacc grammar for * the so-called Presentation Syntax (PS) of the RIF Basic Logic Dialect (BLD) * developed as a result of the activities of the RIF * Working Group. * * This HTML file is the root of a hyperlinked documentation allowing * one to explore the Jacc grammar for BLD via navigation through its * elements - rules, terminal symbols, and non-terminal symbols. The comments accompanying * some rules come from the original documents. It also contains the * pure Yacc rules (i.e., without * semantic actions), and the XML serialization * mappings. This documentation is generated by Jacc from the Jacc * grammar specified in file BLD.grm * (i.e., with the command "jacc -doc * BLD"). Along with this Jacc grammar file, there are other * supporting source files. * *

* * This Jacc grammar is a transcription of the EBNF * for the canonical syntax of the RIF BLD. This syntax is * canonical in that this EBNF defines the kernel constructs used * for the BLD-to-XML * transformation rules. In addition to the canonical BLD PS * language, it has been proposed to allow a simpler syntax for writing * RIF use cases. This simpler syntax extends the canonical syntax by * allowing various shorthands * for RIF constants and for common expressions such as arithmetic, * etc. - the so-called Abridged PS. This * additional syntax is not canonical PS in that it is just syntactic * sugar that is desugared into the canonical form. * *

Implementation conventions

* * The Jacc grammar specification given here is a literal transcription * of the BNF rules given in the above references, adapted to the need * of the Jacc format. There are two sets of grammar rules: * *

the rules for the Basic * Logic Rule language (BLR); and, * *
the rules for the Basic * Logic Condition (BLC) language. *

* *

Tokenizing

* * (N.B.: See the source file of the tokenizer Tokenizer.java.) * *

Recognized tokens

* * The terminal symbols are: *

* *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*Token*	*Value*
`OPENPAR`	`'('`
`CLOSEPAR`	`')'`
`OPENBRA`	`'['`
`CLOSEBRA`	`']'`
`OPENMETA`	`'(*'`
`CLOSEMETA`	`'*)'`
`DOCUMENT`	`'Document'`
`BASE`	`'Base'`
`PREFIX`	`'Prefix'`
`IMPORT`	`'Import'`
`GROUP`	`'Group'`
`EXTERNAL`	`'External'`
`AND`	`'And'`
`OR`	`'Or'`
`EXISTS`	`'Exists'`
`FORALL`	`'Forall'`
`IF`	`':-'`
`ARROW`	`'->'`
`LEXSPACE`	`'^^'`
`EQUAL`	`'='`
`MEMBER`	`'#'`
`SUBCLASS`	`'##'`
`COLON`	`':'`
`NUMBER`	(possibly signed) integer, decimal, or floating-point
`VARIABLE`	maximum-length of word characters starting with * a `'?'` *
`LOCALNAME`	maximum-length of word characters starting * with a `'_'` *
`STRING`	a double-quoted string containing any character (using * `'\\'` to escape `'"'`) *
`IDENTIFIER`	maximum-length of word characters starting with a letter *

* * *

* *

Important Notes

* *

Important notes regarding lexical analysis

* *

NUMBER is a token representing numbers. * *
VARIABLE is a token * recognized thanks to its leading '?' but the token returned by the lexer * suppresses this leading '?'. * * This means that '?' is * not a separate punctuation mark as shown by the BLC * language's EBNF. * *
LOCALNAME is a token * recognized thanks to its leading '_' but the token returned by the lexer * suppresses this leading '_'. * * This means that '_' is * not a separate punctuation mark as shown by the DTB * language's EBNF for shorthands. * *
Using STRING dispenses * from the spurious "..."^^ * notation, making '^^' an infix * operator. * * In other words, the initial and final double quotes are * part of the token itself and need not appear at the grammar level. * *
IDENTIFIER is any maximal sequence of * non-separator not-punctuation unicode characters that does not start * with a '?'. * *
Note that the colon character (':') is * tokenized as punctuation. Indeed, a SymSpace is parsed as a pair of * IDENTIFIERs separated by a * colon. * * This means that ':' * is a separate punctuation mark unlike what is shown by * the EBNF. * *

* * The above conventions have been reached after a careful analysis of * the various notions of what constitutes a constant in RIF BLD. * *

* * In the RIF * specification of the EBNF the Rule Language, it is specified * that: * *

 * IRIMETA        ::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)'
 * Frame          ::= TERM '[' (TERM '->' TERM)* ']'
 * TERM           ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
 * Const          ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
 * SYMSPACE       ::= ANGLEBRACKIRI | CURIE
 *

* * where CONSTSHORT, ANGLEBRACKIRI, and CURIE * are defined (in the DTB * shorthand notation for RIF constants) by: * *

 * CURIE         ::= PNAME_LN | PNAME_NS
 * CONSTSHORT    ::= ANGLEBRACKIRI         // shorthand for "..."^^rif:iri
 *                 | CURIE                 // shorthand for "..."^^rif:iri
 *                 | '"' UNICODESTRING '"' // shorthand for "..."^^xs:string
 *                 | NumericLiteral        // shorthand for "..."^^xs:integer,xs:decimal,xs:double
 *                 | '_' LocalName         // shorthand for "..."^^rif:local
 *

* where: *

 * ANGLEBRACKIRI ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
 * PNAME_LN      ::= PNAME_NS PN_LOCAL
 * PNAME_NS      ::= PN_PREFIX? ':'
 * PN_LOCAL      ::= (PN_CHARS_U | [0-9]) ((PN_CHARS|'.')* PN_CHARS)?
 * PN_PREFIX     ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
 * PN_CHARS_U    ::= PN_CHARS_BASE | '_'
 * PN_CHARS      ::= PN_CHARS_U
 *                 | '-'
 *                 | [0-9]
 *                 | #x00B7
 *                 | [#x0300-#x036F]
 *                 | [#x203F-#x2040]
 * PN_CHARS_BASE ::= [A-Z]
 *                 | [a-z]
 *                 | [#x00C0-#x00D6]
 *                 | [#x00D8-#x00F6]
 *                 | [#x00F8-#x02FF]
 *                 | [#x0370-#x037D]
 *                 | [#x037F-#x1FFF]
 *                 | [#x200C-#x200D]
 *                 | [#x2070-#x218F]
 *                 | [#x2C00-#x2FEF]
 *                 | [#x3001-#xD7FF]
 *                 | [#xF900-#xFDCF]
 *                 | [#xFDF0-#xFFFD]
 *                 | [#x10000-#xEFFFF]
 *

* * The PS grammar's tokenizing is complexified due to not using * double-quoted strings around the IRI's that are arguments of * the pragmas Prefix and Base, which declare * shorthands for IRI's. The alternative would be to parse * IRI's - which is beyond our prototype's goal, besides being * unnecessary in this case. This is not so in the canonical PS, where * all such IRI's are double-quoted strings - which greatly * simplifies the tokenizing. It's as simple and as easy to do so for * the Prefix and Base pragmas - which is what our * prototype does. * *

Parsing

* *

Recognized PS constructs

* *

Important notes regarding syntax analysis

* *

The raw BNF's

* * The two grammars for the BLD (Condition and Rule) languages expressed * in Yacc form are given below. * *

BLD Rule Language

* *

* * The original EBNF is accessible in the specification of the BLD * Rule Language. It is reproduced here for convenience: * **************************************************************************** *

 * Document  ::= IRIMETA? 'Document' '(' Base? Prefix* Import* Group? ')'
 * Base      ::= 'Base' '(' IRI ')'
 * Prefix    ::= 'Prefix' '(' Name IRI ')'
 * Import    ::= IRIMETA? 'Import' '(' IRICONST PROFILE? ')'
 * Group     ::= IRIMETA? 'Group' '(' (RULE | Group)* ')'
 * RULE      ::= (IRIMETA? 'Forall' Var+ '(' CLAUSE ')') | CLAUSE
 * CLAUSE    ::= Implies | ATOMIC
 * Implies   ::= IRIMETA? (ATOMIC | 'And' '(' ATOMIC* ')') ':-' FORMULA
 * PROFILE   ::= TERM
 *

**************************************************************************** * * The Jacc rules corresponding to this EBNF are given in BLR.grm. * *

BLD Condition Language

* * The original EBNF is accessible in the specification of the BLD * Condition Language. It is reproduced here for convenience: * **************************************************************************** * * FORMULA ::= ATOMIC * | IRIMETA? 'And' '(' FORMULA* ')' * | IRIMETA? 'Or' '(' FORMULA* ')' * | IRIMETA? 'Exists' Var+ '(' FORMULA ')' * | IRIMETA? 'External' '(' Atom | Frame ')' * ATOMIC ::= IRIMETA? (Atom | Equal | Member | Subclass | Frame) * Atom ::= UNITERM * UNITERM ::= Const '(' (TERM* | (Name '->' TERM)*) ')' * Equal ::= TERM '=' TERM * Member ::= TERM '#' TERM * Subclass ::= TERM '##' TERM * Frame ::= TERM '[' (TERM '->' TERM)* ']' * TERM ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')') * Expr ::= UNITERM * Const ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT * Name ::= UNICODESTRING * Var ::= '?' UNICODESTRING * SYMSPACE ::= ANGLEBRACKIRI | CURIE * * IRIMETA ::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)' *
**************************************************************************** * * The Jacc rules corresponding to this EBNF are given in BLC.grm. * *

Additional ad hoc rules

* * Some Jacc rules corresponding to temporary ad hoc * implementation decisions for the sake of prototyping are given in AdHoc.grm. * *

XML serialization annotations

* * This version of the BLD grammar is annotated for simple XML * serialization as per the scheme specified in the current BLD * document. Each XML serialization annotation generates an HTML * documentation file accessible by navigating through the grammar * (e.g., that of the rule for * Group). The effects of such annotations are summarized * in the table of XML serialization mappings. * *

* * Essentially, the format of a * Jacc grammar is that of a Yacc grammar. As in Yacc, Jacc rules * may be annotated with semantic actions in the form of Java code * involving the rule's RHS constituents (denoted by $1, * $2, ..., $n - the so-called * pseudo-variables where the index n in $n refers * to the order of RHS constituents. Such actions appear between curly * braces ('{' and '}') wherever a symbol may appear * in a rule's RHS. * * Jacc also allows an additional form of annotation in the RHS of a * rule to indicate the XML serialization pattern of the abstract * syntactic tree (AST) node corresponding to a derivation with this * rule. This XML serialization meta-annotation comes between square * brackets ('[' and ']') and is of the form described * in a simple XML * serialization annotation language. * *

* For example, the annotated rule: * * *

 * QUANTIF
 *    : 'Exists' Var_plus '(' CONDIT ')'
 *    [
 *        nsprefix   : hrl
 *        localname  : quantifier
 *        attributes : {kind="existential"}
 *        children   : (2,4)
 *    ]
 *    ;
 *

* * * means that an AST node for this rule will be serialized thus: * * *

 *  <hrl:quantifier kind="existential">
 *    (XML serialization of Var_plus)
 *    (XML serialization of CONDIT)
 *  </hrl:quantifier>
 *

* * * Rules without XML serialization annotation follow a default behavior: * the serialization is the concatenation of those of its RHS's * constituents, eliminating punctuation tokens; i.e., empty * nodes and literal tokens - namely, tokens that do not carry a value. * (See the Jacc XML * annotation manual for more details.) * *

* * For example, see the two test files examples/Test1.bld and examples/Test2.bld. Running the * command examples/bld on them produces the XML trees * shown in examples/Test1.xml and * examples/Test2.xml. */