Main Documentation for grammar BLD.grm


Description

Contents


Generalities

This is a Jacc grammar for the so-called Presentation Syntax (PS) of the RIF Basic Logic Dialect (BLD) developed as a result of the activities of the RIF Working Group. This HTML file is the root of a hyperlinked documentation allowing one to explore the Jacc grammar for BLD via navigation through its elements - rules, terminal symbols, and non-terminal symbols. The comments accompanying some rules come from the original documents. It also contains the pure Yacc rules (i.e., without semantic actions), and the XML serialization mappings. This documentation is generated by Jacc from the Jacc grammar specified in file BLD.grm (i.e., with the command "jacc -doc BLD"). Along with this Jacc grammar file, there are other supporting source files.

This Jacc grammar is a transcription of the EBNF for the canonical syntax of the RIF BLD. This syntax is canonical in that this EBNF defines the kernel constructs used for the BLD-to-XML transformation rules. In addition to the canonical BLD PS language, it has been proposed to allow a simpler syntax for writing RIF use cases. This simpler syntax extends the canonical syntax by allowing various shorthands for RIF constants and for common expressions such as arithmetic, etc. - the so-called Abridged PS. This additional syntax is not canonical PS in that it is just syntactic sugar that is desugared into the canonical form.

Implementation conventions

The Jacc grammar specification given here is a literal transcription of the BNF rules given in the above references, adapted to the need of the Jacc format. There are two sets of grammar rules:
  1. the rules for the Basic Logic Condition (BLC) language; and,
  2. the rules for the Basic Logic Rule language (BLR).

Tokenizing

(N.B.: See the source file of the tokenizer Tokenizer.java.)

Recognized tokens

The terminal symbols are:

Token Value
STRING double-quoted string
VARIABLE max-length sequence of non-special chars starting with a '?'
IDENTIFIER max-length sequence of non-special chars
OR 'Or'
AND 'And'
FORALL 'Forall'
EXISTS 'Exists'
GROUP 'Group'
EXTERNAL 'External'
IF ':-'
ARROW '->'
LEXSPACE '^^'
EQUAL '='
MEMBER '#'
SUBCLASS '##'
COLON ':'
OPENPAR '('
CLOSEPAR ')'
OPENBRA '['
CLOSEBRA ']'
OPENMETA '(*'
CLOSEMETA '*)'

Important Notes

Important notes regarding lexical analysis

  • VARIABLE is a token recognized thanks to its leading '?' but the token returned by the lexer suppresses this leading '?'. This means that '?' is not a separate punctuation mark as shown by the EBNF.

  • Using STRING dispenses from the spurious "..."^^ notation, making '^^' an infix operator. In other words, the initial and final double quotes are part of the token itself an need not appear at the grammar level.

  • IDENTIFIER is any maximal sequence of non-separator not-punctuation unicode characters that does not start with a '?'.

  • Note that the colon character (':') is tokenized as punctuation. Indeed, a SymSpace is parsed as a pair of IDENTIFIERs separated by a colon. This means that ':' is a separate punctuation mark unlike what is shown by the EBNF.

The above conventions have been reached after a careful analysis of the various notions of what constitutes a constant in RIF BLD.

In the RIF specification of the EBNF for the BLD Rule Language, it is specified that:

  IRIMETA        ::= '(' IRICONST? (Frame | 'And' '(' Frame ')')? ')'
  Frame          ::= TERM '[' (TERM '->' TERM) ']'
  TERM           ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
  Const          ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
  SYMSPACE       ::= ANGLEBRACKIRI | CURIE
  
where CONSTSHORT, ANGLEBRACKIRI, and CURIE are defined (in the DTB shorthand notation for RIF constants) by:
   ANGLEBRACKIRI ::= IRI_REF
   CURIE         ::= PNAME_LN | PNAME_NS
   CONSTSHORT    ::= ANGLEBRACKIRI         // shortcut for "..."^^rif:iri
                   | CURIE                 // shortcut for "..."^^rif:iri
                   | '"' UNICODESTRING '"' // shortcut for "..."^^xs:string
                   | NumericLiteral        // shortcut for "..."^^xs:integer,xs:decimal,xs:double
                   | '_' LocalName         // shortcut for "..."^^rif:local
  
The PS grammar's tokenizing is complexified due to not using double-quoted strings around the IRI's that are arguments of the pragmas Prefix and Base, which declare shorthands for IRI's. The alternative would be to parse IRI's - which is beyond our prototype's goal, besides being unnecessary in this case. This is not so in the canonical PS, where all such IRI's are double-quoted strings - which greatly simplifies the tokenizing. It's as simple and as easy to do so for the Prefix and Base pragmas - which is what our prototype does.

Parsing

Recognized PS constructs

Important notes regarding syntax analysis

The raw BNF's

The two grammars for the BLD (Condition and Rule) languages expressed in Yacc form are given below.

BLD Condition Language

The original EBNF is accessible in the specification of the BLD Condition Language. It is reproduced here for convenience:
  FORMULA        ::= IRIMETA? 'And' '(' FORMULA* ')'
                   | IRIMETA? 'Or' '(' FORMULA* ')'
                   | IRIMETA? 'Exists' Var+ '(' FORMULA ')'
                   | ATOMIC
                   | IRIMETA? 'External' '(' Atom | Frame ')'
  ATOMIC         ::= IRIMETA? (Atom | Equal | Member | Subclass | Frame)
  Atom           ::= UNITERM
  UNITERM        ::= Const '(' (TERM* | (Name '->' TERM)*) ')'
  Equal          ::= TERM '=' TERM
  Member         ::= TERM '#' TERM
  Subclass       ::= TERM '##' TERM
  Frame          ::= TERM '[' (TERM '->' TERM)* ']'
  TERM           ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
  Expr           ::= UNITERM
  Const          ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
  Name           ::= UNICODESTRING
  Var            ::= '?' UNICODESTRING
  SYMSPACE       ::= ANGLEBRACKIRI | CURIE
  
  IRIMETA        ::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)'
  

  FORMULA        ::= IRIMETA? 'And' '(' FORMULA* ')' |
                     IRIMETA? 'Or' '(' FORMULA* ')' |
                     IRIMETA? 'Exists' Var+ '(' FORMULA ')' |
                     ATOMIC |
                     IRIMETA? 'External' '(' Atom | Frame ')'
  ATOMIC         ::= IRIMETA? (Atom | Equal | Member | Subclass | Frame)
  Atom           ::= UNITERM
  UNITERM        ::= Const '(' (TERM* | (Name '->' TERM)*) ')'
  Equal          ::= TERM '=' TERM
  Member         ::= TERM '#' TERM
  Subclass       ::= TERM '##' TERM
  Frame          ::= TERM '[' (TERM '->' TERM)* ']'
  TERM           ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
  Expr           ::= UNITERM
  Const          ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
  Name           ::= UNICODESTRING
  Var            ::= '?' UNICODESTRING
  SYMSPACE       ::= ANGLEBRACKIRI | CURIE
 
  IRIMETA        ::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)'

 

  Formula
    : Atomic
    | AND OPENPAR Formulas_opt CLOSEPAR
    | OR OPENPAR Formulas_opt CLOSEPAR
    | EXISTS Vars OPENPAR Formula CLOSEPAR
    | EXTERNAL OPENPAR Atom CLOSEPAR
    ;
  
  Atomic
    : Atom
    | Equal
    | Member
    | Subclass
    | Frame
    ;
  
  Atom
    : UniTerm
    ;
  
  UniTerm
    : Const OPENPAR UniTermBody CLOSEPAR
    ;
  
  Equal
    : Term EQUAL Term
    ;
  
  Member
    : Term MEMBER Term
    ;
  
  Subclass
    : Term SUBCLASS Term
    ;
  
  Frame
    : Term OPENBRA FrameAttributes_opt CLOSEBRA
    ;
  
  Term
    : Const
    | Var
    | Expr
    | EXTERNAL OPENPAR Expr CLOSEPAR
    ;
  
  Expr
    : UniTerm
    ;
  
  Const
    : STRING LEXSPACE SymSpace
    ;
  
  Var
    : VARIABLE
    ;
  
  UniTermBody
    : Terms_opt
    | TermAttributes_opt
    ;
  
  TermAttributes_opt
    : // empty
    | TermAttributes
    ;
  
  TermAttributes
    : TermAttribute
    | TermAttributes TermAttribute
    ;
  
  TermAttribute
    : Const ARROW Term
    ;
  
  FrameAttributes_opt  
    : // empty
    | FrameAttributes
    ;
  
  FrameAttributes
    : FrameAttribute
    | FrameAttributes FrameAttribute
    ;
  
  FrameAttribute
    : Term ARROW Term
    ;
  
  Formulas_opt
    : // empty
    | Formulas
    ;
  
  Formulas
    : Formula
    | Formulas Formulas
    ;
  
  Terms_opt
    : // empty
    | Terms_opt Term
    ;
  
  Vars
    : Var
    | Vars Var
    ;
  
  SymSpace
    : IDENTIFIER COLON IDENTIFIER
    ;  
  
  

BLD Rule Language

The original EBNF is accessible in the specification of the BLD Condition Language. It is reproduced here for convenience: (N.B.: See the specs. of the BLD Rule Language.)

  Document  ::= IRIMETA? 'Document' '(' Base? Prefix* Import* Group? ')'
  Base      ::= 'Base' '(' IRI ')'
  Prefix    ::= 'Prefix' '(' Name IRI ')'
  Import    ::= IRIMETA? 'Import' '(' IRICONST PROFILE? ')'
  Group     ::= IRIMETA? 'Group' '(' (RULE | Group)* ')'
  RULE      ::= (IRIMETA? 'Forall' Var+ '(' CLAUSE ')') | CLAUSE
  CLAUSE    ::= Implies | ATOMIC
  Implies   ::= IRIMETA? (ATOMIC | 'And' '(' ATOMIC* ')') ':-' FORMULA
  PROFILE   ::= TERM
  
  Document  ::= IRIMETA? 'Document' '(' Base? Prefix* Import* Group? ')'
  Base      ::= 'Base' '(' IRI ')'
  Prefix    ::= 'Prefix' '(' Name IRI ')'
  Import    ::= IRIMETA? 'Import' '(' IRICONST PROFILE? ')'
  Group     ::= IRIMETA? 'Group' '(' (RULE | Group)* ')'
  RULE      ::= (IRIMETA? 'Forall' Var+ '(' CLAUSE ')') | CLAUSE
  CLAUSE    ::= Implies | ATOMIC
  Implies   ::= IRIMETA? (ATOMIC | 'And' '(' ATOMIC* ')') ':-' FORMULA
  PROFILE   ::= TERM
 
 
 
  Group
    : GROUP Meta_opt OPENPAR RuleSet_opt CLOSEPAR
    ;
  
  Meta
    : Frame
    ;
  
  Rule
    : Clause
    | FORALL Vars_opt OPENPAR Clause CLOSEPAR
    ;
  
  Clause
    : Atomic
    | Implies
    ;
  
  Implies
    : Atomic IF Formula
    ;
  
  RuleSet_opt
    : // empty
    | RuleSet
    ;
  
  RuleSet
    : RuleOrGroup
    | RuleSet RuleOrGroup
    ;
 
  RuleOrGroup
    : Rule
    | Group
    ;
 
  Meta_opt
    : // empty
    | Meta
    ;
  
  Vars_opt
    : // empty
    | Vars
    ;
  

XML serialization annotations

This version of the BLD grammar is annotated for simple XML serialization as per the scheme specified in the current BLD document. Each XML serialization annotation generates an HTML documentation file accessible by navigating through the grammar (e.g., that of the rule for Group). The effects of such annotations are summarized in the table of XML serialization mappings.

Essentially, the format of a Jacc grammar is that of a Yacc grammar. As in Yacc, Jacc rules may be annotated with semantic actions in the form of Java code involving the rule's RHS constituents (denoted by $1, $2, ..., $n - the so-called pseudo-variables where the index n in $n refers to the order of RHS constituents. Such actions appear between curly braces ('{' and '}') wherever a symbol may appear in a rule's RHS. Jacc also allows an additional form of annotation in the RHS of a rule to indicate the XML serialization pattern of the abstract syntactic tree (AST) node corresponding to a derivation with this rule. This XML serialization meta-annotation comes between square brackets ('[' and ']') and is of the form described in a simple XML serialization annotation language.

For example, the annotated rule:

  QUANTIF
     : 'Exists' Var_plus '(' CONDIT ')'
     [
         nsprefix   : hrl
         localname  : quantifier
         attributes : {kind="existential"}
         children   : (2,4)
     ]
     ;
  
means that an AST node for this rule will be serialized thus:
   <hrl:quantifier kind="existential">
     (XML serialization of Var_plus)
     (XML serialization of CONDIT)
   </hrl:quantifier>
  
Rules without XML serialization annotation follow a default behavior: the serialization is the concatenation of those of its RHS's constituents, eliminating punctuation tokens; i.e., empty nodes and literal tokens - namely, tokens that do not carry a value. (See the Jacc XML annotation manual for more details.)

For example, see the two test files examples/Test1.bld and examples/Test2.bld. Running the command examples/bld on them produces the XML trees shown in examples/Test1.xml and examples/Test2.xml.


Copyright © 2008 ILOG, Inc.; All Rights Reserved.