The LRSX Unification Tool


Synopsis
The LRSX Unification Tool implements a new algorithm for unification of meta-expression in lambda calculi with recursive and shared bindings (i.e. so-called letrec-expressions).

The underlying meta-language and the unification algorithm are designed to capture usual call-by-need lambda-calculi included their reduction strategies. Besides the core utility to unify expressions of the meta-language, the tool can overlap reduction rules with program transformations on a given calculus description as input, and compute the corresponding critical pairs,

The LRSX Unification Tool is implemented in Haskell and packaged via Cabal and comes with a 3-clause BSD-like LICENSE.

Download and Building
  • The source-distribution is available here: lrsx-0.3.0.4.tar.gz
  • A recent installation of the Glasgow Haskell Compiler, the base libraries, and Cabal-Install is required (see https://www.haskell.org/downloads)
  • To build lrsx unzip the archive, then run cabal, i.e.:

    tar -xzf lrsx-0.3.0.4.tar.gz
    cd lrsx-0.3.0.4/
    cabal configure
    cabal build

    This will create the executable lrsx in dist/build/lrsx/, e.g. type

    dist/build/lrsx/lrsx help

    to see the help-text of the LRSX Unification tool.
  • The executable lrsx can be installed via typing cabal install If this completes successfully you will have the lrsx-binary in ~/.cabal/bin.
Documentation

Using LSRX

The LRSX Unification Tool prints several help-texts by calling lrsx help or more command specific help by typing lrsx command help.

LRSX mainly supports two kinds of input files:

  • Unification equations with constraints:
    For those input files call lrsx unify file, where file contains the input. LRSX will unify the input and print the output to stdout.
  • Program calculus descriptions and overlap commands:
    • lrsx overlap file computes the critical pairs, requested by the commands in file.
Both formats share a common syntax for expressions and constraints as described below.

Unification equations with constraints

The allowed syntax can be represented by the following grammar in ENBF, where nonterminals are in bold font and the side condition that in derivations from EXPRESSION the symbol HOLE does not occur, except for deriviation starting with CONTEXT where the symbol HOLE occurs exactly once.

CONSTRAINED_EQUATIONS ::=EQUATIONS
| EQUATIONS where CONSTRAINTS
EQUATIONS ::=EQUATION
| EQUATION ,EQUATIONS
EQUATION ::=EXPRESSION =?= EXPRESSION
EXPRESSION ::=letrec ENVIRONMENT in EXPRESSION
| letrec { ENVIRONMENT } in EXPRESSION
| SEXPRESSION
SEXPRESSION ::=FUNSYMWITHINFO BINDEXPR1...BINDEXPRN
| ATOM
FUNSYMWITHINFO ::=FUNSYM | FUNSYM[STRICTPOSITIONS]
STRICTPOSITIONS ::=INTEGER,...,INTEGER
BINDEXPR::=ATOM | VARIABLE.BINDEXPR
| *VARIABLE | (*VARIABLE)
ATOM::=var VARIABLE
| ( EXPRESSION )
| METAVARIABLE
| CONTEXTVAR [ EXPRESSION ]
| HOLE
ENVIRONMENT ::=ENVITEM
| ENVITEM ; ENVIRONMENT
ENVITEM ::=METAVARIABLE
| METAVARIABLE [ VARIABLE , EXPRESSION ]
| METAVARIABLE | VARIABLE , VARIABLE |
| VARIABLE = EXPRESSION
VARIABLE ::=CONCRETEVAR | METAVARIABLE
CONCRETEVAR ::=(a|...|z)(a|...|z|A|...|Z|0|...|9)*
FUNSYM ::=CONCRETEVAR
METAVARIABLE ::=(B|D|...|S|U|...|Z)(a|...|z|A|...|Z|0|...|9)*
CONTEXTVAR ::=ACONTEXT | TCONTEXT | CCONTEXT
ACONTEXT ::=A(a|...|z|A|...|Z|0|...|9)*
TCONTEXT ::=T(a|...|z|A|...|Z|0|...|9)*
CCONTEXT ::=C(a|...|z|A|...|Z|0|...|9)*
CONSTRAINTS ::=CONSTRAINT
| CONSTRAINT , CONSTRAINTS
CONSTRAINT ::=METAVAR /= {}
| CONTEXTVAR /= []
| ( EXPRESSION , CONTEXT )
| ( ENVIRONMENT , CONTEXT )
CONTEXT ::=EXPRESSION

An informal explanation of the syntax is

  • A valid input is a sequence of equations where equations are separated by , and after the equations a sequence of constraints may occur which is introduced by the keyword where
  • Equations are written with =?= as equation symbols, and both sides of one equation are LRSX-expressions.
  • LRSX-expressions are meta-expressions which comprise concrete syntax and meta-variables, symbols beginning with upper-case letters are unification variables: unification variables beginning with
    • A are interpreted as application context variables
    • T are interpreted as top context variables
    • C are interpreted as arbitrary context variables
    Other unification variables are
    • expression variables (usually denoted by S)
    • lambda- or let-variable variables (usually denoted by X or Y)
    • letrec-environment variables (usually denoted by E)
    • environment chain variables with special syntax
      • Varname[y,s] represents a (non-empty) application context chain of the form y=A[y1];y2=A[y2];...;yn=A[s] (usually denoted by EE[y,s])
      • Varname|y1,yn| represents a (non-empty) variable-to-variable chain of the form y1 = (var y2);y2=(var y3);...;yn-1=(var yn) (usually denoted by VV|y1,yn|)
    The syntax of expressions includes:
    • the special function symbol var to lift let- and lambda-variables to expressions,
    • other function symbols are recognized by the input, where function symbols must begin with a lower case letter. The algorithm assumes the first argument of a function symbol to be strict, while all other positions are nonstrict. An example is (app S1 S2) which introduces the function symbol app with arity 2 However, strictness can be added by providing a list of strict positions, i.e. app[1,2] defined app to be strict in both arguments.
    • Higher-order expressions are written as x1.x2....xn.s, i.e. lambda abstraction is a function symbol applied to a higher-order expression with on binder, which can be written as \(var.body) but also without parentheses as \var.body.
    • Variable-positions are denoted by the special symbol *, e.g. in app S *X the function symbol app only allows variables as second argument.
    • letrec Environment in body where body is the in-expression and Environment is a meta-letrec-environment: environments may be enclosed by { and }. Parts of the environment are separated by ; and allowed parts are
      • a meta-variable for an environment (usually denoted by E)
      • an environment chain (see above, usually denoted by EE[y,s] or VV|y1,yn|)
      • single bindings, written as var = expr

    Expressions must fulfill the weak distinct variable convention which requires that names of bound variables are different from names of free variables, However, it is allowed that bound (let- or lambda-) variables occur several times, where it is forbidden, that the same let-variable is bound twice in the same letrec-environment.

    The following constraints can be added to the input equations (separated by ,):
    • non-emptiness constraints for contexts written as CtxtName /= [] with the semantics that the context represented by CtxtName must not be the empty context.
    • non-emptiness constraints for environment variables written as EnvName /= {} with the the semantics that the part of the environment represented by EnvName is not an empty environment
    • non-capture constraints written as (expr,ctxt) or [env,ctxt] with the meaning that the free variables of expression expr (environment env, resp.) is not captured by the hole of the context ctxt. The syntax for contexts is the same as for expressions, where additionally the terminal HOLE can (and must be) used to denote the context hole. An example is (var Y,\X.HOLE) which ensures that the variable Y may not be instantiated with the same variable as X is instantiated, since in this case Y is captured by the hole of \X.HOLE.

Program calculus descriptions and overlap commands

The grammar for the syntax of program calculus descriptions and commands extends the above grammar as follows, where the start symbol is RC

RC::=PRULES COMMANDS
PRULES::=RULES | RULES TRANSFORMATIONS RULES
RULES::=RULE | RULES
RULE::={ RULENAME } EXPRESSION ==> EXPRESSION
| { RULENAME } EXPRESSION ==> EXPRESSION where CONSTRAINTS
RULENAME::=SR,RNAME,Number | RNAME,Number
RNAME ::=(a|...|z)(a|...|z|A|...|Z|0...9|_)*
COMMANDS::=COMMAND | COMMAND COMMANDS
COMMAND::=overlap (RULENAME).DIR and (RULENAME).DIR
|overlap (RULENAME).DIR all
DIR::=l | r

The calculus description begins with a sequence of rules followed by a sequence of commands.

The sequence of rules may be partitioned into two sets of rules where the first set are standard reductions and the second set are program transformations. The sets are separated by the keyword TRANSFORMATIONS.

Rules are written as {rulename} left hand side ==> right hand side followed by an optional where-clause to add constraints to the rule. Both sides of the rule are expressions. The rulename should be unique for every rule and it must start with a lower-case letter. If it starts with SR, then it denotes a standard reduction, otherwise it denotes a transformation rule. The rulename also have a number. If no number is given, then the number is always 1.
Commands are of the form overlap (rulename1,Number).dir1 and (rulename2,Number).dir2 where dir is either l or r, to express which side of the rule should be used for overlapping. For computing forking diagrams the left hand side of a transformation has to be overlapped with the left hand sides of standard reductions, while for computing commuting diagrams, the right hand side of a transformation hat to be overlapped with the left hand sides of standard reductions.
As a short form also the command overlap (rulename1,Number).dir1 all is available which computes all overlaps of rulename1.dir1 with all left hand sides of all standard reductions (i.e. the rules which are named with SR,... ).

Examples

The source distribution comes with several examples which can be found in examples.

Examples for Unification

The following table lists the example-input files and the corresponding output generated by the LRSX Unification Tool:

Example Output
example1.inp Output
example2.inp Output
example3.inp Output
example4.inp Output
example5.inp Output
example6.inp Output
example7.inp Output
example8.inp Output
example9.inp Output
example10.inp Output
example11.inp Output
example12.inp Output
example13.inp Output
example14.inp Output

Examples for Computing Overlaps

We provide two examples of program calculi with recursive let:
  • The calculus LNEED is a minimal lambda-calculus with letrec. Its syntax comprises abstractions, applications, variables and letrec.

    The description of the calculus and several program transformations can be found in the file LNEED.inp.

    Computing forking overlaps can be done with the input file lneed_fork.inp which requires LNEED.inp. in the same directory. Running
    lrsx overlap lneed_fork.inp
    produces the critical pairs which can be found in the following text-file .

    Computing commuting overlaps can be done with the input file lneed_commute.inp which requires LNEED.inp. in the same directory. Running
    lrsx overlap lneed_commute.inp
    produces the critical pairs which can be found in the following text-file .
  • The calculus LR extends the calculus LNEED by data constructors, case-expressions and Haskell's seq-operator (LR can be seen as an untyped core language of Haskell) The description of the calculus and several program transformations can be found in the file LR.inp.

    Computing forking overlaps can be done with the input file lr_fork.inp which requires LR.inp. in the same directory. Running
    lrsx overlap LR_fork.inp
    produces the critical pairs which can be found in the following text-file (~389 MB!).

    Computing commuting overlaps can be done with the input file lr_commute.inp which requires lr.inp. in the same directory. Running
    lrsx overlap lr_commute.inp
    produces the critical pairs which can be found in the following text-file (~455 MB!).
Contact
Dr. David Sabel
Computer Science Institute
Computer Science and Mathematics Department
Johann Wolfgang Goethe-University Frankfurt am Main
(c) 2016 PD Dr. David Sabel, Imprint, Last modified: $Author: sabel $, $Date: 2016/07/17 11:03:32 $