Synthesizing an automatic table layout solver with FTL

At the Mozilla/Berkeley meeting on 2/10, an important concern was how to use our FTL synthesizer for difficult features of CSS such as tables, floats, and text. As promised, this document shows how we generated code for the core of automatic table layoutHTML4 Tables. We were able to automatically generate parallel code and, by the same reasoning, see no obstacle for generating incremental code. We did have to slightly modify our runtime library and have been planning a language extension to automate this modification.

I picked two features from the CSS standard to implement. On the left, you can see them computed live by a JavaScript engine we generated, and on the right, by the browser's native layout engine:

Columns as wide as their widest cell and rows as tall as their tallest cell
FTL rendering HTML rendering
a tall cell
that increases
the height
of the row
a really wide cell that increases the width of the column  

Cells that span multiple rows or columns
FTL rendering HTML rendering
span inside a cell, such that the cell spans two columns of the table inline-block inside a cell, such that the cell spans two rows of the table
span inside a singleton cell span inside singleton cell
As expected, we encountered three essential challenges:
  1. Constructing the table's grid data structure: Over a sequence of tree traversals, compute the grid that maps cells to rows and columns
  2. Traversing a dynamic graph: A column's cells are defined by the above grid data structure; computations over them must be scheduled after the grid entries are computed.
  3. Traversing a DAG: For tables, we compute over a DAG rather than a tree because a cell's parents are both its row and column.
We expressed most of the table logic in the specification. For example, to construct the grid data structure, the specification manipulates functional lists rather than just numbers. Likewise, to ensure a column's computations over its cells are scheduled after the grid is constructed, we included this dependency in the specification.

In total, we only edited the specification (see above) and the runtime. Our runtime edits were to use a breadth-first traversal for traversing a table and, to lookup the children of a column, search table rows for cells with the corresponding column number attribute. We did not have to add table-specific code into the synthesizer (the offline scheduling analysis) nor the code generator. Furthermore, we are extending the specification language to better handle the patterns we encountered, which would simplify the specification and eliminate some if not all of the runtime modifications.

The rest of this document overviews our HTML5 implementation, one of our ideas for an FTL extension for an even cleaner specification, and shows our code.

Contents

    Solution: computing the table grid with list function calls

    The placement of a cell into a column is complicated by preceding cells that span multiple rows ("rowspan=n") and columns ("colspan=n"): the cell must be placed in the first column such that an earlier cell in a top-down, left-to-right ordering does not overlap it. For example, consider the table on the left. Even though the blue cell on the bottom right is the third cell of its row in the parse tree, it is not placed in the third column. The reason is that the magenta cell in the first row transitively impacts the placement of the cells after it.

    Unsurprisingly, our specification traverses the cells in top-down, left-to-right order. For each row, it computes what columns its cells are placed in as a function of the list of columns that are still occuppied by preceding cells. The next row is given the columns that are occupied after adding cells on the current row, etc. Our specification of this behavior is interesting in that it is just calls to functional list manipulation methods written in our host language (C++/JavaScript):

    //Specification in an idealized syntax
    class TableBox {
      rows[-1].colAssignment := emptyColumnList(colCount);
      rows[i >= 0].colAssignment := 
        columnsAppendRow(
          rows[i - 1].colAssignment, 
          rows[i].cells, 
          rows[i].rowNum);
    
    //Equivalent specification in FTL's current surface syntax
    class TableBox (shrinkToFitHeightWidth, strokeBox) : Node {
      loop rows {
        rows.colAssignment := 
          fold 
            emptyColumnList(colCount) 
            .. 
            columnsAppendRow(
              rows$-.colAssignment, 
              rows$i.cells, 
              rows$i.rowNum);
    

    The specification on the left uses a slightly cleaner syntax than our FTL prototype (shown on the right), but the important part is the calls to functions emptyColumnList() and columnsAppendRow(). As long as emptyColumnList() and columnsAppendRow() implement a functionally pure interface, attribute grammar techniques for automatic parallelization and incrementalization still apply. For example, instead of a destructive append that mutates a list, we use a functional append. If a script adds a cell to row 3, incremental evaluation could therefore safely reuse rows[2].colAssignment to recompute rows[3].colAssignment without first recomputing rows[-1,0,1].colAssignment.

    Solution: computing over a dynamically generated graph

    A column computes the x coordinates of its cells, but the column's cells are only known after the last columnsAppendRow() call. To ensure the x coordinate computations are scheduled to occur after the grid is computed, we declared this dependency as part of the specification.

    The grid is stored in an attribute, so we simply propagated the grid to all the table nodes as an attribute (cellsready). We were then able to declare the dependency on cellsready to the column computations:

    //Specification in an idealized syntax
    Schedule {
      Col.childs[i].relX <- Col.cellsready
      Col.childs[i].absX <- Col.cellsready
    
    //Equivalent specification in FTL's current surface syntax
    schedule { 
      asserta(assignment(col, self, childs_relx_step, self, cellsready)),
      asserta(assignment(col, self, childs_absx_step, self, cellsready)),
    

    The synthesizer now knows to schedule relX and absX computations only after the grid cellsready is computed.

    Solution: computing over a DAG

    Computing over a table means computing over a DAG, not a tree: a cell has both a row and a column as its parents. This impacted both our runtime and our specification strategy. A noteworthy event is that we did not have to modify the synthesizer nor the code generator.
    1. Runtime strategy: breadth-first traversal of DAG subgraphs.
      <table>
        <tr>      //row
          <td/>   //cell
        </tr>
        <col/>
      </table>
      
      An important invariant of a top down traversal is that a node's parent is visited before the node itself. A valid implementation for trees is depth first. However, consider a depth first traversal of a table's parse tree (see box on the right). It would visit the table, the row, the cell, and then the column. The cell is visited before its column!

      Our modification was simple: we edited the runtime to visit the nodes of a table with a breadth first traversal. We kept the overall document traversal as depth first for performance reasons.

    2. Specification strategy: synthesizing code for nodes with two parents. We had to convince the synthesizer that visiting a cell's parent row and column would set all the attributes needed by the cell (unambiguous) and without conflicting with each other. For example, a column defines the relX/absX attributes of its child cell, and a row, its relY/absY. FTL rightfully rejects such a specification because, if the document is a tree, a cell has only one parent. If that parent is a column, the relY/absY attributes are undefined, and if it is a row, the relX/absX are undefined.

      We chose not to modify the synthesizer, so it still assumes a table is a tree. To pass unambiguity and conflict checks, we extended the specification to tell the synthesizer that external code defines certain attributes:

      class Col {
        phantom {
          childs.relY;
          childs.absY;
      
      class Row {
        phantom {
          childs.relX;
          childs.absX;
      

      The synthesizer can now assume that the external code provides definitions for a column's childs.relY and childs.absY and a row's childs.relX and childs.absX. Unimportant to the synthesizer, the definitions just happen to come from elsewhere in the same specification, such as class Row defining the phantom attributes not set by Column. If we wanted to further verify our specification, we could further specify that the assignments of a row and a column to a cell are disjoint sets that, together, hold the assignments needed for a cell, but this is unnecessary for code generation.

    A modest proposal

    Driven by various experiments in specifying tricky widgets, we have been planning several extensions in order to support a wider class of specifications and optimizations. Relevant to the issues presented in this report, we are examining a mechanism similar to CSS selectors or XPATH expressions in order for a node to more cleanly compute over non-local nodes.

    For example, a column should be able to alias its cells:

    class Col {
      children {
        cells : [ CellI ] = ../rows/childs[.column == self.colNum];
      ...
        cells[i].relX := ...
    

    This specification provides two important pieces of information. First, the synthesizer now knows that any computation over a column's cells depends on first being able to compute the result of the selector, which in turn requires having already computed cell attribute column and column attribute colNum. Second, the localized DAG structure of tables is exposed, so the code generator now knows to use a breadth-first traversal to topologically evaluate any table/rows/columns/cells region of a document.

    Demo: mixing tables with other elements

    The following (live) rendering uses several types of nodes, such as line wrapping boxes. Border color denotes node type:
    • Blue: <WrapBox>
    • Black: <Leaf>
    • Gray: <Cell>
    • Orange: <Cell rowspan="2">
    • Pink: <Cell colspan="2">
    • Green: <Row>
    • Red: <Column>
    • Light gray: <TableBox>
    • Light green: <HBox>
    • Purple: <VBox>

    Most nodes are by default shrink-to-fit and all can have their size overriden by an XML attribute:

    Document source

    The document is similar to standard HTML. Most attributes, such as the x and y position, are solved by the layout engine. We use CSS selectors to set some basic attributes:
    
    

    Other attributes are set as attributes directly in the XML:

    asdf
    

    Widget sources

    The widget is mostly specified in FTL. Note the (optional) use of scheduling constraints at the top and, to allow foreign functions to compute some values, phantom attribute sections for rows and columns.

    Monkey patch

    After the layout engine is loaded at runtime, we run the following monkey patches:
    1. The second time a table is visited, a monkey patched call to dynamically create implicit column cells.
    2. For when a column is visited and its cells are examined, swap the local getChildren function to instead find cells that are children of rows and have the right column number.
    3. Modify the global depth first visit order within tables to use breadth-first for top down visits in order to guarantee all rows and columns are visited before the cells. The reverse is used for bottom up traversals.

    Table ADT

    Basic list/array manipulation functions for the grid. They use mutation etc. internally, but implement a functional (non-destructive) interface.

    Generated layout engine

    The code here is automatically generated from the specification above; we did not modify it at all.

    It is fairly naive for the HTML5 backend. Of note: