Skip to content
Pascal Hof edited this page Dec 8, 2017 · 1 revision

Adding custom parsers

The module Language.Astview.Languages contains a list of all languages (and thus parsers) which are known to astview. You can append new languages right here. See now how to define a new language.

First of all we introduce the data type for parse errors. Since parsers return different amount of error information, we distinguish between three different types of parsers:

data Error
  = Err -- ^ no specific error information
  | ErrMessage String -- ^ plain error message
  | ErrLocation SrcSpan String -- ^ error message with position information

In order to extend astview with your own language you need to know the structure of the data type Language, which we use to represent languages and their parsers.

data Language = Language
  { name :: String
  , syntax :: String
  , exts :: [String]
  , parse :: String -> Either Error Ast
  }

The name is just a string for gui-issues, whereas the second attribute syntax is the name of the syntax highlighter, which should be associated with that language. If no syntax highlighting is desired [] works for you here. Astview uses the same syntax highlighting as gedit, so you might find the name of your language there.

The attribute exts defines a list of file extensions which should be associated with that language. When opening a file astview can automatically select a language based on the file extension. For perfect automatic parser selection it is reasonable for the file extensions of all languages known to astview not to overlap.

The parse function maps the input string either to an error value or to an abstract syntax tree. After an input string has been parsed, one has to transform the parsed tree into our internal representation type Ast (see documentation of Language.Astview.Language for details on Ast). The module Language.Astview.DataTree offers different type-generic functions for that purpose. The very basic one is the function dataToAstSimpl :: Data t => t -> Ast transforming an arbitrary value whose type implements class Data into our internal type Ast by just printing the constructors and storing them in a tree. In order to simplify the tree dataToAstSimpl represents Strings not as a list of Char, but as a single node in the tree.

Example: Adding Haskell support to astview

In this section we will introduce you to adding Haskell support to astview. We use the abstract syntax and parser from package haskell-src-exts. The name and the syntax highlighter are both the string "Haskell". Although we associate both classical Haskell files ".hs" and literate Haskell files ".lhs" with this language. The following code applies the parser to our file content and transforms the parsed value in the right context to fit with our data type Ast using dataToAstSimpl:

parsehs :: String -> Either Error Ast
parsehs s =
  case parse s :: ParseResult (Module SrcSpan) of
    ParseOk t                    -> Right $ data2AstSimpl t
    ParseFailed (SrcLoc _ l c) m ->
      Left $ ErrLocation (position l c) m

If the parse fails, the parser returns information about the incorrect source. We reuse this data to help the user of astview finding the faulty source positions.

Putting it all together we can now define a value of type Language in order to support Haskell sources in astview:

haskellexts :: Language
haskellexts = Language "Haskell" "Haskell" [".hs",".lhs"] parsehs

After appending haskellexts to the list of known languages languages in module Language.Astview.Languages and a reinstallation, astview can now display the abstract synax tree of Haskell files.

Adding custom parsers with source location support

In order to get astview to work this source locations, a bit more work has to be done. We now assume that the parser builds an abstract syntax tree annotated with source locations. The function dataToAstSimpl doesn't know which values in the tree are source locations.

Our type for source locations is defined in module Language.Astview.Language:

data SrcPos = SrcPos { line :: Int , column :: Int }
data SrcSpan =  SrcSpan { begin :: SrcPos , end :: SrcPos }

One should use the smart constructor functions span,position and linear to create source locations, since they apply validity checks.

Instead of the function dataToAstSimpl which does not support creation of source locations, we use

dataToAst :: (Data t) => (forall span.Data span => span -> Maybe SrcSpan)
                      -> (forall st . Typeable st => st -> Bool)
                      -> t -> Ast

which gets a source location selector as first argument. The given function will be automatically applied to all nodes of the tree to extract their source location. The target type is wrapped in Maybe since not every node of a tree has a associated source location. The second argument is a predicate for subtrees, which should not be displayed. After annotating the subtrees with their respective source location, one sometimes does not want the subtrees representing source locations to occur in the displayed tree. For that purpose one can hand over a predicate to dataToAst and all subtrees satisfying the predicate will not be displayed by astview.

In most of the cases one wants values of exactly one type to be removed from the tree. The function

 dataToAstIgnoreByExample :: (Data t,Typeable t,Typeable b,Data b)
         => (forall a . (Data a,Typeable a)  => a -> Maybe SrcSpan)
         -> b -> t -> Ast

works like dataToAst, but instead of a predicate one can define a value of an arbitrary type b and all values of type b will be removed from the displayed tree.

Example: Adding source location support for Haskell

We only need to change the function parsehs from our example above in order to add source location support to Haskell. Since we have to care with the type for source locations from haskell-src-exts and our internal type, we import the Haskell source locations in a qualified manner:

import qualified Language.Haskell.Exts.SrcLoc as HsSrcLoc

First of all we need to define a function, which returns the associated source location for an arbitrary node in the abstract syntax. Thank to the structure of the abstract syntax in hasskell-src-exts this can be done completely type-generic. The source location is always of type SrcSpan and can be found as the left-most subtree of a tree if existing. We use a zipper from package syz to go the left-most subtree and extract the source location information:

getSrcLoc :: Data t => t -> Maybe SrcSpan
getSrcLoc t = down' (toZipper t) >>= query (def `extQ` atSpan) where

  def :: a -> Maybe SrcSpan
  def _ = Nothing

  atSpan :: HsSrcLoc.SrcSpan -> Maybe SrcSpan
  atSpan (HsSrcLoc.SrcSpan _ c1 c2 c3 c4) = Just $ span c1 c2 c3 c4

To add the source location support, we now need to give getSrcLoc as an argument to the function dataToAst as a selector for source locations:

parsehs :: String -> Either Error Ast
parsehs s = case parse s :: ParseResult (Module HsSrcLoc.SrcSpan) of
  ParseOk t                             -> Right $ dataToAst getSrcLoc (const False) t
  ParseFailed (HsSrcLoc.SrcLoc _ l c) m -> Left $ ErrLocation (position l c) m

Using this version of parsehs as a parse function causes astview to support jumping between associated positions in source text and abstract syntax tree.

The resulting trees will now contain all the source location information as subtrees, which are already internally stored in the Ast. Since source locations are only metainformation to the subtrees and one can jump from subtrees to their respective position in the sources, it is not required to display source locations as nodes in the abstract syntax tree. One can simply remove source locations from the tree by using the function dataToAstIgnoreByExample, which causes all values of type SrcSpan to be discarded from the tree:

parsehs :: String -> Either Error Ast
parsehs s = case parse s :: ParseResult (Module HsSrcLoc.SrcSpan) of
  ParseOk t  -> Right $ dataToAstIgnoreByExample getSrcLoc
                                                 (undefined::HsSrcLoc.SrcSpan)
                                                 t
  ParseFailed (HsSrcLoc.SrcLoc _ l c) m -> Left $ ErrLocation (position l c) m