Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deltaparsing #1197

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion pact.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ flag tests-in-lib
manual: True

library

cpp-options: -DLEGACY_PARSER=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple thoughts

You don't need to set =1 for binary flags, you can just define -DLEGACY_PARSER.
Then CPP if-blocks become like

#ifdef LEGACY_PARSER
...
#else
...
#endif

or

#if someCondition
...
#elifdef LEGACY_PARSER
...
#endif

The other thought is that it could be behind a flag:

flag legacy-parser
  description: ...
  default: True
  manual: True
  
library
  if flag(legacy-parser)
    cpp-options: -DLEGACY_PARSER

Which would make toggling it on/off not require editing the cabal file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is all temporary for now. I tried to define the macro in cabal.project.local but cabal doesn't like that, so I had to put it into the cabal file.

Regarding using 1/0, sometimes I like to be more explicit when dealing with macros (I find it easy to mess up with #define and #undef). Yeah, if that macro would still be in the final version of the code, we would use a standard pattern with (automatic) cabal flag. But, hopefully, that won't be necessary.

-- common to all configurations:
hs-source-dirs: src
default-language: Haskell2010
Expand Down Expand Up @@ -382,6 +382,7 @@ test-suite hspec
RoundTripSpec
PrincipalSpec
SizeOfSpec
Test.Pact.Parse

if !impl(ghcjs)
other-modules:
Expand Down Expand Up @@ -447,6 +448,7 @@ test-suite hspec
, sbv
, servant-client
, temporary >=1.3
, trifecta
, yaml
, process
, posix-pty
30 changes: 19 additions & 11 deletions src/Pact/Parse.hs
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TupleSections #-}

-- |
-- Module : Pact.Compile
Expand Down Expand Up @@ -35,7 +36,6 @@ import Control.DeepSeq (NFData)
import Control.Lens (Wrapped(..))
import Control.Monad
import qualified Data.Aeson as A
import qualified Data.Attoparsec.Text as AP
import qualified Data.ByteString as BS
import Data.Char (digitToInt)
import Data.Decimal
Expand All @@ -55,6 +55,8 @@ import Pact.Types.Pretty (Pretty(..),viaShow)
import Pact.Types.Info
import Pact.Types.Term (ToTerm)

-- -------------------------------------------------------------------------- --
-- Expression Parser


-- | Main parser for Pact expressions.
Expand All @@ -63,7 +65,11 @@ expr = do
delt <- position
let inf = do
end <- position
#if LEGACY_PARSER == 1
let len = bytes end - bytes delt
#else
let len = column end - column delt
#endif
return $! Parsed delt (fromIntegral len)
separator t s = symbol t >> (ESeparator . SeparatorExp s <$> inf)
msum
Expand Down Expand Up @@ -141,7 +147,7 @@ newtype ParsedDecimal = ParsedDecimal Decimal

instance A.FromJSON ParsedDecimal where
parseJSON (A.String s) =
ParsedDecimal <$> case AP.parseOnly (unPactParser number) s of
ParsedDecimal <$> case pactAttoParseOnly (unPactParser number) s of
Right (LDecimal d) -> return d
Right (LInteger i) -> return (fromIntegral i)
_ -> fail $ "Failure parsing decimal string: " ++ show s
Expand All @@ -167,7 +173,7 @@ newtype ParsedInteger = ParsedInteger Integer

instance A.FromJSON ParsedInteger where
parseJSON (A.String s) =
ParsedInteger <$> case AP.parseOnly (unPactParser number) s of
ParsedInteger <$> case pactAttoParseOnly (unPactParser number) s of
Right (LInteger i) -> return i
_ -> fail $ "Failure parsing integer string: " ++ show s
parseJSON (A.Number n) = return $ ParsedInteger (round n)
Expand All @@ -181,16 +187,19 @@ instance A.ToJSON ParsedInteger where

instance Wrapped ParsedInteger

-- -------------------------------------------------------------------------- --
-- Top Level Parsers

-- | "Production" parser: atto, parse multiple exprs.
parseExprs :: Text -> Either String [Exp Parsed]
parseExprs = AP.parseOnly (unPactParser (whiteSpace *> exprs <* TF.eof))
parseExprs = pactAttoParseOnly (unPactParser (whiteSpace *> exprs <* TF.eof))
{-# INLINABLE parseExprs #-}

-- | Legacy version of "production" parser: atto, parse multiple exprs. This
-- parser does not force EOF and thus accepts trailing inputs that are not valid
-- pact code.
legacyParseExprs :: Text -> Either String [Exp Parsed]
legacyParseExprs = AP.parseOnly (unPactParser (whiteSpace *> exprs))
legacyParseExprs = pactAttoParseOnly (unPactParser (whiteSpace *> exprs))
{-# INLINABLE legacyParseExprs #-}

-- | ParsedCode version of 'parseExprs'
Expand All @@ -210,7 +219,6 @@ _parseF p fp = do
let s = unpack $ decodeUtf8 bs
fmap (,s) <$> TF.parseFromFileEx p fp


_parseS :: String -> TF.Result [Exp Parsed]
_parseS = TF.parseString exprsOnly mempty

Expand Down
21 changes: 13 additions & 8 deletions src/Pact/Types/ExpParser.hs
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ViewPatterns #-}

-- megaparsec <9.3 backard compatiblity
Expand Down Expand Up @@ -126,8 +127,12 @@ mkTextInfo :: T.Text -> MkInfo
mkTextInfo s d = Info $ Just (Code code, d)
where
code = T.take len $ T.drop offset s
offset = fromIntegral $ TF.bytes d
len = _pLength d
#if LEGACY_PARSER == 1
offset = fromIntegral $ TF.bytes (_pDelta d)
#else
offset = fromIntegral $ TF.column (_pDelta d)
#endif

type ExpParse s a = ReaderT ParseEnv (StateT (ParseState s) (Parsec Void Cursor)) a

Expand Down
3 changes: 1 addition & 2 deletions src/Pact/Types/Info.hs
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ instance Arbitrary Parsed where
, Directed <$> genFilename <*> genPositiveInt64 <*> genPositiveInt64 <*> genPositiveInt64 <*> genPositiveInt64 ]
instance NFData Parsed
instance Default Parsed where def = Parsed mempty 0
instance HasBytes Parsed where bytes = bytes . _pDelta
instance Pretty Parsed where pretty = pretty . _pDelta


Expand Down Expand Up @@ -103,7 +102,7 @@ instance Default Info where def = Info Nothing
instance SizeOf Info where
sizeOf _ _ = 0

-- make an Info that refers to the indicated text
-- | Make an Info that refers to the indicated text
mkInfo :: Text -> Info
mkInfo t = Info $ Just (Code t,Parsed delt len)
where len = T.length t
Expand Down
19 changes: 0 additions & 19 deletions src/Pact/Types/Orphans.hs
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
-- |
-- Module : Pact.Types.Orphans
Expand All @@ -19,10 +17,7 @@ module Pact.Types.Orphans where
import Data.Serialize
import Data.Decimal
import qualified Data.Aeson as A
import Text.Trifecta.Combinators (DeltaParsing(..))
import Text.Trifecta.Delta
import qualified Data.Attoparsec.Text as AP
import qualified Data.Attoparsec.Internal.Types as APT
import Data.Text (Text)
import Data.Text.Encoding
import Pact.Time.Internal (NominalDiffTime(..), UTCTime(..))
Expand Down Expand Up @@ -53,20 +48,6 @@ instance Serialize A.Value where

instance NFData Delta


-- | Atto DeltaParsing instance provides 'position' only (with no support for
-- hidden chars like Trifecta).
instance DeltaParsing AP.Parser where
line = return mempty
position = attoPos >>= \(APT.Pos p) -> let p' = fromIntegral p in return $ Columns p' p' -- p p
slicedWith f a = (`f` mempty) <$> a
rend = return mempty
restOfLine = return mempty

-- | retrieve pos from Attoparsec.
attoPos :: APT.Parser n APT.Pos
attoPos = APT.Parser $ \t pos more _lose win -> win t pos more pos

instance Default Text where def = ""
instance Serialize Text where
put = put . encodeUtf8
Expand Down
109 changes: 106 additions & 3 deletions src/Pact/Types/Parser.hs
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE Rank2Types #-}
-- |
Expand All @@ -10,21 +13,37 @@
--

module Pact.Types.Parser
(
( -- * Abstract Pact Parser
PactParser(..)
, symbols
, style

-- * Pact Attoparsec Parser
, PactAttoparsec(..)
, pactAttoParseOnly
)
where


import Control.Applicative
import Control.Monad
import Text.Trifecta
import Control.Monad.State

import qualified Data.Attoparsec.Internal.Types as APT
import qualified Data.Attoparsec.Text as AP
import qualified Data.HashSet as HS
import qualified Data.Text as T

import Text.Parser.Token.Highlight
import Text.Parser.Token.Style
import Text.Trifecta
import Text.Trifecta.Delta as TF

-- -------------------------------------------------------------------------- --
-- | Abstract Pact Parser
--
-- On-chain this is use for Attoparsec as parser backend. In the repl trifecta
-- is used.
--
newtype PactParser p a = PactParser { unPactParser :: p a }
deriving (Functor, Applicative, Alternative, Monad, MonadPlus, Parsing, CharParsing, DeltaParsing)

Expand All @@ -45,3 +64,87 @@ style = IdentifierStyle "atom"
(HS.fromList ["true","false"])
Symbol
ReservedIdentifier

-- -------------------------------------------------------------------------- --
-- Pact Attoparsec backend parser

-- | A wrapper around Attoparsec that adds DeltaParsing
--
newtype PactAttoparsec a = PactAttoparsec
{ runPactAttoparsec :: StateT Int AP.Parser a }
deriving (Functor, Applicative, Alternative, Monad, MonadPlus, MonadState Int)
deriving (Parsing, TokenParsing)

pactAttoParseOnly :: PactAttoparsec a -> T.Text -> Either String a
pactAttoParseOnly = AP.parseOnly . flip evalStateT 0 . runPactAttoparsec

-- | Atto DeltaParsing instance provides 'position' only (with no support for
-- hidden chars like Trifecta).
--
instance DeltaParsing PactAttoparsec where
line = return mempty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this:

Suggested change
line = return mempty
line = PactAttoparsec (fail "DeltaParsing.line is not supported for PactAttoparsec")

The error message is more informative, than giving an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming it's actually not used ...


-- Returns @Column NUMBER_OF_CHARS NUMBER_OF_BYTES@
--
-- Notes about legacy behavior:
--
-- For text <2 it used to return for both values the number of 16-bit code
-- units words. With the UTF-16 encoding used by text <2 this is the total
-- number of characters plus number of characters that are encoded using two
-- 16-bit code units. For instance the Ugaritic letter U+1038D would result
-- in a positional length of two (two 16-bit code units). The string
-- "a\0x263A\0x1038D" ("a☺𐎍") would have positional length 4 (3 characters
-- plus one 2 16-bit character).
--
-- In practice the old behavior was close enough to the number of characters
-- that it went mostly unnoticed and didn't cause harm on chain. The code
-- just assumed that it represented the number text characters. Those
-- numbers appear on chain (up to some block height) within info objects and
-- later still in failure messages. It is also relevant for extracting the
-- module text from the pact transaction before storing it in the pact db.
-- The presence of unicode characters can result in modules containing
-- dangling data because there are less characters in the module than what
-- is assumed based on the position information.
--
-- For text >=2 the attoparsic position tracks just bytes and the internal
-- representation of UTF-8. For instance the Ugaritic letter U+1038D results
-- in a byte length of 4. The string "a\0x263A\0x1038D" ("a☺𐎍") has 8 bytes
-- (1 code unit plus 3 code unit plus 4 code units).
--
position = do
#if MIN_VERSION_text(2,0,0)
APT.Pos !bytePos <- parserPos
!charPos <- gets fromIntegral
#elif LEGACY_PARSER == 1
APT.Pos !bytePos <- parserPos
let !charPos = fromIntegral bytePos
#else
APT.Pos !bytePos <- (* 2) <$> parserPos
!charPos <- gets fromIntegral
#endif
return $ TF.Columns charPos (fromIntegral bytePos)
{-# INLINE position #-}

slicedWith f a = (`f` mempty) <$> a
rend = return mempty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to line, I think rend and restOfLine, if not used, should return informative error messages

restOfLine = return mempty

-- | retrieve pos from Attoparsec.
--
-- The first parameter to the parser is the remaining available input, which
-- isn't of any help here.
--
parserPos :: PactAttoparsec APT.Pos
parserPos = PactAttoparsec $ StateT $ \x ->
APT.Parser $ \t !pos more _lose win -> win t pos more (pos, x)

instance CharParsing PactAttoparsec where
satisfy p = PactAttoparsec (satisfy p) <* modify' (+ 1)
{-# INLINE satisfy #-}

string s = PactAttoparsec (string s) <* modify' (+ length s)
{-# INLINE string #-}

text s = PactAttoparsec (text s) <* modify' (+ T.length s)
{-# INLINE text #-}

2 changes: 2 additions & 0 deletions tests/PactTests.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import qualified KeysetSpec
import qualified RoundTripSpec
import qualified PrincipalSpec
import qualified SizeOfSpec
import qualified Test.Pact.Parse

#ifndef ghcjs_HOST_OS
import qualified PactTestsSpec
Expand Down Expand Up @@ -40,6 +41,7 @@ main = hspec $ parallel $ do
describe "RoundTripSpec" RoundTripSpec.spec
describe "PrincipalSpec" PrincipalSpec.spec
describe "SizeOfSpec" SizeOfSpec.spec
describe "Test.Pact.Parse" Test.Pact.Parse.spec

#ifndef ghcjs_HOST_OS

Expand Down
Loading