-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deltaparsing #1197
base: master
Are you sure you want to change the base?
Fix deltaparsing #1197
Changes from all commits
bf2d455
7266396
3e93eba
58ca210
c72090b
0d898ec
9a55270
6bd2cea
a609679
a626f89
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,3 +1,6 @@ | ||||||
{-# LANGUAGE BangPatterns #-} | ||||||
{-# LANGUAGE CPP #-} | ||||||
{-# LANGUAGE DerivingStrategies #-} | ||||||
{-# LANGUAGE GeneralizedNewtypeDeriving #-} | ||||||
{-# LANGUAGE Rank2Types #-} | ||||||
-- | | ||||||
|
@@ -10,21 +13,37 @@ | |||||
-- | ||||||
|
||||||
module Pact.Types.Parser | ||||||
( | ||||||
( -- * Abstract Pact Parser | ||||||
PactParser(..) | ||||||
, symbols | ||||||
, style | ||||||
|
||||||
-- * Pact Attoparsec Parser | ||||||
, PactAttoparsec(..) | ||||||
, pactAttoParseOnly | ||||||
) | ||||||
where | ||||||
|
||||||
|
||||||
import Control.Applicative | ||||||
import Control.Monad | ||||||
import Text.Trifecta | ||||||
import Control.Monad.State | ||||||
|
||||||
import qualified Data.Attoparsec.Internal.Types as APT | ||||||
import qualified Data.Attoparsec.Text as AP | ||||||
import qualified Data.HashSet as HS | ||||||
import qualified Data.Text as T | ||||||
|
||||||
import Text.Parser.Token.Highlight | ||||||
import Text.Parser.Token.Style | ||||||
import Text.Trifecta | ||||||
import Text.Trifecta.Delta as TF | ||||||
|
||||||
-- -------------------------------------------------------------------------- -- | ||||||
-- | Abstract Pact Parser | ||||||
-- | ||||||
-- On-chain this is use for Attoparsec as parser backend. In the repl trifecta | ||||||
-- is used. | ||||||
-- | ||||||
newtype PactParser p a = PactParser { unPactParser :: p a } | ||||||
deriving (Functor, Applicative, Alternative, Monad, MonadPlus, Parsing, CharParsing, DeltaParsing) | ||||||
|
||||||
|
@@ -45,3 +64,87 @@ style = IdentifierStyle "atom" | |||||
(HS.fromList ["true","false"]) | ||||||
Symbol | ||||||
ReservedIdentifier | ||||||
|
||||||
-- -------------------------------------------------------------------------- -- | ||||||
-- Pact Attoparsec backend parser | ||||||
|
||||||
-- | A wrapper around Attoparsec that adds DeltaParsing | ||||||
-- | ||||||
newtype PactAttoparsec a = PactAttoparsec | ||||||
{ runPactAttoparsec :: StateT Int AP.Parser a } | ||||||
deriving (Functor, Applicative, Alternative, Monad, MonadPlus, MonadState Int) | ||||||
deriving (Parsing, TokenParsing) | ||||||
|
||||||
pactAttoParseOnly :: PactAttoparsec a -> T.Text -> Either String a | ||||||
pactAttoParseOnly = AP.parseOnly . flip evalStateT 0 . runPactAttoparsec | ||||||
|
||||||
-- | Atto DeltaParsing instance provides 'position' only (with no support for | ||||||
-- hidden chars like Trifecta). | ||||||
-- | ||||||
instance DeltaParsing PactAttoparsec where | ||||||
line = return mempty | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about something like this:
Suggested change
The error message is more informative, than giving an empty string. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. assuming it's actually not used ... |
||||||
|
||||||
-- Returns @Column NUMBER_OF_CHARS NUMBER_OF_BYTES@ | ||||||
-- | ||||||
-- Notes about legacy behavior: | ||||||
-- | ||||||
-- For text <2 it used to return for both values the number of 16-bit code | ||||||
-- units words. With the UTF-16 encoding used by text <2 this is the total | ||||||
-- number of characters plus number of characters that are encoded using two | ||||||
-- 16-bit code units. For instance the Ugaritic letter U+1038D would result | ||||||
-- in a positional length of two (two 16-bit code units). The string | ||||||
-- "a\0x263A\0x1038D" ("a☺𐎍") would have positional length 4 (3 characters | ||||||
-- plus one 2 16-bit character). | ||||||
-- | ||||||
-- In practice the old behavior was close enough to the number of characters | ||||||
-- that it went mostly unnoticed and didn't cause harm on chain. The code | ||||||
-- just assumed that it represented the number text characters. Those | ||||||
-- numbers appear on chain (up to some block height) within info objects and | ||||||
-- later still in failure messages. It is also relevant for extracting the | ||||||
-- module text from the pact transaction before storing it in the pact db. | ||||||
-- The presence of unicode characters can result in modules containing | ||||||
-- dangling data because there are less characters in the module than what | ||||||
-- is assumed based on the position information. | ||||||
-- | ||||||
-- For text >=2 the attoparsic position tracks just bytes and the internal | ||||||
-- representation of UTF-8. For instance the Ugaritic letter U+1038D results | ||||||
-- in a byte length of 4. The string "a\0x263A\0x1038D" ("a☺𐎍") has 8 bytes | ||||||
-- (1 code unit plus 3 code unit plus 4 code units). | ||||||
-- | ||||||
position = do | ||||||
#if MIN_VERSION_text(2,0,0) | ||||||
APT.Pos !bytePos <- parserPos | ||||||
!charPos <- gets fromIntegral | ||||||
#elif LEGACY_PARSER == 1 | ||||||
APT.Pos !bytePos <- parserPos | ||||||
let !charPos = fromIntegral bytePos | ||||||
#else | ||||||
APT.Pos !bytePos <- (* 2) <$> parserPos | ||||||
!charPos <- gets fromIntegral | ||||||
#endif | ||||||
return $ TF.Columns charPos (fromIntegral bytePos) | ||||||
{-# INLINE position #-} | ||||||
|
||||||
slicedWith f a = (`f` mempty) <$> a | ||||||
rend = return mempty | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to |
||||||
restOfLine = return mempty | ||||||
|
||||||
-- | retrieve pos from Attoparsec. | ||||||
-- | ||||||
-- The first parameter to the parser is the remaining available input, which | ||||||
-- isn't of any help here. | ||||||
-- | ||||||
parserPos :: PactAttoparsec APT.Pos | ||||||
parserPos = PactAttoparsec $ StateT $ \x -> | ||||||
APT.Parser $ \t !pos more _lose win -> win t pos more (pos, x) | ||||||
|
||||||
instance CharParsing PactAttoparsec where | ||||||
satisfy p = PactAttoparsec (satisfy p) <* modify' (+ 1) | ||||||
{-# INLINE satisfy #-} | ||||||
|
||||||
string s = PactAttoparsec (string s) <* modify' (+ length s) | ||||||
{-# INLINE string #-} | ||||||
|
||||||
text s = PactAttoparsec (text s) <* modify' (+ T.length s) | ||||||
{-# INLINE text #-} | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple thoughts
You don't need to set
=1
for binary flags, you can just define-DLEGACY_PARSER
.Then CPP if-blocks become like
or
The other thought is that it could be behind a flag:
Which would make toggling it on/off not require editing the cabal file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is all temporary for now. I tried to define the macro in
cabal.project.local
but cabal doesn't like that, so I had to put it into the cabal file.Regarding using 1/0, sometimes I like to be more explicit when dealing with macros (I find it easy to mess up with
#define
and#undef
). Yeah, if that macro would still be in the final version of the code, we would use a standard pattern with (automatic) cabal flag. But, hopefully, that won't be necessary.