maradydd: (fail)
[personal profile] maradydd
... and I'm just too much of an idiot to see it, so maybe one of you smart people can help me.

I'm writing a parser for a context-sensitive grammar in Haskell (for a language that I didn't design). At the moment I'm using Happy but I'm completely open to using parsec, uuagc, whatever will get the job done. Here's my problem:

All tokens of this language are bytes. One of the nonterminals has, as its right-hand side, a byte (which evaluates to an integer value -- I have an attribute rule for this) followed by N bytes, where N is whatever that first byte evaluated to. The usual attribute-grammar way to handle this seems to look a bit like this (using an inherited attribute, and OBTW this is pseudocode):

StringOfBytes : byte NBytes { $$.value = $2.value; $2.length = $1.value }

NBytes : {- empty -} { $$.value = []; unless $$.length == 0 error }
       | byte NBytes { $$.value = $1.value : $2.value; $2.length = $$.length - 1 }

Unfortunately, in most cases there's more content past the end of the StringOfBytes, and the way this rule is written, NBytes will happily keep shifting tokens (bytes) until it runs out of input and finally reduces. This is obviously bad.

If I had some way to force NBytes to reduce whenever the length of the nested NBytes reaches 0, that would solve it, but I don't see a way to impose a precedence on an attribute rule. But that's not even a very Haskelly way to do it; what I'd really like to do is more like:

StringOfBytes : byte TakeN { $$.value = $2.value; $2.length = $1.value }

TakeN : (take $$.length INPUTSTREAM) { $$ = $1 }

...because, I mean, that's what take is for, lazily grabbing N elements of a list. How to do this, though, is not immediately obvious. Having one of my token types be a partial function that I then somehow bind to the appropriate length value would also be cool, but again I don't see how to do this.

I'm thinking monadic parsing might be a solution, since monadic parsing lets you keep track of state and do various context-dependent things ... but I'm at a loss as to how to actually do this. I've pored through the sample code that ships with Happy, but lack sufficient clue to figure out what to do with it. If anyone can provide some guidance with this, I'd be extremely grateful.

(no subject)

Date: 2009-08-25 09:01 pm (UTC)
From: (Anonymous)
You can use parsec, it has an instance of ByteString.Lazy for its Stream typeclass (http://hackage.haskell.org/packages/archive/parsec/3.0.0/doc/html/Text-Parsec-ByteString-Lazy.html). If you never used parsec before, start reading rwh (http://book.realworldhaskell.org/read/using-parsec.html). You can also use a state monad transformer to achieve a more idiomatic style. Bye!

Profile

maradydd: (Default)
maradydd

September 2010

S M T W T F S
   1234
567891011
12131415 161718
19202122232425
26 27282930  

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags