Friday, April 18, 2014

Finding an interesting interface for simple io in Haskell

(This is going to be a long post, but there were a lot of wrong turns and bumps that I encountered that might be helpful to others, kind of a postmortem.
Comments on reddit
Turns out I ended up getting the Cont monad, have a lot to learn, thanks reddit user rampion!)

I had an annoying bug in my editor side project (link).  If you opened the same file twice then you would get two tabs with the same file.   The save button also saves all tabs so if I accidentally opened the same file twice, one of them could overwrite the other.

Should be an easy fix, unfortunately my data structure wasn't going to help:

data EditorWindow = EditorWindow { mainPane:: VPaned,
                                   _fileTreeStore :: TreeStore DirectoryEntry,
                                   _fileTreeView:: TreeView,
                                   notebook :: Notebook,
                                   _rootPath :: TVar (Maybe FilePath),                                  
                                   nextGuiId :: IORef (Int),
                                   sourceBuffers :: TVar ( IntMap.IntMap (String, SourceBuffer))

I'd have to write something to go over the sourceBuffers and match the strings.  Ugh.  Then I thought that it would be really nice to write queries over the gui state like jQuery.  Would be nice for the file system too.

Seemed like it should be possible, I'd been reading a lot of good things about core.logic, watched Adam Foltzer's Molog presentation, Phil Wadler's LINQ presentation.  Looked at some core.logic presentations, went over the "Essence of LINQ" paper, started looking at pieces of William Byrd's miniKanren dissertation, LogicT paper, and List monad stuff.

After all that I figured that I just needed something that did nested loop type stuff like the List monad or an SQL query.  And it would be really great if I could get it into do notation.  For directory reading I had to get the directory entries then after that figure out if they were files or directories.  It would be nice if it looked like this:

dirContents dirPath = do
    filePath <- selectM (directoryContentsRelation dirPath)
    fileNode <- selectM (directoryType filePath)
    return fileNode

The TLDR is that after much goose chasing the above actually works. Still haven't organized it, it's in here for now RelationalTest.hs

The Goose Chase

Didn't look like I could get it into a monad, but at least this Beyond Monads post showed me that you could chain a custom bind >>>= in a pretty readable way so that wasn't too bad.

Seemed like it would be easier to start with files. So something that "selected" from a file would loop over the lines, then collect the results. Since most of the operations would be appends, I stuck in a DList. So the file would provide a source of data, and it would call a consumer to add results to the DList.

type SeedConsumer b = Data.DList.DList b -> IO (Data.DList.DList b)
type IOAccumulator a b = a -> SeedConsumer b
type IOSource a b = Data.DList.DList b -> IOAccumulator a b-> IO (Data.DList.DList b)

readFileRelation :: IOSource String b
readFileRelation seed consumer = withFile "LICENSE" ReadMode (readFileStep seed consumer)

readFileStep :: Data.DList.DList b -> IOAccumulator String b -> Handle -> IO ( Data.DList.DList b )
readFileStep seed consumer handle = do
   isEof <- hIsEOF handle
   if isEof 
       then return $ seed
       else do 
                 line <- hGetLine handle
                 result <- consumer line seed
                 readFileStep result consumer handle

This looked pretty good, it could do query like things like filter out lines:

sat :: Data.DList.DList a -> Bool -> a -> Data.DList.DList a
sat seed True elem = Data.DList.snoc seed elem
sat seed False elem = seed

tests :: Test
tests = TestList [
        TestLabel "Should Query File" $ TestCase $ do
            fileRelations <- readFileRelation emptySeed (\ line seed  -> return (Data.DList.snoc seed line) ) 
            putStrLn $ showDlist $ fileRelations
            assertEqual "Queried file"  "a" "a"
        ,TestLabel "Should Query File" $ TestCase $ do
            fileRelations <- readFileRelation emptySeed (\ line seed  -> return $ (sat seed (length line <= 10) line) )  
            putStrLn $ showDlist $ fileRelations
            assertEqual "Queried file" "a" "a"

Then I added some convenience functions:

selectPipe :: IOSource a b -> IOAccumulator a b -> SeedConsumer b
selectPipe source accumulator = \seed  -> source seed accumulator

It was kind of annoying to pass through the seed all the time, and I was thinking that most of the time the consumer would not modify the seed, so I added some functions to manage the seed separately. This actually ended up side tracking my thinking a bit but made writing the code a bit easier.

type IOConsumer a b = a -> IO (Data.DList.DList b)

accum ::  IOConsumer a b -> IOAccumulator a b
accum consumer = \ input seed -> do
    result <- consumer input
    return $ Data.DList.append seed result

select :: IOSource a b -> IOConsumer a b -> SeedConsumer b
select source consumer = selectPipe source (accum consumer)

flatDirectoryContents :: DirectoryPath -> IOSource FileNode b
flatDirectoryContents dirPath seed consumer = 
    applySeed seed $ selectPipe (directoryContentsRelation dirPath) $ \ filePath ->
        selectPipe (directoryType filePath) $ consumer

fileTree :: DirectoryPath -> IO( Data.DList.DList FileTree )
fileTree dirPath = 
    applySeed emptySeed $ select (flatDirectoryContents dirPath) $ \fileNode -&gt
        if traversable fileNode
            then do 
                subTrees <- fileTree (fileNodePath fileNode)
                    return $ having True $ (Tree fileNode (Data.DList.toList subTrees)) 
            else return $ having True $ (Leaf fileNode) 

I played around with select and selectPipe a bit more and was starting to think that this was really close to a monad. And I really wanted the nice do syntax. So what were the elements of the monad? It looked like the source was a good thing:

type IOSource a b = Data.DList.DList b -> IOAccumulator a b-> IO (Data.DList.DList b)

But then I didn't know how to write >>=. Maybe I could use free monads. I pulled out operational, added the GADT extension, tried to make IOSource a command of the free monad.

The b parameter ended up being a problem. I tried hiding the b in the GADT but I couldn't get it to typecheck. It ended up with an error where some outside b1 looked like it was exactly b but it wasn't able to match them. I tried tweaking it a bunch more times but it wouldn't work.

Well I'm not the greatest person with types, I've actually taken a category theory course before (not really type systems but kind of similar). I even audited it again to try and get more out of it, but I have no intuition for it and can pretty much only crunch through the definitions manually.

So I went back to square one. I thought it over and over. Then I wondered why I was doing this on a Sunday. Then I wondered why I was spending all this brain time on a side project. Then I went for a walk to the store. Then an idea popped up.

The b type parameter never changed through all of the bind applications. Actually all the types with b never changed. So Data.DList.DList b and IO (Data.DList.DList b) were kind of constants when binding. So I really wanted IOSource to look more like:

type IOSource a b =  IOAccumulator a b-> Data.DList.DList b -> IO (Data.DList.DList b)

Which was actually:

type IOSource a b =  (a -> Data.DList.DList b -> IO (Data.DList.DList b)) -> Data.DList.DList b -> IO (Data.DList.DList b)

And then I really wanted the b to be applied first:

type IOSource b a =  (a -> Data.DList.DList b -> IO (Data.DList.DList b)) -> Data.DList.DList b -> IO (Data.DList.DList b)

Now, (IOSource b) looked like a monad, the b parameters didn't matter since they would be fixed from the monad point of view. Maybe I could write >>= now. I didn't really want to rewrite my IOSource code yet though, so since they were just synonyms I tried to make a type for this new thing that might be a monad.

data RelationMonad b a = RelationMonad { source ::  (a -> Data.DList.DList b -> IO (Data.DList.DList b)) -> Data.DList.DList b -> IO (Data.DList.DList b)  }

Now I had to fill in the instance functions:

instance Monad (RelationMonad b) where
    return x = ?
    relationMonad >>= f = ?

So what was return? Thinking of the file IOSource, return x would be kind of like a one line file. So it would just call the consumer with x.

return x = RelationMonad { source = \ accum -> accum x}

What about >>= ? Well what would f be in this case? It would take in an input from the source then produce another source...

But the >>= would have to produce a new source. It looked like the source was kind of a function with a hole in it, kind of like the "one hole context" idea.

And the hole is an accumulator, something that takes an input and adds it to the seed.

 So >>= would have to take a source1 with a hole, a function from the "output" (lines of the file) of source1 to a source2 then produce a new source3 with a hole with source2's "output" type.

Still pretty fuzzy, what are the types of these things? The b is now fixed so make it z to get it out of the way.

--Not exactly code

relationMonad :: (RelationMonad z) a

f :: a -> (RelationMonad z) b

source3 :: (RelationMonad z) b
source3 = relationMonad >>= f

source3 :: RelationMonad { source ::  (b -> Data.DList.DList z -> IO (Data.DList.DList z)) -> Data.DList.DList z -> IO (Data.DList.DList z)  }

Oh, so now I have to make a new source that takes in a hole thing of type (b -> Data.DList.DList z -> IO (Data.DList.DList z)) using relationMonad and f.

Well if I apply f to the output of relationMonad then I get a source like that, but I still have to run relationMonad.

I guess I have to make a new accumulator of type (a -> Data.DList.DList z -> IO (Data.DList.DList z)) then pass that to relationMonad.

But to make an accumulator from a source I have to fill the hole with another accumulator.

Oh, that's going to be the accumulator that is passed in to the new source3.

--Not exactly code

newSource :: a -> (RelationMonad z) b
newSource = \input -> (source (f input))

newAccumulator :: (a -> Data.DList.DList z -> IO (Data.DList.DList z))
newAccumulator :: (\input -> (source (f input)) accum)

relationMonadWithNewAccumulator ::  Data.DList.DList z -> IO (Data.DList.DList z)
relationMonadWithNewAccumulator = (source relationMonad) (\input -> (source (f input)) accum) 

source3 :: RelationMonad { source ::  \accum -> (source relationMonad) (\input -> (source (f input) accum)   }

Ok, does that work?

instance Monad (RelationMonad b) where
    return x = RelationMonad { source = \ accum -> accum x}
    relationMonad >>= f = RelationMonad { source =  \accum -> (source relationMonad) (\input -> (source (f input)) accum) }

It typechecks and it runs.

runMonad :: RelationMonad (Data.DList.DList b -> IO (Data.DList.DList b)) b -> IO (Data.DList.DList b)
runMonad relationMonad = ((source relationMonad) (\ input seed -> return $ Data.DList.snoc seed input)) Data.DList.empty

selectM :: IOSource a b -> RelationMonad (Data.DList.DList b -> IO (Data.DList.DList b)) a
selectM src = RelationMonad { source = (\accum seed -> src seed accum )}

dirContents dirPath = do
    filePath <- selectM (directoryContentsRelation dirPath)
    fileNode <- selectM (directoryType filePath)
    return fileNode

tests :: Test
tests = TestList [
     TestLabel "Monad"  $ TestCase $ do
            dirRelations <- runMonad $ dirContents (DirectoryPath "." "sandbox")
            putStrLn $ ""
            putStrLn $ showDlist $ dirRelations
            assertEqual "Queried DirMonad" "a" "a"

So this is pretty neat interface for reading files and looping over things. Any function that fits the signature will work, so it could loop over filenames then use those filenames to loop over GUI elements. Also can loop over file names, read the files, then collect the results in a list. Cool, wow almost didn't think it was going to work.

But there's still something kind of weird, we didn't really touch the (Data.DList.DList b -> IO (Data.DList.DList b)) part:

data RelationMonad b a = RelationMonad { source ::  (a -> Data.DList.DList b -> IO (Data.DList.DList b)) -> Data.DList.DList b -> IO (Data.DList.DList b)  }

does this work?

data RelationMonad b a = RelationMonad { source ::  (a -> b) -> b  }

Strangely it does. I don't know what this thing is now. Maybe Binder? It kind of does part of what >>= does in other monads.

EDIT: Turns out this is the Cont monad, guess I need to study more :)

Is it actually a monad? Well I always thought Gabriel Gonzalez's equational reasoning tutorial was pretty neat, so might as well try it out:

data Binder b a = Binder { source ::  (a -> b) -> b  }
 instance Monad (Binder b) where
    return x = Binder { source = \ accum -> accum x}
     relationMonad >>= f = Binder { source =  \accum -> (source relationMonad) (\input -> (source (f input)) accum) }

Monad laws
 Left identity:
 return a >>= f
 f a

 return a >>= f
 ≡ (apply definition of return)
 Binder { source = \ accum -> accum a} >>= f
 ≡ (apply definition of >>=)
 Binder { source =  \accum -> (source Binder { source = \ accum1 -> accum1 a}) (\input -> (source (f input)) accum) }
 ≡ (cancel source)
 Binder { source =  \accum -> ( \ accum1 -> accum1 a) (\input -> (source (f input)) accum) }
 ≡ (apply (\input -> (source (f input)) to \accum1... )
 Binder { source =  \accum -> ((\input -> (source (f input)) a)  accum) }
 ≡ (apply a to \input ...)
 Binder { source =  \accum -> (source (f a)) )  accum) }
 Binder { source =  source (f a) }
 f a

 Right identity:
 m >>= return


 m >>= return
 ≡ (apply definition of >>=)
 Binder { source =  \accum -> (source m) (\input -> (source (return input)) accum) }
 ≡ (apply definition of return)
 Binder { source =  \accum -> (source m) (\input -> (source (Binder { source = \ accum1 -> accum1 input})) accum) }
 ≡ (cancel source)
 Binder { source =  \accum -> (source m) (\input -> (\ accum1 -> accum1 input) accum) }
 ≡ (apply accum to \accum...)
 Binder { source =  \accum -> (source m) (\input ->  accum input ) }
 Binder { source =  \accum -> (source m) accum }
 Binder { source = source m } ≡ m

 (m >>= f) >>= g
 m >>= (\x -> f x >>= g)


 (m >>= f) >>= g
 ≡ (apply definition of >>=)
 Binder { source =  \accum -> (source m) (\input -> (source (f input)) accum) } >>= g
 ≡ (rename) 
Binder { source =  \accum1 -> (source m) (\input1 -> (source (f input1)) accum1) } >>= g
 ≡ (apply definition of >>=)
 Binder { source =  \accum -> (source RelationMonad { source =  \accum1 -> (source m) (\input1 -> (source (f input1)) accum1) }) (\input -> (source (g input)) accum) }
 ≡ (cancel source)
 Binder { source =  \accum -> (\accum1 -> (source m) (\input1 -> (source (f input1)) accum1)) (\input -> (source (g input)) accum) }
 ≡ (apply (\input -> (source (g input)) accum) to \accum1 ...)
 Binder { source =  \accum -> (source m) (\input1 -> (source (f input1)) (\input -> (source (g input)) accum))  }
 ≡ (rename)
 Binder { source =  \accum -> (source m) (\input ->   (source (f input)) (\input1 -> (source (g input1)) accum))  }

 Other side:
 m >>= (\x -> f x >>= g)
 ≡ (apply definition of >>=)
 m >>= ( \x ->  Binder { source =  \accum -> (source (f x)) (\input -> (source (g input)) accum) })
 ≡ (rename)
 m >>= ( \x ->  Binder { source =  \accum1 -> (source (f x)) (\input1 -> (source (g input1)) accum1) }) 
≡ (apply definition of >>=)
 Binder { source =  \accum -> (source m) (\input -> (source (( \x ->  Binder { source =  \accum1 -> (source (f x)) (\input1 -> (source (g input1)) accum1) }) input)) accum) }
 ≡ (apply input to \x ...)
 Binder { source =  \accum -> (source m) (\input -> (source ( Binder { source =  \accum1 -> (source (f input)) (\input1 -> (source (g input1)) accum1) } )) accum) }
 ≡ (cancel source)
 Binder { source =  \accum -> (source m) (\input ->  (\accum1 -> (source (f input)) (\input1 -> (source (g input1)) accum1)) accum) }
 ≡ (apply accum to \accum1)
 Binder { source =  \accum -> (source m) (\input ->   (source (f input)) (\input1 -> (source (g input1)) accum))  }

 (m >>= f) >>= g

Ok, so it's quite possible I made a mistake but it looks like it is a monad. I don't know what this monad really is though since I'm still just thinking of it in terms of sources and accumulators. Anyone have an idea?