The Worst Function in Conduit

See a typo? Have a suggestion? Edit this page on Github

This blog post addresses a long-standing FIXME in the conduit-combinators documentation, as well as a question on Twitter. This blog post will assume familiarity with the Conduit streaming data library; if you'd like to read up on it first, please check out the tutorial. The full executable snippet is at the end of this blog post, but we'll build up intermediate bits along the way. First, the Stack script header, import statement, and some minor helper functions.

#!/usr/bin/env stack
--stack --resolver lts-8.12 script
import Conduit

src10 :: Monad m => ConduitM i Int m ()
src10 = yieldMany [1..10]

remaining :: MonadIO m => ConduitM i o m ()
remaining = lengthC >>= \x -> liftIO (putStrLn ("Remaining: " ++ show x))

src10 just provides the numbers 1 through 10 as a source, and remaining tells you how many values are remaining from upstream. Cool.

Now let's pretend that the Conduit libraries completely forgot to provide a drop function. That is, a function that will take an Int and discard that many values from the upstream. We could write one ourselves pretty easily:

dropSink :: Monad m => Int -> ConduitM i o m ()
dropSink cnt
  | cnt <= 0 = return ()
  | otherwise = await >> dropSink (cnt - 1)

(Bonus points to readers: this function is inefficient in the case that upstream has less than cnt values, optimize it.)

This function will drop a certain number of elements from upstream, so the next component we monadically bind with can pick it up. Let's see how that looks:

goodDropSink :: IO ()
goodDropSink = runConduit
             $ src10
            .| (dropSink 5 >> remaining)

All well and good. But notice two things:

I called this dropSink. Why sink?
I stressed that we had to monadically bind. Why?

Well, there's another formulation of this drop function. Instead of letting the next monadically bound component pick up remaining values, we could pass the remaining values downstream. Fortunately it's really easy to implement this function in terms of dropSink:

dropTrans :: Monad m => Int -> ConduitM i i m ()
dropTrans cnt = dropSink cnt >> mapC id

(For more meaningless bonus points, feel free to implement this without dropSink, or for a greater challenge, implement dropSink in terms of dropTrans.) Anyway, this function can be used easily as:

goodDropTrans :: IO ()
goodDropTrans = runConduit
              $ src10
             .| dropTrans 5
             .| remaining

Many may argue that this is more natural. To some extent, it mirrors the behavior of take more closely, as take passes the initial values downstream. On the other hand, dropTrans cannot guarantee that the values will be removed from the stream; if instead of dropTrans 5 .| remaining I simply did dropTrans 5 .| return (), then the dropTrans would never have a chance to fire, since execution is driven from downstream. Also, as demonstrated, it's really easy to capture this transformer behavior from the sink behavior; the other way is trickier.

My point here is that we have two legitimate definitions of a function. And from my experience, different people expect different behavior for the function. In fact, some people (myself included) intuitively expect different behavior depending on the circumstance! This is what earns drop the title of worst function in conduit.

To make it even more clear how bad this is, let's see how you can misuse these functions unintentionally.

badDropSink :: IO ()
badDropSink = runConduit
            $ src10
           .| dropSink 5
           .| remaining

This code looks perfectly reasonable, and if we just replaced dropSink with dropTrans, it would be correct. But instead of saying, as expected, that we have 5 values remaining, this will print 0. The reason: src10 yields 10 values to dropSink. dropSink drops 5 of those and leaves the remaining 5 untouched. But dropSink never itself yields a value downstream, so remaining receives nothing.

Because of the type system, it's slightly trickier to misuse dropTrans. Let's first do the naive thing of just assuming it's dropSink:

badDropTrans :: IO ()
badDropTrans = runConduit
             $ src10
            .| (dropTrans 5 >> remaining)

GHC does not like this one bit:

error:
    • Couldn't match type ‘Int’ with ‘Data.Void.Void’
      Expected type: ConduitM () Data.Void.Void IO ()
        Actual type: ConduitM () Int IO ()

The problem is that runConduit expects a pipeline where the final output value is Void. However, dropTrans has an output value of type Int. And if it's yielding Ints, so must remaining. This is definitely an argument in favor of dropTrans being the better function: the type system helps us a bit. (It's also an argument in favor of keeping the type signature of runConduit as-is.)

However, it's still possible to accidentally screw things up in bigger pipelines, e.g.:

badDropTrans :: IO ()
badDropTrans = runConduit
             $ src10
            .| (dropTrans 5 >> remaining)
            .| (sinkList >>= liftIO . print)

This code may look a bit contrived, but in real-world Conduit code it's not at all uncommon to deeply nest these components in such a way that the error would not be present. You may be surprised to hear that the output of this program is:

Remaining: 0
[6,7,8,9,10]

The reason is that the sinkList is downstream from dropTrans, and grabs all of its output. dropTrans itself will drain all output from src10, leaving nothing behind for remaining to grab.

The Conduit libraries use the dropSink variety of function. I wish there was a better approach here that felt more intuitive to everyone. The closest I can think of to that is deprecating drop and replacing it with more explicitly named dropSink and dropTrans, but I'm not sure how I feel about that (feedback welcome, and other ideas certainly welcome).

Full code

#!/usr/bin/env stack
--stack --resolver lts-8.12 script
import Conduit

dropSink :: Monad m => Int -> ConduitM i o m ()
dropSink cnt
  | cnt <= 0 = return ()
  | otherwise = await >> dropSink (cnt - 1)

dropTrans :: Monad m => Int -> ConduitM i i m ()
dropTrans cnt = dropSink cnt >> mapC id

src10 :: Monad m => ConduitM i Int m ()
src10 = yieldMany [1..10]

remaining :: MonadIO m => ConduitM i o m ()
remaining = lengthC >>= \x -> liftIO (putStrLn ("Remaining: " ++ show x))

goodDropSink :: IO ()
goodDropSink = runConduit
             $ src10
            .| (dropSink 5 >> remaining)

badDropSink :: IO ()
badDropSink = runConduit
            $ src10
           .| dropSink 5
           .| remaining

goodDropTrans :: IO ()
goodDropTrans = runConduit
              $ src10
             .| dropTrans 5
             .| remaining

badDropTrans :: IO ()
badDropTrans = runConduit
             $ src10
            .| (dropTrans 5 >> remaining)
            .| (sinkList >>= liftIO . print)

main :: IO ()
main = do
  putStrLn "Good drop sink"
  goodDropSink
  putStrLn "Bad drop sink"
  badDropSink
  putStrLn "Good drop trans"
  goodDropTrans
  putStrLn "Bad drop trans"
  badDropTrans

Full output

Good drop sink
Remaining: 5
Bad drop sink
Remaining: 0
Good drop trans
Remaining: 5
Bad drop trans
Remaining: 0
[6,7,8,9,10]