Conflicting Module Names

See a typo? Have a suggestion? Edit this page on Github

It's the oldest open issue on the Stackage repo, and a topic I've discussed more times than I can remember over the years. Hackage enforces that package names are unique (so that no one else can claim the name conduit, for instance), but does nothing to ensure unique module names (so someone else could write a package named my-conduit with a module named Data.Conduit).

For the record, I think Hackage's position here is not only a good one, but the only logical one it could have made. I'm not even hinting at wanting to change that. Please don't read this blog post in that way at all.

Usually, conflicting module names do not negatively affect us, at least when working on project code with a proper .cabal file. In my made-up example above, I would explicitly state that I depend on conduit and not list my-conduit, and when my code imports Data.Conduit, Stack+Cabal+GHC can all work together to ensure that the correct module is used.

EDIT Since I've already written some of the code for stackage-curator to detect this, I generated a list of all conflicting module names to give an idea of what we're looking at.

The problem

(If you're already convinced that conflicting module names are a problem, you may want to skip straight to "the solution." This section is fairly long and detailed.)

Unfortunately, there are still some downsides to having the same module name appear in different packages:

Documentation Suppose I'm reading a tutorial that includes the line import Control.Monad.Reader. I look at the Stackage doc list by module and discover:

If I'm not familiar with the Haskell ecosystem, I'm unlikely to know that mtl is far more popular than monads-tf and choose the latter.
runghc/ghci We're not always working on project code. Sometimes we're just writing a script. Sometimes we're playing with an idea in GHCi. What if I import System.FilePath.Glob in a GHCi prompt when I have both the filemanip and Glob packages installed?
doctests Similar to the previous point: even when you run doctests from inside the context of a project, they don't typically know which packages can be used, and conflicting module names can cause the tests to fail. What's especially bad about this is that an unrelated action (like running stack build async-dejafu) can suddenly make your tests start to fail when they previously succeeded.
Custom Setup.hs Suppose you're writing a cabal package that uses a custom Setup.hs file and imports some additional modules. To pick a concrete example that just happened: the executable-hash package has a Setup.hs file which - indirectly - imports Crypto.Hash.SHA1. And there's an explicit dependency on cryptohash in the .cabal file, which one may naively infer means we're safe. However, when uuid-1.3.13 moved from cryptonite to a few other packages (including cryptohash-sha1), building executable-hash when uuid was already installed became a build error. And like the previous point, this is essentially a non-deterministic race condition.

Since I was a backup maintainer for executable-hash, I implemented two fixes: adding an explicit PackageImport and using the new custom-setup feature in Cabal-1.24. While custom-setup is definitely the way to go with this, and it's a great addition to Cabal, not everyone is using the newest version of Cabal, Stack is only just now adding support for this, and not all packages will update to support this immediately.
Better tooling It would be great if tooling could automatically determine which packages to install based on the imports list, to avoid the need for a lot of manual and redundant statements of dependencies. We're considering doing this in the upcoming stack script command. But how will Stack know which Control.Monad.Reader to use?

The solution

While we know that we can't have fully unique module names without a lot of buy-in from package authors, we can get pretty close, with canonical locations for a module. We've already implemented this to some extent in Stackage to resolve problem (3) listed above. We now have the ability to list some packages as hidden in a Stackage snapshot. This means that, after installing the package, the Stackage build system will hide the package, so that its modules won't be available for import. By adding async-dejafu to the hidden list, the warp doctest suite no longer has the ambiguity issue when running.

After dealing with the cryptohash-sha1 fallout earlier this week, I realized that this solution can generalize to solve a large swath of the problems described above. Here's how I see it working:

We introduce a new constraint in the Stackage build process: every module name must be present in only one exposed (that is, non-hidden) package.
When stack build registers a package, it automatically hides it if the snapshot lists it as hidden.
On the stackage.org module list, modules from a hidden package are explicitly marked as hidden (or, if we want to be more extreme, we just hide them entirely).
With the upcoming stack script command, when finding a package for a given imported module, we only pay attention to non-hidden modules.

This doesn't fully solve the problems above. For example, if a user just Googles Control.Monad.Reader, they'll still possibly get confusing documentation. But I think this is a huge step in the right direction.