Go import cycles: three strategies for how to deal with them, and a plea for a fourth
Everybody has their pet issues with their daily driver language. For Gophers, a lot of us complain about the verbosity of error handling. Me, I don't mind that so much.
No, for me, my biggest complaint about the language is import cycles, which Go does not allow. I hate them.
We're using Go to write Dolt, the world's first and only version-controlled SQL database. It's a large and complex code base actively developed by a team of over a dozen, and so we have had to solve the problem of import cycles while working on it many, many times over.
Import cycles are a huge pain to deal with, but at this point I've done it enough times that I have developed reliable strategies for doing so. I have three distinct strategies, which I think of as The Good, The Bad, and The Ugly.
Let's look at a simple import cycle and show how each of these three strategies can solve it.
Experimental setup: let's generate an import cycle
In the simplest case, you can generate an import cycle with just two packages that refer to one another. We have two packages, and two files:
% tree
.
├── a
│ ├── b
│ │ └── b.go
│ └── c
│ └── c.go
├── go.mod
└── main.go
Their contents are very simple:
// b.go
package b
import (
"fmt"
"importcycles/a/c"
)
func VitalWork() {
c.SubTask()
fmt.Println("VitalWork now done")
}
func UtilityFunction() {
fmt.Println("UtilityFunction")
}
// c.go
package c
import (
"fmt"
)
func SubTask() {
fmt.Println("SubTask")
}
// main.go
package main
import "importcycles/a/b"
func main() {
b.VitalWork()
}
This all works fine, and when we run main
, it prints out the expected result.
SubTask
VitalWork now done
All well and good. But now let's try to use the UtilityFunction
defined in package b
from
package c
.
func SubTask() {
b.UtilityFunction()
fmt.Println("SubTask")
}
Now when we run main, we get an error.
package importcycles
imports importcycles/a/b from main.go
imports importcycles/a/c from b.go
imports importcycles/a/b from c.go: import cycle not allowed
Please note: this is a toy example for clarity. In the real world, one rarely enounters cycles that are this simple. In my own experience, a dependency cycle typically involves at least 5 packages in the cycle path, sometimes many more.
Let's look at how we get ourselves out of this mess.
The Good: introduce an interface and push it down
Packages b
and c
need functionality defined by each other. How can we get around this? In most
code bases, most of the time, the actual functionality they need can and should be encapsulated by
interfaces. Let's rewrite our sample code to illustrate this.
// a/b.go
package b
import (
"fmt"
"importcycles/a/c"
)
type B struct {}
func (b B) VitalWork(c c.C) {
c.SubTask(b)
fmt.Println("VitalWork now done")
}
func (b B) UtilityFunction() {
fmt.Println("UtilityFunction")
}
// a/c.go
package c
import (
"fmt"
"importcycles/a/b"
)
type C struct{}
func (c C) SubTask(b b.B) {
b.UtilityFunction()
fmt.Println("SubTask")
}
// main.go
package main
import (
"importcycles/a/b"
"importcycles/a/c"
)
func main() {
b := b.B{}
c := c.C{}
b.VitalWork(c)
}
Writing it this way reveals the cycle more directly: b.VitalWork()
needs a parameter of type
c.C
, and c.SubTask()
needs a parameter of type b.B
.
To resolve the cycle, we follow a two-step process:
- Declare a new interface for the work required at the lowest level. In this case, that means a
type with a method
UtilityFunction()
. Then replace the reference tob.B
with a reference to this new interface. - Move that interface to lowest level of the package structure necessary. Sometimes this can be a
little complicated, but in our example it's easy: it can live in the
c
package.
Now our code looks like this:
// a/b.go
package b
import (
"fmt"
"importcycles/a/c"
)
type B struct {}
func (b B) VitalWork(c c.C) {
c.SubTask(b)
fmt.Println("VitalWork now done")
}
func (b B) UtilityFunction() {
fmt.Println("UtilityFunction")
}
// a/c.go
package c
import (
"fmt"
)
type C struct{}
type UtilityProvider interface {
UtilityFunction()
}
func (c C) SubTask(u UtilityProvider) {
u.UtilityFunction()
fmt.Println("SubTask")
}
// main.go
package main
import (
"importcycles/a/b"
"importcycles/a/c"
)
func main() {
b := b.B{}
c := c.C{}
b.VitalWork(c)
}
In case you don't want to read all that, here's the key part, in a/c.go
:
type UtilityProvider interface {
UtilityFunction()
}
func (c C) SubTask(u UtilityProvider) {
u.UtilityFunction()
fmt.Println("SubTask")
}
We've encapsulated the necessary functionality in the b
package into a new interface called
UtilityProvider
and used it in place of the old b.B
parameter. This eliminates the import from
c
to b
, breaking the cycle.
Now everything works, and our code runs as expected.
UtilityFunction
SubTask
VitalWork now done
I spend a lot of time complaining about Go, so it's worth pausing here to mention that the above flexibility with interfaces, with all the guarantees of static typing, is something I really love about Go. If dealing with dependency cycles is the thing I hate most, the ability to cheaply declare interfaces to solve narrow problems like this is probably the thing I love most.
So why do I call this solution the Good one? A couple reasons:
- It's narrow in scope. I didn't have to restructure all my packages and their relationships to eliminate the cycle, I was able to surgically target just what was necessary.
- It's tractable in large code bases. Interfaces are cheap and easy to declare and use, and the call site doesn't need to know that it's now passing an interface that it happens to implement instead of a struct type. I didn't need to change the call site at all, just the leaf-level function declaration.
In short: I favor this strategy and use it frequently because it's easy to apply and doesn't require doing anything sprawling or gross or dangerous.
The Bad: restructure your packages to eliminate the cycle
This is what Go wants you to do by disallowing import cycles in the first place: restructure your code so there isn't a cycle anymore. Sounds easy right?
In the simplest case, this means moving the functions that cause the cycle into a new package, one
that's accessible from both locations. In our original toy example (without methods on types), that
means moving UtilityFunction
into a new package a
.
// a/a.go
package a
import "fmt"
func UtilityFunction() {
fmt.Println("UtilityFunction")
}
// a/b.go
package b
import (
"fmt"
"importcycles/a/c"
)
func VitalWork() {
c.SubTask()
fmt.Println("VitalWork now done")
}
// a/c.go
package c
import (
"fmt"
"importcycles/a"
)
func SubTask() {
a.UtilityFunction()
fmt.Println("SubTask")
}
// main.go
package main
import "importcycles/a/b"
func main() {
b.VitalWork()
}
Now everything works, and our code runs as expected.
UtilityFunction
SubTask
VitalWork now done
Note that although we moved the function up the directory tree to a
, we could have put it
anywhere, including a new subdirectory under b
, e.g. a/b/d
. Go packages need to be confined to a
directory, but Go's compiler flattens any directory structure, so even though our new package is
still "inside" package b
in terms of the file system, package d
is not considered part of b
,
and thus there is no longer an import cycle.
So why do I call this solution the Bad? Because it's a giant pain in real code bases. This toy example is simple because we're using it to illustrate how these strategies work as clearly as possible. But in the real world, in a large code base, once you start pulling on a thread to begin untangling a cycle in this manner, you could be at it for days. Ask me how I know.
Basically: this strategy requires you stop what you're doing and shave a yak instead. It could actually be way more than one yak, it could be an entire herd, and they could have eczema. Go's default answer to the problem of cycles is "lol just restructure your entire code base bro, what's the big deal?"
Again: it doesn't seem like such a big deal in our toy example, but it's a huge, nearly intractable problem in actual production code bases. Software simply doesn't evolve along right angles in clean lines, it accretes functionality and changes purposes over time such that your originally very thoughtful package structure is eventually guaranteed to become a mockery of its original intent. It's rarely possible to surgically relocate just the offending piece of code -- you have to deal with unexported methods, other dependencies within the package, etc. And much of the time, attempting such a surgical move introduces entirely new cycles, and the process must begin again.
So yes: while this is the "correct" solution, it's so expensive and difficult in practice as to be hard to recommend to anyone but a true die-hard. The last time I tangled with this strategy in the large it took me 64 commits and 24,000 lines across 467 files. That's why it's Bad. It's best attempted as a separate independent refactoring on your own schedule, not on the way to implementing new functionality while a customer waits on your fix.
The Ugly: introduce a function pointer and assign it at a higher level
Now we come to a truly ugly hack that I'm going to put out there because everyone needs a lifeline when production code needs to be fixed fast, although you should understand immediately why I call this solution Ugly and why you shouldn't reach for it as your first recourse. But you might, in anger, and I won't judge you.
Introduce a layer of abstraction in the form of a global function pointer, and call that instead. Then assign that global function pointer in a higher-level package before you run the code.
// a/b.go
package b
import (
"fmt"
"importcycles/a/c"
)
func VitalWork() {
c.SubTask()
fmt.Println("VitalWork now done")
}
func UtilityFunction() {
fmt.Println("UtilityFunction")
}
// a/c.go
package c
import (
"fmt"
)
var UtilityFunction func()
func SubTask() {
UtilityFunction()
fmt.Println("SubTask")
}
// main.go
package main
import (
"importcycles/a/b"
"importcycles/a/c"
)
func main() {
c.UtilityFunction = b.UtilityFunction
b.VitalWork()
}
The magic and the misery happens in the main
function, called out here in case you don't want to
read all that.
func main() {
c.UtilityFunction = b.UtilityFunction
b.VitalWork()
}
You could also do this in an init
block, or however else you tend to configure global data
structures in your application. But what happens when you omit this magic function assignment line?
Well, a panic obviously.
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x53a915]
goroutine 1 [running]:
importcycles/a/c.SubTask()
C:/Users/zachmu/liquidata/go-workspace/src/github.com/dolthub/dolt/importcycles/a/c/c.go:24 +0x15
importcycles/a/b.VitalWork()
C:/Users/zachmu/liquidata/go-workspace/src/github.com/dolthub/dolt/importcycles/a/b/b.go:23 +0x13
main.main()
C:/Users/zachmu/liquidata/go-workspace/src/github.com/dolthub/dolt/importcycles/main.go:23 +0xf
This will also happen if I'm trying to write unit tests of c.SubTask()
and I don't remember to
assign the function pointer first. Doing this in a test doesn't necessariy re-introduce the cycle,
because Go lets you declare test files in a separate package from the library go files,
e.g. package c_test
, but this limits you to testing exported functions. Basically: this technique
couples the setup work of assigning a function pointer to calling that method, such that it's no
longer self-contained, and the penalty for forgetting to do this is a process-ending panic.
Look, I told you it was Ugly.
An appeal to heaven: please just let me turn off this constraint
Why does Go enforce the rule against import cycles anyway? C has had various workarounds to import cycles since the very beginning.
// course.h:
#ifndef COURSE_H
#define COURSE_H
// forward declaration of type
class Student;
class Course
{
private:
Student* someStudent;
};
#endif
// student.h:
#ifndef STUDENT_H
#define STUDENT_H
#include "course.h"
class Student
{
private:
Course someCourse;
};
#endif
Python disallows circular imports by default because Python is an interpreted language and import
is an executable statement that needs one module to finish being processed before another one tries
to import it. But there are workarounds there too, and there have been since Python 1.
# thing.py
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from connection import ApiConnection
class Thing:
def __init__(self, connection: 'ApiConnection'):
self._conn = connection
Other statically-typed, compiled languages don't blanch at import cycles. C# allows them (which surprises some people). Java allows them too. And in general opinion on the alleged harms of circular dependencies is quite divided.
But Go is an opinionated language. For the same reason that your code won't compile if you declare a variable that isn't used, it won't compile if you have an import cycle. It's for your own good!
Here's the part where I shake my fist at a cloud that won't hear me or respond.
I understand the impulse to keep developers from shooting themselves in the foot. I even share it! But the difference between disallowing unused variables and disallowing import cycles is not a small one -- it's the difference between seconds and days of work. Moreoever, it prevents me from expressing useful concepts and relationships in a natural way. It's not at all unusual for two abstractions to require each other's mutual participation to accomplish some task. It's even less unusual for this to be true, transitively, over a dependency graph that encompasses many dozens or hundreds of packages.
As for the alleged harms of allowing circular dependencies, I've yet to hear one that wasn't a) it's harder to write a compiler (weak excuse but at least real), or b) fundamentally aesthetic in nature. "Bad practice" or "poor design".
My response to these objections are, respectively, a) other people have done it, and b) I'm a big boy and I can make my own decisions. I pay taxes. I get up and put on pants and go to an office where I write a version controlled database. I'm sovereign and I decide the exception. It should be my decision when I pay down my technical debt. I shouldn't be forced to do it on your schedule because you disapprove of the simple solution I reached for to get a feature to my customer. And if we want to talk about concrete harms to my code base: do you see the hacky workarounds I just recommended to people in the first sections of this blog post? That's what people actually do in the real world when they encounter this constraint on a deadline. Is that really better than allowing a cycle?
I know, I know: smarter and more influential people than me have probably carped about this and it's gotten nowhere. But.
But please, Rob Pike, if you're really up there, let me turn off this anti-feature. I hate it, all my homies hate it, it costs far more than it's worth. I'm sorry for the jokes I made about you that one time, please just release an experiment or compiler flag that lets me turn it off. I'll run Plan 9 on my desktop at home, I'll use an editor with mouse chords, just tell me what it will take.
Conclusion
Do you too yearn to be released from senselessly restrictive compiler constraints? Or perhaps you know I'm wrong and want to tell me why, at great length. Or maybe you're curious about Dolt, the world's first and only version-controlled SQL database. Either way, come by our Discord to talk to our engineering team and meet other Dolt users.