Mypy and Doltpy
Dolt
Dolt is an SQL-database with Git-versioning. The goal of Doltpy, in concert with Dolt, is to solve reproducibility and versioning problems for data and machine learning engineers using Python.
Mypy
Mypy was created by Guido van Rossum, the primary developer of the Python language, as a way to apply PEP standards to Python source code. When lines of code are added to the Python core libraries, their respective mypy stubs are updated lockstep.
So when we fix mypy errors we are enforcing rules of the Python type system. This point is subtle but important: mypy errors when your code is not doing what you've declared it should do. Static checking can't anticipate what input your code will be fed at runtime, but as a developer you can write code that is self-consistent with function and type signatures.
Adding type-hints without enforcement is a common anti-pattern. Mypy is separately installed from Python and its typing modules -- it is up to the developer to actually validate type-hints after adding them. Code with contradictory typing documentation can mislead developers and users alike. Mypy is that bridge between type-aesthetics and type-correctness.
Mypy involves three main modules:
- mypy: A source code parsing and applying PEP constraints.
- typeshed: Type-stubs core and 3rd party libraries; code whose implementations are correctness-checked when used in new code.
- typing: Modules for compatibility between python versions.
All three of these modules are regularly used when using mypy (typing
less so if you only support one Python version). One addendum is that you
can define custom type stubs in your own code, in the same manner typeshed
provides type stubs for popular pip packages, like
boto
and
requests.
Examples
Typing inconsistency
We use mypy
in Doltpy 2.0 to help ensure code-quality. Below is an
an example from Doltpy 1.0 to demonstrate mypy in action:
def log(self, number: int = None, commit: str = None) -> OrderedDict:
args = ["log"]:
if number:
args.extend(["--number", number])
Inside the log
function signature, number: int
correctly reflects the developer intent,
but args: List[str]
disallows integers. This means that calling Dolt.log(1)
fails with an error, while Dolt.log("1")
succeeds.
The intended behavior is clear, and mypy preemptively notices the inconsistency:
> python -m mypy .
example.py:4: error: List item 1 has incompatible type "int"; expected "str"
fixing the type inconsistency restores the expected behavior:
def log(self, number: int = None, commit: str = None) -> OrderedDict:
args = ["log"]:
if number:
args.extend(["--number", str(number)])
and makes mypy happy:
> python -m mypy .
Success: no issues found in 1 source file
Custom typing stub
As a final example, here are first few lines for a custom type stub of the
doltpy.cli.Dolt
class
in doltpy:
class DoltT(Generic[_T]):
_repo_dir: str
@abc.abstractmethod
def repo_dir(self):
...
@staticmethod
@abc.abstractmethod
def init(repo_dir: Optional[str] = None) -> "Dolt": # type: ignore
...
After defining class Dolt(DoltT)
, mypy will enforce our interface
the same way mypy enforces standard library and other 3rd party type
stubs. As a plus, code editors like VSCode should also give hints for
function signature definitions.
Summary
In this post I touched on the utility of using type-hints with mypy, and the comparative pitfalls of using type-hints without. We used specific examples from Doltpy to highlight the nature of static type-checking, and how we use mypy in production at DoltHub.
Are you interested in learning more about Dolt and Doltpy? Try it out. If you have any questions, come chat with us in our Discord.