Tech

60 posts tagged with "Tech" (See all Category)

Atom Feed

Type coverage for Sydent: evaluation

17.12.2021 00:00 — Tech David Robertson

This is the third in a series of three posts which discuss recent work to improve type annotations in Sydent, the reference Matrix Identity server. Last time we discussed the mechanics of how we added type coverage. Now I want to reflect on how well we did. What information and guarantees did we gain from mypy? How could we track our progress and measure the effect of our work? And lastly, what other tools are out there apart from mypy?

The best parts of --strict

While the primary goal was to improve Sydent's coverage and robustness, to some extent this was an experiment too. How much could we get out of typing and static analysis, if we really invested in thorough annotations? Sydent is a small project that would make for a good testbed! I decided my goal would be to get Sydent passing mypy under --strict mode. This is a command line option which turns on a number of extra checks (though not everything); it feels similar to passing -Wall -Wextra -Werror to gcc. It's a little extreme, but Sydent is a small project and this would be a good chance to see how hard it would be. In my view, the most useful options implied by strict mode were as follows.

--check-untyped-defs

By default, mypy will only analyze the implementation of functions that are annotated. On the one hand, without annotations for inputs and the return type, it's going to be hard for mypy to thoroughly check the soundness of your function. On the other, it can still do good work with the type information it has from other sources. Mypy can

  • infer the type of literals, e.g. deducing x: str from x = "hello";
  • lookup the return types of standard library calls, via typeshed; and similarly
  • lookup the return types of any annotated functions in your code or dependencies.

The information is already available for free: we may as well try to use it to spot problems.

--disallow-untyped-defs and friends

This flag forces you to fully annotate every function. There are less extreme versions available, e.g. --disallow-incomplete-defs; but I think this is a good option to ensure full coverage of your module. It means you can rely on mypy's error output as a to-do list.

One downside to this: sometimes I felt like I was writing obvious boilerplate annotations, e.g.

    def __str__(self) -> str:
        ...

There was one particular example of this that crops up a lot. Mypy has a special exception for a class's __init__ and __init_subclass__ methods. If a return type annotation is missing, it will assume these functions return None instead of Any. (See here for its implementation.) This is normally compatible with --disallow-untyped-defs and --disallow-incomplete-defs, with one exception. If your __init__ function takes no arguments other than self, mypy won't consider it annotated, and you'll need to write -> None explicitly.

It's also worth mentioning --disallow-untyped-calls, which will cause an error if an annotated function calls an unannotated function. Again, it helps to ensure that mypy has a complete picture of the types in your function's implementation. It also helps to highlight dependencies—if you see errors from this, it might be more practical to annotate the functions and modules it's calling first.

--warn-return-any

If I've written a function and annotated it to return an int, mypy will rightly complain if its implementation actually goes on to return a str.

def foo() -> int:
    # error: Incompatible return value type (got "str", expected "int")
    return "hello"

If mypy isn't sure what type I'm returning though, i.e. if I'm returning an expression of type Any, then by default mypy will trust that we've done the right thing.

def i_promise_this_is_an_int():
    return "hello"

def bar() -> int:
    reveal_type(i_promise_this_is_an_int()) # Any
    return i_promise_this_is_an_int()

Enabling --warn-return-any will disable this behaviour; to make this error pass we'll have to prove to mypy that i_promise_this_is_an_int() really is an int. Sometimes that will be the case, and an extra annotation will provide the necessary proof. At other times (like in this example), investigation will prove that there really is a bug!

--strict-equality

This is a bit like a limited form of gcc's -Wtautological-compare. Mypy will report and reject equality tests between incompatible types. If mypy can spot that an equality is always False, there's a good chance of there being a bug in your program, or else an incorrect annotation.

I'm not sure how general this check is, since users can define their own types with their own rules for equality by overriding eq. Perhaps it only applies to built-in types?

Quantifying coverage

It was important to have some way to numerically evaluate our efforts to improve type coverage. It's a fairly abstract piece of work: there's nothing user-visible about it, unless we happen to discover a bug and fix it.

The most obvious metric is the number of total errors reported by mypy. Before the recent sprint, we had roughly 600 errors total.

dmr on titan in sydent on  HEAD (3dde3ad) via 🐍 v3.9.7 (env)
2021-11-08 12:35:37 ✔  $ mypy --strict sydent
Found 635 errors in 59 files (checked 78 source files)

This is a decent way to measure your progress when working on a particular module or package, but it's not perfect, because the errors aren't independent. Fixing one could fix another ten or reveal another twenty—the numeric value can be erratic.

Reports

I found mypy's various reports to be a better approach here. There were three reports I found particularly useful.

--html-report

This produces a main index page showing the "imprecision" of each module. At the bottom of the table is a total imprecision value across the entire project.

HTML report, index page. A table showing each module's precision and number of lines of code.

The precision for each module is broken down line-by-line and colour-coded accordingly, which is useful for getting an intuition for what makes a line imprecise. More on that shortly.

HTML report, module page. Most lines of source code are highlighted green; a minority are highlighted red.

--txt-report

This reproduces the index page from the html report as a plain text file. It's slightly easier to parse—that's how I got the data for the precision line graphs in part one of this series. That was a quick and dirty hack, though; a proper analysis of precision probably ought to read from the json or xml output formats. Here's a truncated sample:

+-----------------------------------+-------------------+----------+
| Module                            | Imprecision       | Lines    |
+-----------------------------------+-------------------+----------+
| sydent                            |   0.00% imprecise |    1 LOC |
| sydent.config                     |   0.00% imprecise |  266 LOC |
| sydent.config._base               |   0.00% imprecise |   31 LOC |
| sydent.config.crypto              |  15.94% imprecise |   69 LOC |
| ...                               |               ... |      ... |
| sydent.validators                 |   0.00% imprecise |   61 LOC |
| sydent.validators.common          |   7.35% imprecise |   68 LOC |
| sydent.validators.emailvalidator  |   1.30% imprecise |  154 LOC |
| sydent.validators.msisdnvalidator |   1.34% imprecise |  149 LOC |
+-----------------------------------+-------------------+----------+
| Total                             |   5.95% imprecise | 9707 LOC |
+-----------------------------------+-------------------+----------+

--any-exprs-report

Selecting this option generate two reports: any-exprs.txt and types-of-anys.txt. The latter is interesting to understand where the Anys come from, but the former is more useful for quantifying the progress of typing. Another sample:

                  Name   Anys   Exprs   Coverage
-------------------------------------------------
                sydent      0       2    100.00%
         sydent.config      0     185    100.00%
   sydent.config._base      0       3    100.00%
  sydent.config.crypto     34      80     57.50%
sydent.config.database      0       8    100.00%
   sydent.config.email      0      86    100.00%
                   ...    ...     ...        ...
-------------------------------------------------
                 Total    544   11366     95.21%

The breakdown in types-of-anys.txt has more gory detail. I found the "Unimported" column particularly interesting: it lets us see how exposed we are to a lack of typing in our dependencies.

                             Name   Unannotated   Explicit   Unimported   Omitted Generics   Error   Special Form   Implementation Artifact
-------------------------------------------------------------------------------------------------------------------------------------------
                           sydent             0          0            0                  0       0              0                         0
                    sydent.config             0          3            0                  0       0              0                         0
              sydent.config._base             0          0            0                  0       0              0                         0
                              ...           ...        ...          ...                ...     ...            ...                       ...
        sydent.util.versionstring             0         80            0                  0       0              0                         0
                sydent.validators             0          4            0                  0       0              0                         0
         sydent.validators.common             0         20            0                  0       0              0                         0
 sydent.validators.emailvalidator             0          8            0                  0       0              0                         0
sydent.validators.msisdnvalidator             0          8            0                  0       0              0                         0
-------------------------------------------------------------------------------------------------------------------------------------------
                            Total             9       1276          273                  0      37              0                        17

The meaning of precision

There are two metrics I chose to focus on:

  • the proportion of "imprecise" lines across the project; I also used the complement, precision = 100% - imprecision, and
  • the proportion of expressions whose type is not Any.

These are plotted in the graph at the top of this writeup. I could see that precision and the proportion of typed expressions were correlated, but I didn't understand how they differed. I couldn't see an explanation in the mypy docs, so I went digging into the mypy source code. My understanding is as follows.

  1. There are five kinds of precision. Full details are visible in the --lineprecision-report.
  2. Two kinds of precision, EMPTY and UNANALYZED convey no information, because there's nothing to analyze.
  3. A line is marked as precise, imprecise or any based on the expressions it uses.
    • An expression that has type Any leads to precision ANY.
    • I think an expression that involves Any but is not Any counts as imprecise. For instance, Dict[str, Any].
    • Remaining expressions have precision PRECISE.
  4. A line's precision is the worst of all its expressions' precisions.
    • ANY is worse than IMPRECISE, which is worse than PRECISE.

The "imprecision" number reported by mypy counts the number of lines classified as IMPRECISE or ANY.

On balance, my preferred metric is the line-level (im)precision percentage. There wasn't much difference between the two in my experience, but the colour-coded visualisation in the HTML report is a neat feature to have. Maybe in the future there could be a version of the HTML report that colour-codes each expression?

The larger typing ecosystem

There are plenty of articles out there about the typing. As well as the mypy blog itself, see Daniele Varrazzo's post on psycopg3 (2020), Dropbox's blog post (2019), Zulip's blog post (2016), Glyph's blog post on Protocols (2020) and a follow-up comparing them to zope.interface (2021) and Nylas's blog (2019). I'm sure there's plenty more out there.

It's worth highlighting the typeshed project, which maintains stubs for the standard library, plus popular third-party libraries. I submitted a PR to add a single type hint—it was a very pleasant experience! Microsoft has an incubator of sorts for stubs too.

Other typecheckers

After the sprint to improve coverage, I spent a short amount of time trying the alternative type checkers out there. Mypy isn't the only typechecker out there—other companies have built and open-sourced their own tools, with different strengths, weaknesses and goals. This is by no means an authoritative, exhaustive survey—just my quick notes.

Pyre (Facebook)

  • I couldn't work out how to configure paths to resolve import errors; in the end, I wasn't able to process much of Sydent's source code.
  • Couldn't get it to process annotations like syd: "Sydent" where "Sydent" is an import guarded by TYPE_CHECKING.
  • No plugin system that I could see. That said, it has a separate mode/program for running "Taint Analysis" to spot security issues.
  • Seemed stricter by default compared to mypy: there was less inference of types.
  • Also has its own strict mode.

Pyright (Microsoft)

  • Didn't seem to recognise getLogger as being imported from logging. Not sure what happened there—maybe something wrong with its bundled version of typeshed?
  • In a few places, Sydent uses urllib.parse.quote but only imports urllib. We must be unintentionally relying on our dependencies to import urllib.parse somewhere! Mypy didn't complain about this; pyright did.
  • Seemed to give a better explanations of why complex types were incompatible. For example:
    /home/dmr/workspace/sydent/sydent/replication/pusher.py
       /home/dmr/workspace/sydent/sydent/replication/pusher.py:77:16 - error: Expression of type "DeferredList" cannot be assigned to return type "Deferred[List[Tuple[bool, None]]]"
         TypeVar "_DeferredResultT@Deferred" is contravariant
           TypeVar "_T@list" is invariant
             Tuple entry 2 is incorrect type
               Type "None" cannot be assigned to type "_DeferredResultT@_DeferredListResultItemT" (reportGeneralTypeIssues)
     /home/dmr/workspace/sydent/sydent/sms/openmarket.py
       /home/dmr/workspace/sydent/sydent/sms/openmarket.py:93:13 - error: Argument of type "dict[_KT@dict, list[bytes]]" cannot be assigned to parameter "rawHeaders" of type "Mapping[AnyStr@__init__, Sequence[AnyStr@__init__]] | None" in function "__init__"
         Type "dict[_KT@dict, list[bytes]]" cannot be assigned to type "Mapping[AnyStr@__init__, Sequence[AnyStr@__init__]] | None"
           TypeVar "_KT@Mapping" is invariant
             Type "_KT@dict" is incompatible with constrained type variable "AnyStr"
           Type cannot be assigned to type "None" (reportGeneralTypeIssues)
    
    This would have been really helpful when interpreting mypy's error reports; I'd love to see something like it in mypy. Here's another example where I tried running against a Synapse file.
    /home/dmr/workspace/synapse/synapse/storage/databases/main/cache.py
    /home/dmr/workspace/synapse/synapse/storage/databases/main/cache.py:103:53 - error: Expression of type "list[tuple[Unknown, Tuple[Unknown, ...]]]" cannot be assigned to declared type "List[Tuple[int, _CacheData]]"
      TypeVar "_T@list" is invariant
        Tuple entry 2 is incorrect type
          Tuple size mismatch; expected 3 but received indeterminate number (reportGeneralTypeIssues)
    
    This is really valuable information. It's worth considering Pyright as an option to get a second opinion!
  • It looks like Pyright's name for Any is Unknown. I think that does a better job of emphasising that Unknown won't be type checked. I'd certainly be more reluctant to type x: Unknown versus x: Any!
  • Pyright is the machinery behind Pylance, which drives VS Code's Python extension. That alone probably makes it worthy of more eyes.
  • Seemed like it was the best-placed alternative typechecker to challenge mypy (the de-facto standard).

Pytype (Google)

  • Google internal? Seems to be maintained by one person semiregularly by "syncing" from Google.
  • Apparently contains a script merge-pyi to annotate a source file given a stub file.
  • No support for TypedDict: as soon as it saw one in Sydent, it stopped all analysis.
  • No Python 3.10 support (according to the README anyway).
  • I think it might use a different kind of typing semantics; its typing FAQ speaks of "descriptive typing" and a more lenient approach.

PyCharm

PyCharm has its own means to typechecking code as you write it. It's definitely caught bugs before, and having the instant feedback as you type is really nice! I have seen it struggle with zope.interface and some uses of Generics though.

Runtime uses of annotations

When annotations were first introduced, they were a generic means to associate Python objects with parts of a program. (It was only later that the community agreed that we really want to use them to annotate types). These annotations are available at runtime in the __annotations__ attribute. There's also a helper function in typing which will help resolve forward references.

>>> from typing import get_type_hints
>>> def foo(x: int) -> str: pass
...
>>> get_type_hints(foo)
{'x': <class 'int'>, 'return': <class 'str'>}

Programs and libraries are free to use these annotations at runtime as they see fit. The most well-known examples are probably dataclasses, attrs with auto_attribs=True and Pydantic. I'd be interested to learn if anyone else is consuming annotations at runtime!

Summary

All in all, in a two-week sprint we were able to get Sydent's mypy coverage from a precision of 83% up to 94%. Our work would have spotted the bytes-versus-strings bug; we understand why the missing await wasn't detected. We fixed other small bugs too as part of the process. As well as fix bugs, I've hopefully made the source code clearer for future readers (but that one is hard to quantify).

There's room to spin out contributions upstream too. I submitted two PRs to twisted upstream; have started to work on annotations for pynacl in my spare time; and submitted a quick fix to typeshed.

Looking forward, I think we'd get a quick gain from ensuring that our smaller libraries (signedjson, canonicaljson) are annotated. We'll be sticking with mypy for now—the mypy-zope plugin is crucial given our reliance on twisted. We're also working to improve Sygnal and Synapse—though not to the extreme standard of --strict across everything.

I'd say the biggest outstanding hole is our processing of JSON objects. There's too much Dict[str, Any] flying around. The ideal for me would be to define dataclass or attr.s class C, and be able to deserialise a JSON object to C, including automatic (deep) type checking. Pydantic sounds really close to what we want, but I'm told it will by default gladly interpret the json string "42" as the Python integer 42, which isn't what we'd like. More investigation needed there. There are other avenues to explore too, like jsonschema-typed, typedload or attrs-strict.

To end, I'd like to add a few personal thoughts. Having types available in the source code is definitely A Good Thing. But there is a part of me that wonders if it might have been worth writing our projects in a language which incorporates types from day one. There are always trade-offs, of course: runtime performance, build times, iteration speed, ease of onboarding new contributors, ease of deployment, availability of libraries, ability to shoot yourself in the foot... the list goes on.

On a more upbeat note, adding typing is a great way to get familiar with new source code. It involves a mixture of reading, cross-referencing, deduction, analysis, all across a wide variety of files. It'd be a lot easier to type as you write from the get-go, but typing after the fact is still a worthy use of time and effort.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

Type coverage for Sydent: annotation

10.12.2021 00:00 — Tech David Robertson

This is the second in a series of three posts which discuss recent work to improve type annotations in Sydent, the reference Matrix Identity server. Last time we discussed the motivation for doing this work in the first place: the why. Now I want to talk about the how. How did we add annotations to individual files, and across the project as whole? What common idioms did we learn on the way?

The process of improving coverage

From experience adding typing to Synapse, we decided to annotate one module at a time. We configured a list of files in pyproject.toml which we knew passed mypy's checks. We could run mypy in CI to check that new PRs didn't introduce type problems in modules already covered.

From there, the workflow was

  1. Choose a new module to annotate. Add it to the list of files in mypy's configuration.
  2. Run mypy. See how many errors you get.
  3. Choose an error. Fix it. Re-run mypy.
  4. Repeat until no errors remaining.
  5. Submit for review.

There are two parts in there which involve a choice. Being honest, I made those choices unscientifically: I tried to choose the easy tasks to do first. My first target was actually the entire sydent.util subpackage. Probably a bit too large for a first bite! My thinking was that util sounded like something with few dependencies that would have impact across the whole source tree.

Within a given module, I'd try to fix easier errors first: partly for confidence, partly to build up momentum, and partly to get myself familiar with that piece of source code. For example, I'd often start by telling mypy that mylist = [] was actually was a List[str] (rather than the generic List[Any] which it would use otherwise).

Picking and choosing easy targets works fairly well, but sometimes that means you end up fixing an error that's really a symptom of an earlier one. Other times fixing one error, e.g. by giving a return type annotation to a function—would solve a series of other errors throughout the file. Watching the total number of errors mypy reports bob up and down was intriguing!

In retrospect, I think it would be smoother to generate some kind of dependency graph for the package. I'm imagining a DAG where whose vertices are modules, and there's an edge A -> B if A imports from B. The sinks of this DAG (i.e. modules which don't depend on any others in the package) are the ideal place to start: you can get something strictly typechecked there without having to annotate a long chain of dependencies across other files. Another strategy would be to see which modules were the least precise according to mypy's reports—but more on those next time.

You're at the mercy of your dependencies

I think this is my single biggest takeaway from the process of adding annotations to Sydent. I'll admit the phrasing is melodramatic, but I think it rings true.

Improving coverage boils down to giving the typechecker more information about your program. The more information it has, the more it can check—and the more errors it can spot. (Hopefully this doesn't make typing come across like a pyramid scheme.) If your dependencies aren't typed, mypy can't validate you're correctly providing inputs and correctly consuming outputs. You might have a bigger impact on overall typing coverage by annotating a dependency (directly or via stubs). I have a hunch that bugs are more likely in code that uses an external dependency: we're much more familiar with the details of our own source code compared to that of a third party we trust.

It's worth looking to see if your dependencies have a newer version including type annotations. Failing that, they may have a stub package added to typeshed and published on PyPI. I saw example of both cases when choosing how to configure mypy for Sydent. If some of your dependencies are under your control, consider annotating them—you'll feel the benefits across multiple projects pretty quickly.

Annotations when working with twisted

Twisted is Sydent's biggest dependency, and I certainly felt at its mercy! In particular, it has a few quirks which make it trickier to annotate applications using it. Here's a summary of the work we had to do to get those annotations working.

Partially typed modules and stubs

Early into the process, mypy reported that calling the function twisted.python.log.err was an error. It did so because I was running mypy in --strict mode. We'll talk more about why I did so and what this means next time; for now, it's enough to know that calling a function that isn't fully annotated from within a function that is constitutes an error under strict mode. twisted is partially annotated: many key modules and functions have type annotations, but others don't. I was reluctant to give up on --strict. Instead, I decided to stub the err function myself.

A stub is a cut-down version of a python function, class or module which lives in a .pyi file. All implementation details are removed; only type annotations remain. Stubs are useful when you want to write annotations for code you don't control. The typeshed library is probably the best example: a collection of third party stubs for the standard library, plus some popular third party packages. Microsoft's python-type-stubs is another example. They'd also solve the problem I mentioned at the end of the first part: I could use a stub to teach mypy that IResponse.headers was a Headers object.

Writing a stub for log.err was straightforward, thanks mainly to Twisted's thorough documentation. I chose to write it by hand, rather than use stubgen. I'd heard of the latter, but was reluctant to use it for a few reasons.

  • Twisted is a big project with many big files. I didn't want to commit huge stub files for me and my colleagues to maintain.
  • The more our stubs cover, the larger the risk of a stub becoming out-of-date with twisted itself. Upstream twisted is the best place for these annotations.
  • We only need annotations for the bits of twisted that we're using.
  • And, being honest: I wanted the low-level experience of writing stubs, cross-referencing between the source and working how to best capture its type semantics.

I think this decision to write a targeted stub made sense at the time. After all, twisted.python.logging.err is just one simple function! But for the project as a whole, I regret not using stubgen to generate module-level stubs. The reason for this is that a .pyi stub file accounts for an entire module: no more and no less. This means that the stubs I was writing for parts of twisted only covered the functions and classes I'd stubbed. Any existing annotations in the twisted source code would be ignored, along with any types that mypy was able to infer for itself.

I think it would have been more efficient to use stubgen to generate stubs and patch them up, rather than writing them. That would have helped avoid a few cases I encountered where stubbing one function would cause additional typechecking failures (because mypy was no longer examining the twisted source for that file). It would also have meant that I could just faithfully stub Twisted as it is; in practice, I would sometimes hesitate to stub to avoid having to cover another module. sydent.http was the most painful part of the source tree for this: that's where we make the most use of Twisted.

defer.inlineCallbacks

This is a decorator which allows us to write code in the style of async/await without actually using that syntax. (Twisted predates asyncio and the async/await syntax, introduced in Python 3.4 and 3.5 respectively. It was originally released in 2002, back when Python had released version 2.2.) We only use it in one place in Sydent nowadays, but I've seen used across Synapse too. I mention it here because it was a bit fiddly to annotate. Here's its use in Sydent:

    @defer.inlineCallbacks
    def request(
        self,
        method: bytes,
        uri: bytes,
        headers: Optional["Headers"] = None,
        bodyProducer: Optional["IBodyProducer"] = None,
    ) -> Generator["defer.Deferred[Any]", Any, IResponse]:

Here, request is a generator function because its body uses the yield keyword. Yielding allows the function to relinquish control back to twisted's reactor, only for its execution to be resumed asynchronously in the future. The Generator type takes three parameters:

  • a YieldType, Deferred[Any];
  • a SendType, Any; and
  • a ReturnType, IResponse.

Why have I opted to use Any here, when we've seen (and will see) that this limits mypy's ability to run checks? The answer is that we yield two different types within the function. Firstly a routing result:

        routing: _RoutingResult
        routing = yield defer.ensureDeferred(self._route_matrix_uri(parsed_uri))

and later, the IResponse we go on to return:

        res: IResponse
        res = yield agent.request(method, uri, headers, bodyProducer)
  • In this example:

  • We yield a value y: Deferred[Any].

  • That will later be resolved by twisted to an x: Any value. Twisted will send that value to request, and the execution continues.

  • This repeats, until we eventually return an IResponse.

I could more correctly annotate the YieldType as Deferred[Union[_RoutingResult, IResponse]] so that the SendType was Union[_RoutingResult, IResponse]. But this would mean we end up having some kind of type check at each yield point: a cast, or a type: ignore, or a runtime isinstance check. It didn't feel like it was worth the boilerplate, especially since I could narrow e.g. res: Any to res: IResponse with minimal effort.

This problem doesn't arise with using the async/await syntax:

async def foo() -> int:
    return 1

async def bar() -> None:
    x = await foo()
    reveal_type(foo())
    reveal_type(x)
$ mypy example.py
example.py:6: note: Revealed type is "typing.Coroutine[Any, Any, builtins.int]"
example.py:7: note: Revealed type is "builtins.int*"

Behind the scenes, I think that x = await foo() is really using the same mechanism as the inlineCallbacks approach.

  • An async def function is really a generator function behind the scenes.
  • When we await foo(), we yield the expression foo()
  • Then the machinery running our coroutine c will call c.send(x) to resume execution, where x is the value produced by waiting for foo().

With the await form, mypy knows two things:

  • the value foo() which was yielded should be Awaitable[T], and
  • the value x send to the coroutine should come from awaiting foo(), and therefore be of type T.

Mypy can't assume or enforce these rules for the yield form, which can yield and send whatever it likes. There's no reason why the send type should be related to the yield type. Here's a toy example:

from typing import Any, Dict, Generator


def generator_function() -> Generator[int, str, Dict[str, Any]]:
  y: int = 10
  x: str = yield y
  print("Coroutine was sent", x)  # -> Coroutine was sent hello
  return {"got": x, "done": True}

coroutine = generator_function()
y = next(coroutine)

try:
  coroutine.send("hello")
except StopIteration as e:
  return_value = e.value
  print(return_value)  # -> {'got': 'hello', 'done': True}

All in all, the handling of inlineCallbacks is a situation specific to working with (older?) twisted code. It's still nice to understand what's going on behind the scenes though!

zope.interface.Interface

Twisted makes use of zope's Interface to define a number of abstract interface classes. Speaking personally, I've not seen it used outside twisted, and I think that means it's not supported by much of the typechecking tooling. For example, I've definitely seen PyCharm struggle to realise that it's okay to pass a Response to a function which expects an IResponse! Here's a more complicated example where PyCharm isn't happy with me widening the type LoggingHostnameEndpoint to IStreamClientEndpoint, even though the latter implements the former.

Screenshot from pycharm showing a false positive warning

Mypy out of the box doesn't play well with a zope Interface (nor does any other typechecker I tried). Fortunately, the excellent mypy-zope plugin helps here: it teaches mypy that any class like Response which @implements(IResponse) can be passed in place of an IResponse.

Tricks of the trade

At this point I'd like to share a few generic lessons about typing I'd picked up. Nothing ground-breaking here: I think these are all fairly well-known. Hopefully they're the start of a good cheat sheet for annotating—though it pales in comparison to the Mypy cheat sheet.

Annotating decorators

Annotating decorators is fiddly, but it's also vitally important. An unannotated decorator will mask or throw away your decoratee's annotations! The way to write the annotation is best explained by the mypy docs, but briefly: it involves a TypeVar and sometimes a cast too. Here's an example.

from typing import TypeVar, Callable, Any, cast

F = TypeVar("F", bound=Callable[..., Any])

def decorator(input: F) -> F:
    def wrapped(*args: Any, **kwargs: Any) -> Any:
        print(f"Calling {f.__name__}")
        return input_func(*args, **kwargs)
    return cast(F, wrapped)

The idea here is

  1. Use Callable[..., Any] to describe a generic function with no particular signature.
  2. Use that as a bound on a type variable F. At each usage of @decorator, mypy will deduce a more specific version of F, e.g. Callable[[int], str].
  3. Within that usage of @decorator, F is fixed to that specific type. We use -> F to express that "we return a function with the same signature as the decoratee".
  4. Unfortunately, we don't have a good way to tell mypy that wrapped also has that signature F. We resort to a cast to force mypy to accept this without proof.

This might change in the future, e.g. when ParamSpec is fully understood by mypy.

Prefer object over Any

Both of these are general types for expressing "I don't know anything about this expression". But only the former will undergo static type checks. We want those type checks to guard against bugs accidentally introduced in the future. For instance, imagine a class with an Any attribute.

from typing import Any
import dataclasses

@dataclasses.dataclass
class C:
    label: Any

Imagine in the future we add a new method which assumes that label is a string:

    def greeting(self) -> str:
        return "My name is " + self.label

Mypy will consider this valid, because no type-checking is done on an Any value. It will complain that it can't prove that greeting returns a str, if --warn-return-any is enabled; but putting that aside, it can't identify the call site as a bug. The bug slips through to runtime.

C(123).greeting()  # TypeError: can only concatenate str (not "int") to str

Replacing the Any with object does allow mypy to spot the problem.

error: Unsupported operand types for + ("str" and "object")  [operator]

# type: ignore and cast sparingly

There's a good chance that mypy knows better than we do, so we should only overrule it if there's no better option. There are two techniques for this. One option is to tell mypy to just silence the error, by appending a # type: ignore comment to the erroneous line. The other is to force it to accept that a certain expression has a given type: that's what cast is for.

I never like using either of these, but sometimes they're the most practical choice. I'd recommend two best practices for their use, however:

  1. Only ignore a specific error code, e.g. # type: ignore[assignment]. Those codes can be displayed by passing --show-error-codes to mypy.
  2. Leave a comment before every #type: ignore[...] and every cast to justify their correctness. I don't think this is a widely-held practice. I picked it up from the Rust world, where it's encouraged as a way to justify unsafe source code.

There's good stuff in typing

It's worth a read through the module documentation—plenty of things in there I wish I'd known about sooner. There's lots in the toolkit to chose from. Some bits I use fairly often include:

  • Protocol: lets you formalise duck typing. To use it, define a class that inherits from Protocol. Its methods and attributes are all stubs which describe what you require of objects belonging to this type. It's like an abstract base class or interface, but purely at typecheck time.
  • Generic: define your own generic types. A bit of a slippery slope to type mania!
  • [Optional](https://docs.python.org/3/library/typing.html?highlight=typing%20generic#typing.Optional): I use this all the time. It's always good to make your None`s explicit!
  • NewType: I haven't had a chance to use it much. As I understand it, it's a way to define a "strong typedef". For example, we can use it to distinguish lengths from durations, even if they're both represented by a float at runtime.

I want to call out two parts of typing in particular:

overload

overload is a way to provide extra information about a function depending on how it's called. For instance, consider this function which takes a str or bytes as input and returns its uppercase version.

def upper(x: Union[bytes, str]) -> Union[bytes, str]:
    return x.upper()

The problem with this annotation is that we'll have to check at every call site to see if the return value was a bytes or a str object. But we know that uppercasing a str gives us a str, and uppercasing a bytes gives us a bytes. We can use overload to express this.

from typing import overload

@overload
def upper(x: str) -> str: ...

@overload
def upper(x: bytes) -> bytes: ...

def upper(x: Union[bytes, str]) -> Union[bytes, str]:
    return x.upper()

The first two @overload definitions are like stubs: they're purely used for their annotations. The actual runtime implementation is specified at the end, undecorated. (NB: this specific example is better expressed without overloads, by using AnyStr.)

I was really impressed to see how mypy could use this overloading information. Here's an example:

from typing import overload, Literal

@overload
def f(x: int) -> Literal[1]: ...

@overload
def f(x: str) -> Literal[2]: ...

def f(x: object) -> int:
    if isinstance(x, int):
        return 1
    elif isinstance(x, str):
        return 2
    return 3

if f(10) == 2:
    print("potato")

We can see that f(10) == 1 and so the equality is always False: we'll never print the word "potato". Mypy can reason through this too, if we ask it nicely.

$ mypy example.py
Success: no issues found in 1 source file

$ mypy --strict-equality --warn-unreachable example.py
example.py:16: error: Non-overlapping equality check (left operand type: "Literal[1]", right operand type: "Literal[2]")
example.py:17: error: Statement is unreachable
Found 2 errors in 1 file (checked 1 source file)

TypedDict

I mentioned this in last week's post) when talking about the missing await bug. Subclassing from TypedDict allows us to define a new type whose values are dictionaries with

  • a fixed set of keys (some mandatory, some not), and
  • a fixed type for each key's value.

PEP 589 can best describe the motivation, but in short: there's a lot of source code out there that passes dictionaries around. TypedDict is a way to gradually add typechecking to that code, without having to refactor it to use e.g. a dataclass or a NamedTuple. Let's have a quick example.

from typing import List, TypedDict

class Person(TypedDict):
    full_name: str
    nicknames: List[str]
    age: int

p: Person = {
    "full_name": "David Matthew Robertson",
    "nicknames": ["dmr"],
    "age": 29,
}

Mypy will detect if you delete or omit a required key, or if you insert a key that's not part of the type.

# error: Key "age" of TypedDict "Person" cannot be deleted
del p["age"]

# error: Missing keys ("full_name", "nicknames", "age") for TypedDict "Person"
p2: Person = {}

# error: TypedDict "Person" has no key "eye_colour"
p["eye_colour"] = "brown"

TypedDict is a useful tool, but there are a few important gotchas to be aware of. In my view, these are all minor and worth putting up with, to benefit from the extra type information that a TypedDict provides. Let's take a look.

TypedDict is incompatible with Dict

This one surprised me at first. Why can't I pass my TypedDict to a function that accepts a generic dictionary?

import json
from typing import Dict, TypedDict

class Foo(TypedDict):
    bar: str

def print_size(d: Dict[object, object]) -> None:
    print(len(d))

f: Foo = {"bar": "baz"}
# error: Argument 1 to "print_size" has incompatible type "Foo"; expected "Dict[object, object]"
print_size(f)

The answer is that it wouldn't be type-safe! For all we know, the function print_size might mutate its argument d. It might add a key, remove a key, or change the type of a value. Each of those would break the contract that TypedDict is supposed to enforce, so mypy is correct to flag this as an error. This GitHub issue has more discussion.

There are few workarounds for this kind of problem.

  • The function might be able to accept a TypedDict directly.
  • The function could accept a Mapping instead of a Dict. This is type-safe because the Mapping type does not offer mutating methods such as pop, __setitem__, __del__; but it only works if the function does not mutate the dictionary.
  • A more drastic approach would discard the TypedDict information with a cast to Dict[str, object] or similar. If so, I'd strongly recommend a comment explaining why the cast is correct and safe.
Mixing mandatory and required fields

Secondly, defining a TypedDict with a mixture of optional and required keys is a little fiddly. All keys can be made optional by passing total=False in the class definition. To have some keys optional and others mandatory, we have to make two TypedDicts. Say for instance we wanted to allow a potentially-missing favourite_colour field to Person. We can't add an annotation favourite_colour: Optional[str] to the first class body. That's optional in a different sense: it would mean that favourite_colour is a mandatory field which is allowed to be None. Instead, we apply total=False to a subclass:

from typing import List, TypedDict

class _PersonRequired(TypedDict):
    full_name: str
    nicknames: List[str]
    age: int

class Person(_PersonRequired, total=False):
    favourite_colour: str

This means that a valid Person dictionary may have no favourite_colour key. If it is present, it must be a str.

The syntax is unfortunately a little clunky. PEP 655 acknowledges this and proposes an alternative.

TypedDict is typecheck-time only

One final note: a TypedDict does no validation or conversion whatsoever. If mypy can't know the keys and values a dictionary will have a typecheck-time, you'll need to validate it by hand at runtime. We see this a lot because when deserialising json objects into a dictionary. Here's an example.

import json
from typing import TypedDict

class Foo(TypedDict):
    bar: str

# note: Revealed type is "Any"
reveal_type(json.loads("{}"))
# no error. mypy can't do analysis on Any.
f: Foo = json.loads("{}")

Next time

This covers a lot of the machinery and the day-to-day process of annotating Sydent. The last part of this series will give us a chance to quantify our efforts and reflect on the wider typing ecosystem.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

Type coverage for Sydent: motivation

03.12.2021 00:00 — Tech David Robertson

This is the first of three posts on improving type coverage in Sydent. Join us next Friday for the second part!

Sydent is the reference Matrix Identity server. It provides a lookup service, so that you can find a Matrix user via their email address or phone number (if they've chosen to share it).

We recently worked on improving Sydent's type coverage: the proportion of its source code with explicit annotations denoting the kind of data that each variable, expression and return value is expected to hold. These annotations are optional, but if present, they allow tools like mypy to analyze your programs and spot entire classes of bugs before you ship them! In this instance, we aimed to make Sydent pass mypy --strict, which it now does. Here's what the process looked like:

Coverage as measured by mypy. Precision and the number of typed expressions increase over the latter half of 2021.

Two lines show two different measures of how well-typed the project is. The grey region covers our two-week sprint towards improving coverage; the earliest data point is from just before previous efforts to improve typing earlier in the year.

In a series of posts, I'd like to reflect on this sprint and share what we've learned. In particular, I aim to:

  • explain why we wanted to improve type coverage now;
  • work through examples to see how (if?) mypy could have spotted bugs;
  • describe the annotation process;
  • illustrate common patterns I learned along the way;
  • discuss the checks that mypy provides; and finally
  • reflect on the state of Python's typing ecosystem.

In this first part, we'll concentrate on the first two topics; the second covers the middle two; and the third the last two.

Why do this now?

It took us a long time (too long) to notice that the Sydent instance serving matrix.org was failing to send SMS messages for verification. We suspected that something was going wrong with our API call to OpenMarket. Our first step was to improve logging, so we could start to deduce what was going wrong and why. Whilst trawling through logs, we spotted one problem which meant we weren't actually sending off the API request in the first place. Further investigation revealed a strings-versus-bytes confusion which meant that we would always (incorrectly) interpret the API response as having failed.

All in all, phone number verification was unknowingly broken in the 2.4.0 release, to be fixed in 2.4.6 a month later. How could we do better? Better test coverage is (as ever) one answer. But it struck me that the two bugs we'd encountered might be ripe for automatic detection:

  • we created an Awaitable but didn't await it or use it in any way, and
  • we tried to look up a str key in a dictionary which mapped bytes to bytes.

Could a static analysis tool like mypy detect these? How much work would it take to do so? Are there other bugs and problems we could spot with it? I was curious to answer these questions and learn more about the tools that Python's typing ecosystem provides.

Could typing have spotted these problems?

Let's start with the first of question: what can mypy detect?

The missing await

Instead of writing x = await foo(), we simply had x = foo() and didn't then go on to await x. Mypy doesn't have means to detect this at present. There was interest in this issue on such a feature, with related threads here and here.

Are there other opportunities to spot the error? Here's the relevant bit of source code from before the fix.

            sid = self.sydent.validators.msisdn.requestToken(
                phone_number_object, clientSecret, sendAttempt, brand
            )
            resp = {
                "success": True,
                "sid": str(sid),
                "msisdn": msisdn,
                "intl_fmt": intl_fmt,
            }

The call to requestToken produces a value of type Awaitable[int]. If we tried to assign that to an expression of type int we'd get an error that mypy can spot.

$ cat example.py
async def foo() -> int:
    return 1

async def bar():
    x = foo()      # no error
    y: int = foo() # error: rhs is Awaitable[int], but lhs expects int

$ mypy --check-untyped-defs example.py
example.py:6: error: Incompatible types in assignment (expression has type "Coroutine[Any, Any, int]", variable has type "int")
Found 1 error in 1 file (checked 1 source file)

Note that we have to specifically ask mypy to typecheck the body of bar by passing --check-untyped-defs; by default, mypy will only typecheck annotated code.

We might also have been able to detect the error by looking at how we used sid. Unfortunately, the only use of was in a conversion str(sid), which is a perfectly type-safe call for both sid: int and sid: Awaitable[int]. But let's put that aside for a second—suppose we added "sid": sid directly into the resp dictionary. Could mypy have spotted there was a problem with that?

Unfortunately not. Because resp has no annotation, we have to rely on how it's used to spot any type inconsistencies. There's only one use of resp: as the return value from its enclosing function, render_POST. Mypy's only chance to spot a type problem would be to compare the mypy's inferred type for resp to the return type of render_POST. What are those types? We can use reveal_type to see the former is Dict[str, object]. For the latter:

    @jsonwrap
    def render_POST(self, request: Request) -> JsonDict:

The return type is JsonDict, which is an alias for Dict[str, Any]. This is bad news, because Dict[str, object] and Dict[str, Any] are compatible. Digging a level deeper, this is because sid: Any holds true for both sid: int and sid: Awaitable[int]—so there's no error to spot here. The Any type is compatible with any other type whatsoever! Mypy uses Any as a way to defer all type checking to runtime; mypy won't (and can't!) statically analyse the usage of an expression of type Any. Indeed, mypy's reports will tell you how many Anys you're working with, and offer a variety of options to warn or error on usages of Any.

If we were inserting sid directly into a dictionary, we could do better by annotating the dictionary (or the function's return type) as a TypedDict. This is a way to specify a dictionary with a fixed set of keys, each with a fixed type. It comes in really handy for Sydent, Sygnal and Synapse—all of the Matrix APIs exchange JSON dictionaries, so anything we can do to teach mypy about their shape and types is gold dust.

In short, there were options for detecting this with some code changes, but no magic wand that would have spotted the error in the code as written.

The strings/bytes confusion

Our error was here:

        headers = dict(resp.headers.getAllRawHeaders())
        request_id = None
        if "X-Request-Id" in headers:
            request_id = headers["X-Request-Id"][0]

In this sample, resp.headers is a twisted.web.http_headers.Headers instance. getAllRawHeaders is documented as returning an iterable of (bytes, Sequence[bytes]) pairs. Even better, mypy can see this because getAllRawHeaders is annotated (many thanks to the twisted authors for this). Mypy should be able to deduce that we build a dictionary headers: Dict[bytes, Sequence[bytes]. We can check this using reveal_type:

        headers = dict(resp.headers.getAllRawHeaders())
        reveal_type(headers)
$ mypy
sydent/sms/openmarket.py:110: note: Revealed type is "builtins.dict[builtins.bytes*, typing.Sequence*[builtins.bytes]]"

(The * in builtins.bytes* here means mypy has inferred that the dictionary's keys are bytes, rather than being told explicitly that they must be bytes.)

That's all fine and dandy. But why didn't we spot this before if the annotations were all in place in twisted? Let's put aside the fact that, erm, we weren't running mypy in Sydent's CI until the recent sprint, unlike our other projects. Checking out the problematic version, we can run mypy on the file we know to contain the bug.

$ git checkout v2.4.0
$ mypy --strict sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]

Huh. Mypy spots something, but not the error we were hoping for. What's going on here? We can ask mypy to show its working with reveal_type again.

        resp = await self.http_cli.post_json_get_nothing(
            API_BASE_URL, send_body, {"headers": req_headers}
        )
        reveal_type(resp)
        headers = dict(resp.headers.getAllRawHeaders())
        reveal_type(resp.headers)
        reveal_type(resp.headers.getAllwRawHeaders())
        reveal_type(headers)

This yields:

$ mypy sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]
sydent/sms/openmarket.py:102: note: Revealed type is "twisted.web.iweb.IResponse*"
sydent/sms/openmarket.py:104: note: Revealed type is "Any"
sydent/sms/openmarket.py:105: note: Revealed type is "Any"
sydent/sms/openmarket.py:106: note: Revealed type is "builtins.dict[Any, Any]"
Found 1 error in 1 file (checked 1 source file)

Ahh, the Any type. As mentioned above, this represents a value whose type can't be statically determined. We're left to runtime checks to detect the problem. But we won't detect it at runtime, because dictionaries don't enforce any kind of type requirements on their keys and values.

The problem here is that mypy can't see that resp.headers is a twisted Headers object. If we could inform it of this, mypy would spot our bug:

        import twisted.web.http_headers
        raw_headers: twisted.web.http_headers.Headers = resp.headers
        reveal_type(resp)
        headers = dict(raw_headers.getAllRawHeaders())
        reveal_type(raw_headers)
        reveal_type(raw_headers.getAllRawHeaders())
        reveal_type(headers)
$ mypy sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]
sydent/sms/openmarket.py:104: note: Revealed type is "twisted.web.iweb.IResponse*"
sydent/sms/openmarket.py:106: note: Revealed type is "twisted.web.http_headers.Headers"
sydent/sms/openmarket.py:107: note: Revealed type is "typing.Iterator[Tuple[builtins.bytes, typing.Sequence[builtins.bytes]]]"
sydent/sms/openmarket.py:108: note: Revealed type is "builtins.dict[builtins.bytes*, typing.Sequence*[builtins.bytes]]"
sydent/sms/openmarket.py:114: error: Invalid index type "str" for "Dict[bytes, Sequence[bytes]]"; expected type "bytes"  [index]
sydent/sms/openmarket.py:114: error: Argument 1 to "split" of "bytes" has incompatible type "str"; expected "Optional[bytes]"  [arg-type]
Found 3 errors in 1 file (checked 1 source file)

There it is, on line 114: Invalid index type "str" for "Dict[bytes, Sequence[bytes]]"; expected type "bytes".

Unfortunately it'd be a pain to annotate our application code to mark every use of IResponse.headers as a Headers object. We'll see a better way to do things in this the next post, which discusses the nitty-gritty details of adding annotations file-by-file.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

How the UK's Online Safety Bill threatens Matrix

19.05.2021 15:47 — Tech Denise Almeida
Last update: 19.05.2021 14:48

Last week the UK government published a draft of the proposed Online Safety Bill, after having initially introduced formal proposals for said bill in early 2020. With this post we aim to shed some light on its potential impacts and explain why we think that this bill - despite having great intentions - may actually be setting a dangerous precedent when it comes to our rights to privacy, freedom of expression and self determination.

The proposed bill aims to provide a legal framework to address illegal and harmful content online. This focus on “not illegal, but harmful” content is at the centre of our concerns - it puts responsibility on organisations themselves to arbitrarily decide what might be harmful, without any legal backing. The bill itself does not actually provide a definition of harmful, instead relying on service providers to assess and decide on this. This requirement to identify what is “likely to be harmful” applies to all users, children and adults. Our question here is - would you trust a service provider to decide what might be harmful to you and your children, with zero input from you as a user?

Additionally, the bill incentivises the use of privacy-invasive age verification processes which come with their own set of problems. This complete disregard of people’s right to privacy is a reflection of the privileged perspectives of those in charge of the drafting of this bill, which fails to acknowledge how actually harmful it would be for certain groups of the population to have their real life identity associated with their online identity.

Our view of the world, and of the internet, is largely different from the one presented by this bill. Now, this categorically does not mean we don’t care about online safety (it is quite literally our bread and butter) - we just fundamentally disagree with the approach taken.

Whilst we sympathise with the government’s desire to show action in this space and to do something about children’s safety (everyone’s safety really), we cannot possibly agree with the methods.

Back in October of 2020 we presented our proposed approach to online safety - ironically also in response to a government proposal, albeit about encryption backdoors. In it, we briefly discussed the dangers of absolute determinations of morality from a single cultural perspective:

As uncomfortable as it may be, one man’s terrorist is another man’s freedom fighter, and different jurisdictions have different laws - and it’s not up to the Matrix.org Foundation to play God and adjudicate.

We now find ourselves reading a piece of legislation that essentially demands these determinations from tech companies. The beauty of the human experience lies with its diversity and when we force technology companies to make calls about what is right or wrong - or what is “likely to have adverse psychological or physical impacts” on children - we end up in a dangerous place of centralising and regulating relative morals. Worst of all, when the consequence of getting it wrong is criminal liability for senior managers what do we think will happen?

Regardless of how omnipresent it is in our daily lives, technology is still not a solution for human problems. Forcing organisations to be judge and jury of human morals for the sake of “free speech” will, ironically, have severe consequences on free speech, as risk profiles will change for fear of liability.

Forcing a “duty of care” responsibility on organisations which operate online will not only drown small and medium sized companies in administrative tasks and costs, it will further accentuate the existing monopolies by Big Tech. Plainly, Big Tech can afford the regulatory burden - small start-ups can’t. Future creators will have their wings clipped from the offset and we might just miss out on new ideas and projects for fear of legal repercussions. This is a threat to the technology sector, particularly those building on emerging technologies like Matrix. In some ways, it is a threat to democracy and some of the freedoms this bill claims to protect.

These are, quite frankly, steps towards an authoritarian dystopia. If Trust & Safety managers start censoring something as natural as a nipple on the off chance it might cause “adverse psychological impacts” on children, whose freedom of expression are we actually protecting here?

More specifically on the issue of content moderation: the impact assessment provided by the government alongside this bill predicts that the additional costs for companies directly related to the bill will be in the billions, over the course of 10 years. The cost for the government? £400k, in every proposed policy option. Our question is - why are these responsibilities being placed on tech companies, when evidently this is a societal problem?

We are not saying it is up to the government to single-handedly end the existence of Child Sexual Abuse and Exploitation (CSAE) or extremist content online. What we are saying is that it takes more than content filtering, risk assessments and (faulty) age verification processes for it to end. More funding for tech literacy organisations and schools, to give children (and parents) the tools to stay safe is the first thing that comes to mind. Further investment in law enforcement cyber units and the judicial system, improving tech companies’ routes for abuse reporting and allowing the actual judges to do the judging seems pretty sensible too. What is absolutely egregious is the degradation of the digital rights of the majority, due to the wrongdoings of a few.

Our goal with this post is not to be dramatic or alarmist. However, we want to add our voices to the countless digital rights campaigners, individuals and organisations that have been raising the alarm since the early days of this bill. Just like with coercive control and abuse, the degradation of our rights does not happen all at once. It is a slippery slope that starts with something as (seemingly) innocuous as mandatory content scanning for CSAE content and ends with authoritarian surveillance infrastructure. It is our duty to put a stop to this before it even begins.

Twitter card image credit from Brazil, which feels all too familiar right now.

The Matrix Space Beta!

17.05.2021 17:50 — Tech Matthew Hodgson
Last update: 17.05.2021 17:35

Hi all,

As many know, over the years we've experimented with how to let users locate and curate sets of users and rooms in Matrix. Back in Nov 2017 we added 'groups' (aka 'communities') as a custom mechanism for this - introducing identifiers beginning with a + symbol to represent sets of rooms and users, like +matrix:matrix.org.

However, it rapidly became obvious that Communities had some major shortcomings. They ended up being an extensive and entirely new API surface (designed around letting you dynamically bridge the membership of a group through to a single source of truth like LDAP) - while in practice groups have enormous overlap with rooms: managing membership, inviting by email, access control, power levels, names, topics, avatars, etc. Meanwhile the custom groups API re-invented the wheel for things like pushing updates to the client (causing a whole suite of problems). So clients and servers alike ended up reimplementing large chunks of similar functionality for both rooms and groups.

And so almost before Communities were born, we started thinking about whether it would make more sense to model them as a special type of room, rather than being their own custom primitive. MSC1215 had the first thoughts on this in 2017, and then a formal proposal emerged at MSC1772 in Jan 2019. We started working on this in earnest at the end of 2020, and christened the new way of handling groups of rooms and users as... Spaces!

Spaces work as follows:

  • You can designate specific rooms as 'spaces', which contain other rooms.
  • You can have a nested hierarchy of spaces.
  • You can rapidly navigate around that hierarchy using the new 'space summary' (aka space-nav) API - MSC2946.
  • Spaces can be shared with other people publicly, or invite-only, or private for your own curation purposes.
  • Rooms can appear in multiple places in the hierarchy.
  • You can have 'secret' spaces where you group your own personal rooms and spaces into an existing hierarchy.

Today, we're ridiculously excited to be launching Space support as a beta in matrix-react-sdk and matrix-android-sdk2 (and thus Element Web/Desktop and Element Android) and Synapse 1.34.0 - so head over to your nearest Element, make sure it's connected to the latest Synapse (and that Synapse has Spaces enabled in its config) and find some Space to explore! #community:matrix.org might be a good start :)

The beta today gives us the bare essentials: and we haven't yet finished space-based access controls such as setting powerlevels in rooms based on space membership (MSC2962) or limiting who can join a room based on their space membership (MSC3083) - but these will be coming asap. We also need to figure out how to implement Flair on top of Spaces rather than Communities.

This is also a bit of a turning point in Matrix's architecture: we are now using rooms more and more as a generic way of modelling new features in Matrix. For instance, rooms could be used as a structured way of storing files (MSC3089); Reputation data (MSC2313) is stored in rooms; Threads can be stored in rooms (MSC2836); Extensible Profiles are proposed as rooms too (MSC1769). As such, this pushes us towards ensuring rooms are as lightweight as possible in Matrix - and that things like sync and changing profile scale independently of the number of rooms you're in. Spaces effectively gives us a way of creating a global decentralised filesystem hierarchy on top of Matrix - grouping the existing rooms of all flavours into an epic multiplayer tree of realtime data. It's like USENET had a baby with the Web!

For lots more info from the Element perspective, head over to the Element blog. Finally, the point of the beta is to gather feedback and fix bugs - so please go wild in Element reporting your first impressions and help us make Spaces as awesome as they deserve to be!

Thanks for flying Matrix into Space;

Matthew & the whole Spaces (and Matrix) team.

Introducing the Pinecone overlay network

06.05.2021 00:00 — Tech Neil Alexander

Since the end of 2019, we have spent quite a bit of time thinking about and exploring different technologies whilst building various demos for P2P Matrix. Our mission for P2P Matrix is to evolve Matrix into a hybrid between today's server-oriented network and a pure P2P network - empowering users to have total autonomy and privacy over their data if they want (by storing it in P2P Matrix, by embedding their server into their Matrix client), while also letting users store their data in serverside nodes if they so desire.

The goal is to protect metadata much better (as users no longer have to depend on a server run by someone else to communicate), as well as drive new features such as account portability, multi-homed accounts, low-bandwidth Matrix and smarter federation transports - and provide support for internet-less mesh communication via Matrix which can also interoperate with the wider network. You can read more about it in our Introducing P2P Matrix blog post from last summer, or watch our FOSDEM 2021 talk where we previewed Pinecone. It's important to note that this has been a small but important long-term project for Matrix, and has been progressing entirely outside our business-as-usual work of improving the core protocol and reference implementations.

As the project has progressed, we've built a variety of prototypes using existing libraries (go-libp2p, js-libp2p and Yggdrasil), demonstrating what an early P2P Matrix might feel like if it were running on a mobile device, in the web browser and so on using such an overlay network. Each of these demos has taught us something new, and so in October 2020 we decided to take this knowledge to build an experimental new overlay network of our own.

Pinecone is designed to provide end-to-end encrypted communications between devices, regardless of how they are connected to one another, in a lightweight and self-arranging fashion. The routing protocol is a hybrid, taking inspiration from Yggdrasil by building a global spanning tree, but rather than forwarding all traffic using the spanning tree topology, we use it as a bootstrap routing mechanism for a line/snake topology, ordered by their ed25519 public keys, which we have affectionately named SNEK (Sequentially Networked Edwards Key) routing.

Nodes seek out their closest keyspace neighbours on the network and paths are built between these pairs of nodes, similar to how a Chord DHT functions, populating the routing tables of intermediate nodes in the process. These paths are then used to forward traffic without having to perform up-front searches, allowing for very fast connection setups between overlay nodes. These paths are resilient to network topology changes and handle node mobility considerably better than any other name-independent routing scheme that we have seen — early results are very promising so far. We have also been experimenting with a combination of the μTP (Micro Transport Protocol) and TLS to provide stateful connection setup, congestion control and end-to-end encryption for all federation traffic carried over the Pinecone network.

Pinecone simulator showing line/snake logical network topology

If Pinecone works out, our intention is to collaborate with the libp2p and IPFS team to incorporate Pinecone routing into libp2p (if they'll have us!) while incorporating their gossipsub routing to improve Matrix federation... and get the best of both worlds :)

Today we're releasing the source code for our current early implementation of Pinecone — you can get it from GitHub right now! It's very experimental still and not very well optimised yet, but it is the foundation of our latest mobile P2P Matrix demos, which support P2P Matrix over both Bluetooth Low Energy mesh networks, multicast DNS discovery within a LAN, and/or by routing through static Pinecone peers on the Internet:

  • Android: https://appdistribution.firebase.dev/i/394600067ea8ba37
  • iOS: https://testflight.apple.com/join/Tgh2MEk6

Building a routing overlay is only the first step in the journey towards P2P Matrix. We will also be looking closely in the coming months at improving the Matrix federation protocol to work well in mixed-connectivity scenarios (rather than the full mesh approach used today) as well as decentralised identities, hybrid deployments with existing homeservers and getting Dendrite (the Matrix homeserver which is embedded into the current P2P demos) more stable and feature-complete.

The long-term plan could look something like this:

Diagram showing possible P2P Matrix stack

Most discussion around P2P Matrix takes place in #p2p:matrix.org, so if you are interested in what's going on, please join us there!

Privacy improvements in Synapse 1.4 and Riot 1.4

27.09.2019 00:00 — Privacy Matthew Hodgson

Hi all,

Back in June we wrote about our plans to tighten up data privacy in Matrix after some areas for improvement were brought to our attention. To quickly recap: the primary concern was that the default config for Riot specifies identity servers and integration managers run by New Vector (the company which the original Matrix team set up to build Riot and fund Matrix dev) - and so folks using a standalone homeserver may end up using external services without realising it. There were some other legitimate issues raised too (e.g. contact information should be obfuscated when checking if your contacts are on Matrix; Riot defaulted to using Google for STUN (firewall detection) if no TURN server had been set up on their server; Synapse defaults to using matrix.org as a key notary server).

We’ve been working away at this fairly solidly over the last few months. Some of the simpler items shipped quickly (e.g. Riot/Web had a stupid bug where it kept incorrectly loading the integration manager; Riot/Android wasn’t clear enough about when contact discovery was happening; Riot/Web wasn’t clear enough about the fact device names are publicly visible; etc) - but other bits have turned out to be incredibly time-consuming to get right.

However, we’re in the process today of releasing Synapse 1.4.0 and Riot/Web 1.4.0 (it’s coincidence the version numbers have lined up!) which resolve the majority of the remaining issues. The main changes are as follows:

  1. Riot no longer automatically uses identity servers by default. Identity servers are only useful when inviting users by email address, or when discovering whether your contacts are on Matrix. Therefore, we now wait until the user tries to perform one of these operations before explaining that they need an identity server to do so, and we prompt them to select one if they want to proceed. This makes it abundantly clear that the user is connecting to an independent service, and why.

  2. Integration Managers and identity servers now have the ability to force users to accept terms of use before using them. This means they can explicitly spell out the data privacy & usage policy of the server as required by GDPR, and it should now be impossible for a user to use these services without realising it. This was particularly fun in the case of identity servers, which previously had no concept of users and so couldn’t track whether users had agreed to their terms & conditions or not… and because homeservers sometimes talk to the identity server on behalf of users rather than the user talking direct, the privacy policy flow gets even hairier. But it’s solved now, and a nice side-effect of this is that users can now explicitly select their Integration Manager in Riot, in case they want to use Dimension or similar rather than the default provided by Modular.

  3. Synapse no longer uses identity servers for verifying registrations or verifying password reset. Originally, Synapse made use of the fact that the Identity Service contains email/msisdn verification logic to handle identity verification in general on behalf of the homeserver. However, in retrospect this was a mistake: why should the entity running your identity server have the right to verify password resets or registration details on your homeserver? So, we have moved this logic into Synapse. This means Synapse 1.4.0 requires new configuration for email/msisdn verification to work - please see the upgrade notes for full details.

  4. Sydent now supports discovering contacts based on hashed identifiers. MSC2134 specifies entirely new IS APIs for discovering contacts using a hash of their identifier rather than directly exposing the raw identifiers being searched for. This is implemented in Riot/iOS and Riot/Android and should be in the next major release; Riot/Web 1.4.0 has it already.

  5. Synapse now warns in its logs if you are using matrix.org as a default trusted key server, in case you wish to use a different server to help discover other servers’ keys.

  6. Synapse now garbage collects redacted messages after N days (7 days by default). (It doesn’t yet garbage collect attachments referenced from redacted messages; we’re still working on that).

  7. Synapse now deletes account access data (IP addresses and User Agent) after N days (28 days by default) of a device being deleted.

  8. Riot warns before falling back to using STUN (and defaults to turn.matrix.org rather than stun.google.com) for firewall discovery (STUN) when placing VoIP calls, and makes it clear that this is an emergency fallback for misconfigured servers which are missing TURN support. (We originally deleted the fallback entirely, but this broke things for too many people, so we’ve kept it but warn instead).

All of this is implemented in Riot/Web 1.4.0 and Synapse 1.4.0. Riot/Web 1.4.0 shipped today (Fri Sept 27th) and we have a release candidate for Synapse 1.4 (1.4.0rc1) today which fully ship on Monday.

For full details please go check out the Riot 1.4.0 and Synapse 1.4.0 blog posts.

Riot/Mobile is following fast behind - most of the above has been implemented and everything should land in the next release. RiotX/Android doesn’t really have any changes to make given it hadn’t yet implemented Identity Service or Integration Manager APIs.

This has involved a surprisingly large amount of spec work; no fewer than 9 new Matrix Spec Changes (MSC) have been required as part of the project. In particular, this results in a massive update to the Identity Service API, which will be released very shortly with the new MSCs. You can see the upcoming changes on the unstable branch and compare with the previous 0.2.1 stable release, as well as checking the detailed MSCs as follows:

This said, there is still some work remaining for us to do here. The main things which haven’t made it into this release are:

  • Preferring to get server keys from the source server rather than the notary server by default (https://github.com/matrix-org/synapse/pull/6110). This almost made it in, but we need to test it more first - until then, your specified notary server will see roughly what servers your servers are trying to talk to. In future this will be mitigated properly by MSC1228 (removing mxids from events).
  • Configurable data retention periods for rooms. We are tantalisingly close with this - https://github.com/matrix-org/synapse/pull/5815 is an implementation that the French Govt deployment is using; we need to port it into mainline Synapse.
  • Authenticating access to the media repository - for now, we still rely on media IDs being almost impossible to guess to protect the data rather than authenticating the user.
  • Deleting items from the media repository - we still need to hook up deletion APIs.
  • Garbage collecting forgotten rooms. If everyone leaves & forgets a room, we should delete it from the DB.
  • Communicating erasure requests over federation

We’ll continue to work on these as part of our ongoing maintenance backlog.

Separately to the data privacy concerns, we’ve had a separate wave of feedback regarding how we handle GDPR Data Subject Access Requests (DSARs). Particularly: whether DSAR responses should contain solely the info your have directly keyed by the requesting Matrix ID - or if we should provide all the data “visible” to that ID (i.e. the history of the conversations they’ve been part of). We went and got professional legal advice on this one, and the conclusion is that we should keep our responses to DSARs as tightly scoped as possible. We updated Matrix.org’s privacy policy and DSAR tools to reflect the new legal input.

Finally, it’s really worth calling out the amount of effort that went into this project. Huge huge thanks to everyone involved (given it’s cut across pretty much every project & subteam we have working on the core of Matrix) who have soldiered through the backlog. We’ve been tracking progress using our feature-dashboard tool which summarises Github issues based on labels & issue lifecycle, and for better or worse it’s ended up being the biggest project board we’ve ever had. You can see the live data here (warning, it takes tens of seconds to spider Github to gather the data) - or, for posterity and ease of reference, I’ve included the current issue list below. The issues which are completed have “done” after them; the ones still in progress say “in progress”, and ones which haven’t started yet have nothing. We split the project into 3 phases - phases 1 and 2 represent the items needed to fully solve the privacy concerns, phase 3 is right now a mix of "nice to have" polish and some more speculative items. At this point we’ve effectively finished phase 1 on Synapse & Riot/Web, and Riot/Mobile is following close behind. We're continuing to work on phase 2, and we’ll work through phase 3 (where appropriate) as part of our general maintenance backlog.

I hope this gives suitable visibility on how we’re considering privacy; after all, Matrix is useless as an open communication protocol if the openness comes at the expense of user privacy. We’ll give another update once the remaining straggling issues are closed out; and meanwhile, now the bulk of the privacy work is out of the way on Riot/Web, we can finally get back to implementing the UI E2E cross-signing verification and improving first time user experience.

Thanks for your patience and understanding while we’ve sorted this stuff out; and thanks once again for flying Matrix :)

In the absence of comments on the current blog, please feel free to discuss over at HN, or alternatively come ask stuff in our AMA over at /r/privacy (starting ~5pm GMT+1 (UK) on Friday Sept 27th).

The Privacy Project Dashboard Of Doom

Publishing the Backend Roadmap

15.02.2019 00:00 — Tech Neil Johnson

Good people,

2019 is a big year for Matrix, in the next month we will have shipped:

  • Matrix spec 1.0 (including the first stable release of the Server to Server Spec)
  • Synapse 1.0
  • Riot 1.0
This is huge in itself, but is really only the beginning, and now we want to grow the ecosystem as quickly as possible. This means landing a mix of new features, enhancing existing ones, some big performance improvements as well as generally making life easier for our regular users, homeserver admins and community developers.

Today we are sharing the Matrix core team's backend roadmap. The idea is that this will make it easier for anyone to understand where the project is going, what we consider to be important, and why.

To see the roadmap in its full glory, take a look here.

What is a roadmap and why is it valuable?

A roadmap is a set of high level projects that the team intend to work on and a rough sense of the relative priority. It is essential to focus on specific goals, which inevitably means consciously not working on other initiatives.

Our roadmap is not a delivery plan - there are explicitly no dates. The reason for this is that we know that other projects will emerge, developers will be needed to support other urgent initiatives, matrix.org use continues to grow exponentially and will require performance tweaking.

So simply, based on what we know now, this is the order we will work on our projects.

Why are we sharing it?

We already share our day to day todo list, and of course our commit history, but it can be difficult for a casual observer to see the bigger picture from such granular data. The purpose of sharing is that we want anyone from the community to understand where our priorities lie.

We are often asked ‘Why are you not working on X, it is really important' where the answer is often ‘We agree that X is really important, but A, B and C are more important and must come first'.

The point of sharing the roadmap is to make that priority trade off more transparent and consumable.

How did we build it?

The core contributors to Synapse and Dendrite are 6 people, of 5 nationalities spread across 3 locations. After shipping the r0 release of the Server to Server spec last month we took some time to step back and have a think about what to do after Synapse 1.0 lands. This meant getting everyone in one place to talk it through.

We also had Ben (benpa) contribute from a community perspective and took input from speaking to so many of you at FOSDEM.

In the end we filled a wall with post-its, each post-it representing a sizeable project. The position of the post-it was significant in that the vertical axis being a sense of how valuable we thought the task would be, and the horizontal axis being a rough guess on how complex we considered it to be.

We found this sort of grid approach to be really helpful in determining relative priority.

After many hours and plenty of blood, sweat and tears we ended up with something we could live with and wrote it up in the shared board.

And this is written in blood right?

Not at all (it's written in board marker). This is simply a way to express our plan of action and we are likely to make changes to it dynamically. However, this means that at any given moment, if someone wants to know what we are working on then the roadmap is the place to go.

But wait I want to know more!

Here is a video of myself and Matthew to talk you through the projects

Interesting, but I have questions ...

Any feedback gratefully received, come and ask questions in #synapse or #dendrite or feel free to ping me direct at @neilj:matrix.org

Porting Synapse to Python 3

21.12.2018 00:00 — Tech Neil Johnson

Matrix's reference homeserver, Synapse, is written in Python and uses the Twisted networking framework to power its bitslinging across the Internet. The Python version used has been strictly Python 2.7, the last supported version of Python 2, but as of this week that changes! Since Twisted and our other upstream dependencies now support the newest version of Python, Python 3, we are now able to finish the jump and port Synapse to use it by default. The port has been done in a backwards compatible way, written in a subset of Python that is usable in both Python 2 and Python 3, meaning your existing Synapse installs still work on Python 2, while preparing us for a Python 3 future.

Why port?

Porting Synapse to Python 3 prepares Synapse for a post-Python 2 world, currently scheduled for 2020. After the 1st of January in 2020, Python 2 will no longer be supported by the core Python developers and no bugfixes (even critical security ones) will be issued. As the security of software depends very much on the runtime and libraries it is running on top of, this means that by then all Python 2 software in use should have moved to Python 3 or other runtimes.

The Python 3 port has benefits other than just preparing for the End of Life of Python 2.7. Successive versions of Python 3 have improved the standard library, provided newer and clearer syntax for asynchronous code, added opt-in static typing to reduce bugs, and contained incremental performance and memory management improvements. These features, once Synapse stops supporting Python 2, can then be fully utilised to make Synapse's codebase clearer and more performant. One bonus that we get immediately, though, is Python 3's memory compaction of Unicode strings. Rather than storing as UCS-2/UTF-16 or UCS-4/UTF-32, it will instead store it in the smallest possible representation giving a 50%-75% memory improvement for strings only containing Latin-1 characters, such as nearly all dictionary keys, hashes, IDs, and a large proportion of messages being processed from English speaking countries. Non-English text will also see a memory improvement, as it can be commonly stored in only two bytes instead of the four in a UCS-4 “wide” Python 2 build.

Editor's note: If you were wondering how this fits in with Dendrite (the next-gen golang homeserver): our plan is to use Synapse as the reference homeserver for all the current work going on with landing a 1.0 release of the Matrix spec: it makes no sense to try to iterate and converge on 1.0 on both Synapse and Dendrite in parallel. In order to prove that the 1.0 spec is indeed fit for purpose we then also need Synapse to exit beta and hit a 1.0 too, hence the investment to get it there. It's worth noting that over the last year we've been plugging away solidly improving Synapse in general (especially given the increasing number of high-profile deployments out there), so we're committed to getting Synapse to a formal production grade release and supporting it in the long term. Meanwhile, Dendrite development is still progressing - currently acting as a place to experiment with more radical blue-sky architectural changes, especially in low-footprint or even clientside homeservers. We expect it to catch up with Synapse once 1.0 is out the door; and meanwhile Synapse is increasingly benefiting from performance work inspired by Dendrite.

When will the port be released?

The port is has been released in a “production ready” form in Synapse 0.34.0, supporting Python 3.5, 3.6, and 3.7. This will work on installations with and without workers.

What's it like in the real world?

Beta testers of the Python 3 port have reported lower memory usage, including lower memory “spikes” and slower memory growth. You can see this demonstrated on matrix.org:

See 10/15, ~20:00 for the Python 3 migration. This is on some of the Synchrotrons on matrix.org.

See ~11/8 for the Python 3 migration. This is on the Synapse master on matrix.org.

We have also noticed some better CPU utilisation:

See 21:30 for the migration of federation reader 1, and 21:55 for the others. The federation reader is a particular pathological case, where the replacement of lists with iterators internally on Python 3 has given us some big boosts.

See 10/15, 4:00.The CPU utilisation has gone down on synchrotron 1 after the Python 3 migration, but not as dramatically as the federation reader. Synchrotron 3 was migrated a few days later.

As some extra data-points, my personal HS consumes about 300MB now at initial start, and grows to approximately 800MB -- under Python 2 the growth would be near-immediate to roughly 1.4GB.

Where to from here?

Python 2 is still a supported platform for running Synapse for the time being. We plan on ending mainstream support on 1st April 2019, where upon Python 3.5+ will be the only officially supported platform. Additionally, we will give notice ahead of time once we are ready to remove Python 2.7 compatibility from the codebase (which will be no sooner than 1st April). Although slightly inconvenient, we hope that this gives our users and integrators adequate time to migrate, whilst giving us the flexibility to use modern Python features and make Synapse a better piece of software to help power the Matrix community.

How can I try it?

The port is compatible with existing homeservers and configurations, so if you install Synapse inside a Python 3 virtualenv, you can run it from there. Of course, this differs based on your installation method, operating system, and what version of Python 3 you wish to use. Full upgrade notes live here but if you're having problems or want to discuss specific packagings of Synapse please come ask in #synapse:matrix.org.

Thanks

Many thanks go to fellow Synapse developers Erik and Rich for code review, as well as community contributors such as notafile and krombel for laying the foundations many months ago allowing this port to happen. Without them, this wouldn't have happened.

Happy Matrixing,

Amber Brown (hawkowl)

Olm 3.0.0 released!

25.10.2018 00:00 — Tech Hubert Chathi

Olm 3.0.0 has been released, which features several big changes. It can be downloaded from https://git.matrix.org/git/olm/. The npm package for JavaScript can be downloaded from https://matrix.org/packages/npm/olm/olm-3.0.0.tgz

Python

The biggest change is the merge of poljar's improved Python bindings. These bindings should be much easier to use for Python programmers, and are used by Zil0's E2E support in the Matrix Python SDK.

Since the binding API has changed, existing Python code will need to be rewritten in order to work with this release.

poljar has also included comprehensive documentation for the new API.

CMake

mujx contributed support for building olm using CMake. This should allow for easier building on different platforms. Currently the library can be built using either make or CMake. In the future, make support may be removed.

JavaScript

The JavaScript bindings now use WebAssembly by default. WebAssembly is much faster than the previous asm.js build, and is supported by recent versions of the most popular browsers. For compatibility with browsers that do not support WebAssembly, the asm.js version is still provided.

Due to adding support for WebAssembly, the API had to be changed slightly. There is now an init function that must be called before the library can be used. This function will return a promise that will resolve once the library is ready to be used. The matrix-js-sdk has not yet been updated to do this, so users of matrix-js-sdk should continue using olm 2.x until it has been updated.

Key backups

The public key API has been updated to support the proposal for server-side key backups. More details on how to use these functions will be published in the future.