Skip to content

The Backfilling Issue of ActivityPub

Updated: at 11:18 PM

In the Fediverse the main service, Mastodon, has an issue which was filed as a bug back in 2016. Basically it doesn’t show users all the replies on a thread, it will only show the replies to the thread coming from users it knows about. Also, when viewing profiles of users you don’t follow their profile, as shown on your instance, will only contain that users basic information along with their pinned posts. To see anything more you literally have to open that users profile on their instance.

Table of contents

Open Table of contents

Problem description

A contact of mine asked one of the leading ActivityPub developers/spokesperson the following question: -“Is the issue of instances not seeing certain replies, AKA “back-filling”, an ActivityPub thing or specifically something Mastodon has implemented? Just curious.”

As this is a topic that has bothered me for quite some time I (unsurprisingly) have opinions, but instead of making a quite (well, REALLY) long reply I figured I might as well type up my thoughts here instead, and then perhaps link to it. That way I can also revise my thoughts, or refer back to them if I wish to.

My response (which started life as a regular reply before I changed my mind):

The question will appear to be asked strangely if we establish a few things first:

This could be seen as a hack, albeit one that really make the user experience a lot better.

So, when Mastodon does not do lazy loading it therefore isn’t “implemented” per se. You can’t implement a not-developed non-function. Also, they have implemented some fixes for lack of backfilling. What they haven’t done is to prioritize fixing a bad user experience by adding extra functionality to work around a much-reported and much discussed problem.

It could have ended here…but here we go

Also, another issue here is that this mainly happens, or is noticed, to new and/or small instances. It doesn’t affect people on large instances to the same degree, apart from, you know, the fact that they don’t get to see replies to their posts if those replies are made by people on instances you don’t follow. This usually means small instances.

As we all know, small instances are not the priority of Mastodon. The other way around, the Mastodon leadership appear to prefer if everyone centralises on, and if everyone are on the same instance, or at least stick to a few large ones, this problem almost goes away.

I say almost, as there still are users on unknown (usually small) instances out there that struggle to understand why no one is acting/re-acting/replying on their posts and replies - when those replies in fact haven’t even been seen by anyone else in the conversation thread, even though it looks to the replyer like everyone should see their contribution to the discussion. In short, they start to feel ignored (not a very social behaviour from a social media service eh?).

Therefore, if you see a great thread, you almost always have to open the original post (often in a new tab) to see all the replies in that thread…at least the ones the original thread knows about. Otherwise you’ll only see a sub-set/selection of the replies in the thread, coming from users on instances you already federate with.

This is what it means when there is no backfilling. The scenario above is an illustration of what happens when backfilling isn’t there.

Are there drawbacks with backfilling?

A concern some raise about backfilling, often to try to trace-back a viable explanation for the reason of this strange lack of functionality (as this is not how other social media works) is that backfilling possibly is adding to DDos-like functionality of the instances, by hitting the server(s) multiple times, when updating with the most accurate and new content.

However, requests still happen, albeit manually or via lazy-loading hacks, as the users themselves manually have to open up the same posts multiple times on different instances. Also, the DDos argument falls rather flat if we are talking about conversation threads where the conversation participants follow each other…to specifically (lazy-) load a handful of missing replies for the few users that doesn’t follow all of the other conversation participant would be a lot less taxing for the involved instances than it is to serve everything to everyone.

In the services where back-filling works better they usually have added a couple of tables to the database, that keep track of both separate users and other instances, and they store this data and update it (in an unknown to me frequency). This means that all user relevant data doesn’t have to be fetched fully all the time. We’ll get to why this can help out…

What are other services doing?

In the *oma-set of services (i.e. Pleroma/Akkoma) they have made “extensions” to the Mastodon API, which among other things keep track of conversation threads and posts in those threads, in a slightly different way compared to Mastodon.

They can see “ancestors” and “descendants” in a conversation thread (which in turn can be a sub-thread off the main thread). This is needed to be able to show the timeline as a nested tree structure, without tossing all the replies in to one big mess. This is in addition to the already existing “in-reply-to” key from ActivityPub, which I’m guessing the ancestor/descendant functionality is based on.

Those things combined also means that “gaps” in the conversation gets highlighted for the *-oma’s. If they run in to an ancestor/descendant post-ID which they don’t have, they run over to the relevant instance to pick up the post with that ID, along with the user details, unless already stored (see above)…but this is also why my little single-user instance, where I follow 3-400 people actually keeps track of 360 000 users and thousands of known instances by now.

It might sound like a lot of data, but in the grand scheme of things 360k lines in a database table isn’t excessively large, especially not if it contributes to lowered network usage and a much improved user experience. (when it comes to data storage for my single-user instances, my Akkoma instance isn’t even on my radar as a problem yet, compared to the others)

Do note I say “works better” in one of the paragraphs above: no service (that I have tested) does backfilling perfectly, as it still is reliant on known information, and make no mistake: we are talking about attempts to fix a bad user experience here, which isn’t 100% successfull. It is very difficult to act on what you don’t know. This is exactly what Mastodon does though. It does nothing. What the other services mentioned above do do better (than Mastodon) is to acknowledge that they quite likely have a knowledge gap, and they at least try to fix it.

So…what can you do?

Well, firstly, if you have your account on one of the large Mastodon instances you don’t have to do anything, and also this is probably not a problem (apart from you not seeing all the replies to your posts, but who cares, right?). Yay you!

However, if you are adamant on running a small Mastodon instance (I’m not judging…much…) there are other people out there who have created software that will help in fixing this problem. The catch is you have to be an instance admin yourself, and you have to know your way around running code on your instance. If this sounds like something you’d like to explore, go check out FediFetcher, which is the one I’ve heard the most about (though there are others too).

Thirdly, and as previously mentioned, you can use apps to interface with your instance, and those apps will help you fix this. Mona app (iOS/macOS) and Mammoth (iOS) both do this. If you are on Android you might want to check out Fedilab (Android)

Tagging along on the apps track is a browser addon/plugin called Substitoot which aims to show up-to-date information from posts on remote instances. This only works for Mastodon (they say…I haven’t tried it).

Lastly, you can do what I’ve done and not use Mastodon as your Fediverse service. This is obviously no guarantee that you will find/use a service that does backfilling well, but I love how Akkoma handles showing conversation threads and doing background fetches of content, both in replies and of profiles. At times it feels like magic (with a 1-2 second delay) when content dynamically appear right where it should be. I have recently heard GoToSocial does backfilling quite well too. The very nature of how Friendica handles threads, which technically isn’t actually backfilling, but still recursively fetching parents of any given post means that conversation threads usually feel rather complete (they also show conversations as a tree structure). Calckey say they do “limited” backfilling too, but I don’t know what state that project is in so I will not be linking to it here.

Further reading on the topic

DISCLAIMER: I am an IT-architect (both on Enterprise/Solution level) in my day job, but I am not a Fediverse developer. There might well be things about this whole problem/challenge where I lack knowledge. I have asked around, but the answers have been few and far between. It could be that I have asked the wrong people. It could be that I haven't asked the right questions. Ironically it can also be that the people I've asked simply don't see my question, due to the issue as described above. The above text is however how I currently understand it. My view of this might change if I get better/more/other information.

I whipped this page together in the time it took me to write it (and I type incredibly fast) and dig out my bookmarks, so I might well have missed something.

Have I missed something? Am I unfair? Do you have opinions?
Feel free to send me a message.