Can the fediverse scale up indefinitely?

Merulox@lemmy.world · edit-2 1 year ago

Can the fediverse scale up indefinitely?

wyzewyz@programming.dev · 1 year ago

I probably misunderstand how the fediverse works, but my worry is that the small instances won’t be able to hold an ever-growing amount of data forever.

Let’s pretend you run a small Lemmy instance (~100 users).

If you federate with a large instance, you (i.e. your instance) will only receive new posts from communities that your users subscribe to, or users that your users follow [1]. These are deduplicated, in the sense that if all 100 of your users subscribe to the same community, you only need to download and store one copy of that community’s posts in your database.

[1] AFAICT. The current implementation of Lemmy seems to handle federation using the activitypub_federation crate. I skimmed the docs of that crate, but they aren’t 100% clear about this.

the posts I’m posting here today might get lost in time because the instances that annex it will have shut down by then?

You have the same problem with any data you put online anywhere: The people currently keeping your stuff online might delete it anytime they decide it’s not worth the trouble to keep it online.

If it’s important to you that certain information stays online, keep a copy on a disk in your house; check back periodically to be sure it’s still online, and if it’s not, you can always use the copy in your house to put it online again somewhere else. If it’s very important to you, keep multiple copies on multiple disks hosted by multiple companies on different continents.

50 years from now on

Predicting what will happen in tech in 50 years is a pretty daunting challenge.

50 years ago, in 1973, all the computers on the ARPAnet (the predecessor of the Internet) could be easily listed on a single piece of paper. The home computer was still years from birth. The Zilog Z80, Intel 8080, Motorola 6500 and the MOS Technology 6502, which would play key roles in early home computers and gaming consoles, were just beginning to enter the market.

Merulox@lemmy.world · 1 year ago

All the answers I got were very useful and informative, but this one is definitely the one that catered the most to my worries.

SolidGrue@lemmy.world · edit-2 1 year ago

Mostly serious answer: the current implementation is not going to scale effectively with growth. The software implementation is still rough around the edges, and the ActivityPub protocol probably needs more knobs to handle bulk data synchronization. Within the service, moderaton is a serious challenge with many unanswered questions.

Likewise, the back end software implementation is monolithic, meaning it’s one software stack that does everything from sign in to subscriptions to synchronization and scheduling. Housekeeping and garbage collection probably isn’t that tight, either. This is mostly speculation as I’ve watched things over the last couple of weeks’ growth.

I believe the data store is based on Postgres RDBMS, which while being robust and scalable is fussy and needs tuning when turning over large amounts of highly unique data.

None of this is an indictment on the devs! Rather the opposite, because the software IS chugging along while experiencing tremendous growth.

I expect over time the back end will devolve into micro services that communicate over a highly scalable, or stream-based messaging bus. Larger instances could probably also benefit from static caching and CDN techniques to keep pages loading quickly even while the back end thrashes.

The structure.if the ecosystem needs to strike a balance between fewer large instances and many-many small instances. In the first scenario, the scaling limit is in the monolithic stack, which introduces I/O bottlenecks and serialization delays (even if massively threaded). In the latter scenario, message state and synchronous distribution become challenging because a full mesh of federations could scale faster than network state tables have room to support. Some middle tier might be needed, and I have no idea what that might even look like.

So to answer your question, can it scale indefinitely? Probably not because we hit scaling limits pretty quickly on a number of dimensions. Nevertheless, smart people.are starting to hang out here, and I expect will take an interest in how it all works. Improvement is inevitable, and I think the early roadblocks will be overcome easily enough

Edit to add: I’m a systems engineer in my day job but I work adjacent to the applications teams. The preceding commentary is just (un-)educated guesswork on my part.

marsokod@lemmy.world · 1 year ago

The world produces 15Mt of beans every year. The average shit post with beans has 700g of beans in it. This means Lemy can scale to around 22 billions shitposts/year. We have some margin.

#shitpost

Bearded_Baguette@lemmy.world · 1 year ago

This math checks out. I ran it through the bean calculator using OpenBeanAI. 32.33% of the simulations show these numbers.

r00ty@kbin.life · 1 year ago

Repeating of course.

WarmSoda@lemm.ee · 1 year ago

Of course

BlameThePeacock@lemmy.ca · 1 year ago

Smaller instances don’t grab everything from every other server, it only grabs data from other servers when their users are subscribed to specific communities, also I suspect it doesn’t grab all historical data automatically (though I don’t know how much it does grab by default)

Right now there’s no migration tool for when instances shut down, but it should be technically possible someone just needs to implement it.

blueshades@lemmy.world · 1 year ago

The Fediverse needs to encourage different instances. It’s the only way it can work. It has the technical framework to do it and for it to be transparent to the enduser but I feel like it’s not there yet.

For example I think users should be strongly encourages to chose regional instances instead of lemmy.world (I know know, ironic coming from me). It should be default and require the user to go out of their way to select a different instance. It should also be concisely explained that your instance doesn’t matter and that you can see any other federated instance. Yes, this is not always true but it doesn’t matter to someone just joining. Let them get here first and then they’ll naturally learn about the intricacies. Don’t scare them away at the gates.

Monkey With A Shell@lemmy.socdojo.com · 1 year ago

I guessy answer is, who cares? Don’t treat a social media account as some immortal time capsule of your life. Keep a photo album, write some diary entries, but don’t rely on any form of social media to be the historical record of your existance. If it’s inportant keep it somewhere you can ensure the preservation.

I’m pretty sure the world will continue long after we’ve forgotten beans and not pooping for X days.

Merulox@lemmy.world · 1 year ago

I needed to be reminded of this, thanks.

Still, Reddit is probably the biggest and most accessible source of information in the world, written out of passion by people, experts, professors, neckbeards… trolls… uni students, researchers,

and I wish Lemmy could also become the archive that Reddit is, but if information has a high likelihood to get lost with time, why bother? It should then really only be treated as a very temporary social media which is… okay, I guess.

NatureBoyFlickRair@lemmy.fmhy.ml · 1 year ago

Everything is temporary. Nothing is permanent. Embrace it and live in the now.

Flemmy@lemmy.world · 1 year ago

It’s weird to think about, but data has a shelf life. Software needs to grow and be pruned regularly, or it dies.

Social media is both - the data dump is useless without an ecosystem of tools around it, and if the data itself stops interacting with the zeitgeist of the parent society, it basically becomes an old journal. It’s interesting to a very specific group of people, and literally no one else wants to see it (aside from a few gems picked out and cleaned up for public consumption)

At any point we could go back to Reddits explosion after the digg migration. We could pull up posts that mirror exactly what’s happening now. It’d be interesting for sure, and there’s days of then-now posts that people could be making…but instead we just have people telling us about their memories of that process.

Why? Because that data is old and stale. You’d have to hunt it down with tools not intended for it, filter out the best of it, fix broken links, and probably put it through a slur filter

Candelestine@lemmy.world · 1 year ago

No, after a sufficient amount of time has passed, we would run out of useable matter and energy in the universe. This theorized end-state of heat death puts a finite cap on the size of the Fediverse.

Constrained to Earth, it’d probably be fine. Though I do see it splintering eventually, with sub-communities existing independently from the main organism.

BlushedPotatoPlayers@terefere.eu · 1 year ago

But would it work with spherical servers in vacuum?

Miqo@lemmy.world · 1 year ago

Time to invent the Dyson Server!