Could be that I'm just really out of touch, but I need at least one paragraph saying "What is this?" (and ideally also, "Who is the target audience?" - or "Why would I be interested in reading this?")
I'm with you on this one. I had to Google all of this for a few minutes. This is my best interpretation below. I hope others will come in and correct my silly mistakes. (Disclaimer: I'm not a programmer and English is my 2nd language):
"So you know how people use BitTorrent to distribute big-budget Hollywood movies with occasional mediocre storylines? What if you could distribute your own data on a similar P2P network, but view it directly in a web browser (among other HTTP clients)? You suddenly turn your seed box into a "web server". Plus, you can stream videos on it too. So your home computer is not just a "web server", it can act as your own "Netflix" server to publish your home movies. And you could do it in a secure fashion. You would need a few protocols (eg IPFS), concepts (eg Merkle Trees), and some freely available software to make it practical, easy, and secure. These tutorials will show you how to get all of those things working together to publish & stream to the world."
Thanks for the recommendation. Positively great talk and you are right it inspires a lot of usecases. Speaking as someone who had heard about Merkle trees but never looked into the details of what they were about, this talk was absolutely worth watching. Regarding IPFS there was a question at the end about the difference between IPFS and the stuff he talked about so it's briefly talked about by the speaker then.
Most of the seedbox providers already support a good subset of this.
You can view your files in a browser, you can stream with Plex. You also have full ssh and ftp access, but those are not reallt the points being made here.
Just stop reading if those words doens't mean anything to you.
I don't think this is in the spirit of your typical Hacker News reader. HN readers are naturally curious, but with the volume of what comes across, it can be really hard to decipher if something would be of interest or value just based on the title or page 1.
For me, it's just not possible to read very deeply on everything that seems interesting, and it would be helpful to get a summary sentence or two for posts that are non-trivial.
How about instead "If those words don't mean anything to you, go read their wikipedia entries before continuing"? Not every article on IPFS needs to have its own separate introduction about what IPFS is.
This looks really interesting and I look forward to reading it but IPFS is not the only player in the decentralized web game. One only needs to start typing "IPFS vs" in Google.
I agree, it seems disingenuous to describe IPFS as the only way to implement a decentralized web. For example in our group we've explored the idea of essentially a decentralized Merkle tree where the document id's are resolved via P2P gossip, like in early versions of Gnutella.
I agree. IPFS itself has taken a very tacky Silicon-Valley-startup approach to marketing: "IPFS is the Distributed Web". IPFS is distributed and could be used as a web platform. But it's not "the" anything. For me, it's a really big turnoff. Most people involved in the decentralized web movement are very non-commercial. IPFS always seems to have a "shilling for VC funding" taste.
"The Decentralized Web" is the WWW system Tim Berners-Lee invented and that we're all using right now. It relies on the DNS and HTTP protocols. This new IPFS-based decentralized web is interesting but it's not even a measurable percentage of web traffic today. Far from being "The".
What we need more than anything is to actually realize the beautiful dream TBL had for the WWW. HN is one great example but there should be a million more.
No. The web as it works right now is naturally centralised, for a simple reason: the bandwidth costs on a server is proportional to the size of its audience. This is why we need YouTube to begin with, instead of millions of people posting their videos on their personal page.
There are other factors of course: firewalls, asymmetric bandwidth, security issues, technical ignorance… But the client-server model does play a significant role.
If you want a truly decentralised web, you need to modify a couple things: get rid of HTTP, and use a P2P protocol instead. Generalise IPv6 and get rid of NAT. Give people symmetric bandwidth. Make secure software that is simple to implement, simple to use, and hard to misuse (it's not that hard dammit, just look at qmail).
>Make secure software that is simple to implement, simple to use, and hard to misuse (it's not that hard dammit, just look at qmail).
I made a previous comment about this being a hard problem[1]. If competent ultra techno geeks can get hacked, the problem looks to be unsolvable for the general homeowner to safely run a p2p server.
Many recurring security+usability problems that nobody has solved in the general case. Examples... Heartbleed SSL bug that's undetected for 2 years before being fixed, social engineering, homeowner misconfiguration, p2p server data not being backed up, software updates as an attack vector for decreasing security instead of hardening security, etc.
If my grandmother asked me, "A friend said I can share cooking recipes if I install a p2p food wiki server, what do you think?" I would immediately say "No! Don't install it. I will find you a _centralized_ web recipes forum for you to log in to!"
I ran qmail on my home server in the 1990s to personally control my SMTP needs which should theoretically make me the biggest cheerleader for a "p2p food wiki server". Instead, it informs my position to stop my family members from installing p2p software. Think about why I would do that.
The "symmetric bandwidth" isn't the only underlying problem.
>That problem is never worse with P2P protocols.
Everything has tradeoffs -- including p2p. Otherwise, you have an incomplete picture that doesn't consider pros and cons. P2p is worse for latency, worse for analyzing a unified network landscape to identify and filter out hostile actors, worse for non-techie usability, worse for instantaneous propagation of software updates to fix zero-day exploits, worse for homeowner costs, etc. There isn't a "set & forget" p2p appliance you can build that solves all of that. BitTorrent gets around the negatives of p2p because the value of the data users transfer (pirated Star Wars movies or Adobe software etc) overrides the hassles.
> I made a previous comment about this being a hard problem[1].
That comment doesn't address the same problem. I'm fully aware that current software makes it virtually impossible to conveniently and securely host stuff ourselves right now. I am also fully aware that a general purpose computer can never be idiot proof.
What I was saying is, we should write special purpose software dedicated to a few chosen forms of hosting (mail, web, some P2P stuff for big files…). The software can be simple, secure, and easy to use. It just doesn't exist for the most part.
Also, one problem at a time. Wikis are great, but we're talking about allowing untrusted write access to the system, which is possibly the hardest problem of all. Let's start by single author publishing and build from there.
> This is why we need YouTube to begin with, instead of millions of people posting their videos on their personal page.
People can post videos to their own hosted pages already. We "need" YouTube because 1) people generally don't want to host anything, and 2) it provides connectedness (search, related videos, subscribers, etc).
P2P makes the first issue worse. The second issue is a problem of data federation, not client/server systems.
> People can post videos to their own hosted pages already.
No they cannot. Here's an example: what if make a nice tutorial, put it on my web site and submit it here and r/programming? If the thing is well done, people will download the video, and the sheer amount of requests may be enough to render my site unresponsive, simply because I don't have the bandwidth.
So If I ever make a video, there's a good chance I host it to YouTube. Despite my reluctance to feed Alphabet. Even though I already operate a server.
> 1) people generally don't want to host anything
People want to publish stuff —that much is obvious. They want to avoid hassles if possible. And they rarely think about what centralised hosting entails.
That doesn't amount to "don't want to host anything". Granted, hosting stuff yourself is a major hassle these days. It doesn't have to be, though.
You can easily get more bandwidth though, it's not that it is somehow forbidden. You just don't want to pay for it. That is what youtube gives you: 'unlimited' video hosting bandwidth (and improved discoverability) in return for the opportunity to play their ads before your video. The bandwidth costs also don't magically disappear with IPFS, it is just a convenient way to make other people pay for it.
In order of increasing independence you could choose to:
- Host it on youtube, with the benefits and disadvantages you already mentioned.
- Host it on a public S3 bucket and link to it from a HTML5 video player on your web page. You pay for only the storage and bandwidth used.
- Keep the video on your own server, but pay a CDN to cache it for you. Actual bandwidth use of you server is now very low again.
- Host it on a (big) server in a colocation center. You would typically buy bandwidth in advance though, so you want to be able to roughly predict how much you need.
- Contract with a fiber laying company to lay a dedicated fiber line from the nearest backbone to your home. (Alternatively, a large microwave connection) Buy several 100Gbit NICS to go with it and negotiate peering agreements with the major networks.
Most people don't want the hassle and cost of the latter options, but they are very possible and very legal. In your example with the programming video, if a ton of people want to see it then somewhere along the chain a lot of bandwidth is going to be used and it is _not_ free. All you do is shift who pays for it.
>> People can post videos to their own hosted pages already.
> No they cannot. Here's an example: what if make a nice tutorial, put it on my web site and submit it here and r/programming? If the thing is well done, people will download the video, and the sheer amount of requests may be enough to render my site unresponsive, simply because I don't have the bandwidth.
He was already stating he didn't think hosting videos on their own page was the best solution for most audiences, in fact, nearly every audience today. Why are you arguing anyway?
I'm not sure what you're getting at. The problem is only magnified with the web: there's only one seed. If a site is down, it's down, and you can only hope Google cache or the web archive got a copy.
That is only valid when all nodes have an equal chance of being down. Big websites have entire teams dedicated to keeping their site online, so even though it's centralized, youtube is actually a very stable website. A torrent with one seeding peer (or an IPFS object where only one node has it pinned) can potentially disappear if a single machine has a disk failure.
P2P is not like the cloud. It's someone else's computer, but you don't know nor care what the exact machine you are connecting to because they are mostly untrusted.
I just made an analogy, didn't say it was the same thing as the cloud.
My point was that in a P2P architecture you must be a server, everyone else must be a server. And being a server is costly. There must be incentives for someone to be a server, and that's not easy. I actually don't remember a single P2P protocol or framework that has solved that problem yet. Perhaps Bitcoin -- for miners only, not for normal nodes.
It appears BitTorrent solved the problem well enough. There's already an incentive to share, because nodes who don't are eventually ignored by the others (if I recall correctly). As for the operating costs… they don't seem very high unless you're operating a tracker. I mean, it's just a matter of running a BitTorrent client, and not even all the time (unless you want to reliably seed something).
The real problem there is the asymmetry of the bandwidth. If we had as much upload as we had download, we wouldn't have any reason not to shoot for a 1:1 ratio.
The traditional WWW has shown its weaknesses with websites that exist at the boundaries of legality like Sci-Hub and the Pirate Bay, or that probe the edges of free speech like Stormfront or forums that support Islamic terrorism.
Fact is unless you're a Tier 1 ISP yourself, you're going to be beholden to an ISP to host your content. Your domain name will be beholden to a registrar. Your TLS certificate will be beholden to a certificate authority. You're going to be subject to every one of these organizations' terms of service, and history has shown with the examples above that they will exercise that authority to pull the plug on you if it suits them (especially if they're facing a mob of public opinion or the state puts a gun to their head).
Agreed. As cool as IPFS is, I only clicked on it because I thought it was going to explain how to scale my Tim Berners-Lee WWW server in a decentralized way (without relying on some big monolithic company). Has anyone seen a primer like that?
As for IPFS, I’d love to see them succeed, but as long as you have to know about merkle trees, hashing, etc for it, it’s not going to be any more mainstream than things like PGP or IRC.
Finally, they claim IPFS is good for archival. To me, that means it will be as good as putting a few hundred paper books in a few hundred different libraries around the world. With that in mind, how does censorship work with IPFS? If the government wants to rewrite history by deleting my archive, how do the mirrors avoid being tracked down, and told to selectively delete the undesirable blobs?
I think we ought to be distinguishing between "decentralized" and "distributed". The original web as we used to know it was decentralized. IPFS is distributed.
In social media: [Mastodon][0] is decentralized. [Patchwork][1] is distributed.
>""The Decentralized Web" is the WWW system Tim Berners-Lee invented and that we're all using right now."
I believe the context of "The Decentralized Web" here is meant to contrast the increasing centralization of content from FB, Google, Medium, login walls pay-walls etc.
The term "decentralized" is used, because TBL's Web evolved into an effectively very centralized system, at the levels of both logical architecture and data ownership.
Sure your laser printer can serve a webpage. But nobody cares! They don't care to the extent that your printer-page will most likely not even be accessible, given the default settings on your or your ISP's routers.
In practice, most of your - and everyone else's - web traffic goes through the same few services. It doesn't matter that each of those services is made from a thousand servers each - logically, they form a single unit, and the web has mostly a star topology. This leads to ridiculously stupid amounts of waste at the endpoints - if you and me are sitting in the same room and both want to watch the same funny cat video, we both download gigabytes off youtube, even though after you downloaded the video I should be perfectly able to stream it off your machine via LAN. We use CDNs these days, which are literally top-down dictatorship style attempt at forcing some minimum distribution into this centralized system.
The web is centralized. Your comparison with writing a story would be more accurate like this: it's like someone writing a story about "modern slavery" that's really about wage slavery. Yes, the social phenomenon of actual slavery has pretty much disappeared, but we've replaced it with something similar with similar issues, only one level of abstraction higher.
So out of the one sentence that I wrote you decided to ignore the part where I wrote that decentralization was referring to centralization of content? Another word for centralization is consolidation.
You might want to look up the term "walled garden":
I didn’t see the word “content” on first reading. I never have encountered Google or FB created content (other than open source code), and I also don’t use their news aggregators (or other services), so (with the exception of YouTube), I don’t think of them as being in the content game at all. Anyway, I’ve been happily ignoring their walled gardens for decades, and using the decentralized web for news, etc.
Certainly, they’re in the infrastructure + surveillance business, and (especially for Google) that’s concerning.
Also, “Decentralized” is a technical term. Google runs a large, decentralized infrastructure, based on web technologies.
“Consolidation” is usually used as a business term, and is orthogonal to centralizing operations. For instance, many companies have outsourced payroll to one centralized company. That does not mean their industries are consolidating.
When people intentionally use words wrong, it is bound to lead to confusion.
How so? I see how "the" DNS is centralized in ICANN's list of root servers, but from a technical standpoint isn't the use of that list just a convention?
By this definition of "centralized", nothing is centralized. For DNS to have any meaning--particularly in the way it is used by HTML--requires there to be a person whose job is to run a server that maps "name a human enjoys using" to "name that a computer can connect to" in a way that is stable over time. To the extent to which DNS "the protocol" (as opposed to DNS "the concept") is incapable of enforcing that everyone uses the same servers (as no protocol can force that; again: with this definition, nothing is centralized), we have been building out cryptographic mitigations into other protocols (such as TLS and X.509) that act as a form of checksum against DNS failing to be used to access the one shared reality. Yes: a user can opt to use a totally different "shadow Internet" by using different DNS root servers and different CA root servers, but that is true of any protocol.
I'm not talking about just anchoring a hierarchy at an alternative root. What I mean is that as far as I know, nothing (DNSSEC notwithstanding) really prevents breaking away from the strict hierarchical model altogether and doing something like a system with web-of-trust or filtered by heuristics. From there it's possible to think about how to build a genuine consensus beyond "ICANN says these are the root servers; who am I to argue?". For TLS (DANE notwithstanding), I very well might misunderstand the situation but I thought it only mattered that the client, server, and CA agree on the server's name, not that they agree on a particular delegation of authority for assigning that name.
How hard to censor is IPFS? It would be quite disappointing if we managed to switch everyone over to IPFS, and in the end, after all that effort (and the compromises we'd have to make compared to simply using centralized content), governments would remain quite effective at censoring IPFS content.
Could ISPs throttle or block IPFS content, for instance? The developers should start assuming some extremely aggressive environments, like China, and then go from there, because honestly, we don't know if 20 years from now a lot more governments would see China's censorship as a role model to follow. We're already seeing multiple countries in Europe considering it to stop "extremist content", the IPSs in Canada are now banding together to block pirated content, and the US is probably not going to be too far behind after the repeal of the net neutrality rules. And those are just the "democratic" countries. It's obviously much worse for Middle Eastern or African countries already.
So I just hope the IPFS developers will always try to develop the platform from an extreme resilience point of view.
A dev for Ethereum's Swarm, a similar project, wrote the following comparing with IPFS:
"Swarm has a very strong anti-censorship stance. It incentivizes content agnostic collective storage (block propagation/distribution scheme). Implements plausible deniability with implausible accountability through a combination of obfuscation and double masking (not currently done). IPFS believes that wider adoption warrants compromising on censorship by providing tools for blacklisting, source-filtering though using these is entirely voluntary."
From what I gathered by reading the whitepaper mass censorship is extremely hard to accomplish.
>Could ISPs throttle or block IPFS content, for instance?
They wouldn't know what was IPFS and what was regular encrypted TCP, the only way I can tell IPFS traffic apart in wireshark is the port number. Once ISPs start blockings random encrypted connections people will get mad when their games or other applications stop working.
If we are talking about bad actors in the IPFS network, I reccommend you read their whitepaper. They layout a clever karma based system for incentivizing nodes to help others and punishing nodes that misbehave.
> They wouldn't know what was IPFS and what was regular encrypted TCP
The extreme conclusion of state-sponsored censorship is not a black list or DPI. It is a white list. It is easy to restrict traffic to a small set of addresses that are rigorously monitored for any ability to proxy non-conforming traffic. Everything else would get dropped on the floor.
IPFS and the like sit too high in the OSI layers to combat state sponsored censorship in the long-run.
It makes me wonder what the target market is. If this ever got big, it would just trigger the next step in the arms race it is incapable of sustaining
I’d think it would support grafting in chunks of the merkle dag via usb drives, etc. Otherwise, I don’t understand the claims about disconnected operation and breaking the dependency on the internet backbone.
I don't have any experience with IPFS so I just downloaded the client and ran `ipfs daemon`. It's been using 500-1500Kbps down and 200-500Kbps up for the last 15min or so. It also has 330+ open connections. Is that expected? It seems a bit aggressive for idling. No message on the console indicating what exactly it's doing either.
By default every node acts as DHT server. If you have easily reachable node this is fairly typical load. You may opt-out from serving DHT requests by adding --routing=dhtclient parameter to the daemon.
Decentralization is frequently sold as a means against censorship: if we use a decentralized system such as IPFS, we don't need to have a DNS hierarchy to serve content, so it is no longer viable to block a particular domain.
But as long as we have ISPs and a common communications architecture, if we start using a content-addressable system, doesn't that help censorship? As a censor, I jump from having to censor all domains that may serve one particular document (which is difficult, as we can see with pirate bay), to just having to force ISPs to block urls with the hash of the document that I want to censor.
So we go from having to jump among domains, to having to jump among content hashes, which seems much less practical, isn't it?
In my mind, until we have some sort of mesh network with efficient cache systems, the decentralization topic seems (to me) that is providing answers to the wrong questions.
I think by default IPFS also doesn't even replicate content. So you don't even have to block hash lookups in the whole network but just take down the one host that currently has the only replica.
Pretty sure it used to be that you have to deliberately pin content that you want to share on your node. Else nodes that accessed it will throw it out of their cache if they don't need it anymore.
Maybe they changed the behavior in the meantime though, but IPFS didn't permanently replicate uploaded content in the past without some deliberate user action.
Yes, you have to pin to keep it, but you download it and begin serving it automatically every time you load something. You can continue serving that for a while after -- specially if you're not downloading many other things after.
The canonical way of getting content on IPFS isn't through some HTTP gateway, but by running a node yourself. In that case there's no way for an ISP to block anything.
One thing I've never understood is how IPFS would serve dynamic content. Like imagine building amazon.com on IPFS how would that work? Every single page served would have Different hash, because the content is always changing.
One of them is us (https://github.com/amark/gun), David reached out the other week, now we're trying to find time to discuss with Juan how to best integrate.
Thanks! GUN is already a CRDT (it is a generalizable CRDT on graphs), which is why IPFS is chatting with us. Our CRDT has an emergent property of composability though, which lets other specialized CRDTs be implemented on top in just a few lines of code (take this example, of a counter CRDT: https://github.com/amark/gun/wiki/snippets-(v0.3.x)#counter ).
Here's a pretty cool talk on the subject as well by Andre Staltz. It's nothing groundbreaking, but just simple facts he puts on the table to show what the current situation is.
Thanks for sharing this talk - it inspired me to study and get into these new developments more; now I'm ready to read in depth the book Decentralized Web Primer.
The biggest advantage of centralization is the ability to delete something from the normie web and have it deleted.
Granted, nowadays you have things like the internet archive, but I don't see normal people going along with IPFS for dynamic content like social media as long as you cannot delete things.
Yeah, one of the problems with content-addressable systems like IPFS is that they don't support the "right to be forgotten". And I've heard a lot of big supporters of content-addressable systems arguing that's a good thing, because links never break. But I don't like a world where I can't take down my own blog post[0], only unlink it from my home page.
[0]: Yes, I'm aware anyone can host a copy of my deleted page. But it will be served on a different URL, and it takes extra work to do, unlike IPFS or similar.
Not sure but didn't get any information about how the data is stored between the nodes. Does each node keep full state and synchronize when connected to the network(similar to blockchain), or is the data split between available nodes? Could someone clarify the storage part?
Well, my personal project as a voracious reader and critical thinker does needs censorship resistance on behalf of a diversity of authors who wish to publish their thinking.
You can still consider learning about it and supporting it if you dislike the wastefulness/inefficiency of present-day web, or hate how fragile and ephemeral it is.
How would the inefficiency of the modern web be alleviated by replicating the same data on more nodes? That _increases_ wastefulness, not decreases it.
Replication is cheap. When the data travels over the wire, it's being constantly "replicated" at every hop (data doesn't really "travel", it's just being copied over and deleted afterwards). When you send data halfway around the planet and back, you're doing a lot of copying. IPFS model makes it so that after you do that, other people around you don't have to do the same long roundtrip to get the same file.