Do you really need a database for this? On a unix system, you should be able to:...

MontyCarloHall · 2025-10-24T12:29:30 1761308970

How would you implement things like version history or shareable URLs to files without a database?

Another issue would be permissions: if I wanted to restrict access to a file to a subset of users, I’d have to make a group for that subset. Linux supports a maximum of 65536 groups, which could quickly be exhausted for a nontrivial number of users.

Wicher · 2025-10-24T13:08:04 1761311284

As for the permissions, using ACLs would work better here. Then you don't need a separate group for every grouping.

MontyCarloHall · 2025-10-24T13:40:14 1761313214

TIL about ACLs! I think that would nicely solve the group permission issue.

technothrasher · 2025-10-24T14:31:28 1761316288

The final project for my senior year filesystems class thirty years ago was to implement ACLs on top of a SunOS 4 filesystem. That was a fun project.

johnisgood · 2025-10-24T18:10:44 1761329444

Write up? Code? :D

thebeardisred · 2025-10-24T19:48:05 1761335285

Then let me also introduce you to extended attributes, aka xattrs. That's how the data for SELinux is stored.

westurner · 2025-10-25T03:06:50 1761361610

There is no support for writing multiple xattrs in one transaction.

There is no support for writing multiple xattrs and file contents in one transaction.

Journaled filesystems that immediately flush xattrs to the journal do have atomic writes of single xattrs; so you'd need to stuff all data in one xattr value and serialize/deserialize (with e.g JSON, or potentially Arrow IPC with Feather ~mmap'd from xattrs (edit: but getxattr() doesn't support mmap. And xattr storage limits: EXT4: 4K, XFS: 64k, BTRFS: 16K)

Atomicity (database systems) https://en.wikipedia.org/wiki/Atomicity_(database_systems)

skydhash · 2025-10-24T12:42:52 1761309772

Backup files the way Emacs, Vim,... do it: Consistent scheme for naming the copies. As for sharable URLs, they could be links.

The file system is already a database.

edweis · 2025-10-24T13:01:30 1761310890

Ok this product will be for project with less than 65k users.

For naming, just name the directory the same way on your file system.

Shareable urls can be a hash of the path with some kind of hmac to prevent scraping.

Yes if you move a file, you can create a symlink to preserve it.

conception · 2025-10-24T12:36:58 1761309418

Encode paths by algorithm/encryption?

MontyCarloHall · 2025-10-24T12:46:59 1761310019

This wouldn’t be robust to moving/renaming files. It also would preclude features like having an expiration date for the URL.

conception · 2025-10-25T14:07:19 1761401239

Well sure there’s a bevy of features you’re missing out on, but it would work. Object store and file metadata solves both of those though feels like cheating.

edweis · 2025-10-24T13:01:53 1761310913

Use sym link in that case to keep the redirect.

ajross · 2025-10-24T13:05:45 1761311145

> How would you implement things like version history

Filesystem or LVM snapshots immediately come to mind

> or shareable URLs to files without a database?

Uh... is the path to the file not already an URL? URLs are literally an abstraction of a filesystem hierarchy already.

QuantumNomad_ · 2025-10-24T13:52:18 1761313938

> Filesystem or LVM snapshots immediately come to mind

I use ZFS snapshots and like them a lot for many reasons. But I don’t have any way to quickly see individual versions of a file without having to wade through a lot of snapshots where the file is the same because snapshots are at filesystem level (or more specifically in ZFS, at “dataset” level which is somewhat like a partition).

And also, because I snapshot at set intervals, there might be a version of a file that I wanted to go back to but which I don’t have a snapshot of at that exact moment. So I only have history of what the file was a bit earlier or a bit later than some specific moment.

I used to have snapshots automatically trigger every 2 minutes and snapshot clean up automatically trigger hourly, daily, weekly and monthly. In that setup it was fairly high chance that if I make some mistake with an edit to a file I also had a version of it that kept the edits from right before as long as I discover the mistake right away.

These days I snapshot automatically a couple of times per day and cleanup every few months with a few keystrokes. Mainly because at the moment the files I store on the servers don’t need that fine-grained snapshots.

Anyway, the point is that even if you snapshot frequently it’s not going to be particularly ergonomic to find the version you want. So maybe the “Google Drive” UI would also have to check each revision to see if they were actually modified and only show those that were. And even then it might not be the greatest experience.

vablings · 2025-10-24T19:25:04 1761333904

If you are on windows with a Samba share hooked up to zfs you can actually use the "previous versions" in file explorer for a given folder and your snapshots will show up :) there are some guides online on setting it up

ramses0 · 2025-10-24T12:44:49 1761309889

Take a look at "cockpit", because if there were, that's where it "should" be.

https://cockpit-project.org/applications

--

    With no command line use needed, you can:

    Navigate the entire filesystem,
    Create, delete, and rename files,
    Edit file contents,
    Edit file ownership and permissions,
    Create symbolic links to files and directories,
    Reorganize files through cut, copy, and paste,
    Upload files by dragging and dropping,
    Download files and directories.

motorest · 2025-10-24T12:07:42 1761307662

> Do you really need a database for this?

I have no idea how this project was designed, but a) it's expectable that disk operations can and should be cached, b) syncing file shares across multiple nodes can easily involve storing metadata.

For either case, once you realize you need to persist data then you'd be hard pressed to justify not using a database.

benrutter · 2025-10-24T12:05:32 1761307532

I don't know of one- have thought this before but with python and fsspec. Having a google drive style interface that can run on local files, or any filesystem of your choice (ssh, s3 etc) would be really great.

XorNot · 2025-10-24T12:23:54 1761308634

I'm unironically convinced that a basic Samba share with Active Directory ACLs is actually probably the best possible storage system...but the UI for managing permissions sucks, and most people don't have enough access to set it up the way they want.

Like broadly, for all configuration Hashicorp Vault makes you do, you can achieve a much more useful set of permissions with a Samba fileshare and ACLs (certainly it makes it easy to grant targeted access to specific resources - and with IIS and Kerberos you even have an HTTP API).

nodesocket · 2025-10-24T12:12:44 1761307964

Perhaps they are using MongoDB GridFS instead of storing files on disk.

WesolyKubeczek · 2025-10-24T12:59:27 1761310767

I need to remind that the time when a service's tenant — be it a file, email, whatever else — automatically meant there was an OS user account for that user, has also been decades ago.

GiorgioG · 2025-10-24T11:27:34 1761305254

You expose SAMBA shares outside your home network?

edweis · 2025-10-24T11:55:48 1761306948

I do, password-protected of course. It is the only "native" way I found to get server files access to my iPhone without downloading a third party app (via Files).

vlovich123 · 2025-10-24T12:18:04 1761308284

I really hope you lock it down to something like Tailscale so that you have a private area network and your Samba share isn’t open to the entire world.

Samba is a complicated piece of software built around protocols from the 90s. It’s designed around the old idea of physical network security where it’s isolated on a LAN and has a long long history of serious critical security vulnerabilities (eg here’s an RCE from this month https://cybersecuritynews.com/critical-samba-rce-vulnerabili...).

Steltek · 2025-10-24T13:28:01 1761312481

It seems like every network filesystem is irredeemably terrible. SMB and NFS the stuff of security nightmares, chatty performance issues, and awkward user id mapping. WebDAV is a joke. SSHFS is slow. You can get really crazy with CephFS or GlusterFS, and for all that complexity, you don't get much farther way from SMB/NFS issues with those either.

My solution: Share nothing and use rsync.

vlovich123 · 2025-10-24T19:38:18 1761334698

Well one problem is that filesystem in general is a terrible abstraction both in terms of usability and in terms of not fitting well with how you design network applications.

I’d say Dropbox et all is closer to a good design but their backend is insanely crazy optimized to make it work and proprietary. There’s an added challenge that everything these days is behind a NAT so you usually end up needing to have a central rendezvous server where nodes can find each other.

Since you’re looking at rsync where you want something closer to Dropbox, I’d say look at syncthing. It’s designed in a way to make personal file sharing secure.

dns_snek · 2025-10-24T12:54:04 1761310444

I think you should figure out how to quit while you're ahead. I wouldn't expose Samba to most of the devices on my LAN, never mind the internet.

operon · 2025-10-24T12:45:44 1761309944

Search for wannacry. You may rethink your setup.

pas · 2025-10-24T11:29:30 1761305370

... well, it makes sense to be able to do a "join" with the `users` and `documents` collections, use the full expressive range of an aggregation pipeline (and it's easy to add additional indices to MongoDB collections, and have transactions, and even add replication - not easy with a generic filesystem)

put all kinds of versioned metadata on docs without coming up with strange encodings, and even though POSIX (and NodeJS) offers a lot of FS related features it probably makes sense to keep things reeeeally simple

and it's easy to hack on this even on Windows

jedimastert · 2025-10-24T11:42:43 1761306163

An SCP or FTP client maybe?

edweis · 2025-10-24T11:59:39 1761307179

Definity. Though SAMBA supports authentication natively. With SCP and sFTP you'll need another admin server to create users.

skvmb · 2025-10-24T17:14:59 1761326099

With SAMBA you just get boring old authentication, but with SCP you need to file a Form-72B with Site Command, ensure all new users pass a Class-3 memetic hazard screening, and then hope that the account doesn't escape containment and start replicating across subnets.

Sure, it's more overhead, but you can't put a price on preventing your NAS from developing sentience.

dangus · 2025-10-24T11:54:11 1761306851

Can you name a single Google Drive clone that doesn’t use a database?

Would love to see your source code for your take on this product.

thekid314 · 2025-10-24T12:26:28 1761308788

The Synology Drive version mirrors the filesystem, though I’m sure it has a database for sharing metadata. Is that what they mean?

dangus · 2025-10-25T13:15:25 1761398125

I would say that basically all these software options use a database for things like preferences and user management.

Using a database isn’t some kind of heavy-handed horrendous thing depending on the implementation (e.g., as long as it leaves your content files alone).

aborsy · 2025-10-24T12:45:37 1761309937

Nextcloud too.

There is a database in most if not all useful cases, but there could also be the actual files separately.