The root cause here is just that managing any kind of storage service is instantly painful. The property of "not losing data" means that you are sort of required to always be doing something in order to keep it healthy.
I believe this is also changing with instances that now allow you to adjust the ratio of throughput on the NIC that's dedicated to EBS vs. general network traffic (with the intention, I'm sure, that people would want more EBS throughput than the default).
I read somewhere that Hacker News should have been named Startup News, and sometimes interactions like the one upthread reminds me of that. I'm not saying it's wrong - if you're good at something don't do it for free and all that - but it's kinda sad that in-depth discussions on public forums are getting harder and harder to find these days.
Normal conversations by topic enthusiasts usually have fun stuff hidden in their profiles and at times lead to fun rabbit holes where you endlessly learn and somehow, forgot that you were initially browsing HN.
Agree about the public discussion part, one of the reasons why I'm here lately.
Also, why can't someone create Startup News: Where every article reply is an opportunity to be sold a service, SN would take a cut of transactions. /s
These are people already trying to divert the discussion off-site for their benefit. Very few would honestly report any resulting transaction for the cut to be taken from.
[yeah, it did see the sarcasm tag, just clarifying to put off would be entrepreneurs so we aren't inundated by show-hn posts from people vibe-coding the idea over the next few days!]
I saw the follow-up responses complaining about you soliciting, but I've got no problem with you offering to solve a problem and being remunerated for it.
However, my lab is a brokedick operation with barely enough cash reserves to pay staff salaries. We sincerely do not have the budget to buy new software, especially after the NIH funding cuts.
> As a secondary, I wonder if it's possible to actively use a SQLite interface against a database file on S3, assuming a single server/instance is the actual active connection.
You could achieve this today using one of the many adapters that turn S3 into a file system, without needing to wait for any SQLite buy in.
S3 Mountpoint is exposing a POSIX-like file system abstraction for you to use with your file-based applications. Foyer appears to be a library that helps your application coordinate access to S3 (with a cache), for applications that don't need files and you can change the code for.
Storage Gateway is an appliance that you connect multiple instances to, this appears to be a library that you use in your program to coordinate caching for that process.
These are, effectively, different use cases. You want to use (and pay for) Express One Zone in situations in which you need the same object reused from multiple instances repeatedly, while it looks like this on-disk or in-memory cache is for when you may want the same file repeatedly used from the same instance.
Is it the same instance ? Rising wave (and similar tools )are designed to run in production on a lot of distributed compute nodes for processing data , serving/streaming queries and running control panes .
Even for any single query it will likely run on multiple nodes with distributed workers gathering and processing data from storage layer, that is whole idea behind MapReduce after all.
Yes, definitely. S3 has a time to first byte of 50-150ms (depending on how lucky you are). If you're serving from memory that goes to ~0, and if you're serving from disk, that goes to 0.2-1ms.
It will depend on your needs though, since some use cases won't want to trade off the scalability of S3's ability to serve arbitrary amounts of throughput.
In that case you run the proxy service load balanced to get desired throughput or run a sidecar/process in each compute instance where data is needed .
You are limited anyway by the network capacity of the instance you are fetching the data from .
Woah buddy, I worked with Andy for years and this is not my experience. Moving a large product like S3 around is really, really difficult, and I've always thought highly of Andy's ability to: (a) predict where he thought the product should go, (b) come up with novel ways of getting there, and (c) trimming down the product to get something in the hands of customers.
Also, did you create this account for the express purpose of bashing Andy? That's not cool.
Some quick questions that came up in the last post, that I wanted to go ahead and address:
How are you different than existing products like S3 Mountpoint, S3FS, ZeroFS, ObjectiveFS, JuiceFS, and cunoFS?
Archil is designed to be a general-purpose storage system to replace networked block storage like EBS or Hyperdisk with something that scales infinitely, can be shared across multiple instances, and synchronizes to S3. Existing adapters that turn S3 into a file system are either not POSIX-compliant (such as Mountpoint for S3, S3FS, or GoofyFS), do not write data to the S3 bucket in its native format (such as JuiceFS, ObjectiveFS – preventing use of that data directly from S3), or are not designed for a fully-managed one-click set up (such as cunoFS). We have massive respect for folks who build these tools, and we’re excited that the data market is large enough for all of us to see success by picking different tradeoffs.
What regions can I launch an Archil disk in?
We’re live in 3 regions in AWS (us-east-1, us-west-2, and eu-west-1) and 1 region in GCP (us-central1). Today, we’re also able to deploy directly into on-premises environments and smaller GPU clouds. Reach out if you’re interested in an on-premises deployment (hleath [at] archil.com).
Can I mount Archil from a Kubernetes cluster?
Yes! We have a CSI driver that you can use to get ReadWriteOnce and ReadWriteMany volumes into your Kubernetes cluster.
What performance benchmarks can you share?
We generally don’t publish specific performance benchmarks, since they are easy to over-index on and often don’t reflect how real-world applications run on a storage system. In general, Archil disks provide ~1ms latency for hot data and can, by default, scale up to 10 GiB/s and tens of thousands of IOPS. Contact me at hleath [at] archil.com, if you have needs that exceed these numbers.
What happens if your caching layer goes down before a write is synchronized to S3?
Our caching layer is, itself, highly-durable (~5 9s). This means that once a write is accepted into our layer, there are no individual components (such as an instance or an AZ) failure which would cause us to lose data.
What are you planning next for Archil?
By moving away from NFS and using our new, custom protocol, we have a great foundation for the performance work that we’re looking to accomplish in the next 6 months. In the short-term, we plan to launch: one-click Lustre-like scale-out performance (run hundreds of GiB/s of throughput and millions of IOPS without provisioning), the ability to synchronize data from non-object storage sources (such as HuggingFace), and the ability to use multiple data sources on a single disk.
How can I learn more about how the new protocol works?
We’re planning on publishing a bunch more on the protocol in the coming weeks, stay tuned!
reply