Scarf Gateway is a core service of Scarf (https://scarf.sh) that we've been running in production since 2020. You can think of it as a powerful link shortener that can also sit in front of Docker containers, Python packages, tarballs, etc - any other artifact you distribute can sit behind it. It also emits logs that can be used for robust analytics that you probably aren't getting from your package registry provider today.
It was originally a nginx+lua service that we migrated to Haskell as the requirements became more complex. Now that the code has settled, we've open-sourced it, so you can self host Scarf Gateway or get involved with development!
Scarf | Director of Engineering | Remote (US timezones only)
Scarf builds builds maintainer-friendly tools for sustainable distribution of open-source software. Scarf identifies and connects open source projects with the companies that rely on their software.
If you love open source, love startups, and excel at leading great engineering teams, we'd love to hear from you.
Scarf | Director of Engineering | Remote (US timezones only)
Scarf builds builds maintainer-friendly tools for sustainable distribution of open-source software. Scarf identifies and connects open source projects with the companies that rely on their software.
If you love open source, love startups, and excel at leading great engineering teams, we'd love to hear from you.
Scarf is a startup that builds maintainer-friendly tools for sustainable distribution of open-source software. Scarf identifies and connects you with the companies that rely on your OSS.
We're looking for talented engineers to join our small remote team. If interested, please send a resume to jobs@scarf.sh
Scarf is a startup that builds maintainer-friendly tools for sustainable distribution of open-source software. Scarf identifies and connects you with the companies that rely on your OSS.
We're looking for talented engineers to join our small remote team. If interested, please send a resume to jobs@scarf.sh
> I like this approach slightly better than what scarfjs was doing. I first ran into that with react query a few years ago. It was a chilling effect for me at least and I was glad it changed.
Glad to hear you like this better, we do too. We built Scarf Gateway in a large part due to the response of the react-query community (and a handful of other projects) to scarf-js. Lots of discussion on GitHub and the Reactiflux discord provided good learnings for us: mainly that mechanisms that phone home, especially at unexpected times, were particularly unpopular. We also heard more acceptance of the idea that the registry/host platforms who already have this information could be sharing it with maintainers.
We want to support maintainers with better data in a way that best suits the OSS community and respects privacy. And so we went back to the drawing board, and Scarf Gateway is the result!
Still, I understand you may still have remaining hesitations here anyway with Scarf-powered download links. Are there any specific privacy concerns we can mitigate?
Hi HN, a comment to give a little more backstory here:
At Scarf, we aim to give open source developers more visibility into how their software is being used. As people with experience distributing binaries and artifacts hosted on platforms like GitHub Releases and S3, a repeated struggle was not having any visibility into downloads. Which versions of the software were being downloaded the most? On which platforms? Where in the world? Which companies were downloading?
This year we built Scarf Gateway, which acts as a redirect/analytics layer for any container registry. Supporting other kinds of artifacts was a natural extension, and arbitrary file downloads is perhaps the most general extension we could build!
This still needs to be added to our docs. A `dnt=1` query param in a download URL is interpreted as an end-user opt-out. We plan to add more forms of opting out based on user feedback. We want to ensure it's low-friction to opt out of tracking.
Great suggestion, very appreciated. Global privacy control wasn't on my radar but this looks like what we should do. DNT is considered deprecated, at least according to MDN docs.
I think this is great as long as you respect GDPR. Tracking is not inherently bad. And I had some pain tracking downloads of our OSS project files, thankfully Eclipse Foundation has some tools for gathering anonymous statistics (I think the term "anonymous statistics" will fare better with the HN crowd than "tracking" or "measure"). Added your service to bookmarks for the next time I need such functionality.
However, you seem to have an incomplete understanding of GDPR judging from your homepage. For example, you don't provide a way for people to opt out on your homepage. This may indicate that you are thinking about GDPR in American "PII" terms instead of thinking about "processing purposes" and "personal data" (not necessarily identifiable, such as a 5-star rating for a taxi driver) as intended by GDPR. You can store my home address without my consent if you need it to deliver a book to me. You may not pass my non-anonymized IP address to anyone except your secops (legitimate business need has been explained by EU courts to mean a need to fulfill user's need, not company need, e.g. to show ads).
Further down the thread you also discuss the opt-out mechanisms. Again, this is only legal under GDPR for opting out of the kinds of processing you have a legitimate business need for. Things that require a consent may not be worked around with an opt-out.
Not a lawyer but a person in EU who sent GDPR requests and complaints to company DPOs and regulators. Hope your service grows well!
Fully complying with GDPR is a requirement as we build this out. Our data policies and practices have been thoroughly reviewed by our legal team. If we are doing anything incorrectly with respect to GDPR, it will be promptly addressed.
It turns out that the data we are actually storing about end-user traffic do not meet the criteria that trigger requirements for explicit consent. Scarf also operates a data processor with respect to GDPR, rather than a controller.
Ah, shrewd move! For others reading this: your project using Scarf will bear responsibility for GDPR compliance regarding processing purposes as the controller and Scarf is just a processor like AWS (not that I buy it completely but I am sure smart folks at noyb.eu will look at this when time comes).
In short, using Scarf does not provide personally identifiable information about who is downloading your artifacts because we don't have that data ourselves.
The main way this is achieved is by purging any personally identifiable information from our system, mainly the IP address of a download request. Scarf uses the IP to look up metadata like company affiliation, cloud provider, course grained location, etc, to surface that to you. Once that metadata is looked up, the original IP address is discarded. All information stored long term is fully anonymized.
This is impressive, but seems like a dark pattern to me a la tracking pixels in emails. An annoying use case I could see this used for is targeted spam. Say a company selling a software tool publishes a PDF of industry insights and then reaches out to everyone who's downloaded it. Or they publish an OCI image, and then try to sell everyone who uses it a support package.
Well, Scarf offers free pixel tracking too so you definitely have the correct model for what we do, though sorry to hear you dislike the approach.
Our goal is to help enable OSS developers to financially support their work. Do you think it's still wrong when it's OSS developers trying to sell their services or premium offerings to the companies that already rely on their work?
If so - companies are tracking people all the time at a very granular, personally identifiable level. Why should we hold OSS developers to an even higher standard than what we tolerate from large companies?
> Why should we hold OSS developers to an even higher standard than what we tolerate from large companies?
The problem here is that i DON'T tolerate this from large companies either. I find the pixel tracking thing outrageous and disable images by default in my email client to avoid it.
I understand your argument, I just find it personally strongly disagreeable, and I'm willing to bet poster above did as well.
> The problem here is that i DON'T tolerate this from large companies either. I find the pixel tracking thing outrageous and disable images by default in my email client to avoid it.
If you are already using OSS today and grabbing that software over the internet, you are tolerating it even if you claim otherwise. If you pull something down from GitHub, Microsoft has all the data that we're talking about here.
> I understand your argument, I just find it personally strongly disagreeable
Fair! And I understand yours too. I also think your argument is more idealistic than practical for the current state of the ecosystem, especially considering how many parties already have access to this web traffic data. Maintainers having this data too is a very benign additional party to have access to it. Furthermore, it's a concrete way we can all chip in to help OSS maintainers and make their jobs a little bit easier, short of reaching for your credit card (which we should all be doing too).
You're going to be walking a thin (and difficult) line if you're trying to find open source developers also interested in introducing involuntary tracking to their software. Open Source software, since it's inception, has been about creating respectful software for the commons.
> Do you think it's still wrong when it's OSS developers trying to sell their services or premium offerings to the companies that already rely on their work?
No, but I shouldn't have to worry about that as a user. The onus is on corporations to disclose the software that they use in accordance with their respective licenses, the regular user doesn't deserve to suffer for the incompetence of funded organizations.
> Why should we hold OSS developers to an even higher standard than what we tolerate from large companies?
You don't, they do. That's the point of open source licensing in the first place: defining what you're comfortable with other people using your software for. By choosing an Open Source license, you're assuming one of the most difficult and thankless positions in the world of software. That's how it's intended to be though, because that kind of transparency is imperative when we're distributing free software. You wouldn't poison the rations being donated to the homeless, so why are you comfortable poisoning the CDN of my download? This all seems pretty cut and dried to me.
sigh Time to start dropping Scarf URLs in my hosts file...
This argument conflates licensing of a piece software with the the distribution channel that distributes artifacts of that software. The service being discussed here is purely part of the distribution layer and has no footprint on the artifacts themselves. It's merely a passthrough layer sitting in front of the current stack.
If you are using open source today, you're already hitting servers that have access to all of the same information Scarf sees. Visiting a URL is by definition asking a server on the other side to process your request. That data can be very helpful to all of the great open source maintainers out there, but has historically been difficult or impossible to access. The result will be better informed maintainers, and better OSS for everyone.
Absolutely agree. And that's why we've put so much effort into making sure the system handles all PII as correctly and securely as possible.
End-user privacy does not need to be compromised in order to give OSS maintainers a basic quantitative understanding of how their software is used. This is our best attempt at a solution. We will be continually improving it better however we can.
User agent and other headers can be used to provide more differentiation, but you're correct to point out that limitation (assuming you meant IP not ISP).
No, I meant ISP, although IP could work as a special case - if this isn't recording the user's actual IP but just information derived from it (rough location, residential/commercial/datacenter, whatever), I would expect many addresses under that ISP to have the same recorded details. Granted, CGNAT with the same exact public IP would be even more like that, but if you don't record the actual IP then you probably can't deduplicate close "neighbors".
(I should add that your sibling comment says they're using browser headers, which probably reduces this issue a lot)
Scarf | Senior Software Engineer (Backend/General) | Remote (American Timezones) | Full Time | https://about.scarf.sh
For any functional programming (especially Haskell) fans, this one is for you! You'd be working mainly in Haskell, and our entire system is built with Nix.
Scarf helps open-source maintainers understand how their software is being used and transact directly with their commercial users. We are building state of the art package management and distribution tooling that help OSS developers make data-informed decisions about their projects and get fairly compensated for their valuable work.
It was originally a nginx+lua service that we migrated to Haskell as the requirements became more complex. Now that the code has settled, we've open-sourced it, so you can self host Scarf Gateway or get involved with development!