Uggghhhh! AI crawling is fast becoming a headache for self-hosted content. Is using a CDN the "lowest effort" solution? Or is there something better/simpler?
Depending on the content and software stack, caching might be a fairly easy option. For instance, Wordpress W3 Total Cache used to be pretty easy to configure and could easily bring a small VPS from 6-10req/sec to 100-200req/sec.
Also some solutions for generating static content sites instead of "dynamic" CMS where they store everything in a DB
If it's new, I'd say the easiest option is start with a content hosting system that has built-in caching (assuming that exists for what you're trying to deploy)
Nah, just add a rate limiter (which any public website should have anyways). Alternatively, add some honeypot URLs to robots.txt, then setup fail2ban to ban any IP accessing those URLs and you'll get rid of 99% of the crawling in half a day.
I gave up after blocking 143,000 unique IPs hitting my personal Forgejo server one day. Rate limiting would have done literally nothing against the traffic patterns I saw.
2 unique IPs or 200,000 shouldn't make a difference, ban the ones that make too many requests automatically and you basically don't have to do anything.
Are people not using fail2ban and similar at all anymore? Used to be standard practice until I guess before people started using PaaS instead and "running web applications" became a different role than "developing web applications".
It makes a difference if there's 143,000 unique IPs and 286,000 requests. I think that's what the parent post is saying (lots of requests but also not very many per IP since there's also lots of IPs)
Even harder with IPv6 considering things like privacy extensions where the IPs intentionally and automatically rotate
Yes, this is correct. I’d get at most 2 hits from an IP, spaced minutes apart.
I went as far as blocking every AS that fetched a tripwire URL, but ended up blocking a huge chunk of the Internet, to the point that I asked myself whether it’d be easier to allowlist IPs, which is a horrid way to run a website.
But I did block IPv6 addresses as /48 networks, figuring that was a reasonable prefixlen for an individual attacker.
Splitting storage from retrieval is a powerful abstraction. You can then build retrieval indexes based on whatever property you desire by indexing it to amortize O(N) over many queries.
Concretely, you could search by metadata (timestamp, geotag, which camera device, etc) or by content (all photos of Joe, or photos with Eiffel tower in the background, or boating at dusk...). For the latter, you just need to process your corpus with a vision language model (VLM) and generate embeddings. Btw, this is not outlandish; there are already photos apps with this such capability if you search a bit online.
Sometimes, the best way to learn about abstruse topics one has a passing curiosity in is to upvote what pops up on HN and hope that some nerd might drop by and comment with a simplified intuitive picture for plebs :-)
These days, some nerds prefer to ask AI to confirm their "precious" intuitions of why schemes might be needed in the first place. To fix the problems with certain basic geometric notions of old timers? They are then so spooked that the AI instantly validates those intuitions without any relevant citations whatsoever that they decide not to comment
But still leave warnings to gung-ho nerds in the form of low-code exercises
That's a theory, but I think it's more likely that the few people in the world who deeply understand schemes are locked in the basement of a mathematics department somewhere, and not on hacker news =P
>
That's a theory, but I think it's more likely that the few people in the world who deeply understand schemes are locked in the basement of a mathematics department somewhere, and not on hacker news =P
I rather think that because of the very low career prospects in research, quite a lot of people who are good in this area rather left research and took some job in finance or at some Silicon Valley company, and thus might actually at least sometimes have a look at what happens on Hacker News. :-)
I think you overestimate how many people exist in the world with a professional interest in algebraic geometry! The vast majority of mathematicians have no idea how to compute with schemes (and there aren't that many of them to begin with).
Even though I am from in a different area of mathematics, I know quite many people who work(ed) in algebraic geometry (and at the university where I graduated there wasn't even an academic chair for (Grothendieck-style) algebraic geometry).
The amount of people I know who would love to learn this material is even many, many magnitudes larger (just to give some arbitrary example: some pretty smart person who studied physics, but (for some reasons) neither had any career prospects in research nor found any fullfilling job, who just out of boredom decided that he would love to get deeply into Grothendieck-style algebraic geometry).
I guess we hang out in different academic circles. I met a single algebraic geometer in my whole academic career. But people are into very different stuff where I come from, which may have biased me (topology, number theory and category theory for the most part, and a lot of relativity/fluid dynamics on the applied side of the department). Based on rough estimates from papers published on arxiv over the last few years, I (very) conservatively estimate there are ~5000 working algebraic geometers in the world right now.
> The amount of people I know who would love to learn this material [...]
I am one of them =) but my point wasn't really about people who want to learn the material (which I assume includes many orders of magnitude more humans) it was about people who already deeply understand it.
It's hard to help GP but I'm gonna try (pls forgive me):
I believe that the masses don't have a deep understanding of Schemes because of enemy action by the sufficiently advanced stupidity (aka loneliness) of the intelligent :)
Their interest is "pro" and they are not a hypothesis
(& I'd NOT bet against that they understand deeper than Sturmfels and his students)
Schemes (like cat theory) have become a sort of religion-- it's sad because Grothendieck himself might not have understood them intuitively.. and it won't be the first time.. Feynman didn't understand Path Integrals, nor Archimedes integration!! BECAUSE they were all loners whose first resort was WRITING LETTERS
Ps: as with Jobs.. I hesitate to call Buzzard a full-time salesman
If you want to hang out in meatspace: do you have a public key?
But that's just culture, and quite easily moldable. Lots of people would also rather gamble watch smut all day, but we decided that it's not the best way to go about life... so we set up a system (school) to manage their learning process, and shepherds them for well over a decade, and then involves them in the economy and in society. Likewise we have cultural mechanisms which try to ensure that people learn essential skills related to nutrition, mobility, relationships, etc.
A lot of this has been eroding in recent years under the banner of convenience, and will likely have pernicious consequences in the coming decades. I posit that letting the insidious patterns broadly drive our approach to computing is similarly dangerous.
I was reading the HN discussion [1] about the Mozilla support fiasco in Japan [2], and was reminded of this Drucker classic about the American -vs- Japanese approaches to decision making.
The way to show you care is by having a meeting of the minds before you shove your changes in their face. The fact that the deployment was done carelessly demonstrates disregard.
I doubt "take them out to dinner" is the right solution in this situation, but any attempt at redressal must understand the above point and acknowledge it publicly.
"Ask for forgiveness rather than permission" is far from universally true, and carries massive cultural baggage. You cannot operate within that framework and expect all humans to cooperate with you.
IMHO the only correct way to measure the effectiveness of decision making is from the quality of executed outcomes. It is somewhat nonsensical to sever decisions from execution, and claim that decisions have been made rapidly if the decision doesn't lend itself to crisp execution. Without that, decisions are merely intentions.
Someone mentioned the latencies for gaming, but also I had a 4K TV as a monitor briefly that had horrible latency for typing, even. Enough of a delay between hitting a key and the terminal printing to throw off my cadence.
Only electronic device I’ve ever returned.
Also they tend to have stronger than necessary backlights. It might be possible to calibrate around this issue, but the thing is designed to be viewed from the other side of a room. You are at the mercy of however low they decided to let it go.
You could probably circumvent this by putting the display into Gaming Mode, which most TVs have. It removes all the extra processing that TVs add to make the image "nicer". These processes add a hell of a lot of latency, which is obviously just fine for watching TV, but horrible for gaming or using as a pc monitor.
It was a while ago (5 years?), so I can’t say for certain, but I’m pretty sure I was aware of game mode at the time and played with the options enough to convince myself that it wasn’t there.
game mode is a scam. it breaks display quality on most TVs. and still doesn't respond as fast as a PC monitor with <1ms latencies.... it might drop itself to 2 or 3 which is still 2x or 3x atleast slower.
you can think 'but thats inhumanly fast, you wont notice it' but in reality, this is _very_ noticeable in games like counter-strike where hand-eye coordination, speed and pinpoint accuracy are key. if you play such games a lot then you will feel it if the latency goes above 1ms.
Most people lack an understanding of displays and therefore what they are quoting and are in fact quoting the vendors claimed pixel response time as the input lag.
It’s gotta be the most commonly mixed up things I’ve seen in the last twenty years as an enthusiast.
well atleast i didn't misunderstand my own lack of understanding :D ... -
the part of feeling the difference in response times, that's true though, but I must say, the experience is a bit dated ^^ i see more high resolution monitors have generally quite slow response times.
<1ms was from CRT times :D which was my main counter-striker days. I do find noticable 'lag' still on TV vs. monitor though but i've only tested on HD (1080p) - own only 1 4k monitor and my own age-induced-latency by now far exceeds my display's latency :D
I still use CRTs :) However more than input lag, for me its motion. Which is driven by panel response times and I just cant stand even the best modern OLEDs for motion unless its 240hz and up.
> I do find noticable 'lag' still on TV vs. monitor though
Yeah you likely will to be honest. Most even average monitors will be single digit, 1-2ms maybe at most. Depending upon TV model, game mode may only get you to low double digits. High end panels should get you to pretty low single digits though, like 4-5ms.
They measure in a particular way that includes half a frame of unavoidable lag. There are reasons to do it that way, but it's not objectively the "right" way to do it.
Rtings basically gives you a number that represents average lag without screen tearing. If you measure at the top of your screen and/or tolerate tearing then the numbers get significantly smaller, and a lot of screens can in fact beat 1ms.
I'm sure there are reasons with regards to games and stuff, but I don't really use this TV for anything but writing code and Slack and Google Meet. Latency doesn't matter that much for just writing code.
I really don't know why it's not more common. If you get a Samsung TV it even has a dedicated "PC Mode".
Lots of us HAVE tried using a TV as a primary monitor, I did for years.
Then I bought a real display and realized oh my god there's a reason they cost so much more.
"Game mode" has no set meaning or standard, and in lots of cases can make things worse. On my TV, it made the display blurry in a way I never even noticed until I fixed it. It's like it was doing N64 style anti-aliasing. I actually had to use a different mode, and that may have had significant latency that I never realized.
Displays are tricky, because it can be hard to notice how good or bad one is without a comparison, which you can't do in the store because they cheat display modes and display content, and nobody is willing to buy six displays and run tests every time they want to buy a new display.
"PC Mode" or "Gaming mode" or whatever is necessary - I can tell any other mode easily just by moving the mouse, the few frames of lag kill me inside. Fortunately all tvs made in this decade should have one.
Depending on the specific TV, small details like text rendering can be god-awful.
A bunch of TVs don't actually support 4:4:4 chroma subsampling, and at 4:2:2 or 4:2:0 text is bordering on unreadable.
And a bunch of OLEDs have weird sub-pixel layouts that break ClearType. This isn't the end of the world, but you end up needing to tweak the OS text rendering to clean up the result.
I have been using a 43 inch TV as a monitor, since last 10 years, currently on a LG.
You get lot of screen-space, as well as you can sit away from desk and still use it. Just increase the zoom.
If you play video games, display latency. Most modern TVs offer a way to reduce display latency, but it usually comes at the cost of various features or some impact to visual quality. Gaming monitors offer much better display latencies without compromising their listed capabilities.
Televisions are also more prone to updates that can break things and often have user hostile 'smart' software.
Still, televisions can make a decent monitor and are definitely cheaper per inch.
high latency on TVs make it bad for games etc. as anyhting thats sensitive on IO timings can feel a bit off. even 5ms compared to 1 or 2ms response times is noticable by a lot in hand-eye coordination across io -> monitor.
It sort of depends on what you perceive as 'high'. Many TVs have a special low-latency "game" display mode. My LG OLED does, and it's a 2021 model. But OLED in general (in a PC monitor as well) is going to have higher latency than IPS for example, regardless of input delay.
I have a MiSTer Laggy thing to measure TV latency. In my bedroom Vizio LCD thing, in Game Mode, is between 18-24ms, a bit more than a frame of latency (assuming 60fps).
I don’t play a lot of fast paced games and I am not good enough at any of them to where a frame of latency would drastically affect my performance in any game, and I don’t think two frames of latency is really noticeable when typing in Vim or something.
> But OLED in general (in a PC monitor as well) is going to have higher latency than IPS for example, regardless of input delay.
I hope you mean lower? An OLED pixel updates roughly instantly while liquid crystals take time to shift, with IPS in particular trading away speed for quality.
In the context of this thread that's a non-issue. Good TVs have been in the ~5ms@120Hz/<10ms@60Hz world for some time now. If you're in the market for a 4K-or-higher display, you won't find much better, even among specialized monitors (as those usually won't be able to drive higher Hz with lower lag with full 4k+ resolution anyway).
IIRC Apple dropped sub pixel antialiasing in Mojave or Sonoma (I hate these names). It makes no sense when Macs are meant to be used with retina class displays.
Usually refresh rate and sometimes feature set. And it’s meant to be viewed from further away. I’m sure someone else could elaborate but that’s the gist.
Simple. I don't know emacs that well :)
I didn't even know emacs had a terminal mode until I looked this up; my main experience with emacs was when I was writing prolog and the IDE was emacs based. I didn't find it as nice to use back then so I never gave it a serious shot.
By comparison nano is everywhere and was super-simple to configure and spruce-up with custom functions, so it just stuck with me.
As for other competitors, when comparing to vim, I find it much simpler to use, and to the surprise of most vim users I speak to, equally powerful (at least for my needs).
> So really, how important is computing the exact gradient using calculus, vs just knowing the general direction to step? Would that be cheaper to calculate than full derivatives?
Yes, absolutely -- a lot of ideas inspired by this have been explored in the field of optimization, and also in machine learning. The very idea of "stochastic" gradient descent using mini-batches basically a cheap (hardware compatible) approximation to the gradient for each step.
Ben Recht has an interesting survey of how various learning algorithms used in reinforcement learning relate with techniques in optimization (and how they each play with the gradient in different ways): https://people.eecs.berkeley.edu/~brecht/l2c-icml2018/ (there's nothing special about RL... as far as optimization is concerned, the concepts work the same even when all the data is given up front rather than generated on-the-fly based on interactions with the environment)
reply