This may seem like a minor nit, but I think there is a problem with using the term "unreliable" to describe UDP. The more commonly used term, and IMHO better term, is "best-effort" [1]. UDP makes its best effort to deliver the datagrams, but the datagrams may be dropped anyway. But it does not make UDP inherently unreliable.
IMO "best-effort" is euphemistic and confusing to people outside of this space, even (especially?) to native English speakers. (I would probably have described UDP as "reasonable effort" and TCP as "best effort", if not for the existing terminology.)
I recall my computer networking professor describing this as "Best-effort means never having to say you're sorry". In practice, best-effort does not mean you try your hardest to make sure the message gets from A to B, it means that you made an effort. Router in the path was congested? Link flap leading to blackholing on the order of 50ms before fast reroute kicks in? Oh well, we tried.
Meanwhile, TCP's reliable delivery will retry several times and will present an in-order data stream to the application.
Reliable vs unreliable might be bad terminology, but I don't think best-effort is any better.
My experience with unreliable systems is that they're great something like 95% of the time, and they're great for raw throughput, but there are many cases where that last 5% makes a huge difference.
The question is 'who deals with dropped packages'? In TCP, the answer is: 'the protocol'. In UDP the answer is 'the next layer of abstraction' (eg the app or some library).
You can build a 'reliable' protocol on top of UDP, and still not get TCP.
Eg if you want to transfer a large file that you know up front, then TCP's streaming mechanism doesn't make too much sense. You could use something like UDP to send the whole file from A to B in little chunks once, and at the end B can tell A what (numbered) chunks she's missing.
There's no reason to hold off on sending chunk n+1 of the file, just because chunk n hasn't arrived yet.
> There's no reason to hold off on sending chunk n+1 of the file, just because chunk n hasn't arrived yet.
Congestion control comes to mind -- you don't necessarily know what rate the network supports if you don't have a feedback mechanism to let you know when you're sending too fast. Congestion control is one of those things where sure, you can individually cheat and possibly achieve better performance at everyone else's expense, but if everyone does it, then you'll run into congestive collapse.
> You can build a 'reliable' protocol on top of UDP, and still not get TCP.
I agree -- there are reliable protocols running on top of UDP (e.g. QUIC, SCTP) that do not behave exactly like TCP. You don't need an in-order stream in the described use case of bulk file transfer. You certainly don't need head-of-line blocking.
But there are many details and interactions that you and I wouldn't realize or get right on the first try. I would rather not relearn all of those lessons from the past 50+ years.
> But there are many details and interactions that you and I wouldn't realize or get right on the first try. I would rather not relearn all of those lessons from the past 50+ years.
Oh, the model I had in mind was not that everyone should write their network code from scratch all the time, but rather that everything that's higher level than datagrams should be handled by unprivileged library code instead of privileged kernel level code.
If speed is an issue, modern Linux can do wonders with eBPF and io_uring, I guess? I'm taking my inspiration from the exokernel folks who believed that abstractions have no place in the operating system kernel.
interestingly, the exokernel approach is suited to particular cases. for instance, a single application (regardless of whether it's multiple sessions, diversity of RTT, etc). after all, "get the packets into userspace with as little fuss as possible" would be the right goal there.
the unix model is different: it's basically premised on a minicomputer server which would be running a diverse set of independent services where isolation is desired, and where it makes sense for a privileged entity to provide standardized services. services whose API has been both stable and efficient for more than a couple decades.
I think it's kind of like cloud: outsourcing that makes sense at the lower-scale of hosting alternatives. but once you get to a particular scale, you can and should take everything into your own hands, and can expect to obtain some greater efficiency, agility, autonomy.
> the unix model is different: it's basically premised on a minicomputer server which would be running a diverse set of independent services where isolation is desired, [...]
Exokernels provide isolation. Secure multiplexing is actually the only thing they do.
> and where it makes sense for a privileged entity to provide standardized services. services whose API has been both stable and efficient for more than a couple decades.
Yes, standardisation is great. Libraries can do that standardisation. Why do you need standardisation at the kernel level?
That kind of standardisation is eg what we are doing with libc: memcpy has a stable interface, but how it's implemented depends on the underlying hardware; the kernel does not impose an abstraction.
Is there a popular protocol that uses this scheme today? I've often thought that this would be a superior way to transfer/sync data. and have always wondered why it wasn't common.
The closest I can think of is the old FSP protocol, which never really saw wide use. The client would request each individual chunk by offset, and if a chunk got lost, it could re-request it. But that's not quite the same thing.
SACKs have been in TCP for 25-30 years now (Widely adopted as part of New Reno, although RFC 2018 proposed the TCP option and implementation back in 1996).
That said, the typical reason why TCP doesn't send packet N+1 is because its congestion window is full.
There is a related problem known as head-of-line blocking where the application won't receive packet N+1 from the kernel until packet N has been received, as a consequence of TCP delivering that in-order stream of bytes.
Sorry, I didn't literally mean n+1. What I mean is probably better described by 'head of line blocking'.
Basically, when transmitting a file, in principle you could just keep sending n+k, even if the n-th package hasn't been received or has been dropped. No matter how large k is.
You can take your sweet time fixing the missing packages in the middle, as long as the overall file transfer doesn't get delayed.
The term "best effort delivery" in networking is a weasel term that is no better than "unreliable". It should probably be burned.
"Effort" generally refers to some sort of persistence in the face of difficulty. Dropping a packet upon encountering a resource problem isn't effort, let alone best effort.
The way "best effort" is used in networking is quite at odds with the "best efforts" legal/business term, which denotes something short of a firm commitment, but not outright flaking off.
Separately from the delivery question, the checksums in UDP (and TCP!) also poorly assure integrity when datagrams are delivered. They only somewhat improve on the hardware.
I mean that in the same rhetorical sense of "standing around leaning on your shovel isn't effort, let alone best effort". Of course we all understand that what is effort is digging the ditch.
Anyway, so the question is, if typical IP forwarding is "best effort" ... what is an example of poor effort, and what exhibits it?
I think the issue is that there's a stigma around UDP, largely borne from not using it effectively. I'm not sure this is the right way to address the stigma though.
Ultimately, TCP vs UDP per se is rarely the right question to ask. But that is often the only configuration knob available, or at least the only way to get away from TCP-based protocols and their overhead is to switch over to raw UDP as though it were an application protocol unto itself.
Such naive use of UDP contributes to the stigma. If you send a piece of data only once, there's a nontrivial chance it won't get to its destination. If you never do any verification that the other side is available, misconfiguration or infrastructure changes can lead to all your packets going to ground and the sender being completely unaware. I've seen this happen many times and of course the only solution (considered or even available in a pinch) is to ditch UDP and use TCP because at least the latter "works". You can say "well it's UDP, what did you expect?" but unfortunately while that may have been meant to spur some deeper thought, it often just leads to the person who hears it writing off UDP entirely.
Robust protocol design takes time and effort regardless of transport protocol chosen, but a lot of developers give it short shrift. Lacking care or deeper understanding, they blame UDP and eschew its use even when somebody comes along who does know how to use it effectively.
Both? I've seen UDP and TCP treated as interchangeable. They're not, even though in many cases the APIs involved make it seem that way. Choosing datagrams and unreliable transport vs. streams and reliable transport affects the rest of the protocol design and behavior, whether the designer wants it to or not and whether the implementer considered the consequences or not.
i agree that "unreliable" isn't a good term for the transport. reliability is a property of the system, not of the transport. and you can make an unreliable system with TCP or a reliable one with UDP
but, "best-effort" implies that it's doing some effort to ensure delivery, when it's really dropping any packet that looks funny or is unlucky enough to hit a full buffer
i like "lossy", but this is definitely one of the two hard problems
In a way, TCP is just about as reliable as UDP, but at a different layer: TCP will either forward your entire stream of data in order and without gaps, or it won’t. UDP does the same, but on the per-datagram level.
Reliability is arguably more of a statement about the availability metrics of your underlying network; it doesn’t seem like a great summary for what TCP does. You can’t make an unreliable lower layer reliable with any protocol magic on top; you can just bundle the unreliability differently (e.g. by trading off an unreliability in delivery for an unreliability in timing).
TCP wont always deliver your packets anyway , but it does have a mechanism of timing out if a party believes the other party did not receive something. UDP just means that if one cares for their data to be received, they must verify it themselves.
That's in my opinion what reliable / unrealiable mean in this context.
reliable = it either succeeds or you get an error after some time, unreliable = it may or may not succeed
I concur with the people who think "best effort" is not a good term. But perhaps TCP streams are not reliable enough for TCP to be rightly called a reliable stream protocol. As it turned out, it's not really possible to use TCP without a control channel, message chunking, and similar mechanisms for transmitting arbitrary large files. If it really offered reliable streams that would its primary use case.
the designers of link layers and protocol implemntations, the army of operators who test signals, insteall repeaters, configure routers and hold conferences about how to manage the global routing system would disagree.
best effort implies 'no, we're not going to be climb that curve and get try to get to 100% reliability, because that would actually be counterproductive from an engineering perspective, but we're going to go to pretty substantial lengths to deliver your packet'
There is minimal effort. If the queue has room the packet gets forwarded. If the situation is any more difficult than that the effort ends and the packet is dropped.
This sounds like the problem is the term "best-effort" (hand wavy, what's the measure of effort? What's "the best" effort?).
In the end, best-effort is just saying "unreliable" in a fussier way.
>But it does not make UDP inherently unreliable.
Isn't that exactly what it does make it?
If that's not it, then what woud an actual "inherently unreliable" design for such a protocol be? Calling an RNG to randomly decide whether to send the next packet?
As a fellow Swede: having grown up in northern Sweden, I have both heard of this, and have personally witnessed cheese especially made for this practice in regular stores, but I have never tried it myself.
And in my opinion, this is why `0.1 + 0.2 = 0.30000000000000004` is a bad meme. It cements a very wrong perception about floating point numbers. If we denote a rounding operation as `f64(...)`, this is `f64(f64(0.1) + f64(0.2)) = f64(0.30000000000000004) != f64(0.1 + 0.2)` which can obviously happen. In particular, `f64(0.1) != 0.1` etc. but we happened to choose 0.1 as a representative for `f64(0.1)` for various reasons. Nothing inaccurate, nothing meme-worthy, just implied operations.
Yes, this is the best way to explain it. Your number literals are “snapping to a grid” that is not base-10 and then we choose the shortest base-10 decimal that snaps to the appropriate grid point when we stringify the number.
The other thing that I would mention is that I see some really gnarly workarounds to try to get around this... Just bump up to integers for a second! People have this mistaken idea that the best way to understand these rounding “errors” is that floating point is just unpredictably noisy for everything, and that's not true.
Floating point has an exact representation of all integers up to 2^53 – 1. If you are dealing with dollars and cents that clients are getting billed or whatever, okay, the best thing to do is to have a decimal library. But if you don't have a decimal library and it's just some in-game currency that you don't want to get these gnarly decimals on, 3/10 will always give 0.3. 4/100 will always give 0.04. Just use the fact that the integer arithmetic is exact: multiply by the base, round to nearest integer, do your math, and then divide out the base in the end: and you'll be good.
A reasonable, but not always available, choice is to use integer quantities of the divided quantity. If you're dollars and cents, express things in cents. If you need tenths of a cent, express things in milliDollars. If you need 1/8th dollars, use those. Have a conversion to pretty values when displayed.
Sometimes you really do need to have a pretty good estimate of pi dollars, but often not.
what i haven't figured out is multicurrency. "cents" is fine for dollars but 1/100 doesn't work for all currencies.
do you use a different denominator for each currency or standardize on 1/1e8 or something?
I'd use a different denominator per currency? You've got to keep track of the currency anyway, so have 1 USDCent, or 1 EURCent or 1 BHDFil (1/1000) or 1 GBPPence (1/100) or the historic 1 GBPFarthing (1/960 ??)
If the wikipedia article on Decimalisation[1] is complete and accurate, only Mauritania and Madagascar still have non-decimal currencies.
If you really needed it to be uniform, you could work in 1/1000th worldwide, as long as you didn't need to keep more decimals for other reasons.
And it's very bad UX that when you write "0.1" in the code or feed it to the standard string parser at runtime, what you get back is not actually 0.1. It's effectively silent data corruption. If you want "snapping to a grid" for perf reasons, it should be opt-in, not opt-out.
I wonder how much confusion could have been avoided if compilers/interpreters emitted warnings for inexact float literals. It is bit surpirising pitfall, you generally expect the value of a literal to be obvious, but with floats its almost unpredictable. Similarly functions like strtod/atof could have some flags/return values indicating/preventing inexact conversions. Instead we ended up on this weird situation where values are quietly converted to something that is close to the desired value
> you generally expect the value of a literal to be obvious, but with floats its almost unpredictable
Fractions in positional notations are not exact as a rule. There are some exceptions, but mostly they are not exact. 1/3, 1/6, 1/7, 1/9 cannot be represented by decimals exactly (or they can, but using infinite amount of digits in their representation). There are exceptions of course, for example for decimals you need denominator with no prime factors except 2 and 5. For binary it can be only 2.
I'm not sure it is necessary for the author also post the link to the tweet that referred to the tumblr post, since the tumblr post is where the actual proof is.
I think the grandparent is under the impression that the alleged Twitter version of the post had a non-anonymous author and that the Github blog instead chose to link to a Tumblr blog, where the blog author claims they learned the proof from an anonymous number theory professor, thus laundering out the authorship.
But I suspect the alleged Twitter version was just a link to the Tumblr, for which it's clearly not necessary to attribute the retweeter (especially in the context of the post being "the proof is actually incomplete, not very good, and less beautiful/more complex than the canonical proof").
The tweet is the "secondary source" whereas the Tumbler post is the "primary source" in this context[1]:
> For example, suppose you are reading an article by Brown (2014) that cites information from an article by Snow (1982) that you would like to include in your essay. For the reference list, you will only make a citation for the secondary source (Brown). You do not put in a citation for the primary source (Snow) in the reference list. For the in-text citation, you identify the primary source (Snow) and then write "as cited in" the secondary source (Brown).
Regardless of one's preferred style guide, it is odd not to credit the author whose work that led to one's discovery of the inspiration for one's own work.
> For the reference list, you will only make a citation for the secondary source (Brown). You do not put in a citation for the primary source (Snow) in the reference list. For the in-text citation, you identify the primary source (Snow) and then write "as cited in" the secondary source (Brown).
That's not so you can give credit to Brown for helpfully pointing you toward Snow. It's a requirement that you admit, when you cite Snow, that you never actually read Snow. If you read Brown, find a pointer to Snow, and then read Snow, you don't cite Brown at all.
it's a little more confusing than that, because while he does link to a proof, the proof does not claim to be new, the guy says
"A favourite proof of mine: first demonstrated to me by my professor in number theory. I think its beauty stems from the fact that it requires no knowledge of mathematics above the definition of what it means for a number to be rational, and can be written almost in one line."
According to the article, they built a total of 1574 planes. This number happens to be equal to 2 * 787. The significance of this would be that 787 is also the model number of one of Boeings current generation jetliner, the Boeing 787 Dreamliner. I, as a small-time-wannabe number nerd and small-time-wannabe airplane nerd, reacted positively when I happened to see this connection.
I have no idea if there is any intent behind this specific number of produced planes, but I suspect that there would be people at Boeing who feel the same kind of mild satisfaction as I did when I saw the number of aircraft produced.
After seeing this, I can't be the only one who got curious as to how many emojis there actually are on iOS.
Obviously, a quick google doesn't work right now.
So I did this to try to figure out: emojipedia.org, the site that supposedly breaks google, has a page that appears to show all the available emojis on iOS [1]. On this page, all except the first 21 emojis are displayed in a way that uses lazy loading of the images. These images are contained in <li class="lazyparent"> elements. Assuming that there are no other <li class="lazyparent"> elements on the page, we should get the number of emojis on iOS simply by counting those elements.
You can also use `querySelectorAll` and also pass parent’s selector. Didn’t get to post while I was back at computer, so can’t recall exact selector, but number was same
Those examples are all relatively low-level. Many high-level languages provide a floating point interface on top of such an underlying integer pseudo randomness algorithm. This is the reason why much high-level code use multiplication to get a ranged value.
This post goes into some detail about how the V8 JavaScript engine creates a [0, 1) floating point pseudo random value from its underlying integer pseudo-random algorithm: https://v8.dev/blog/math-random
> This post goes into some detail about how the V8 JavaScript engine creates a [0, 1) floating point pseudo random value from its underlying integer pseudo-random algorithm: https://v8.dev/blog/math-random
You'd think so but it doesn't really say. It's almost entirely about how they make the integer.
Following the commit link shows that they use the method of filling 1.0 with random mantissa bits and then subtracting 1.0
>> The number of random values it can generate is limited to 2^32 as opposed to the 2^52 numbers between 0 and 1 that double precision floating point can represent.
Not really true. That's how many numbers their algorithm returns, which is almost as many evenly-spaced floats there are between 0 and 1. But because it starts with a number between 1 and 2, that method actually wastes a bit, and that number really should be 2^53.
But floating point itself has 1/4 of its values between 0 and 1, so for double precision that's roughly 2^62!
The automated translation of that article turned out a little bizarre. This is the automatically translated version:
> The analysis shows that it is a Swedish remote-controlled craft (Seafox) that the Norwegian Armed Forces use to detect, for example, minors.
Not only does this translation fumble the translation of the word "naval mines", which amusingly becomes "minors", but it also somewhat surprisingly changes the nationality of the military power involved.
The correct translation should be:
> The analysis shows that it is a Swedish remote-controlled craft (Seafox) that the Swedish Armed Forces use to detect, for example, naval mines.
[1] https://en.wikipedia.org/wiki/Best-effort_delivery