First, we take some of the safety into our own hands. While this approach won't ...

burntsushi · on March 26, 2024

> you’re just laundering the unsafe pointer arithmetic behind array indexing

Perhaps true in a very narrow sense, but you could say the same thing about all of Rust. "It's just laundering unsafe stuff behind a safe interface." And indeed, the ability to encapsulate unsafe internals inside a safe interface is one of the primary selling points of the language. It is also one of the key characteristics that differentiate it from languages that do not currently have this ability, such as C and C++. Whether you think this is an actual advantage or not is I suppose up to you, but I certainly think it is. And I think your use of the word "just" is papering over a lot of stuff.

For a more concrete code-level comparison with C, I did the leg work to translate a C program to a number of different Rust programs by varying some constraints. One of those Rust programs does indeed use indices instead of pointers. The README talks about the trade offs. See: https://github.com/BurntSushi/rsc-regexp/

Chabsff · on March 26, 2024

> languages that do not currently have this ability

That seems to be a very common position, and one that's super weird to me. C and particularly C++ absolutely have that ability with library support if you know what you are doing.

The only material difference, from my point of view, is that the default behavior of the language is different.

I will fully grant that the path of least resistance being dangerous is a huge issue in C/C++, and one that Rust addresses, but extending that all the way to saying that the language lacks the ability is really excessive.

tialaramex · on March 26, 2024

There's a big cultural problem and there are several big technical problems.

Rust has a safety culture, and C++ does not. In Rust's safety culture it was obvious that std::mem::unintialized (an unsafe function) should be deprecated because it's more dangerous than it appears, it's actually hard to use it correctly. That's why today we have the MaybeUninit type. In C++ it was apparently equally obvious that std::span, a brand new type in C++ 20, should not have a safe index operation.

Technically the safe/ unsafe distinction being at the language level makes it hard to fake. You can say your C++ only uses your safe abstractions, but the language itself doesn't care, so without inspecting every part of it to check you're never more than one slip away from catastrophe.

Most importantly in this context, at the language level Rust is committed to this safety distinction. If you write code where Rust's compiler can't see why it's OK, the compiler rejects your program. C++ requires that a conforming compiler must instead accept programs unless it can show why they're wrong. These are two possible ways to cut the Gordion knot of Rice's Theorem, but they have very different consequences.

burntsushi · on March 26, 2024

You can't encapsulate safety in C or C++. There's no `unsafe` keyword like Rust (or like Modula 3). If you have to say "if you know what you're doing," then you haven't encapsulated anything. There really is a categorical difference here. It's not excessive at all. It's the entire point.

Now if I were to say something like, "Rust's safety means that you can never have UB anywhere ever and CVEs will never happen for anything if you use Rust." Then yes, that's excessive. But to say that Rust can encapsulate `unsafe` and C and C++ cannot? I don't see how that's excessive. It's describing one of the most obvious differences between the programming languages.

You can restrict yourself to particular subsets of C (I'm thinking about MISRA) or C++, but these usually come with even more significant trade offs than Rust. And I'm not aware of any such subset that provides the ability to encapsulate safety in a way that lets folks not using that subset benefit from it in a way that is impossible to misuse (as a matter of an API guarantee).

pjmlp · on March 27, 2024

Only to add some historical context, ESPOL/NEWP for Burroughs B5000 in 1961 were one of the first systems programming languages with unsafe, many others followed upon that.

Burroughs B5000 had an additional feature for executables using unsafe code that we only have nowadays on managed runtimes like Java and CLR, binaries with unsafe code were tainted and required someone with admin access to enable them for execution.

Regarding C and C++, Visual Studio, Clion and clang tidy are the best we have in terms of tooling for the general public supporting the Core Guidelines (including lifetime checks), and they are still relatively basic in what they can actually validate.

burntsushi · on March 27, 2024

AIUI, Modula-3 provided an ability to actually encapsulate unsafety, in that the concept was elevated to the level of interfaces. Did any language prior to Modula-3 have that capability?

I think that's fundamentally different---although related---to just having an `unsafe` keyword. To take something I know well, Go has an `unsafe` package that acts as a sort of unsafe keyword. If we ignore data races, you can say that a Go program can't violate memory safety if there is no use of the `unsafe` package.

The problem though is that you can't really build new `unsafe` abstractions. You can't write a function that might cause UB on some inputs in a way that requires the caller to write `unsafe`. (You can do this by convention of course, e.g., by putting `Unsafe` in the name of the function.)

In Rust, `unsafe` doesn't just give you the ability to, e.g., dereference raw pointers. It also is required in order to call other `unsafe` functions. You get the benefit of composition so that you can build arbitrary abstractions around `unsafe` with the compiler's support.

My understanding is that Modula-3 supported this style of encapsulation (which is what I was talking about in this thread). What languages prior to Modula-3 supported it, or was Modula-3 the first?

pjmlp · on March 27, 2024

Starting with that 1961 example, ESPOL/NEWP.

Since Unisys still sells Burroughs, nowadays ClearPath MCP, you can get the latest NEWP manual here, section 8.

https://public.support.unisys.com/framework/publicterms.aspx...

Followed by Mesa/Cedar (CHECKED, TRUSTED, UNCHECKED), Modula-2 (IMPORT SYSTEM), the languages of Oberon linage (which follow up on the IMPORT SYSTEM approach), Ada (using Unchecked),....

In the languages that use the IMPORT SYSTEM approach, the compiler can mark the module as unsafe, and anything that might depend on it.

Some the Modula-3 folks worked previously on Cedar at Xerox, by the way.

Mesa - http://www.bitsavers.org/pdf/xerox/mesa/5.0_1979/documentati...

Cedar - http://www.bitsavers.org/pdf/xerox/parc/cedar/Cedar_7.0/09_C...

burntsushi · on March 27, 2024

Very interesting. Thank you.

kelnos · on March 27, 2024

> C and particularly C++ absolutely have that ability with library support if you know what you are doing.

I won't speak to C++, as it's a very different language now since the last time I used it. I've been writing C for more than 20 years, and I still make mistakes. And there's nothing keeping me from accidentally doing something unsafe outside my unsafe abstraction, aside from my own perfection at never making mistakes (yeah, right).

Rust requires you to be explicit about the unsafe things you do. And, realistically, even when I'm building a safe interface on top of necessarily-unsafe code, the unsafe portions aren't even that large compared to the entirety of the abstraction. That makes things much easier to audit, and the compiler tells me which sections of code I need to pay more attention to.

To me, this is lacking the ability. "If you know what you are doing" is a laughable constraint. Even people who theoretically do (and I suspect programming ability is a lot like people's self-reported skill at driving a car) still make mistakes sometimes.

cmeacham98 · on March 26, 2024

I think you're missing the point - abusing a vector in this way is not a safe interface and is instead a good way to introduce several memory safety bugs (for example, UAFs) that the normal rust memory model explicitly prevents.

adwn · on March 26, 2024

> abusing a vector in this way is not a safe interface and is instead a good way to introduce several memory safety bugs (for example, UAFs)

No, as long as you don't use the unsafe keyword, an out-of-bounds vector access won't lead to use-after-free or other memory safety bugs.

Arnavion · on March 26, 2024

Out-of-bounds access is not required for the pseudo-UAF we're talking about here. Deleting a node in the middle of a linked list will leave a "hole" in the backing Vec. You cannot shift the next elements down to fill the hole because that will invalidate all handles to them. If the backing Vec holds the nodes directly, as TFA's implementation does, then there is no way to mark the hole as a hole. So any bug where a different node's handle accidentally ends up accessing this hole instead will lead to that code observing a "freed" node.

One workaround is to make the backing Vec hold Option of Node instead so that deleting a node can set the hole to None, in which case the bug I described above has the opportunity to unwrap() and panic instead of silent UAF. Though you'll also need additional tracking for those holes so that you can fill them with new nodes later, at which point you'd be better off using a proper freelist / slab instead of a Vec anyway (as TFA also mentions).

kelnos · on March 27, 2024

Well, we're not talking about "pseudo-UAF", we're talking about actual-UAF and actual-memory-safety.

You use scare quotes around "freed" for a reason: the data has not actually been freed.

The bug you're talking about is a logic error. It could be a bad bug, depending on circumstances, but there's no memory safety issue here.

Arnavion · on March 27, 2024

>You use scare quotes around "freed" for a reason: the data has not actually been freed.

Who said it hasn't? I would assume such a node to have been given to `std::ptr::drop_in_place`. Not doing that would be a leak until the list as a whole was dropped.

cmeacham98 · on March 26, 2024

I don't mean a literal UAF, but a "use array index after free" because you're using indexes (which only have bounds checking) as heap pointers.

Rust's borrow checker doesn't account for when you re-implement parts of memory management as array indexes.

im3w1l · on March 26, 2024

Safe rust is a turing complete language. That implies that it is possible to build an emulator for any other programming language in safe rust. That emulated language and programs running on it can have memory-safety issues, even if the emulator itself cannot be corrupted.

When writing such an emulator the natural way to set up the memory is to use an array, and the natural way to implement pointers is indices into that array. In other words, this pattern is part-ways there to creating an emulator with internal safety issues.

However guaranteeing that any corruption will be contained to the array is certainly a lot better than nothing.

Chabsff · on March 26, 2024

Rust's safety mechanisms don't exist just to prevent bugs from escalating into security issues, they exist to prevent the whole class of bugs related to reference handling from being present in the first place. That's supposed to mean programs that work more consistently. Handwaving techniques that lead to programs panicking as "Well at least it doesn't become a security concern" is missing the forest for the trees.

potatochup · on March 26, 2024

This is generally true, but afaik there are still some (fairly complex) ways to write memory unsafe code in safe rust https://faultlore.com/blah/everyone-poops/

mandarax8 · on March 26, 2024

How is reusing a freed index not a UAF? If I roll my own allocator I can still get UAFs even though the memory accessed is not yet free'd.

kelnos · on March 27, 2024

Because that's not what "UAF" means. Also not what "freed" means.

To have a UAF, there has to be memory that is actually freed, and you have to attempt to access that memory. No memory is freed here (in the OP's implementation). Even if it was, at worst you'd get a panic for trying to access past the end of the Vec.

None of that is a UAF or a memory safety issue. It's just a logic bug.

amichal · on March 26, 2024

Nope but "handle confusion" safety perhpas?

kelnos · on March 27, 2024

No, that's incorrect. While yes, it's true that you can point two different nodes at the same array index, or mismanage your array indices in a variety of ways, that is a logic error. It will indeed make your program behave incorrectly (and depending on what it's doing, that may have security implications), but there is no memory safety issue. No one is using anything after freeing it; if you try to access an index that is past the end of the Vec, it will panic. Panicking, while undesirable, is memory-safe.

You may think that's a difference without distinction, but "memory safety" and "use after free" have specific definitions, and this ain't them.

burntsushi · on March 26, 2024

You later clarified that by "memory safety bugs" you don't actually mean "memory safety bugs," but rather "use array index after free." But that isn't a memory safety bug. (It might be a denial of service bug or a logic bug, but because of bounds checks, it isn't a memory safety bug.) So no, I'm afraid I haven't missed the point at all.

Could you please read the link I shared? There's all sorts of nuance in the README. And there is absolutely no pretending in my comment or in the link I shared that using indices instead of pointers has zero downsides.

cmeacham98 · on March 26, 2024

It is a memory safety bug - by using this "indexes as pointers" methodology it is possible to write code where two different owners simultaneously believe they are the sole owner of an object.

Writing that using normal pointers is impossible in safe rust (barring a compiler bug, which do exist but are rare).

Rusky · on March 26, 2024

You don't get two owners that way. You get something slightly less powerful than owners, which still has all its preconditions satisfied: you can check that it is in bounds, and if so you will find an element of the expected type. You cannot corrupt the underlying allocator's data structures, you cannot resize the array when there are outstanding pointers to its elements, and you cannot violate the type system.

Rust's goal was never to exclude all shared mutability. (Otherwise why support things like locks?) Rather, it excludes only the kinds of shared mutability that open the door to undefined behavior. The point of all these sorts of "workarounds" is that, because there is no longer a single unrestricted kind of shared mutability, you now get to pick which version of restricted mutability you want: reference counting vs bounds checking vs ghostcell's higher-rank lifetimes vs whatever else.

kelnos · on March 27, 2024

> two different owners simultaneously believe they are the sole owner of an object.

Not in the sense of ownership that matters in Rust. The Vec owns the data. A node with an index in it does not own that data. It merely refers to it.

The key here is when you answer the question, "what happens when I screw it up the indices?" And the answer is logic errors or a panic. Neither of those are memory safety issues.

burntsushi · on March 26, 2024

Can you share a program where this results in UB without using `unsafe`?

Please also consider what I was responding to:

> you’re just laundering the unsafe pointer arithmetic behind array indexing

cmeacham98 · on March 26, 2024

The definition of memory safety is not "code that does not result in UB".

burntsushi · on March 27, 2024

So just to be clear here, the progression is:

"memory safety bugs" -> "for example, UAFs" -> "I don't mean a literal UAF" -> "use array index after free" -> 'memory safety is not "code that does not result in UB"'

I mean, you can define "memory safety" to be whatever you want it to be, but the definition everyone else uses (including Rust) is absolutely connected with undefined behavior. More than that, the entire context of this thread assumes that definition. Rust certainly does. And if you are going to use a different definition than everyone else, at least have the courtesy to provide it.

If people used your definition, then it would be wrong to, for example, say that "Java is a memory safe programming language." But that is, as far as I know, widely regarded to be a true statement.

This sort of disagreement is profoundly irritating, because I made it exceptionally clear what I meant from the get-go. All you had to do was respond and say, "oh, it sounds like we are just using different definitions of the term 'memory safety.' if we use your definition, I agree with what you said."

Rusky · on March 26, 2024

It is the definition of memory safety that Rust uses.

It would be easier to discuss whatever non-UB failure modes you have in mind, in the context of Rust, if you used a different term.

kelnos · on March 27, 2024

Maybe not, but it's also not whatever unconventional definition you've come up with here.

akavel · on March 26, 2024

There's a huge difference in the outcomes those can lead to, that's why it's not just a simple "laundering", but a serious gain. On top of that, C vs. Zig vs. Rust give you a different level of that gain still.

The particular difference is noted in the fragment you quoted. What may not be obvious, is why corrupted memory etc. are a much bigger problem than panics. In fact it is usually the _lack_ of panics that is problem in them: they will lead to silent memory corruption. This will lead to unexpected behaviors of your code. Ones you just never even assumed possible - due to basic contracts being broken. Such problems with then silently propagate through your system, poisoning it without you knowing. Whereas a panic is a clear cut alarm/canary behavior: something's wrong, let's crash loudly and make it immediately visible. This will protect against the corruption spreading out. And in a typical system, will restart the panicked binary, allowing the system to resume operation with minimal disruption.

10000truths · on March 26, 2024

The laundering is the point. It's much easier for the user to handle out-of-bounds array accesses and uninitialized array elements, compared to tracking the lifetime and validity of data at arbitrary memory addresses.

agalunar · on March 26, 2024

Using indices is a (slight) improvement over pointers, since in Rust you will benefit from bounds checking (restricting you to the memory allocated for that tree).

(If you use u32 for the indices, you also save some space.)

taeric · on March 26, 2024

It is hilarious how much space you can save by using smaller indexes than a pointer. Especially on modern architectures.

burntsushi · on March 26, 2024

Yes! The regex crate does this and it saves quite a bit of memory.

taeric · on March 26, 2024

It is the kind of low hanging optimization that I used to think was greatly oversold. And, to be fair, for many programs I would still wager it probably is. If you are chasing a ton of pointers, though, it is definitely worth considering how many more you can hold in memory with smaller sizes.

I think I remember a discussion a while back lamenting that we just tacitly accepted wide pointers. If anyone has a link going over that, I'd be delighted to read it again. I'm 90% sure I did not understand it when I first saw it. :D

eclark · on March 26, 2024

I am currently writing a CFR Tree that has nodes that can be self-referential. https://github.com/elliottneilclark/rs-poker/pull/92

The idea is that every game of poker is a walk through a tree. Keeping track of probabilities and outcomes allows for exploration of the correct way to play. Using indices made building the tree easier (though I am still one bug away from it all working).

bbatha · on March 26, 2024

The difference is that the array bounds are checked and the behavior is defined whereas bad unsafe pointer athematic if you’re lucky gives you a segfault and at worst an RCE.

alexchamberlain · on March 26, 2024

Array bounds are essentially unchecked in C, unless you go so far over as to hit a guard page.

empath-nirvana · on March 26, 2024

You can have bugs with that style of coding, but they're not going to be bugs of the "remote code execution" type.

googh · on March 27, 2024

If preventing remote code execution is the goal, then C/C++/Zig can also achieve this using isoheaps (see Fil-C[1] for example). Even without isoheaps, CFI mitigations are able to prevent jumps into code not known at compile time. Why bother with Rust or any other language with borrow checkers then? Just a honest question.

[1] - https://github.com/pizlonator/llvm-project-deluge/blob/delug...

favorited · on March 26, 2024

The arena + handles approach doesn't solve the memory management issue, but it can be beneficial for storing nodes in a list, graph, etc. from a data locality perspective.

kelnos · on March 27, 2024

There's a big difference, though: the array indexing code is still safe from memory errors. I would much prefer a panic over a segfault.