> It's a pretty radical design choice, to force you to be aware that, yeah, since you're creating a new value (out of the two parameters), you will have to allocate. And so it forces you to allocate outside of the Add operation itself.
Forcing you to think about allocation is good for many situations.
Making you do an extra allocation is a bad way to do that. And the only way I can think of to avoid that would give up having the function be generic.
>Making you do an extra allocation is a bad way to do that. And the only way I can think of to avoid that would give up having the function be generic.
Sure, if you want to optimize that, then create your "foo" string with the extra capacity instead of doing `"foo".to_string()`, eg `{ let foo = String::with_capacity("foo".len() + "bar".len()); foo.push_str("foo"); foo }`. It has nothing to do with the generic `add` function; it can stay generic as it is just fine. Even a function specialized for adding String and &str could not go back in time and change how much capacity the String was created with.
Edit (1h later): It's also worth noting that:
- This desire of `Add::add(&str, &str) -> String` only seems to come up with people learning the language and bringing expectations from other languages. I've never seen production code that would've benefitted from it.
- As much as the ship has sailed with libstd functions hiding allocations via the global allocator where you may not expect them, the less new cases get added the better. So I'd rather not have `impl Add<&'_ str> for &'_ str { type Output = String; }` added to libstd.
- If you really do have two str literals that you want to concatenate, just use `concat!()`. Not only does it work, but also it works at compile-time and without any allocation (it evaluates to a `&'static str`).
> It has nothing to do with the generic `add` function; it can stay generic as it is just fine. Even a function specialized for adding String and &str could not go back in time and change how much capacity the String was created with.
The only reason we're converting a parameter to a String outside the function is because it has that awkward signature.
I was suggesting a specialization for adding &str and &str that requires minimal code at the call site and handles the capacity issue inside the function.
But that means we're no longer just applying + to types that have Add, we've given up all that convenience because it's not compatible with performance. This wouldn't have to be a tradeoff if Add worked differently; having to make this decision is a flaw with Rust. And you could still make it explicit that allocation is happening.
> - This desire of `Add::add(&str, &str) -> String` only seems to come up with people learning the language and bringing expectations from other languages. I've never seen production code that would've benefitted from it.
It's not a big problem that strA.into_string() + strB tends to allocate twice, and you wouldn't notice it in production, but it's still a waste of cycles caused by Rust's abstractions. Not even the abstractions, really, they left that implementation out as a reminder to the coder. It's a bit of an icky tradeoff.
What do you propose, that fn add goes back in time taps the shoulder of the allocator as it's allocating what will eventually be the first argument and ask to give it a bit of extra capacity?
This is explained in the official docs: to avoid too many allocations when repeatedly concatenating strings.
> Implements the + operator for concatenating two strings.
> This consumes the String on the left-hand side and re-uses its buffer (growing it if necessary). This is done to avoid allocating a new String and copying the entire contents on every operation, which would lead to O(n^2) running time when building an n-byte string by repeated concatenation.
Sure, but it would have been a legitimate choice to let + do that, and use other functions to handle those use cases where it matters. After all, I expect to get a brand new value from an addition. If
a = 2
b = 3
c = a + b
I certainly do not expect "a" to be modified. So why would I expect it with strings? It would be no problem to warn the user about repeated concatenation with +, as I recall Java with its immutable strings does just that.
>After all, I expect to get a brand new value from an addition. [...] I certainly do not expect "a" to be modified. So why would I expect it with strings?
To be clear, your expectations are not betrayed when doing String + &str -> String in Rust. The String addend is consumed by + and attempting to reuse it will produce an error. So it's not like you notice that the String you used to have has implicitly been modified.
We call this being consumed, "moving". And in Rust actually your integer 'a' was the special case. Every value gets moved when you add it to something else, but this integer 'a' is still available to be used after it was moved.
What happens there is, the integer types implement Copy which is a special Trait that says, "I promise I don't have any meaning beyond my actual pattern of bits in a memory location or a register or whatever". If the type implements Copy then when 'a' gets moved, it is logically still fine to keep 'a' around anyway in case anybody still wanted it, as you can freely make or destroy copies of the bit pattern and this type has no meaning beyond the bit pattern.
You can derive Copy (or implement it) on your own types if the Rust compiler can see why that's a reasonable claim to make for your type, otherwise you can't. So String couldn't be Copy, because it clearly has a reference to a vector (of bytes) inside it making it work and so the bit patterns are just pointing at the memory address of that vector of bytes.
My expectations are betrayed, but I do appreciate that Rust at least stops me if that expectation matters.
Really, whenever I concatenate something variable-length like a string, I know there's potential allocation under the hood. There has to be. What I don't get is why that allocation has to be hidden as a mutation of a, rather than just be explicit as yeah, c is a new thing. I expect functions to return new things rather than mutate things via side effects. Isn't it better to let addition stay a function, rather than overload of + being slightly more convenient/efficient for building up strings?
"Sure, but it would have been a legitimate choice to let + do that"
Yes, it is a legitimate choice indeed. Many fine languages have made that choice.
But it is also a legitimate choice not to. We need at least some languages that don't do that. We really don't have very many that are this careful about allocation. Most of programming language history since C and C++ has been trying to "fix" this problem, and make allocations easier. Rust has chosen to be one of the languages that stays at the highest level of care about allocations. As a result, it will not just casually allocate memory for strings.
That's what this is; a design choice. If you don't like it, don't use Rust. I generally don't use Rust, because my tasks don't call for this level of control. But if I ever do have a task with that level of control, I know where to go get the tools to solve that problem, and I'm glad that someone made that choice. I'd be in real trouble if nobody chose that.
(Another example of a common pattern in programming language history can be seen happening here, too. C and C++ were this careful about allocations for the most part, but they made it really hard to use at all, and borderline impossible to use safely. Then, people incorrectly attributed that difficulty to being careful about allocations, and subsequent programming language design went in the direction of getting away from that level of care. But what we see in Rust is that if you are that careful, and add in a better toolset to deal with it, you can blunt the disadvantages while reaping the advantages. The disadvantages don't entirely go away, but if you reduce the "costs" in the costs/benefits analysis, the benefits become more attainable for engineers, especially when tooling can be developed to also increase the benefits. Haskell doesn't do this with allocation but it has a similar story for the things like how hard strong typing is. IMHO the mid-1990s to 200xs in programming languages was a story of running away from hard problems, but the big story in the 2010s has been the running towards hard problems, and I'm liking the results.)
> We really don't have very many that are this careful about allocation.
You're missing the point. I don't think this is being careful about allocation. There is potential allocation either way - when your string (or vec) grows, allocation under the hood may be necessary too.
Do you see my point? Surely doing the allocation inside "a" isn't better than just allocating a new "c" for explicitness' sake. It may be better for performance's sake, but that's not justification enough for making (+) an impure function as I see it. We could always just write it out as .append(b) or whatever if that's what we wanted.
You're welcome to dislike it because it's "impure". Actually, depending on how rigidly you define purity, either both implementations are pure (both return the same output for the same input) or both implementations are impure (both modify global state via the implicit global allocator).
But you're mistaken in thinking it's something obvious such that others should agree with you. As I said previously, the reality is that you cannot write a program that lets you differentiate whether String + &str reused the same String or allocated a new one. So your hangup is entirely over the documentation revealing the extra information that it has the efficient implementation, not the inefficient one.
The entire point of passing in a String, according to the article, is to "force you to be aware that you will have to allocate".
But you're saying that an extra hidden allocation doesn't matter and you shouldn't have a "hangup" about it? Trying to follow the same rules as the article seems reasonable to me, not a "hangup".
>The comment you replied to was talking both about the negatives of hidden allocations and about purity
That is incorrect. The comment was requesting that `Add::add(String, &str) -> String` should be impled by creating a new String instead of mutating an existing one. My comment pointed out why this request was meaningless.
That can not typecheck in Rust, unless you use something really odd like `Into<Cow<‘a, A>> where A: Add<B>` as your input types.
And in languages where it does, you need either ubiquitously mutable strings (which everyone’s moved from because it generally screws you up) or refcounting.
I think this is right. I haven't written Rust before.
I copied Add to RAdd because you can't implement traits you don't own on types you don't own. And I didn't bother implementing it for numbers.
There's a trait implementation that takes String and &str (directly copied from Add), and there's a version that takes &str and &str. Both of them return String.
The generic function is just as generic as it always was, but of course you can't do this with actual Add without changing the standard library. You'd have to make the function less generic instead.
> And in languages where it does, you need either ubiquitously mutable strings (which everyone’s moved from because it generally screws you up) or refcounting.
It's the same String as ever. Owned and mutated by adding.
No, they won't in standard implementations we are not exactly allocating, the allocated buffer doubles each time, so it's amortised and not O(n^2). There is a downside for non trivial example code.
If you want a function to do that, you're absolutely at liberty to write one. You could even write a zero-overhead wrapper around the references to allow for overloading. Rust won't let you implement Add<&str> for &str though, you'd need to use your wrapper.
I wouldn't recommend doing that, though, and not just because of "surprise" allocations: in my experience writing Rust code, it's fairly rare to need to add strings together. I'm much more likely to use a format macro.
> It's a pretty radical design choice, to force you to be aware that, yeah, since you're creating a new value (out of the two parameters), you will have to allocate. And so it forces you to allocate outside of the Add operation itself.
Forcing you to think about allocation is good for many situations.
Making you do an extra allocation is a bad way to do that. And the only way I can think of to avoid that would give up having the function be generic.