Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> add("foo".to_string(), "bar")

> It's a pretty radical design choice, to force you to be aware that, yeah, since you're creating a new value (out of the two parameters), you will have to allocate. And so it forces you to allocate outside of the Add operation itself.

Forcing you to think about allocation is good for many situations.

Making you do an extra allocation is a bad way to do that. And the only way I can think of to avoid that would give up having the function be generic.



There is no extra allocation. No matter how you do it, you will need at least one allocation to hold the result of combined strings.


Yes, you need one.

Which would mean using String::with_capacity based on both &str.

Not making a string from one side, then very likely resizing it to add the other. The generic code+call will often make two allocations.


>Making you do an extra allocation is a bad way to do that. And the only way I can think of to avoid that would give up having the function be generic.

Sure, if you want to optimize that, then create your "foo" string with the extra capacity instead of doing `"foo".to_string()`, eg `{ let foo = String::with_capacity("foo".len() + "bar".len()); foo.push_str("foo"); foo }`. It has nothing to do with the generic `add` function; it can stay generic as it is just fine. Even a function specialized for adding String and &str could not go back in time and change how much capacity the String was created with.

Edit (1h later): It's also worth noting that:

- This desire of `Add::add(&str, &str) -> String` only seems to come up with people learning the language and bringing expectations from other languages. I've never seen production code that would've benefitted from it.

- As much as the ship has sailed with libstd functions hiding allocations via the global allocator where you may not expect them, the less new cases get added the better. So I'd rather not have `impl Add<&'_ str> for &'_ str { type Output = String; }` added to libstd.

- If you really do have two str literals that you want to concatenate, just use `concat!()`. Not only does it work, but also it works at compile-time and without any allocation (it evaluates to a `&'static str`).


> It has nothing to do with the generic `add` function; it can stay generic as it is just fine. Even a function specialized for adding String and &str could not go back in time and change how much capacity the String was created with.

The only reason we're converting a parameter to a String outside the function is because it has that awkward signature.

I was suggesting a specialization for adding &str and &str that requires minimal code at the call site and handles the capacity issue inside the function.

But that means we're no longer just applying + to types that have Add, we've given up all that convenience because it's not compatible with performance. This wouldn't have to be a tradeoff if Add worked differently; having to make this decision is a flaw with Rust. And you could still make it explicit that allocation is happening.


> Edit (1h later): It's also worth noting that:

> - This desire of `Add::add(&str, &str) -> String` only seems to come up with people learning the language and bringing expectations from other languages. I've never seen production code that would've benefitted from it.

It's not a big problem that strA.into_string() + strB tends to allocate twice, and you wouldn't notice it in production, but it's still a waste of cycles caused by Rust's abstractions. Not even the abstractions, really, they left that implementation out as a reminder to the coder. It's a bit of an icky tradeoff.


But why can't the add function do this itself?


What do you propose, that fn add goes back in time taps the shoulder of the allocator as it's allocating what will eventually be the first argument and ask to give it a bit of extra capacity?


The coder would go back in time to before they added ".to_string()"


No, I'm asking that add("foo", "bar") should automatically allocate a string of length 6.


This is explained in the official docs: to avoid too many allocations when repeatedly concatenating strings.

> Implements the + operator for concatenating two strings.

> This consumes the String on the left-hand side and re-uses its buffer (growing it if necessary). This is done to avoid allocating a new String and copying the entire contents on every operation, which would lead to O(n^2) running time when building an n-byte string by repeated concatenation.


Sure, but it would have been a legitimate choice to let + do that, and use other functions to handle those use cases where it matters. After all, I expect to get a brand new value from an addition. If

a = 2 b = 3

c = a + b

I certainly do not expect "a" to be modified. So why would I expect it with strings? It would be no problem to warn the user about repeated concatenation with +, as I recall Java with its immutable strings does just that.


>After all, I expect to get a brand new value from an addition. [...] I certainly do not expect "a" to be modified. So why would I expect it with strings?

To be clear, your expectations are not betrayed when doing String + &str -> String in Rust. The String addend is consumed by + and attempting to reuse it will produce an error. So it's not like you notice that the String you used to have has implicitly been modified.


We call this being consumed, "moving". And in Rust actually your integer 'a' was the special case. Every value gets moved when you add it to something else, but this integer 'a' is still available to be used after it was moved.

What happens there is, the integer types implement Copy which is a special Trait that says, "I promise I don't have any meaning beyond my actual pattern of bits in a memory location or a register or whatever". If the type implements Copy then when 'a' gets moved, it is logically still fine to keep 'a' around anyway in case anybody still wanted it, as you can freely make or destroy copies of the bit pattern and this type has no meaning beyond the bit pattern.

You can derive Copy (or implement it) on your own types if the Rust compiler can see why that's a reasonable claim to make for your type, otherwise you can't. So String couldn't be Copy, because it clearly has a reference to a vector (of bytes) inside it making it work and so the bit patterns are just pointing at the memory address of that vector of bytes.


My expectations are betrayed, but I do appreciate that Rust at least stops me if that expectation matters.

Really, whenever I concatenate something variable-length like a string, I know there's potential allocation under the hood. There has to be. What I don't get is why that allocation has to be hidden as a mutation of a, rather than just be explicit as yeah, c is a new thing. I expect functions to return new things rather than mutate things via side effects. Isn't it better to let addition stay a function, rather than overload of + being slightly more convenient/efficient for building up strings?


"Sure, but it would have been a legitimate choice to let + do that"

Yes, it is a legitimate choice indeed. Many fine languages have made that choice.

But it is also a legitimate choice not to. We need at least some languages that don't do that. We really don't have very many that are this careful about allocation. Most of programming language history since C and C++ has been trying to "fix" this problem, and make allocations easier. Rust has chosen to be one of the languages that stays at the highest level of care about allocations. As a result, it will not just casually allocate memory for strings.

That's what this is; a design choice. If you don't like it, don't use Rust. I generally don't use Rust, because my tasks don't call for this level of control. But if I ever do have a task with that level of control, I know where to go get the tools to solve that problem, and I'm glad that someone made that choice. I'd be in real trouble if nobody chose that.

(Another example of a common pattern in programming language history can be seen happening here, too. C and C++ were this careful about allocations for the most part, but they made it really hard to use at all, and borderline impossible to use safely. Then, people incorrectly attributed that difficulty to being careful about allocations, and subsequent programming language design went in the direction of getting away from that level of care. But what we see in Rust is that if you are that careful, and add in a better toolset to deal with it, you can blunt the disadvantages while reaping the advantages. The disadvantages don't entirely go away, but if you reduce the "costs" in the costs/benefits analysis, the benefits become more attainable for engineers, especially when tooling can be developed to also increase the benefits. Haskell doesn't do this with allocation but it has a similar story for the things like how hard strong typing is. IMHO the mid-1990s to 200xs in programming languages was a story of running away from hard problems, but the big story in the 2010s has been the running towards hard problems, and I'm liking the results.)


> We really don't have very many that are this careful about allocation.

You're missing the point. I don't think this is being careful about allocation. There is potential allocation either way - when your string (or vec) grows, allocation under the hood may be necessary too.

Do you see my point? Surely doing the allocation inside "a" isn't better than just allocating a new "c" for explicitness' sake. It may be better for performance's sake, but that's not justification enough for making (+) an impure function as I see it. We could always just write it out as .append(b) or whatever if that's what we wanted.


You're welcome to dislike it because it's "impure". Actually, depending on how rigidly you define purity, either both implementations are pure (both return the same output for the same input) or both implementations are impure (both modify global state via the implicit global allocator).

But you're mistaken in thinking it's something obvious such that others should agree with you. As I said previously, the reality is that you cannot write a program that lets you differentiate whether String + &str reused the same String or allocated a new one. So your hangup is entirely over the documentation revealing the extra information that it has the efficient implementation, not the inefficient one.


The entire point of passing in a String, according to the article, is to "force you to be aware that you will have to allocate".

But you're saying that an extra hidden allocation doesn't matter and you shouldn't have a "hangup" about it? Trying to follow the same rules as the article seems reasonable to me, not a "hangup".


>But you're saying that an extra hidden allocation doesn't matter and you shouldn't have a "hangup" about it?

No. I recommend reading the subthread the comment was made in more carefully.


The comment you replied to was talking both about the negatives of hidden allocations and about purity, so I thought you were addressing both.

If you are only talking about purity then okay I have no comment.


>The comment you replied to was talking both about the negatives of hidden allocations and about purity

That is incorrect. The comment was requesting that `Add::add(String, &str) -> String` should be impled by creating a new String instead of mutating an existing one. My comment pointed out why this request was meaningless.


What does that explain?

Let's say you did add(add("foo", "bar"), "baz").

The inner add would return a String, and then all subsequent adds would have the exact same optimization you're quoting.

There is no downside, performance-wise.


That can not typecheck in Rust, unless you use something really odd like `Into<Cow<‘a, A>> where A: Add<B>` as your input types.

And in languages where it does, you need either ubiquitously mutable strings (which everyone’s moved from because it generally screws you up) or refcounting.


> That can not typecheck in Rust, unless you use something really odd like `Into<Cow<‘a, A>> where A: Add<B>` as your input types.

I'm not entirely sure what that means, but here I'll just go implement a version:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

I think this is right. I haven't written Rust before.

I copied Add to RAdd because you can't implement traits you don't own on types you don't own. And I didn't bother implementing it for numbers.

There's a trait implementation that takes String and &str (directly copied from Add), and there's a version that takes &str and &str. Both of them return String.

The generic function is just as generic as it always was, but of course you can't do this with actual Add without changing the standard library. You'd have to make the function less generic instead.

> And in languages where it does, you need either ubiquitously mutable strings (which everyone’s moved from because it generally screws you up) or refcounting.

It's the same String as ever. Owned and mutated by adding.


But mutable string are not an issue when not shared, which can be checked statically in rust, right?


No, they won't in standard implementations we are not exactly allocating, the allocated buffer doubles each time, so it's amortised and not O(n^2). There is a downside for non trivial example code.


Every add except the first one is using the same code in either version.

The suggested change does not affect String + &str at all.


Ok, maybe I misunderstood, I thought you were saying &str + &str with precomputed capacity in call cases, let me read it again.


If you want a function to do that, you're absolutely at liberty to write one. You could even write a zero-overhead wrapper around the references to allow for overloading. Rust won't let you implement Add<&str> for &str though, you'd need to use your wrapper.

I wouldn't recommend doing that, though, and not just because of "surprise" allocations: in my experience writing Rust code, it's fairly rare to need to add strings together. I'm much more likely to use a format macro.


Agreed, and in the cases where format! isn't up to the task, String::push_str is both easy and explicit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: