tl;dr: std.array.join was quite complicated: 6 implementation functions, and a few hundred lines of code. Now it is just 2 functions, ~120 lines of code smaller, and within 10% performance of original (faster in some cases).
The original had 6 implementations: 3 joins with a separator, 3 without. For each, those 3 consisted of:
* A general slow version (uses `std.algorithm.joiner` with `appender`).
* A version for a range of arrays where you know the length (pre-allocates the destination array then uses array ops).
* A version for a range of arrays where you don't know the length (uses `appender`).
There's no need for the second version because you can just use appender and `.reserve` the size upfront. Appender already uses array ops when available in its `put` function.
There's also no need for the first function because using `joiner` is slower than just looping through the sub-ranges, appending them. The third function doesn't even use the fact that its operating on a range of arrays, so it can easily be merged with the first and second.
The end result is just one function for join with a separator, and one for a join without. They both use `appender`, and have a static branch that will `.reserve` the required size when the information is available.
Previously, std.array.join with a separator would check whether it needed to add the separator on each iteration of the loop. With this change, the check is unnecessary because the first element is appended before the loop, then the loop just adds the separator and next element each iteration.
This improves the code significantly in the case of a non-forward RoR because we don't need to know the length anymore. It's simpler, and will be slightly faster.
This reverts commit 07104b5fe6.
As we cannot currenly rely on even ranges with slicing implementing
opDollar, this change breaks code, because it assumes that such ranges
implement opDollar.