I also added decodeFront which operates on the first code point in the
range (unlike stride and strideBack, it requires a different name, since
the signatures of decode and decodeFront are almost identical - the only
difference being that decode takes the index by ref, and decodeFront
takes it as out).
It's easier to define aliases this way. I created two templates though
so that you can still pass both template arguments if you want to, and
it'll avoid breaking any code.
- Use fast path tests for non-complex unicode sequences that can be
inlined. These rely on the built-in array bounds check.
- Factor out complex cases into separate functions that do exception
based validity checks. The char[] and wchar[] versions use
pointers to avoid redundant array bounds checks, thus they can only
be trusted.
- Complete rewrite of decode for char[] to use less branching and
unrolled loops. This requires less registers AND less instructions.
The overlong check is done much cheaper on the code point.
- The decode functions were made templates to short circuit the very
restricted function inlining possibilities.
The pragmas have not been as effective as we might have liked, since
they only work with templates and can't tell you where in your code you
need to make changes, and they seemed to have been more annoying to
programmers than helpful, so we're going to discontinue them. We'll
leave them in for stuff that's actually been deprecated until deprecated
has been improved enough to take a message, but we'll leave "scheduled
for deprecation" messages to the documentation and changelog.
It's pretty clear from discussions on the newsgroup that we want to keep
toUTF16z, so it shouldn't be scheduled for deprecation anymore. That
change is part of https://github.com/D-Programming-Language/phobos/pull/279 ,
but for whatever reason that pull request hasn't been reviewed yet, let
alone merged in, and this change shouldn't need review, _and_ it should
be in before the next release, so I'm just making it and checking it in.
It's simply removing the "scheduled for deprecation" note in the
documentation and its associated pragma.
The main purpose of these changes was to make as much as possible in
std.utf pure (other than toUTFx, which I'll be replacing with toUTF in a
future pull request), but I also ended up doing a fair bit of
documentation cleanup. Almost everything in std.utf is pure now though,
which should help considerably in making it possible to make
string functions pure.
I also put @system on the two overloads of toUTFZ which do pointer
arithmetic. They're obviously @system anyway, but tagging them with it
makes it clearer.