mirror of
https://github.com/dlang/phobos.git
synced 2025-04-29 14:40:30 +03:00

This makes it so that readText checks for a BOM. If there is a BOM, it is for UTF-8, UTF-16, or UTF-32, and it doesn't match the requested string type, then a UTFException is thrown. Other encodings are let through in case they happen to work with the requested string type and pass UTF validation. Also, this makes it so that readText checks the alignment of the buffer against the requested string type and throws a UTFException instead of letting the cast throw an Error.
20 lines
1.1 KiB
Text
20 lines
1.1 KiB
Text
readText now checks BOMs
|
|
|
|
$(REF readText, std, file) now checks for a
|
|
$(HTTP https://en.wikipedia.org/wiki/Byte_order_mark, BOM). If a BOM is present
|
|
and it is for UTF-8, UTF-16, or UTF-32, $(REF readText, std, file) verifies
|
|
that it matches the requested string type and the endianness of the machine,
|
|
and if there is a mismatch, a $(REF UTFException, std, utf) is thrown without
|
|
bothering to validate the string.
|
|
|
|
If there is no BOM, or if the BOM is not for UTF-8, UTF-16, or UTF-32, then the
|
|
behavior is what it's always been, and UTF validation continues as per normal,
|
|
so if the text isn't valid for the requested string type, a
|
|
$(REF UTFException, std, utf) will be thrown.
|
|
|
|
In addition, before the buffer is cast to the requested string type, the
|
|
alignment is checked (e.g. 5 bytes don't fit cleanly in an array of $(D wchar)
|
|
or $(D dchar)), and a $(REF UTFException, std, utf) is now throw if the
|
|
number of bytes does not align with the requested string type. Previously, the
|
|
alignment was not checked before casting, so if there was an alignment mismatch,
|
|
the cast would throw an Error, killing the program.
|