Fix issue 15732 - std.function partial does not work with function / delegate references
merged-on-behalf-of: Nathan Sashihara <n8sh@users.noreply.github.com>
Fix issue 16639 - Review std.json wrt this article on JSON edge cases and ambiguities
merged-on-behalf-of: Sebastian Wilzbach <sebi.wilzbach@gmail.com>
It was unable to handle alignments > 2^31 bytes, e.g., returning 2 for
the 64-bit pointer 0x2FA_00000000. Return a size_t instead of a uint.
Note that it's only called in one place, an assertion in a ctor of the
BitmappedBlockImpl mixin, which handles size_t fine.
This fixes sporadic
std.experimental.allocator.building_blocks.bitmapped_block unittest
failures for LDC CI on Win64 (something like 1 out of 100 runs failing).
fopen and popen wrap C functions that exist in druntime, and it's not
uncommon for folks to end up accidentally trying to use the private
functions in std.stdio instead of the ones from druntime - which then
tends to result in questions on D.Learn. There's no reason that either
of these private functions needs to be named the same as the C function
that it wraps. It just causes confusion when folks accidentally try to
use them instead of the C functions.
So, this changes them to _fopen and _popen so that there will be no such
conflict, and the error messages when folks try to call the C functions
but do so incorrectly will not mention std.stdio's internals.
This removes unnecessary uses of scope, adds a lot of necessary uses of
scope, and adds basic tests that ensure that each of the functions in
std.datetime.systime compile when given scope arguments. It also enables
some previously commented out tests involving immutable SysTimes (IIRC,
they didn't use to compile due to compiler bugs related to
Rebindable!(immutable TimeZone), but they now compiler, so they should
be enabled).
The test corpus provided at https://github.com/nst/JSONTestSuite/ revealed some issues with the std.json.parseJSON function. Since addressing some of the issues required parseJSON to reject input it previously accepted, I have added a new JSONOptions.strictParsing flag so callers can opt-in to the stricter parsing.
The issues, and how I've addressed them, are listed below (approximately from most severe to least):
Silently dropping ASCII NUL characters from strings:
n_string_unescaped_crtl_char.json
This is the most serious problem I found while fixing the test cases. The current implementation of parseJSON() uses a helper function called peekChar() which can store the next character to handle in a variable of type Char (an alias of the character type). Unfortunately it was using 0 to indicate it has not read a character yet so if an ASCII NUL (which will have the value 0) is present in the text and someone reads it with peekChar() then it will effectively be skipped over, which was happening in string and whitespace parsing.
I changed peekChar() to use a Nullable!Char as the temporary storage for the next character to disambiguate the case where there is no pending unconsumed character from the case where there is a pending unconsumed ASCII NUL. In strict mode JSON with unescaped ASCII NULs in strings will throw an exception while in non-strict mode the JSON will be accepted with the NUL included in the string value.
Failure to accept ASCII DEL (0x7f) unescaped in strings:
y_string_unescaped_char_delete.json
y_string_with_del_character.json
These were the only test cases that std.json rejected that it should have accepted. This issue was addressed by changing the string parsing logic to explicitly check for character values < 0x20 instead of using std.ascii.isControl (which also returned true for 0x7f), with a special exception for ASCII NULs in non-strict mode as mentioned above.
Parsing "true", "false", and "null" tokens case-insensitively:
n_structure_capitalized_True.json
In strict mode those tokens are now parsed case-sensitively.
Accepting control characters other than ' ', '\t', '\r', and '\n' as whitespace:
n_structure_null-byte-outside-string.json
n_structure_whitespace_formfeed.json
In strict mode only the listed characters are accepted as whitespace, while non-strict mode continues to use std.ascii.isWhite with an additional exception for ASCII NUL for a similar reason as the n_string_unescaped_ctrl_char.json case (the skipWhitespace() function used peekChar() so it didn't handle ASCII NULs consistently; non-strict mode after my changes is actually more permissive than the previous behavior but it is at least consistently permissive).
Silently accepting empty data:
n_structure_no_data.json
In strict mode an exception is now thrown instead of returning an empty value.
Failure to enforce that numbers beginning with 0 cannot have any additional digits in the non-fractional part:
n_number_-01.json
n_number_neg_int_starting_with_zero.json
n_number_with_leading_zero.json
An additional check is now performed in strict mode when the whole part of a number begins with zero to ensure trailing digits are not present.
Failure to check for trailing characters after parsing:
n_array_comma_after_close.json
n_array_extra_close.json
n_multidigit_number_then_00.json
n_object_trailing_comment.json
n_object_trailing_comment_open.json
n_object_trailing_comment_slash_open_incomplete.json
n_object_trailing_comment_slash_open.json
n_object_with_trailing_garbage.json
n_string_with_trailing_garbage.json
n_structure_array_trailing_garbage.json
n_structure_array_with_extra_array_close.json
n_structure_close_unopened_array.json
n_structure_double_array.json
n_structure_number_with_trailing_garbage.json
n_structure_object_followed_by_closing_object.json
n_structure_object_with_trailing_garbage.json
n_structure_trailing_#.json
An additional check is now performed in strict mode to ensure any trailing characters after the initial JSON value are only whitespace.
In addition to the above issues, parseJSON() will throw ConvException for numbers out of the range of double/long/ulong which was not previously documented. I have updated the ddoc comment to reference that exception.
This adds a special TimeZone type internal to SysTime which _timezone is
directly initialized to so that SysTime.init will work without
segfaulting but will still be uniquely identifiable with the is
operator. Or at least, _timezone is _supposed_ to be directly
initialized to it, but issue# 17740 currently prevents that. So,
_timezone has been temporarily renamed to _timezoneStorage and private
getters and setters named _timezone have been added. The getter then
does a null check and returns InitTimeZone() for SysTime.init to
simulate the member variable having been initialized to InitTimeZone().
Once issue# 17740 has been fixed, these accessors will be unnecessary,
and the code should be updated so that _timezone is properly a variable
again and is directly initialized with InitTimeZone().
The new TimeZone type - InitTimeZone - is internal to SysTime and can
only be obtained by the timezone getter on SysTime.init. It acts the
same as UTC except that it is not special-cased by the to*String
functions and thus will print out its timezone as +00:00 instead of z,
which is perfectly legitimate per the spec. And as such, if _timezone
were directly initialized with InitTimeZone(), there would be no extra
checks due to this change, and everything would just work. However,
until issue# 17740 is fixed, there will be an extra null check any time
that a function is called on _timezone, because _timezone is currently a
wrapper that does a null check rather than being a member variable
directly like it's supposed to be.
Unlike previous attempts along these lines, this does not make it so
that SysTime.init has NaN behavior such that any operation (other than
assignment) on an an uninitialized SysTime would result in SysTime.init,
and the timezone setter property does not set the SysTime to
SysTime.init if it's passed this TimeZone. So, unfortunately, it _is_
possible to have other SysTime values with the special TimeZone, but it
was deemed unnecessarily complex for too little benefit to add the NaN
behavior. And regardless, SysTime.init is still uniquely identifiable
via the is operator. It's just that it can't technically be uniquely
identified by the timezone getter, which was never a supported feature
anyway.