Fix handling of empty words when WRDSF_RETURN_DELIMS or WRDSO_MAXWORDS are in effect

* README: Update.
* wordsplit.3: Document changes.
* wordsplit.at: Test backward compatibility quirk.
* wordsplit.c: Make sure NULL and DELIM nodes are protected from
expansions.
(wordsplit_finish): Ensure the output array produced
with WRDSF_RETURN_DELIMS is consistent with that produced without this
flag.  Provide new option, WRDSO_RETDELNOTEMPTY, to request old buggy
behavior.
* wordsplit.h (WRDSO_RETDELNOTEMPTY): New option.
* wsp.c: New tests.
This commit is contained in:
Sergey Poznyakoff 2025-03-15 23:05:25 +02:00
parent 0e1a09c4c7
commit 8f3eb3433e
6 changed files with 138 additions and 55 deletions

View file

@ -1,5 +1,5 @@
.\" This file is part of wordsplit -*- nroff -*-
.\" Copyright (C) 2009-2021 Sergey Poznyakoff
.\" Copyright (C) 2009-2025 Sergey Poznyakoff
.\"
.\" Wordsplit is free software; you can redistribute it and/or modify
.\" it under the terms of the GNU General Public License as published by
@ -14,7 +14,7 @@
.\" You should have received a copy of the GNU General Public License
.\" along with wordsplit. If not, see <http://www.gnu.org/licenses/>.
.\"
.TH WORDSPLIT 3 "June 22, 2023" "WORDSPLIT" "Wordsplit User Reference"
.TH WORDSPLIT 3 "March 15, 2025" "WORDSPLIT" "Wordsplit User Reference"
.SH NAME
wordsplit \- split string into words
.SH SYNOPSIS
@ -558,6 +558,43 @@ the last word. For example, if the input to the above fragment were
"is"
"the time for all good men"
.EE
.SH COMPATIBILITY QUIRKS
If
.B WRDSF_RETURN_DELIMS
is set and
.B WRDSF_SQUEEZE_DELIMS
is not,
.B wordsplit
returns an empty word between each pair of contiguous delimiters.
Consider, for example, the following fragmen:
.PP
.EX
struct wordsplit ws;
ws.ws_delim = ":";
wordsplit(str, &ws, WRDSF_DELIM | WRDSF_RETURN_DELIMS);
.EE
.PP
If \fIstr\fR contained \fBroot:x:0:0::/root:/bin/sh\fR, the
resulting \fBws.ws_wordv\fR array would be:
.PP
.EX
{ "root", ":", "0", ":", "0", ":", "", ":", "/root", ":", "/bin/sh" }
.EE
.PP
Notice the empty word at index 6. Earlier versions of
.B wordsplit
(up to v1.1-7-g0e1a09c) behaved differently: several contiguous
delimiters were returned one after another, without empty words in
between, like that:
.PP
.EX
{ "root", ":", "0", ":", "0", ":", ":", "/root", ":", "/bin/sh" }
.EE
.PP
To request this behavior, use the
.B WRDSO_RETDELNOTEMPTY
option. It is not advised to be used, except to
ensure backward compatibility with earlier wordsplit versions.
.SH WORDSPLIT_T STRUCTURE
The data type \fBwordsplit_t\fR has three members that contain
output data upon return from \fBwordsplit\fR or \fBwordsplit_len\fR,
@ -1256,7 +1293,7 @@ Backtick command expansion is not supported.
.SH "BUG REPORTS"
Report bugs to <gray@gnu.org>.
.SH COPYRIGHT
Copyright \(co 2009-2019 Sergey Poznyakoff
Copyright \(co 2009\(en2025 Sergey Poznyakoff
.br
.na
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>