mirror of
git://git.gnu.org.ua/wordsplit.git
synced 2025-04-26 00:29:54 +03:00
Fix default escape settings.
* wordsplit.c (wordsplit_escape): New global. (wordsplit_init): Backslash interpretation is disabled if not expliticitly configured. (wsnode_quoteremoval): Unquote unless _WSNF_NOEXPAND is set. (scan_word): Fix backslash handling if WRDSF_QUOTE flags are set. * wsp.c: Fix option handling. * wordsplit.at: Test handling of C-style escapes. * README: Document changes. * wordsplit.3: Likewise.
This commit is contained in:
parent
403b1c769f
commit
e2f0c64db9
6 changed files with 146 additions and 78 deletions
79
wordsplit.3
79
wordsplit.3
|
@ -14,7 +14,7 @@
|
|||
.\" You should have received a copy of the GNU General Public License
|
||||
.\" along with wordsplit. If not, see <http://www.gnu.org/licenses/>.
|
||||
.\"
|
||||
.TH WORDSPLIT 3 "July 24, 2019" "WORDSPLIT" "Wordsplit User Reference"
|
||||
.TH WORDSPLIT 3 "June 22, 2023" "WORDSPLIT" "Wordsplit User Reference"
|
||||
.SH NAME
|
||||
wordsplit \- split string into words
|
||||
.SH SYNOPSIS
|
||||
|
@ -299,16 +299,15 @@ the \fBWRDSF_DQUOTE\fR flag. The macro \fBWRDSF_QUOTE\fR enables both.
|
|||
Backslash interpretation translates unquoted
|
||||
.I escape sequences
|
||||
into corresponding characters. An escape sequence is a backslash followed
|
||||
by one or more characters. By default, each sequence \fB\\\fIC\fR
|
||||
appearing in unquoted words is replaced with the character \fIC\fR. In
|
||||
doubly-quoted strings, two backslash sequences are recognized:
|
||||
\fB\\\\\fR translates to a single backslash, and \fB\\\(dq\fR
|
||||
translates to a double-quote.
|
||||
by one or more characters. By default, that is if no flags are
|
||||
supplied, no escape sequences are defined, and each sequence
|
||||
\fB\\\fIC\fR is reproduced verbatim.
|
||||
.PP
|
||||
There are several ways to enable backslash interpretation and to
|
||||
define escape sequences. The simplest one is to use the
|
||||
\fBWRDSF_CESCAPES\fR flag. This flag defines the C-like escape
|
||||
sequences:
|
||||
.PP
|
||||
Two flags are provided to modify this behavior. If
|
||||
.I WRDSF_CESCAPES
|
||||
flag is set, the following escape sequences are recognized:
|
||||
.sp
|
||||
.nf
|
||||
.ta 8n 18n 42n
|
||||
.ul
|
||||
|
@ -329,19 +328,59 @@ for a two-digit hex number is replaced with ASCII character \fINN\fR.
|
|||
The sequence \fB\\0\fINNN\fR, where \fINNN\fR stands for a three-digit
|
||||
octal number is replaced with ASCII character whose code is \fINNN\fR.
|
||||
.PP
|
||||
The \fBWRDSF_ESCAPE\fR flag allows the caller to customize escape
|
||||
sequences. If it is set, the \fBws_escape\fR member must be
|
||||
initialized. This member provides escape tables for unquoted words
|
||||
(\fBws_escape[0]\fR) and quoted strings (\fBws_escape[1]\fR). Each
|
||||
table is a string consisting of an even number of characters. In each
|
||||
pair of characters, the first one is a character that can appear after
|
||||
backslash, and the following one is its translation. For example, the
|
||||
above table of C escapes is represented as
|
||||
\fB\(dq\\\\\\\\"\\"a\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR.
|
||||
Additionally, outside of quoted strings (if these are enabled by the
|
||||
use of \fBWRDSF_DQUOTE\fR flag) backslash character can be used to
|
||||
escape horizontal whitespace: horizontal space (ASCII 32) and
|
||||
tab (ASCII 9) characters.
|
||||
.PP
|
||||
It is valid to initialize \fBws_escape\fR elements to zero. In this
|
||||
The \fBWRDSF_CESCAPES\fR bit is included in the default flag
|
||||
set \fBWRDSF_DEFFLAGS\fR.
|
||||
.PP
|
||||
The \fBWRDSF_ESCAPE\fR flag provides a more elaborate way of defining
|
||||
escape sequences. If it is set, the \fBws_escape\fR member must be
|
||||
initialized. This member provides escape tables for unquoted words
|
||||
(\fBws_escape[WRDSX_WORD]\fR) and quoted strings
|
||||
(\fBws_escape[WRDSX_QUOTE]\fR). Each table is a string consisting of
|
||||
an even number of characters. In each pair of characters, the first
|
||||
one is a character that can appear after backslash, and the following
|
||||
one is its translation. For example, the table of C escapes is
|
||||
represented as follows:
|
||||
.TP
|
||||
\fB\(dq\\\\\\\\"\\"a\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR
|
||||
.PP
|
||||
It is valid to initialize \fBws_escape\fR elements to NULL. In this
|
||||
case, no backslash translation occurs.
|
||||
.PP
|
||||
For convenience, the global variable
|
||||
.B wordsplit_escape
|
||||
defines several most often used escape translation tables:
|
||||
.PP
|
||||
.EX
|
||||
extern char const *wordsplit_escape[];
|
||||
.EE
|
||||
.PP
|
||||
It is indexed by the following constants:
|
||||
.TP
|
||||
.B WS_ESC_C
|
||||
C-style escapes, the definition of which is shown above. This is the
|
||||
translation table that is used within quoted strings when
|
||||
.B WRDSF_CESCAPES
|
||||
is in effect.
|
||||
.TP
|
||||
.B WS_ESC_C_WS
|
||||
The \fBWS_ESC_C\fR table augmented by two entries: for horizontal tab
|
||||
character and whitespace. This is the table that is used for unquoted
|
||||
words when
|
||||
.B WRDSF_CESCAPES
|
||||
is in effect.
|
||||
.TP
|
||||
.B WS_ESC_DQ
|
||||
Backslash character escapes double-quote and itself. Useful for
|
||||
handling doubly-quoted strings in various Internet protocols.
|
||||
.TP
|
||||
.B WS_ESC_DQ_WS
|
||||
Escape double-quote, backslash, horizontal tab and whitespace characters.
|
||||
.PP
|
||||
Interpretation of octal and hex escapes is controlled by the following
|
||||
bits in \fBws_options\fR:
|
||||
.TP
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue