Improve the docs

This commit is contained in:
Sergey Poznyakoff 2019-07-09 12:21:26 +03:00
parent 92f0904c20
commit 4cd8bec42c
3 changed files with 535 additions and 261 deletions

298
README Normal file
View file

@ -0,0 +1,298 @@
* Overview
This package provides a set of C functions for splitting a string into
words. The splitting process is highly configurable and allows for
considerable flexibility. The default splitting rules are similar to
those used in Bourne shell. The splitting process includes tilde
expansion, variable expansion, quote removal, command substitution,
and path expansion. Each of these phases can be turned off by the caller.
The following code fragment shows the basic usage:
/* This variable controls the splitting */
wordsplit_t ws;
int rc;
/* Provide variable definitions */
ws.ws_env = (const char **) environ;
/* Provide a function for expanding commands */
ws.ws_command = runcom;
/* Split input_string into words */
rc = wordsplit(input_string, &ws,
WRDSF_QUOTE /* Handle both single and
double quoted strings as words. */
| WRDSF_SQUEEZE_DELIMS /* Compress adjacent delimiters */
| WRDSF_PATHEXPAND /* Expand pathnames */
| WRDSF_SHOWERR); /* Show errors */
if (rc == 0) {
/* Success. The resulting words are returned in the NULL-terminated
array ws.ws_wordv. Number of words is in ws.ws_wordc */
}
/* Reclaim the allocated memory */
wordsplit_free(&ws);
For a detailed discussion, please see the man page wordsplit.3 inluded
in the package.
* Description
The package is designed as a drop-in facility for use in larger
programs. It consists of the following files:
wordsplit.h - Interface header.
wordsplit.c - Main source file.
wordsplit.3 - Documentation.
For most uses, you will need only these three. The rest of files
are for building the autotest-based testsuite:
wsp.c - Auxiliary test program.
wordsplit.at - The source for the testsuite.
* Incorporating wordsplit into your project
The project is designed to be used as a git submodule. First, select
the location DIR for the wordsplit directory within your project. Then
add the submodule:
git submodule add git://git.gnu.org.ua/wordsplit.git DIR
The rest is quite straightforward: you need to add wordsplit.c to your
sources and add both wordsplit.c and wordsplit.h to the distributed files.
There are two methods of doing so: direct incorporation and
incorporation via VPATH. The discussion below will describe both
methods based on the assumption that your project is using GNU
autotools framework. If you are using plain makefiles, these
instructions are easy to convert to such use as well.
** Direct incorporation
Add the subdir-objects option to the invocation of AM_INIT_AUTOMAKE macro in
your configure.ac:
AM_INIT_AUTOMAKE([subdir-objects])
In your Makefile.am, add both wordsplit/wordsplit.c and wordsplit/wordsplit.h
to the sources and -Iwordsplit to the cpp flags. For example:
program_SOURCES = main.c \
wordsplit/wordsplit.c \
wordsplit/wordsplit.h
AM_CPPFLAGS = -I$(srcdir)/wordsplit
You can also put wordsplit.h in the noinst_HEADERS variable, if you like:
program_SOURCES = main.c \
wordsplit/wordsplit.c
noinst_HEADERS = wordsplit/wordsplit.h
AM_CPPFLAGS = -I$(srcdir)/wordsplit
If you are building an installable library and wish to make wordsplit functions
available, install wordsplit.h to $(pkgincludedir), e.g.
lib_LTLIBRARIES = libmy.la
libmy_la_SOURCES = main.c \
wordsplit/wordsplit.c
AM_CPPFLAGS = -I$(srcdir)/wordsplit
pkginclude_HEADERS = wordsplit/wordsplit.h
** Vpath-based incorporation
Modify the VPATH variable in your Makefile.am:
VPATH += $(srcdir)/wordsplit
Notice the use of "+=": it is necessary for the vpath builds to work.
Add wordsplit.o to the name_LIBADD or name_LDADD variable, depending on
the nature of the object being built.
Modify AM_CPPFLAGS as shown in the previous section:
AM_CPPFLAGS = -I$(srcdir)/wordsplit
Add both wordsplit/wordsplit.c and wordsplit/wordsplit.h to the EXTRA_DIST
variable.
An example Makefile.am:
program_SOURCES = main.c
LDADD = wordsplit.o
noinst_HEADERS = wordsplit/wordsplit.h
VPATH += $(srcdir)/wordsplit
EXTRA_DIST = wordsplit/wordsplit.c wordsplit/wordsplit.h
* The testsuite
The package contains two files for building the testsuite: wsp.c,
which is used to build the auxiliary binary wsp, and wordsplit.at,
which is translated by GNU autotest into a testsuite shell script.
The discussion below is for those who wish to include wordsplit
testsuite into their project. It assumes the following layout of the
hosting project:
lib/
Directory holding the library that incorporates wordsplit.o.
This discussion assumes the library name is libmy.a
lib/wordsplit
Wordsplit sources.
The testsuite will be built in lib.
** Additional files
Three additional files are necessary for the testsuite: atlocal.in,
wordsplit-version.h, and package.m4.
The file atlocal.in is a simple shell script that sets the PATH
environment variable for the testsuite. It contains just one line:
PATH=$srcdir/wordsplit:$PATH
The file wordsplit-version.h provides the version definition for the
test program wsp.c. Use the following script to create it:
version=$(cd wordsplit; git describe)
cat > wordsplit-version.h <<EOF
#define WORDSPLIT_VERSION "$version"
EOF
The file package.m4 contains package description which allows
testsuite to generate an accurate report. To create it, use:
cat > package.m4 <<EOF
m4_define([AT_PACKAGE_NAME], [wordsplit])
m4_define([AT_PACKAGE_TARNAME], [wordsplit])
m4_define([AT_PACKAGE_VERSION], [$version])
m4_define([AT_PACKAGE_STRING], [AT_PACKAGE_NAME AT_PACKAGE_VERSION])
m4_define([AT_PACKAGE_BUGREPORT], [gray@gnu.org])
EOF
Here, $version is the same variable you used for wordsplit-version.h.
After creating the three files, list them in the EXTRA_DIST variable in
lib/Makefile.am to make sure they will be distributed with the tarball.
** configure.ac
Add the following lines to your configure.ac:
AM_MISSING_PROG([AUTOM4TE], [autom4te])
AC_CONFIG_TESTDIR([lib])
AC_CONFIG_FILES([lib/Makefile lib/atlocal])
** lib/Makefile.am
The makefile in lib must be modified to build the auxiliary program
wsp and create the testsuite script. This is done by the following
fragment:
EXTRA_DIST = testsuite wordsplit/wordsplit.at package.m4
DISTCLEANFILES = atconfig
MAINTAINERCLEANFILES = Makefile.in $(TESTSUITE)
TESTSUITE = $(srcdir)/testsuite
M4=m4
AUTOTEST = $(AUTOM4TE) --language=autotest
$(TESTSUITE): src/wordsplit.at
$(AM_V_GEN)$(AUTOTEST) -I $(srcdir) wordsplit/wordsplit.at \
-o $(TESTSUITE).tmp
$(AM_V_at)mv $(TESTSUITE).tmp $(TESTSUITE)
noinst_PROGRAMS = wsp
wsp_SOURCES = wordsplit/wsp.c wordsplit-version.h
wsp_LDADD = ./libmy.a
atconfig: $(top_builddir)/config.status
cd $(top_builddir) && ./config.status $@
clean-local:
@test ! -f $(TESTSUITE) || $(SHELL) $(TESTSUITE) --clean
check-local: atconfig atlocal $(TESTSUITE)
@$(SHELL) $(TESTSUITE)
* History
First version of wordsplit appeared in March 2009 as a part of the
Wydawca[1] project. Its main usage there was to assist in
configuration file parsing. The parser subsystem proved to be quite
useful and it soon forked into a separate project - Grecs[2]. This
package had been since used (as a git submodule) in a number of other
projects, such as GNU Dico[3] and Direvent[4], to name a few.
In 2010 the wordsplit sources were incorporated to the GNU
Mailutils[5] package, where they replaced the obsolete argcv module.
Mailutils uses its own configuration package, which meant that using
Grecs was not expedient. Therefore the sources had been exported from
Grecs and are kept in sync with the changes in it.
Several other projects, such as GNU Rush[6] and fileserv[7], followed
the suite. It was therefore decided that it would be advisable to
have wordsplit as a separate package which could be easily included in
another project without incurring unnecessary overhead.
Currently the work is underway on incorporating it into existing
projects.
* References
[1] Wydawca - an automatic release submission daemon
Home: <http://puszcza.gnu.org.ua/software/wydawca>
Git: <http://git.gnu.org.ua/cgit/wydawca.git>
[2] Grecs - a library for parsing structured configuration files
Home: <https://puszcza.gnu.org.ua/projects/grecs>
Git: <http://git.gnu.org.ua/cgit/grecs.git>
[3] GNU Dico - a dictionary server
Home: <https://puszcza.gnu.org.ua/projects/dico>
Git: <http://git.gnu.org.ua/cgit/dico.git>
[4] GNU Direvent - filesystem event watching daemon
Home: <http://puszcza.gnu.org.ua/software/direvent>
Git: <http://git.gnu.org.ua/cgit/direvent.git>
[5] GNU Mailutils - a general-purpose mail package
Home: <http://mailutils.org>
Git: <http://git.savannah.gnu.org/cgit/mailutils.git>
[6] GNU Rush - a restricted user shell for remote access
Home: <http://puszcza.gnu.org.ua/software/rush>
Git: <http://git.gnu.org.ua/cgit/rush.git>
[7] fileserv - simple http server for serving static files
Home: <https://puszcza.gnu.org.ua/projects/fileserv>
Git: <http://git.gnu.org.ua/cgit/fileserv.git>
[8] vmod-dbrw - Database-driven rewrite rules for Varnish Cache
Home: <http://puszcza.gnu.org.ua/software/vmod-dbrw>
Git: <http://git.gnu.org.ua/cgit/vmod-dbrw.git>
* Bug reporting
Please send bug reports, questions, suggestions and criticism to
<gray@gnu.org>. When sending bug reports, please make sure to provide
the following information:
1. Wordsplit invocation flags.
2. Input string.
3. Produced output.
4. Expected output.
* Copying
Copyright (C) 2009-2019 Sergey Poznyakoff
Permission is granted to anyone to make or distribute verbatim copies
of this document as received, in any medium, provided that the
copyright notice and this permission notice are preserved,
thus giving the recipient permission to redistribute in turn.
Permission is granted to distribute modified versions
of this document, or of portions of it, under the above conditions,
provided also that they carry prominent notices stating who last
changed them.
Local Variables:
mode: outline
paragraph-separate: "[ ]*$"
version-control: never
End:

162
bootstrap
View file

@ -1,162 +0,0 @@
#! /bin/sh
cd $(dirname $0)
version=$(git describe)
function genfiles() {
cat > wordsplit-version.h <<EOF
#define WORDSPLIT_VERSION "$version"
EOF
cat > package.m4 <<EOF
m4_define([AT_PACKAGE_NAME], [wordsplit])
m4_define([AT_PACKAGE_TARNAME], [wordsplit])
m4_define([AT_PACKAGE_VERSION], [$version])
m4_define([AT_PACKAGE_STRING], [AT_PACKAGE_NAME AT_PACKAGE_VERSION])
m4_define([AT_PACKAGE_BUGREPORT], [gray@gnu.org])
EOF
}
function mk_atlocal() {
cat <<\EOF
# @configure_input@ -*- shell-script -*-
# Configurable variable values for wordsplit test suite.
# Copyright (C) 2016-2019 Sergey Poznyakoff
PATH=@abs_builddir@:$srcdir:$PATH
EOF
} > atlocal.in
function mk_testsuite() {
sed -e 's|MODDIR|$moddir|' <<\EOF
# ##################
# Testsuite
# ##################
EXTRA_DIST = testsuite wordsplit.at package.m4
DISTCLEANFILES = atconfig
MAINTAINERCLEANFILES = Makefile.in $(TESTSUITE)
TESTSUITE = $(srcdir)/testsuite
M4=m4
AUTOTEST = $(AUTOM4TE) --language=autotest
$(TESTSUITE): wordsplit.at
$(AM_V_GEN)$(AUTOTEST) -I $(srcdir) wordsplit.at -o $(TESTSUITE).tmp
$(AM_V_at)mv $(TESTSUITE).tmp $(TESTSUITE)
atconfig: $(top_builddir)/config.status
cd $(top_builddir) && ./config.status MODDIR/$@
clean-local:
@test ! -f $(TESTSUITE) || $(SHELL) $(TESTSUITE) --clean
check-local: atconfig atlocal $(TESTSUITE)
@$(SHELL) $(TESTSUITE)
noinst_PROGRAMS = wsp
wsp_SOURCES = wsp.c wordsplit-version.h
EOF
echo "wsp_LDADD = $1"
}
function common_notice() {
cat <<EOF
Add the following to your configure.ac:
AC_CONFIG_TESTDIR($moddir)
AC_CONFIG_FILES([$moddir/Makefile $moddir/atlocal])
EOF
}
function mk_installed() {
(cat <<EOF
lib_LTLIBRARIES = libwordsplit.la
libwordsplit_la_SOURCES = wordsplit.c
include_HEADERS = wordsplit.h
EOF
mk_testsuite ./libwordsplit.la) > Makefile.am
mk_atlocal
common_notice
}
function mk_shared() {
(cat <<EOF
noinst_LTLIBRARIES = libwordsplit.la
libwordsplit_la_SOURCES = wordsplit.c wordsplit.h
EOF
mk_testsuite ./libwordsplit.la) > Makefile.am
mk_atlocal
common_notice
}
function mk_static() {
(cat <<EOF
noinst_LIBRARIES = libwordsplit.a
libwordsplit_a_SOURCES = wordsplit.c wordsplit.h
EOF
mk_testsuite ./libwordsplit.a) > Makefile.am
mk_atlocal
common_notice
}
function mk_embedded() {
(mk_testsuite wordsplit.o
echo "AM_CPPFLAGS = "
)> Makefile.am
mk_atlocal
cat <<EOF
Add the following to the _SOURCES variable of your top-level Makefile.am:
wordsplit/wordsplit.c\\
wordsplit/wordsplit.h
If test framework is enabled, add also the line
SUBDIRS = . wordsplit
and the following lines to your configure.ac:
AC_CONFIG_TESTDIR($moddir)
AC_CONFIG_FILES([$moddir/Makefile $moddir/atlocal])
Replace ellipsis with the leading path components to the embedded wordsplit
sources.
EOF
}
function usage() {
cat <<EOF
usage: $0 MODE MODDIR
MODE is any of:
installed standalone installable library
shared shared convenience library (lt)
static static convenience library
embedded embedded into another library
EOF
}
#
if [ $# -ne 2 ]; then
usage >&2
exit 1
fi
moddir=$2
case $1 in
installed|shared|static|standalone|embedded)
genfiles
mk_$1
;;
clean)
rm -f Makefile.am package.m4 wordsplit-version.h atlocal.in
;;
*)
usage
;;
esac

View file

@ -14,7 +14,7 @@
.\" You should have received a copy of the GNU General Public License
.\" along with wordsplit. If not, see <http://www.gnu.org/licenses/>.
.\"
.TH WORDSPLIT 3 "July 7, 2019" "WORDSPLIT" "Wordsplit User Reference"
.TH WORDSPLIT 3 "July 9, 2019" "WORDSPLIT" "Wordsplit User Reference"
.SH NAME
wordsplit \- split string into words
.SH SYNOPSIS
@ -62,7 +62,10 @@ The function
.B wordsplit_free_words
frees only the memory allocated for elements of
.I ws_wordv
and initializes
after which it resets
.I ws_wordv to
.B NULL
and
.I ws_wordc
to zero.
.PP
@ -73,15 +76,17 @@ wordsplit_t ws;
int rc;
if (wordsplit(s, &ws, WRDSF_DEFFLAGS)) {
wordsplit_perror(&ws);
return;
}
for (i = 0; i < ws.ws_wordc; i++) {
/* do something with ws.ws_wordv[i] */
for (i = 0; i < ws.ws_wordc; i++) {
/* do something with ws.ws_wordv[i] */
}
}
wordsplit_free(&ws);
.EE
.PP
Notice, that \fBwordsplit_free\fR must be called after each invocation
of \fBwordsplit\fR or \fBwordsplit_len\fR, even if it resulted in
error.
.PP
The function
.B wordsplit_getwords
returns in \fIwordv\fR an array of words, and in \fIwordc\fR the number
@ -135,49 +140,37 @@ wordsplit_free(&ws);
.EE
.SH OPTIONS
The number of flags is limited to 32 (the width of \fBuint32_t\fR data
type) and each bit is occupied by a corresponding flag. However, the
number of features \fBwordsplit\fR provides required still
more. Additional features can be requested by setting a corresponding
\fIoption bit\fR in the \fBws_option\fR field of the \fBstruct
wordsplit\fR argument. To inform wordsplit functions that this field
is initialized the \fBWRDSF_OPTIONS\fR flag must be set.
type). By the time of this writing each bit is already occupied by a
corresponding flag. However, the number of features \fBwordsplit\fR
provides requires still more. Additional features can be requested by
setting a corresponding \fIoption bit\fR in the \fBws_option\fR field
of the \fBstruct wordsplit\fR argument. To inform wordsplit functions
that this field is initialized the \fBWRDSF_OPTIONS\fR flag must be set.
.PP
Option symbolic names begin with \fBWRDSO_\fR. They are discussed in
detail in the subsequent chapters.
.SH EXPANSION
Expansion is performed on the input after it has been split into
words. There are several kinds of expansion, which of them are
performed is controlled by appropriate bits set in the \fIflags\fR
argument. Whatever expansion kinds are enabled, they are always run
in the same order as described in this section.
words. The kinds of expansion to be performed are controlled by the
appropriate bits set in the \fIflags\fR argument. Whatever expansion
kinds are enabled, they are always run in the order described in this
section.
.SS Whitespace trimming
Whitespace trimming removes any leading and trailing whitespace from
the initial word array. It is enabled by the
.B WRDSF_WS
flag. Whitespace trimming is needed only if you redefine
word delimiters (\fIws_delim\fR member) so that they don't contain
whitespace characters (\fB\(dq \\t\\n\(dq\fR).
.SS Tilde expansion
Tilde expansion is enabled if the
.B WRDSF_PATHEXPAND
bit is set. It expands all words that begin with an unquoted tilde
character (`\fB~\fR'). If tilde is followed immediately by a slash,
it is replaced with the home directory of the current user (as
determined by his \fBpasswd\fR entry). A tilde alone is handled the
same way. Otherwise, the characters between the tilde and first slash
character (or end of string, if it doesn't contain any) are treated as
a login name. and are replaced (along with the tilde itself) with the
home directory of that user. If there is no user with such login
name, the word is left unchanged.
flag. Whitespace trimming is enabled automatically if the word
delimiters (\fIws_delim\fR member) contain whitespace characters
(\fB\(dq \\t\\n\(dq\fR), which is the default.
.SS Variable expansion
Variable expansion replaces each occurrence of
.BI $ NAME
or
.BI ${ NAME }
with the value of the variable \fINAME\fR. It is enabled if the
flag \fBWRDSF_NOVAR\fR is not set. The caller is responsible for
supplying the table of available variables. Two mechanisms are
provided: environment array and a callback function.
with the value of the variable \fINAME\fR. It is enabled by default
and can be disabled by setting the \fBWRDSF_NOVAR\fR flag. The caller
is responsible for supplying the table of available variables. Two
mechanisms are provided: environment array and a callback function.
.PP
Environment array is a \fBNULL\fR-terminated array of variables,
stored in the \fIws_env\fR member. The \fBWRDSF_ENV\fR flag must be
@ -204,8 +197,8 @@ function itself shall be defined as
int getvar (char **ret, const char *var, size_t len, void *clos);
.EE
.PP
The function shall look up for the variable identified by the first
\fIlen\fR bytes of the string \fIvar\fR. If such variable is found,
The function shall look up the variable identified by the first
\fIlen\fR bytes of the string \fIvar\fR. If the variable is found,
the function shall store a copy of its value (allocated using
\fBmalloc\fR(3)) in the memory location pointed to by \fBret\fR, and
return \fBWRDSE_OK\fR. If the variable is not found, the function shall
@ -216,7 +209,7 @@ If \fIws_getvar\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
.PP
@ -225,10 +218,11 @@ If both
and
.I ws_getvar
are used, the variable is first looked up in
.IR ws_env ,
and if not found there, the
.IR ws_env .
If it is not found there, the
.I ws_getvar
function is called.
callback is invoked.
This order is reverted if the \fBWRDSO_GETVARPREF\fR option is set.
.PP
During variable expansion, the forms below cause
.B wordsplit
@ -255,14 +249,61 @@ Otherwise, the value of \fIvariable\fR is substituted.
.BI ${ variable :+ word }
.BR "Use Alternate Value" .
If \fIvariable\fR is null or unset, nothing is substituted, otherwise the
expansion of \fIword\fR is substituted.
expansion of \fIword\fR is substituted.
.PP
Unless the above forms are used, a reference to an undefined variable
expands to empty string. Three flags affect this behavior. If the
\fBWRDSF_UNDEF\fR flag is set, expanding undefined variable triggers
a \fBWRDSE_UNDEF\fR error. If the \fBWRDSF_WARNUNDEF\fR flag is set,
a non-fatal warning is emitted for each undefined variable. Finally,
if the \fBWRDSF_KEEPUNDEF\fR flag is set, references to undefined
variables are left unexpanded.
.PP
If two or three of these flags are set simultaneously, the behavior is
undefined.
.SS Positional argument expansion
\fIPositional arguments\fR are special parameters that can be
referenced in the input string by their ordinal number. The numbering
begins at \fB0\fR. The syntax for referencing positional arguments is
the same as for the variables, except that argument index is used
instead of the variable name. If the index is between 0 and 9, the
\fB$\fIN\fR form is acceptable. Otherwise, the index must be enclosed
in curly braces: \fB${\fIN\fB}\fR.
.PP
During argument expansion, references to positional arguments are
replaced with the corresponding values.
.PP
Argument expansion is requested by the \fBWRDSO_PARAMV\fR option bit.
The NULL-terminated array of variables shall be supplied in the
.I ws_paramv
member. The
.I ws_paramc
member shall be initialized to the number of elements in
.IR ws_paramv .
.PP
Setting the \fBWRDSO_PARAM_NEGIDX\fR option together with
\fBWRDSO_PARAMV\fR enables negative positional argument references.
A negative reference has the form \fB${-\fIN\fB}\fR. It is expanded
to the value of the argument with index \fB\fIws_paramc\fR \- \fIN\fR.
.SS Quote removal
Quote removal translates unquoted escape sequences into corresponding bytes.
An escape sequence is a backslash followed by one or more characters. By
default, each sequence \fB\\\fIC\fR appearing in unquoted words is
replaced with the character \fIC\fR. In doubly-quoted strings, two
backslash sequences are recognized: \fB\\\\\fR translates to a single
backslash, and \fB\\\(dq\fR translates to a double-quote.
During quote removal, single or double quotes surrounding a sequence
of characters are removed and the sequence itself is treated as a
single word. Characters within single quotes are treated verbatim.
Characters within double quotes undergo variable expansion and
backslash interpretation (see below).
.PP
Recognition of single quoted strings is enabled by the
\fBWRDSF_SQUOTE\fR flag. Recognition of double quotes is enabled by
the \fBWRDSF_DQUOTE\fR flag. The macro \fBWRDSF_QUOTE\fR enables both.
.SS Backslash interpretation
Backslash interpretation translates unquoted
.I escape sequences
into corresponding characters. An escape sequence is a backslash followed
by one or more characters. By default, each sequence \fB\\\fIC\fR
appearing in unquoted words is replaced with the character \fIC\fR. In
doubly-quoted strings, two backslash sequences are recognized:
\fB\\\\\fR translates to a single backslash, and \fB\\\(dq\fR
translates to a double-quote.
.PP
Two flags are provided to modify this behavior. If
.I WRDSF_CESCAPES
@ -292,16 +333,16 @@ The \fBWRDSF_ESCAPE\fR flag allows the caller to customize escape
sequences. If it is set, the \fBws_escape\fR member must be
initialized. This member provides escape tables for unquoted words
(\fBws_escape[0]\fR) and quoted strings (\fBws_escape[1]\fR). Each
table is a string consisting of even number of charactes. In each
table is a string consisting of an even number of charactes. In each
pair of characters, the first one is a character that can appear after
backslash, and the following one is its translation. For example, the
above table of C escapes is represented as
\fB\(dqa\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR.
\fB\(dq\\\\\\\\"\\"a\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR.
.PP
It is valid to initialize \fBws_escape\fR elements to zero. In this
case, no backslash translation occurs.
.PP
The handling of octal and hex escapes is controlled by the following
Interpretation of octal and hex escapes is controlled by the following
bits in \fBws_options\fR:
.TP
.B WRDSO_BSKEEP_WORD
@ -357,9 +398,9 @@ The substitution function should be defined as follows:
void *\fIclos\fB);\fR
.RE
.PP
First \fIlen\fR bytes of \fIcmd\fR contain the command invocation as
it appeared between
.BR $( and ),
On input, the first \fIlen\fR bytes of \fIcmd\fR contain the command
invocation as it appeared between
.BR $( " and " ),
with all expansions performed.
.PP
The \fIargv\fR parameter contains the command
@ -381,11 +422,27 @@ is returned, a pointer to the error description string must be stored in
When \fBWRDSE_OK\fR or \fBWRDSE_USERERR\fR is returned, the
data stored in \fB*ret\fR must be allocated using
.BR malloc (3).
.SS Pathname expansion
Pathname expansion is performed if the \fBWRDSF_PATHEXPAND\fR flag is
set. Each unquoted word is scanned for characters
.BR * , ? ", and " [ .
If one of these appears, the word is considered a \fIpattern\fR (in
.SS Tilde and pathname expansion
Both expansions are performed if the
.B WRDSF_PATHEXPAND
flag is set.
.PP
.I Tilde expansion
affects any word that begins with an unquoted tilde
character (\fB~\fR). If the tilde is followed immediately by a slash,
it is replaced with the home directory of the current user (as
determined by his \fBpasswd\fR entry). A tilde alone is handled the
same way. Otherwise, the characters between the tilde and first slash
character (or end of string, if it doesn't contain any) are treated as
a login name. and are replaced (along with the tilde itself) with the
home directory of that user. If there is no user with such login
name, the word is left unchanged.
.PP
During
.I pathname expansion
each unquoted word is scanned for characters
.BR * ", " ? ", and " [ .
If any of these appears, the word is considered a \fIpattern\fR (in
the sense of
.BR glob (3))
and is replaced with an alphabetically sorted list of file names matching the
@ -429,9 +486,9 @@ the last word. For example, if the input to the above fragment were
The data type \fBwordsplit_t\fR has three members that contain
output data upon return from \fBwordsplit\fR or \fBwordsplit_len\fR,
and a number of members that the caller can initialize on input in
order to customize the function behavior. Each its member has a
corresponding flag bit, which must be set in the \fIflags\fR argument
in order to instruct the \fBwordsplit\fR function to use it.
order to customize the function behavior. For each input member there
is a corresponding flag bit, which must be set in the \fIflags\fR argument
in order to instruct the \fBwordsplit\fR function to use the member.
.SS OUTPUT
.TP
.BI size_t " ws_wordc"
@ -441,17 +498,6 @@ from \fBwordsplit\fR.
.BI "char ** " ws_wordv
Array of resulting words. Accessible upon successful return
from \fBwordsplit\fR.
.TP
.BI "size_t " ws_wordi
Total number of words processed. This field is intended for use with
.B WRDSF_INCREMENTAL
flag. If that flag is not set, the following relation holds:
.BR "ws_wordi == ws_wordc - ws_offs" .
.TP
.BI "int " ws_errno
Error code, if the invocation of \fBwordsplit\fR or
\fBwordsplit_len\fR failed. This is the same value as returned from
the function in that case.
.PP
The caller should not attempt to free or reallocate \fIws_wordv\fR or
any elements thereof, nor to modify \fIws_wordc\fR.
@ -463,6 +509,52 @@ the caller should use
It is more effective than copying the contents of
.I ws_wordv
manually.
.TP
.BI "size_t " ws_wordi
Total number of words processed. This field is intended for use with
.B WRDSF_INCREMENTAL
flag. If that flag is not set, the following relation holds:
.BR "ws_wordi == ws_wordc - ws_offs" .
.TP
.BI "int " ws_errno
Error code, if the invocation of \fBwordsplit\fR or
\fBwordsplit_len\fR failed. This is the same value as returned from
the function in that case.
.TP
.BI "char *" ws_errctx
On error, context in which the error occurred. For
.BR WRDSE_UNDEF ,
it is the name of the undefined variable. For
.B WRDSE_GLOBERR
- the pattern that caused error.
.sp
The caller should treat this member as
.BR "const char *" .
.PP
The following members are used if the variable expansion was requested
and the input string contained an
.B Assign Default Values
form (\fB${\fIvariable\fB:=\fIword\fB}\fR).
.TP
.BI "char **" ws_envbuf
Modified environment. It follows the same arrangement as \fIws_env\fR
on input (see the \fBWRDSF_ENV_KV\fR flag). If \fIws_env\fR was NULL (or
\fBWRDSF_ENV\fR was not set), but the \fIws_getvar\fR callback was
used, the \fIws_envbuf\fR array will contain only the modified variables.
.TP
.BI "size_t " ws_envidx
Number of entries in
.IR ws_envbuf .
.PP
If positional parameters were used (see the \fBWRDSO_PARAMV\fR option)
and any of them were modified during processing, the following two
members supply the modified parameter array.
.TP
.BI "char ** " ws_parambuf
Array of positional parameters.
.TP
.BI "size_t " ws_paramidx
Number of positional parameters.
.SS INPUT
.TP
.BI "size_t " ws_offs
@ -569,12 +661,12 @@ one containing variable name, and the next one with its
value.
.TP
.BI "int (*" ws_getvar ") (char **ret, const char *var, size_t len, void *clos)"
Points to the function that will be used during variable expansion to
look up for the value of the environment variable named \fBvar\fR.
Points to the function that will be used during variable expansion for
environment variable lookups.
This function is used if the variable expansion is enabled (i.e. the
.B WRDSF_NOVAR
flag is not set), and the \fBWRDSF_GETVAR\fR flag is set.
.sp
If both
.B WRDSF_ENV
and
@ -583,14 +675,15 @@ are set, the variable is first looked up in the
.I ws_env
array and, if not found there,
.I ws_getvar
is called.
is called. If the \fBWRDSO_GETVARPREF\fR option is set, this order is
reverted.
.sp
The name of the variable is specified by the first \fIlen\fR bytes of
the string \fIvar\fR. The \fIclos\fR parameter supplies the
user-specific data (see below the description of \fIws_closure\fR
member) and the \fBret\fR parameter points to the memory location
where output data is to be stored. On success, the function must
store ther a pointer to the string with the value of the variable and
store there a pointer to the string with the value of the variable and
return 0. On error, it must return one of the error codes described
in the section
.BR "ERROR CODES" .
@ -598,7 +691,7 @@ If \fIws_getvar\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
.TP
@ -629,7 +722,7 @@ If \fIws_command\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
@ -639,6 +732,17 @@ command substitution disabled.
The \fIclos\fR parameter supplies user-specific data (see the
description of \fIws_closure\fR member).
.PP
The following two members are consulted if the \fBWRDSO_PARAMV\fR
option is set. They provide an array of positional parameters.
.TP
.BI "char const **" ws_paramv
Positional parameters. These are accessible in the input string using
the notation \fB$\fIN\fR or \fB${\fIN\fB}\fR, where \fIN\fR is the
0-based parameter number.
.TP
.BI "size_t " ws_paramc
Number of positional parameters.
.SH FLAGS
The following macros are defined for use in the \fBflags\fR argument.
.TP
@ -657,7 +761,7 @@ delimiter, replace \fBC\fR escapes appearing in the input string with
the corresponding characters.
.TP
.B WRDSF_APPEND
Append the words found to the array resulting from a previous call to
Append the resulting words to the array left from a previous call to
\fBwordsplit\fR.
.TP
.B WRDSF_DOOFFS
@ -671,7 +775,9 @@ These are not counted in the returned
.IR ws_wordc .
.TP
.B WRDSF_NOCMD
Don't do command substitution.
Don't do command substitution. The \fBWRDSO_NOCMDSPLIT\fR option set
together with this flag prevents splitting command invocations
into separate words (see the \fBOPTIONS\fR section).
.TP
.B WRDSF_REUSE
The parameter \fIws\fR resulted from a previous call to
@ -686,7 +792,9 @@ Print errors using
Consider it an error if an undefined variable is expanded.
.TP
.B WRDSF_NOVAR
Don't do variable expansion.
Don't do variable expansion. The \fBWRDSO_NOVARSPLIT\fR option set
together with this flag prevents variable references from being split
into separate words (see the \fBOPTIONS\fR section).
.TP
.B WRDSF_ENOMEMABRT
Abort on
@ -721,7 +829,8 @@ Return delimiters.
.TP
.B WRDSF_SED_EXPR
Treat
.BR sed (1) expressions as words.
.BR sed (1)
expressions as words.
.TP
.B WRDSF_DELIM
.I ws_delim
@ -792,8 +901,7 @@ See the section
for a detailed discussion.
.TP
.B WRDSF_PATHEXPAND
Perform pathname and tilde expansion. If this flag is set, the
\fIws_options\fR member must also be initialized. See the
Perform pathname and tilde expansion. See the
subsection
.B "Pathname expansion"
for details.
@ -822,32 +930,60 @@ metacharacters.
.PP
.TP
.B WRDSO_BSKEEP_WORD
Quote removal: when an unrecognized escape sequence is encountered in a word,
preserve it on output. If that bit is not set, the backslash is
removed from such sequences.
Backslash interpretation: when an unrecognized escape sequence is
encountered in a word, preserve it on output. If that bit is not set,
the backslash is removed from such sequences.
.TP
.B WRDSO_OESC_WORD
Quote removal: handle octal escapes in words.
Backslash interpretation: handle octal escapes in words.
.TP
.B WRDSO_XESC_WORD
Quote removal: handle hex escapes in words.
Backslash interpretation: handle hex escapes in words.
.TP
.B WRDSO_BSKEEP_QUOTE
Quote removal: when an unrecognized escape sequence is encountered in
a doubly-quoted string, preserve it on output. If that bit is not
set, the backslash is removed from such sequences.
Backslash interpretation: when an unrecognized escape sequence is
encountered in a doubly-quoted string, preserve it on output. If that
bit is not set, the backslash is removed from such sequences.
.TP
.B WRDSO_OESC_QUOTE
Quote removal: handle octal escapes in doubly-quoted strings.
Backslash interpretation: handle octal escapes in doubly-quoted strings.
.TP
.B WRDSO_XESC_QUOTE
Quote removal: handle hex escapes in doubly-quoted strings.
Backslash interpretation: handle hex escapes in doubly-quoted strings.
.TP
.B WRDSO_MAXWORDS
The \fBws_maxwords\fR member is initialized. This is used to control
the number of words returned by a call to \fBwordsplit\fR. For a
detailed discussion, refer to the chapter
.BR "LIMITING THE NUMBER OF WORDS" .
.TP
.B WRDSO_NOVARSPLIT
When \fBWRDSF_NOVAR\fR is set, don't split variable references, even
if they contain whitespace. E.g.
.B ${VAR:-foo bar}
will be treated as a single word.
.TP
.B WRDSO_NOCMDSPLIT
When \fBWRDSF_NOCMD\fR is set, don't split whatever looks like command
invocation, even if it contains whitespace. E.g.
.B $(command arg)
will be treated as a single word.
.TP
.B WRDSO_PARAMV
Positional arguments are supplied in
.I ws_paramv
and
.IR ws_paramc .
See the subsection
.B Positional argument expansion
for a discussion.
.TP
.B WRDSO_PARAM_NEGIDX
Used together with \fBWRDSO_PARAMV\fR, this allows for negative
positional argument references. A negative argument reference has the
form \fB${-\fIN\fB}\fR. It is expanded to the value of the argument
with index \fB\fIws_paramc\fR \- \fIN\fR, i.e. \fIN\fRth if counting
from the end.
.SH "ERROR CODES"
.TP
.BR WRDSE_OK ", " WRDSE_EOF
@ -1015,8 +1151,10 @@ char **shell_parse(char *s)
.EE
.SH AUTHORS
Sergey Poznyakoff
.SH BUGS
Backtick command expansion is not supported.
.SH "BUG REPORTS"
Report bugs to <gray@gnu.org.ua>.
Report bugs to <gray@gnu.org>.
.SH COPYRIGHT
Copyright \(co 2009-2019 Sergey Poznyakoff
.br