aboutsummaryrefslogtreecommitdiff
path: root/src/lib/bstrlib/bstrlib.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/lib/bstrlib/bstrlib.txt')
-rw-r--r--src/lib/bstrlib/bstrlib.txt3547
1 files changed, 3547 insertions, 0 deletions
diff --git a/src/lib/bstrlib/bstrlib.txt b/src/lib/bstrlib/bstrlib.txt
new file mode 100644
index 0000000..26fc927
--- /dev/null
+++ b/src/lib/bstrlib/bstrlib.txt
@@ -0,0 +1,3547 @@
+Better String library
+---------------------
+
+by Paul Hsieh
+
+The bstring library is an attempt to provide improved string processing
+functionality to the C and C++ language. At the heart of the bstring library
+(Bstrlib for short) is the management of "bstring"s which are a significant
+improvement over '\0' terminated char buffers.
+
+===============================================================================
+
+Motivation
+----------
+
+The standard C string library has serious problems:
+
+ 1) Its use of '\0' to denote the end of the string means knowing a
+ string's length is O(n) when it could be O(1).
+ 2) It imposes an interpretation for the character value '\0'.
+ 3) gets() always exposes the application to a buffer overflow.
+ 4) strtok() modifies the string its parsing and thus may not be usable in
+ programs which are re-entrant or multithreaded.
+ 5) fgets has the unusual semantic of ignoring '\0's that occur before
+ '\n's are consumed.
+ 6) There is no memory management, and actions performed such as strcpy,
+ strcat and sprintf are common places for buffer overflows.
+ 7) strncpy() doesn't '\0' terminate the destination in some cases.
+ 8) Passing NULL to C library string functions causes an undefined NULL
+ pointer access.
+ 9) Parameter aliasing (overlapping, or self-referencing parameters)
+ within most C library functions has undefined behavior.
+ 10) Many C library string function calls take integer parameters with
+ restricted legal ranges. Parameters passed outside these ranges are
+ not typically detected and cause undefined behavior.
+
+So the desire is to create an alternative string library that does not suffer
+from the above problems and adds in the following functionality:
+
+ 1) Incorporate string functionality seen from other languages.
+ a) MID$() - from BASIC
+ b) split()/join() - from Python
+ c) string/char x n - from Perl
+ 2) Implement analogs to functions that combine stream IO and char buffers
+ without creating a dependency on stream IO functionality.
+ 3) Implement the basic text editor-style functions insert, delete, find,
+ and replace.
+ 4) Implement reference based sub-string access (as a generalization of
+ pointer arithmetic.)
+ 5) Implement runtime write protection for strings.
+
+There is also a desire to avoid "API-bloat". So functionality that can be
+implemented trivially in other functionality is omitted. So there is no
+left$() or right$() or reverse() or anything like that as part of the core
+functionality.
+
+Explaining Bstrings
+-------------------
+
+A bstring is basically a header which wraps a pointer to a char buffer. Lets
+start with the declaration of a struct tagbstring:
+
+ struct tagbstring {
+ int mlen;
+ int slen;
+ unsigned char * data;
+ };
+
+This definition is considered exposed, not opaque (though it is neither
+necessary nor recommended that low level maintenance of bstrings be performed
+whenever the abstract interfaces are sufficient). The mlen field (usually)
+describes a lower bound for the memory allocated for the data field. The
+slen field describes the exact length for the bstring. The data field is a
+single contiguous buffer of unsigned chars. Note that the existence of a '\0'
+character in the unsigned char buffer pointed to by the data field does not
+necessarily denote the end of the bstring.
+
+To be a well formed modifiable bstring the mlen field must be at least the
+length of the slen field, and slen must be non-negative. Furthermore, the
+data field must point to a valid buffer in which access to the first mlen
+characters has been acquired. So the minimal check for correctness is:
+
+ (slen >= 0 && mlen >= slen && data != NULL)
+
+bstrings returned by bstring functions can be assumed to be either NULL or
+satisfy the above property. (When bstrings are only readable, the mlen >=
+slen restriction is not required; this is discussed later in this section.)
+A bstring itself is just a pointer to a struct tagbstring:
+
+ typedef struct tagbstring * bstring;
+
+Note that use of the prefix "tag" in struct tagbstring is required to work
+around the inconsistency between C and C++'s struct namespace usage. This
+definition is also considered exposed.
+
+Bstrlib basically manages bstrings allocated as a header and an associated
+data-buffer. Since the implementation is exposed, they can also be
+constructed manually. Functions which mutate bstrings assume that the header
+and data buffer have been malloced; the bstring library may perform free() or
+realloc() on both the header and data buffer of any bstring parameter.
+Functions which return bstring's create new bstrings. The string memory is
+freed by a bdestroy() call (or using the bstrFree macro).
+
+The following related typedef is also provided:
+
+ typedef const struct tagbstring * const_bstring;
+
+which is also considered exposed. These are directly bstring compatible (no
+casting required) but are just used for parameters which are meant to be
+non-mutable. So in general, bstring parameters which are read as input but
+not meant to be modified will be declared as const_bstring, and bstring
+parameters which may be modified will be declared as bstring. This convention
+is recommended for user written functions as well.
+
+Since bstrings maintain interoperability with C library char-buffer style
+strings, all functions which modify, update or create bstrings also append a
+'\0' character into the position slen + 1. This trailing '\0' character is
+not required for bstrings input to the bstring functions; this is provided
+solely as a convenience for interoperability with standard C char-buffer
+functionality.
+
+Analogs for the ANSI C string library functions have been created when they
+are necessary, but have also been left out when they are not. In particular
+there are no functions analogous to fwrite, or puts just for the purposes of
+bstring. The ->data member of any string is exposed, and therefore can be
+used just as easily as char buffers for C functions which read strings.
+
+For those that wish to hand construct bstrings, the following should be kept
+in mind:
+
+ 1) While bstrlib can accept constructed bstrings without terminating
+ '\0' characters, the rest of the C language string library will not
+ function properly on such non-terminated strings. This is obvious
+ but must be kept in mind.
+ 2) If it is intended that a constructed bstring be written to by the
+ bstring library functions then the data portion should be allocated
+ by the malloc function and the slen and mlen fields should be entered
+ properly. The struct tagbstring header is not reallocated, and only
+ freed by bdestroy.
+ 3) Writing arbitrary '\0' characters at various places in the string
+ will not modify its length as perceived by the bstring library
+ functions. In fact, '\0' is a legitimate non-terminating character
+ for a bstring to contain.
+ 4) For read only parameters, bstring functions do not check the mlen.
+ I.e., the minimal correctness requirements are reduced to:
+
+ (slen >= 0 && data != NULL)
+
+Better pointer arithmetic
+-------------------------
+
+One built-in feature of '\0' terminated char * strings, is that its very easy
+and fast to obtain a reference to the tail of any string using pointer
+arithmetic. Bstrlib does one better by providing a way to get a reference to
+any substring of a bstring (or any other length delimited block of memory.)
+So rather than just having pointer arithmetic, with bstrlib one essentially
+has segment arithmetic. This is achieved using the macro blk2tbstr() which
+builds a reference to a block of memory and the macro bmid2tbstr() which
+builds a reference to a segment of a bstring. Bstrlib also includes
+functions for direct consumption of memory blocks into bstrings, namely
+bcatblk () and blk2bstr ().
+
+One scenario where this can be extremely useful is when string contains many
+substrings which one would like to pass as read-only reference parameters to
+some string consuming function without the need to allocate entire new
+containers for the string data. More concretely, imagine parsing a command
+line string whose parameters are space delimited. This can only be done for
+tails of the string with '\0' terminated char * strings.
+
+Improved NULL semantics and error handling
+------------------------------------------
+
+Unless otherwise noted, if a NULL pointer is passed as a bstring or any other
+detectably illegal parameter, the called function will return with an error
+indicator (either NULL or BSTR_ERR) rather than simply performing a NULL
+pointer access, or having undefined behavior.
+
+To illustrate the value of this, consider the following example:
+
+ strcpy (p = malloc (13 * sizeof (char)), "Hello,");
+ strcat (p, " World");
+
+This is not correct because malloc may return NULL (due to an out of memory
+condition), and the behaviour of strcpy is undefined if either of its
+parameters are NULL. However:
+
+ bstrcat (p = bfromcstr ("Hello,"), q = bfromcstr (" World"));
+ bdestroy (q);
+
+is well defined, because if either p or q are assigned NULL (indicating a
+failure to allocate memory) both bstrcat and bdestroy will recognize it and
+perform no detrimental action.
+
+Note that it is not necessary to check any of the members of a returned
+bstring for internal correctness (in particular the data member does not need
+to be checked against NULL when the header is non-NULL), since this is
+assured by the bstring library itself.
+
+bStreams
+--------
+
+In addition to the bgets and bread functions, bstrlib can abstract streams
+with a high performance read only stream called a bStream. In general, the
+idea is to open a core stream (with something like fopen) then pass its
+handle as well as a bNread function pointer (like fread) to the bsopen
+function which will return a handle to an open bStream. Then the functions
+bsread, bsreadln or bsreadlns can be called to read portions of the stream.
+Finally, the bsclose function is called to close the bStream -- it will
+return a handle to the original (core) stream. So bStreams, essentially,
+wrap other streams.
+
+The bStreams have two main advantages over the bgets and bread (as well as
+fgets/ungetc) paradigms:
+
+1) Improved functionality via the bunread function which allows a stream to
+ unread characters, giving the bStream stack-like functionality if so
+ desired.
+2) A very high performance bsreadln function. The C library function fgets()
+ (and the bgets function) can typically be written as a loop on top of
+ fgetc(), thus paying all of the overhead costs of calling fgetc on a per
+ character basis. bsreadln will read blocks at a time, thus amortizing the
+ overhead of fread calls over many characters at once.
+
+However, clearly bStreams are suboptimal or unusable for certain kinds of
+streams (stdin) or certain usage patterns (a few spotty, or non-sequential
+reads from a slow stream.) For those situations, using bgets will be more
+appropriate.
+
+The semantics of bStreams allows practical construction of layerable data
+streams. What this means is that by writing a bNread compatible function on
+top of a bStream, one can construct a new bStream on top of it. This can be
+useful for writing multi-pass parsers that don't actually read the entire
+input more than once and don't require the use of intermediate storage.
+
+Aliasing
+--------
+
+Aliasing occurs when a function is given two parameters which point to data
+structures which overlap in the memory they occupy. While this does not
+disturb read only functions, for many libraries this can make functions that
+write to these memory locations malfunction. This is a common problem of the
+C standard library and especially the string functions in the C standard
+library.
+
+The C standard string library is entirely char by char oriented (as is
+bstring) which makes conforming implementations alias safe for some
+scenarios. However no actual detection of aliasing is typically performed,
+so it is easy to find cases where the aliasing will cause anomolous or
+undesirable behaviour (consider: strcat (p, p).) The C99 standard includes
+the "restrict" pointer modifier which allows the compiler to document and
+assume a no-alias condition on usage. However, only the most trivial cases
+can be caught (if at all) by the compiler at compile time, and thus there is
+no actual enforcement of non-aliasing.
+
+Bstrlib, by contrast, permits aliasing and is completely aliasing safe, in
+the C99 sense of aliasing. That is to say, under the assumption that
+pointers of incompatible types from distinct objects can never alias, bstrlib
+is completely aliasing safe. (In practice this means that the data buffer
+portion of any bstring and header of any bstring are assumed to never alias.)
+With the exception of the reference building macros, the library behaves as
+if all read-only parameters are first copied and replaced by temporary
+non-aliased parameters before any writing to any output bstring is performed
+(though actual copying is extremely rarely ever done.)
+
+Besides being a useful safety feature, bstring searching/comparison
+functions can improve to O(1) execution when aliasing is detected.
+
+Note that aliasing detection and handling code in Bstrlib is generally
+extremely cheap. There is almost never any appreciable performance penalty
+for using aliased parameters.
+
+Reenterancy
+-----------
+
+Nearly every function in Bstrlib is a leaf function, and is completely
+reenterable with the exception of writing to common bstrings. The split
+functions which use a callback mechanism requires only that the source string
+not be destroyed by the callback function unless the callback function returns
+with an error status (note that Bstrlib functions which return an error do
+not modify the string in any way.) The string can in fact be modified by the
+callback and the behaviour is deterministic. See the documentation of the
+various split functions for more details.
+
+Undefined scenarios
+-------------------
+
+One of the basic important premises for Bstrlib is to not to increase the
+propogation of undefined situations from parameters that are otherwise legal
+in of themselves. In particular, except for extremely marginal cases, usages
+of bstrings that use the bstring library functions alone cannot lead to any
+undefined action. But due to C/C++ language and library limitations, there
+is no way to define a non-trivial library that is completely without
+undefined operations. All such possible undefined operations are described
+below:
+
+1) bstrings or struct tagbstrings that are not explicitely initialized cannot
+ be passed as a parameter to any bstring function.
+2) The members of the NULL bstring cannot be accessed directly. (Though all
+ APIs and macros detect the NULL bstring.)
+3) A bstring whose data member has not been obtained from a malloc or
+ compatible call and which is write accessible passed as a writable
+ parameter will lead to undefined results. (i.e., do not writeAllow any
+ constructed bstrings unless the data portion has been obtained from the
+ heap.)
+4) If the headers of two strings alias but are not identical (which can only
+ happen via a defective manual construction), then passing them to a
+ bstring function in which one is writable is not defined.
+5) If the mlen member is larger than the actual accessible length of the data
+ member for a writable bstring, or if the slen member is larger than the
+ readable length of the data member for a readable bstring, then the
+ corresponding bstring operations are undefined.
+6) Any bstring definition whose header or accessible data portion has been
+ assigned to inaccessible or otherwise illegal memory clearly cannot be
+ acted upon by the bstring library in any way.
+7) Destroying the source of an incremental split from within the callback
+ and not returning with a negative value (indicating that it should abort)
+ will lead to undefined behaviour. (Though *modifying* or adjusting the
+ state of the source data, even if those modification fail within the
+ bstrlib API, has well defined behavior.)
+8) Modifying a bstring which is write protected by direct access has
+ undefined behavior.
+
+While this may seem like a long list, with the exception of invalid uses of
+the writeAllow macro, and source destruction during an iterative split
+without an accompanying abort, no usage of the bstring API alone can cause
+any undefined scenario to occurr. I.e., the policy of restricting usage of
+bstrings to the bstring API can significantly reduce the risk of runtime
+errors (in practice it should eliminate them) related to string manipulation
+due to undefined action.
+
+C++ wrapper
+-----------
+
+A C++ wrapper has been created to enable bstring functionality for C++ in the
+most natural (for C++ programers) way possible. The mandate for the C++
+wrapper is different from the base C bstring library. Since the C++ language
+has far more abstracting capabilities, the CBString structure is considered
+fully abstracted -- i.e., hand generated CBStrings are not supported (though
+conversion from a struct tagbstring is allowed) and all detectable errors are
+manifest as thrown exceptions.
+
+- The C++ class definitions are all under the namespace Bstrlib. bstrwrap.h
+ enables this namespace (with a using namespace Bstrlib; directive at the
+ end) unless the macro BSTRLIB_DONT_ASSUME_NAMESPACE has been defined before
+ it is included.
+
+- Erroneous accesses results in an exception being thrown. The exception
+ parameter is of type "struct CBStringException" which is derived from
+ std::exception if STL is used. A verbose description of the error message
+ can be obtained from the what() method.
+
+- CBString is a C++ structure derived from a struct tagbstring. An address
+ of a CBString cast to a bstring must not be passed to bdestroy. The bstring
+ C API has been made C++ safe and can be used directly in a C++ project.
+
+- It includes constructors which can take a char, '\0' terminated char
+ buffer, tagbstring, (char, repeat-value), a length delimited buffer or a
+ CBStringList to initialize it.
+
+- Concatenation is performed with the + and += operators. Comparisons are
+ done with the ==, !=, <, >, <= and >= operators. Note that == and != use
+ the biseq call, while <, >, <= and >= use bstrcmp.
+
+- CBString's can be directly cast to const character buffers.
+
+- CBString's can be directly cast to double, float, int or unsigned int so
+ long as the CBString are decimal representations of those types (otherwise
+ an exception will be thrown). Converting the other way should be done with
+ the format(a) method(s).
+
+- CBString contains the length, character and [] accessor methods. The
+ character and [] accessors are aliases of each other. If the bounds for
+ the string are exceeded, an exception is thrown. To avoid the overhead for
+ this check, first cast the CBString to a (const char *) and use [] to
+ dereference the array as normal. Note that the character and [] accessor
+ methods allows both reading and writing of individual characters.
+
+- The methods: format, formata, find, reversefind, findcaseless,
+ reversefindcaseless, midstr, insert, insertchrs, replace, findreplace,
+ findreplacecaseless, remove, findchr, nfindchr, alloc, toupper, tolower,
+ gets, read are analogous to the functions that can be found in the C API.
+
+- The caselessEqual and caselessCmp methods are analogous to biseqcaseless
+ and bstricmp functions respectively.
+
+- Note that just like the bformat function, the format and formata methods do
+ not automatically cast CBStrings into char * strings for "%s"-type
+ substitutions:
+
+ CBString w("world");
+ CBString h("Hello");
+ CBString hw;
+
+ /* The casts are necessary */
+ hw.format ("%s, %s", (const char *)h, (const char *)w);
+
+- The methods trunc and repeat have been added instead of using pattern.
+
+- ltrim, rtrim and trim methods have been added. These remove characters
+ from a given character string set (defaulting to the whitespace characters)
+ from either the left, right or both ends of the CBString, respectively.
+
+- The method setsubstr is also analogous in functionality to bsetstr, except
+ that it cannot be passed NULL. Instead the method fill and the fill-style
+ constructor have been supplied to enable this functionality.
+
+- The writeprotect(), writeallow() and iswriteprotected() methods are
+ analogous to the bwriteprotect(), bwriteallow() and biswriteprotected()
+ macros in the C API. Write protection semantics in CBString are stronger
+ than with the C API in that indexed character assignment is checked for
+ write protection. However, unlike with the C API, a write protected
+ CBString can be destroyed by the destructor.
+
+- CBStream is a C++ structure which wraps a struct bStream (its not derived
+ from it, since destruction is slightly different). It is constructed by
+ passing in a bNread function pointer and a stream parameter cast to void *.
+ This structure includes methods for detecting eof, setting the buffer
+ length, reading the whole stream or reading entries line by line or block
+ by block, an unread function, and a peek function.
+
+- If STL is available, the CBStringList structure is derived from a vector of
+ CBString with various split methods. The split method has been overloaded
+ to accept either a character or CBString as the second parameter (when the
+ split parameter is a CBString any character in that CBString is used as a
+ seperator). The splitstr method takes a CBString as a substring seperator.
+ Joins can be performed via a CBString constructor which takes a
+ CBStringList as a parameter, or just using the CBString::join() method.
+
+- If there is proper support for std::iostreams, then the >> and << operators
+ and the getline() function have been added (with semantics the same as
+ those for std::string).
+
+Multithreading
+--------------
+
+A mutable bstring is kind of analogous to a small (two entry) linked list
+allocated by malloc, with all aliasing completely under programmer control.
+I.e., manipulation of one bstring will never affect any other distinct
+bstring unless explicitely constructed to do so by the programmer via hand
+construction or via building a reference. Bstrlib also does not use any
+static or global storage, so there are no hidden unremovable race conditions.
+Bstrings are also clearly not inherently thread local. So just like
+char *'s, bstrings can be passed around from thread to thread and shared and
+so on, so long as modifications to a bstring correspond to some kind of
+exclusive access lock as should be expected (or if the bstring is read-only,
+which can be enforced by bstring write protection) for any sort of shared
+object in a multithreaded environment.
+
+Bsafe module
+------------
+
+For convenience, a bsafe module has been included. The idea is that if this
+module is included, inadvertant usage of the most dangerous C functions will
+be overridden and lead to an immediate run time abort. Of course, it should
+be emphasized that usage of this module is completely optional. The
+intention is essentially to provide an option for creating project safety
+rules which can be enforced mechanically rather than socially. This is
+useful for larger, or open development projects where its more difficult to
+enforce social rules or "coding conventions".
+
+Problems not solved
+-------------------
+
+Bstrlib is written for the C and C++ languages, which have inherent weaknesses
+that cannot be easily solved:
+
+1. Memory leaks: Forgetting to call bdestroy on a bstring that is about to be
+ unreferenced, just as forgetting to call free on a heap buffer that is
+ about to be dereferenced. Though bstrlib itself is leak free.
+2. Read before write usage: In C, declaring an auto bstring does not
+ automatically fill it with legal/valid contents. This problem has been
+ somewhat mitigated in C++. (The bstrDeclare and bstrFree macros from
+ bstraux can be used to help mitigate this problem.)
+
+Other problems not addressed:
+
+3. Built-in mutex usage to automatically avoid all bstring internal race
+ conditions in multitasking environments: The problem with trying to
+ implement such things at this low a level is that it is typically more
+ efficient to use locks in higher level primitives. There is also no
+ platform independent way to implement locks or mutexes.
+
+Note that except for spotty support of wide characters, the default C
+standard library does not address any of these problems either.
+
+Configurable compilation options
+--------------------------------
+
+The Better String Library is not an application, it is a library. To compile
+it, you need to compile bstrlib.c to an object file that is linked to your
+application. A Makefile might contain entries such as the following to
+accomplish this:
+
+BSTRDIR = $(CDIR)/bstrlib
+INCLUDES = -I$(BSTRDIR)
+BSTROBJS = $(ODIR)/bstrlib.o
+DEFINES =
+CFLAGS = -O3 -Wall -pedantic -ansi -s $(DEFINES)
+
+application: $(ODIR)/main.o $(BSTROBJS)
+ echo Linking: $@
+ $(CC) $< $(BSTROBJS) -o $@
+
+$(ODIR)/%.o : $(BSTRDIR)/%.c
+ echo Compiling: $<
+ $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
+
+$(ODIR)/%.o : %.c
+ echo Compiling: $<
+ $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
+
+You can configure bstrlib using with the standard macro defines passed to
+the compiler. All configuration options are meant solely for the purpose of
+compiler compatibility. Configuration options are not meant to change the
+semantics or capabilities of the library, except where it is unavoidable.
+
+Since some C++ compilers don't include the Standard Template Library and some
+have the options of disabling exception handling, a number of macros can be
+used to conditionally compile support for each of this:
+
+BSTRLIB_CAN_USE_STL
+
+ - defining this will enable the used of the Standard Template Library.
+ Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.
+
+BSTRLIB_CANNOT_USE_STL
+
+ - defining this will disable the use of the Standard Template Library.
+ Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.
+
+BSTRLIB_CAN_USE_IOSTREAM
+
+ - defining this will enable the used of streams from class std. Defining
+ BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.
+
+BSTRLIB_CANNOT_USE_IOSTREAM
+
+ - defining this will disable the use of streams from class std. Defining
+ BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.
+
+BSTRLIB_THROWS_EXCEPTIONS
+
+ - defining this will enable the exception handling within bstring.
+ Defining BSTRLIB_THROWS_EXCEPTIONS overrides the
+ BSTRLIB_DOESNT_THROWS_EXCEPTIONS macro.
+
+BSTRLIB_DOESNT_THROW_EXCEPTIONS
+
+ - defining this will disable the exception handling within bstring.
+ Defining BSTRLIB_THROWS_EXCEPTIONS overrides the
+ BSTRLIB_DOESNT_THROW_EXCEPTIONS macro.
+
+Note that these macros must be defined consistently throughout all modules
+that use CBStrings including bstrwrap.cpp.
+
+Some older C compilers do not support functions such as vsnprintf. This is
+handled by the following macro variables:
+
+BSTRLIB_NOVSNP
+
+ - defining this indicates that the compiler does not support vsnprintf.
+ This will cause bformat and bformata to not be declared. Note that
+ for some compilers, such as Turbo C, this is set automatically.
+ Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.
+
+BSTRLIB_VSNP_OK
+
+ - defining this will disable the autodetection of compilers that do not
+ vsnprintf.
+ Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.
+
+Semantic compilation options
+----------------------------
+
+Bstrlib comes with very few compilation options for changing the semantics of
+of the library. These are described below.
+
+BSTRLIB_DONT_ASSUME_NAMESPACE
+
+ - Defining this before including bstrwrap.h will disable the automatic
+ enabling of the Bstrlib namespace for the C++ declarations.
+
+BSTRLIB_DONT_USE_VIRTUAL_DESTRUCTOR
+
+ - Defining this will make the CBString destructor non-virtual.
+
+BSTRLIB_MEMORY_DEBUG
+
+ - Defining this will cause the bstrlib modules bstrlib.c and bstrwrap.cpp
+ to invoke a #include "memdbg.h". memdbg.h has to be supplied by the user.
+
+Note that these macros must be defined consistently throughout all modules
+that use bstrings or CBStrings including bstrlib.c, bstraux.c and
+bstrwrap.cpp.
+
+Version
+-------
+
+Current release: v1.0.0
+
+The version format v[Major].[Minor].[Update] is used to facilitate
+developers with backward compatibility in the core developer branch of the
+Better String Library. This is also reflected in the macro symbols
+BSTR_VER_MAJOR, BSTR_VER_MINOR and BSTR_VER_UPDATE in the bstrlib.h file.
+Differences in the Major version imply that there has been a change in the
+API, and that a recompile and usage source changes may be necessary.
+Differences in Minor version imply that there has been an expansion of the
+API, that backward compatibility should be preserved and that at most a
+recompile is necessary (unless there is a namespace collision). Differences
+in Update imply that no API change has occurred.
+
+Although ordered, there is no implication of lexical sequencing. In
+particular, the Update number will not reset to 0 as the Major and Minor
+version numbers increment.
+
+So simple bug fixes will usually be reflected in a change in the Update
+number. If new functions are available, the Minor value will increment.
+If any function changes its parameters, or if a function is removed, the
+Major value will increment.
+
+
+===============================================================================
+
+Files
+-----
+
+Core C files (required for C and C++):
+bstrlib.c - C implementaion of bstring functions.
+bstrlib.h - C header file for bstring functions.
+
+Core C++ files (required for C++):
+bstrwrap.cpp - C++ implementation of CBString.
+bstrwrap.h - C++ header file for CBString.
+
+Base Unicode support:
+utf8util.c - C implemention of generic utf8 parsing functions.
+utf8util.h - C head file for generic utf8 parsing functions.
+buniutil.c - C implemention utf8 bstring packing and unpacking functions.
+buniutil.h - C header file for utf8 bstring functions.
+
+Extra utility functions:
+bstraux.c - C example that implements trivial additional functions.
+bstraux.h - C header for bstraux.c
+
+Miscellaneous:
+bstest.c - C unit/regression test for bstrlib.c
+test.cpp - C++ unit/regression test for bstrwrap.cpp
+bsafe.c - C runtime stubs to abort usage of unsafe C functions.
+bsafe.h - C header file for bsafe.c functions.
+
+C modules need only include bstrlib.h and compile/link bstrlib.c to use the
+basic bstring library. C++ projects need to additionally include bstrwrap.h
+and compile/link bstrwrap.cpp. For both, there may be a need to make choices
+about feature configuration as described in the "Configurable compilation
+options" in the section above.
+
+Other files that are included in this archive are:
+
+license.txt - The BSD license for Bstrlib
+gpl.txt - The GPL version 2
+security.txt - A security statement useful for auditting Bstrlib
+porting.txt - A guide to porting Bstrlib
+bstrlib.txt - This file
+
+===============================================================================
+
+The functions
+-------------
+
+ extern bstring bfromcstr (const char * str);
+
+ Take a standard C library style '\0' terminated char buffer and generate
+ a bstring with the same contents as the char buffer. If an error occurs
+ NULL is returned.
+
+ So for example:
+
+ bstring b = bfromcstr ("Hello");
+ if (!b) {
+ fprintf (stderr, "Out of memory");
+ } else {
+ puts ((char *) b->data);
+ }
+
+ ..........................................................................
+
+ extern bstring bfromcstralloc (int mlen, const char * str);
+
+ Create a bstring which contains the contents of the '\0' terminated
+ char * buffer str. The memory buffer backing the bstring is at least
+ mlen characters in length. The buffer is also at least size required
+ to hold the string with the '\0' terminator. If an error occurs NULL
+ is returned.
+
+ So for example:
+
+ bstring b = bfromcstralloc (64, someCstr);
+ if (b) b->data[63] = 'x';
+
+ The idea is that this will set the 64th character of b to 'x' if it is at
+ least 64 characters long otherwise do nothing. And we know this is well
+ defined so long as b was successfully created, since it will have been
+ allocated with at least 64 characters.
+
+ ..........................................................................
+
+ extern bstring bfromcstrrangealloc (int minl, int maxl, const char* str);
+
+ Create a bstring which contains the contents of the '\0' terminated
+ char * buffer str. The memory buffer backing the string is at least
+ minl characters in length, but an attempt is made to allocate up to
+ maxl characters. The buffer is also at least size required to hold
+ the string with the '\0' terminator. If an error occurs NULL is
+ returned.
+
+ So for example:
+
+ bstring b = bfromcstrrangealloc (0, 128, "Hello.");
+ if (b) b->data[5] = '!';
+
+ The idea is that this will set the 6th character of b to '!' if it was
+ allocated otherwise do nothing. And we know this is well defined so
+ long as b was successfully created, since it will have been allocated
+ with at least 7 (strlen("Hello.")) characters.
+
+ ..........................................................................
+
+ extern bstring blk2bstr (const void * blk, int len);
+
+ Create a bstring whose contents are described by the contiguous buffer
+ pointing to by blk with a length of len bytes. Note that this function
+ creates a copy of the data in blk, rather than simply referencing it.
+ Compare with the blk2tbstr macro. If an error occurs NULL is returned.
+
+ ..........................................................................
+
+ extern char * bstr2cstr (const_bstring s, char z);
+
+ Create a '\0' terminated char buffer which contains the contents of the
+ bstring s, except that any contained '\0' characters are converted to the
+ character in z. This returned value should be freed with bcstrfree(), by
+ the caller. If an error occurs NULL is returned.
+
+ ..........................................................................
+
+ extern int bcstrfree (char * s);
+
+ Frees a C-string generated by bstr2cstr (). This is normally unnecessary
+ since it just wraps a call to free (), however, if malloc () and free ()
+ have been redefined as a macros within the bstrlib module (via macros in
+ the memdbg.h backdoor) with some difference in behaviour from the std
+ library functions, then this allows a correct way of freeing the memory
+ that allows higher level code to be independent from these macro
+ redefinitions.
+
+ ..........................................................................
+
+ extern bstring bstrcpy (const_bstring b1);
+
+ Make a copy of the passed in bstring. The copied bstring is returned if
+ there is no error, otherwise NULL is returned.
+
+ ..........................................................................
+
+ extern int bassign (bstring a, const_bstring b);
+
+ Overwrite the bstring a with the contents of bstring b. Note that the
+ bstring a must be a well defined and writable bstring. If an error
+ occurs BSTR_ERR is returned and a is not overwritten.
+
+ ..........................................................................
+
+ int bassigncstr (bstring a, const char * str);
+
+ Overwrite the string a with the contents of char * string str. Note that
+ the bstring a must be a well defined and writable bstring. If an error
+ occurs BSTR_ERR is returned and a may be partially overwritten.
+
+ ..........................................................................
+
+ int bassignblk (bstring a, const void * s, int len);
+
+ Overwrite the string a with the contents of the block (s, len). Note that
+ the bstring a must be a well defined and writable bstring. If an error
+ occurs BSTR_ERR is returned and a is not overwritten.
+
+ ..........................................................................
+
+ extern int bassignmidstr (bstring a, const_bstring b, int left, int len);
+
+ Overwrite the bstring a with the middle of contents of bstring b
+ starting from position left and running for a length len. left and
+ len are clamped to the ends of b as with the function bmidstr. Note that
+ the bstring a must be a well defined and writable bstring. If an error
+ occurs BSTR_ERR is returned and a is not overwritten.
+
+ ..........................................................................
+
+ extern bstring bmidstr (const_bstring b, int left, int len);
+
+ Create a bstring which is the substring of b starting from position left
+ and running for a length len (clamped by the end of the bstring b.) If
+ there was no error, the value of this constructed bstring is returned
+ otherwise NULL is returned.
+
+ ..........................................................................
+
+ extern int bdelete (bstring s1, int pos, int len);
+
+ Removes characters from pos to pos+len-1 and shifts the tail of the
+ bstring starting from pos+len to pos. len must be positive for this call
+ to have any effect. The section of the bstring described by (pos, len)
+ is clamped to boundaries of the bstring b. The value BSTR_OK is returned
+ if the operation is successful, otherwise BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int bconcat (bstring b0, const_bstring b1);
+
+ Concatenate the bstring b1 to the end of bstring b0. The value BSTR_OK
+ is returned if the operation is successful, otherwise BSTR_ERR is
+ returned.
+
+ ..........................................................................
+
+ extern int bconchar (bstring b, char c);
+
+ Concatenate the character c to the end of bstring b. The value BSTR_OK
+ is returned if the operation is successful, otherwise BSTR_ERR is
+ returned.
+
+ ..........................................................................
+
+ extern int bcatcstr (bstring b, const char * s);
+
+ Concatenate the char * string s to the end of bstring b. The value
+ BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is
+ returned.
+
+ ..........................................................................
+
+ extern int bcatblk (bstring b, const void * s, int len);
+
+ Concatenate a fixed length buffer (s, len) to the end of bstring b. The
+ value BSTR_OK is returned if the operation is successful, otherwise
+ BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int biseq (const_bstring b0, const_bstring b1);
+
+ Compare the bstring b0 and b1 for equality. If the bstrings differ, 0
+ is returned, if the bstrings are the same, 1 is returned, if there is an
+ error, -1 is returned. If the length of the bstrings are different, this
+ function has O(1) complexity. Contained '\0' characters are not treated
+ as a termination character.
+
+ Note that the semantics of biseq are not completely compatible with
+ bstrcmp because of its different treatment of the '\0' character.
+
+ ..........................................................................
+
+ extern int bisstemeqblk (const_bstring b, const void * blk, int len);
+
+ Compare beginning of bstring b0 with a block of memory of length len for
+ equality. If the beginning of b0 differs from the memory block (or if b0
+ is too short), 0 is returned, if the bstrings are the same, 1 is returned,
+ if there is an error, -1 is returned.
+
+ ..........................................................................
+
+ extern int biseqcaseless (const_bstring b0, const_bstring b1);
+
+ Compare two bstrings for equality without differentiating between case.
+ If the bstrings differ other than in case, 0 is returned, if the bstrings
+ are the same, 1 is returned, if there is an error, -1 is returned. If
+ the length of the bstrings are different, this function is O(1). '\0'
+ termination characters are not treated in any special way.
+
+ ..........................................................................
+
+ extern int biseqcaselessblk (const_bstring b, const void * blk, int len);
+
+ Compare content of b and the array of bytes in blk for length len for
+ equality without differentiating between character case. If the content
+ differs other than in case, 0 is returned, if, ignoring case, the content
+ is the same, 1 is returned, if there is an error, -1 is returned. If the
+ length of the strings are different, this function is O(1). '\0'
+ termination characters are not treated in any special way.
+
+ ..........................................................................
+
+ extern int bisstemeqcaselessblk (const_bstring b0, const void * blk, int len);
+
+ Compare beginning of bstring b0 with a block of memory of length len
+ without differentiating between case for equality. If the beginning of b0
+ differs from the memory block other than in case (or if b0 is too short),
+ 0 is returned, if the bstrings are the same, 1 is returned, if there is an
+ error, -1 is returned.
+
+ ..........................................................................
+
+ int biseqblk (const_bstring b, const void * blk, int len)
+
+ Compare the string b with the character block blk of length len. If the
+ content differs, 0 is returned, if the content is the same, 1 is returned,
+ if there is an error, -1 is returned. If the length of the strings are
+ different, this function is O(1). '\0' characters are not treated in
+ any special way.
+
+ ..........................................................................
+
+ extern int biseqcstr (const_bstring b, const char *s);
+
+ Compare the bstring b and char * bstring s. The C string s must be '\0'
+ terminated at exactly the length of the bstring b, and the contents
+ between the two must be identical with the bstring b with no '\0'
+ characters for the two contents to be considered equal. This is
+ equivalent to the condition that their current contents will be always be
+ equal when comparing them in the same format after converting one or the
+ other. If they are equal 1 is returned, if they are unequal 0 is
+ returned and if there is a detectable error BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int biseqcstrcaseless (const_bstring b, const char *s);
+
+ Compare the bstring b and char * string s. The C string s must be '\0'
+ terminated at exactly the length of the bstring b, and the contents
+ between the two must be identical except for case with the bstring b with
+ no '\0' characters for the two contents to be considered equal. This is
+ equivalent to the condition that their current contents will be always be
+ equal ignoring case when comparing them in the same format after
+ converting one or the other. If they are equal, except for case, 1 is
+ returned, if they are unequal regardless of case 0 is returned and if
+ there is a detectable error BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int bstrcmp (const_bstring b0, const_bstring b1);
+
+ Compare the bstrings b0 and b1 for ordering. If there is an error,
+ SHRT_MIN is returned, otherwise a value less than or greater than zero,
+ indicating that the bstring pointed to by b0 is lexicographically less
+ than or greater than the bstring pointed to by b1 is returned. If the
+ bstring lengths are unequal but the characters up until the length of the
+ shorter are equal then a value less than, or greater than zero,
+ indicating that the bstring pointed to by b0 is shorter or longer than the
+ bstring pointed to by b1 is returned. 0 is returned if and only if the
+ two bstrings are the same. If the length of the bstrings are different,
+ this function is O(n). Like its standard C library counter part, the
+ comparison does not proceed past any '\0' termination characters
+ encountered.
+
+ The seemingly odd error return value, merely provides slightly more
+ granularity than the undefined situation given in the C library function
+ strcmp. The function otherwise behaves very much like strcmp().
+
+ Note that the semantics of bstrcmp are not completely compatible with
+ biseq because of its different treatment of the '\0' termination
+ character.
+
+ ..........................................................................
+
+ extern int bstrncmp (const_bstring b0, const_bstring b1, int n);
+
+ Compare the bstrings b0 and b1 for ordering for at most n characters. If
+ there is an error, SHRT_MIN is returned, otherwise a value is returned as
+ if b0 and b1 were first truncated to at most n characters then bstrcmp
+ was called with these new bstrings are paremeters. If the length of the
+ bstrings are different, this function is O(n). Like its standard C
+ library counter part, the comparison does not proceed past any '\0'
+ termination characters encountered.
+
+ The seemingly odd error return value, merely provides slightly more
+ granularity than the undefined situation given in the C library function
+ strncmp. The function otherwise behaves very much like strncmp().
+
+ ..........................................................................
+
+ extern int bstricmp (const_bstring b0, const_bstring b1);
+
+ Compare two bstrings without differentiating between case. The return
+ value is the difference of the values of the characters where the two
+ bstrings first differ, otherwise 0 is returned indicating that the
+ bstrings are equal. If the lengths are different, then a difference from
+ 0 is given, but if the first extra character is '\0', then it is taken to
+ be the value UCHAR_MAX+1.
+
+ ..........................................................................
+
+ extern int bstrnicmp (const_bstring b0, const_bstring b1, int n);
+
+ Compare two bstrings without differentiating between case for at most n
+ characters. If the position where the two bstrings first differ is
+ before the nth position, the return value is the difference of the values
+ of the characters, otherwise 0 is returned. If the lengths are different
+ and less than n characters, then a difference from 0 is given, but if the
+ first extra character is '\0', then it is taken to be the value
+ UCHAR_MAX+1.
+
+ ..........................................................................
+
+ extern int bdestroy (bstring b);
+
+ Deallocate the bstring passed. Passing NULL in as a parameter will have
+ no effect. Note that both the header and the data portion of the bstring
+ will be freed. No other bstring function which modifies one of its
+ parameters will free or reallocate the header. Because of this, in
+ general, bdestroy cannot be called on any declared struct tagbstring even
+ if it is not write protected. A bstring which is write protected cannot
+ be destroyed via the bdestroy call. Any attempt to do so will result in
+ no action taken, and BSTR_ERR will be returned.
+
+ Note to C++ users: Passing in a CBString cast to a bstring will lead to
+ undefined behavior (free will be called on the header, rather than the
+ CBString destructor.) Instead just use the ordinary C++ language
+ facilities to dealloc a CBString.
+
+ ..........................................................................
+
+ extern int binstr (const_bstring s1, int pos, const_bstring s2);
+
+ Search for the bstring s2 in s1 starting at position pos and looking in a
+ forward (increasing) direction. If it is found then it returns with the
+ first position after pos where it is found, otherwise it returns BSTR_ERR.
+ The algorithm used is brute force; O(m*n).
+
+ ..........................................................................
+
+ extern int binstrr (const_bstring s1, int pos, const_bstring s2);
+
+ Search for the bstring s2 in s1 starting at position pos and looking in a
+ backward (decreasing) direction. If it is found then it returns with the
+ first position after pos where it is found, otherwise return BSTR_ERR.
+ Note that the current position at pos is tested as well -- so to be
+ disjoint from a previous forward search it is recommended that the
+ position be backed up (decremented) by one position. The algorithm used
+ is brute force; O(m*n).
+
+ ..........................................................................
+
+ extern int binstrcaseless (const_bstring s1, int pos, const_bstring s2);
+
+ Search for the bstring s2 in s1 starting at position pos and looking in a
+ forward (increasing) direction but without regard to case. If it is
+ found then it returns with the first position after pos where it is
+ found, otherwise it returns BSTR_ERR. The algorithm used is brute force;
+ O(m*n).
+
+ ..........................................................................
+
+ extern int binstrrcaseless (const_bstring s1, int pos, const_bstring s2);
+
+ Search for the bstring s2 in s1 starting at position pos and looking in a
+ backward (decreasing) direction but without regard to case. If it is
+ found then it returns with the first position after pos where it is
+ found, otherwise return BSTR_ERR. Note that the current position at pos
+ is tested as well -- so to be disjoint from a previous forward search it
+ is recommended that the position be backed up (decremented) by one
+ position. The algorithm used is brute force; O(m*n).
+
+ ..........................................................................
+
+ extern int binchr (const_bstring b0, int pos, const_bstring b1);
+
+ Search for the first position in b0 starting from pos or after, in which
+ one of the characters in b1 is found. This function has an execution
+ time of O(b0->slen + b1->slen). If such a position does not exist in b0,
+ then BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int binchrr (const_bstring b0, int pos, const_bstring b1);
+
+ Search for the last position in b0 no greater than pos, in which one of
+ the characters in b1 is found. This function has an execution time
+ of O(b0->slen + b1->slen). If such a position does not exist in b0,
+ then BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int bninchr (const_bstring b0, int pos, const_bstring b1);
+
+ Search for the first position in b0 starting from pos or after, in which
+ none of the characters in b1 is found and return it. This function has
+ an execution time of O(b0->slen + b1->slen). If such a position does
+ not exist in b0, then BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int bninchrr (const_bstring b0, int pos, const_bstring b1);
+
+ Search for the last position in b0 no greater than pos, in which none of
+ the characters in b1 is found and return it. This function has an
+ execution time of O(b0->slen + b1->slen). If such a position does not
+ exist in b0, then BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int bstrchr (const_bstring b, int c);
+
+ Search for the character c in the bstring b forwards from the start of
+ the bstring. Returns the position of the found character or BSTR_ERR if
+ it is not found.
+
+ NOTE: This has been implemented as a macro on top of bstrchrp ().
+
+ ..........................................................................
+
+ extern int bstrrchr (const_bstring b, int c);
+
+ Search for the character c in the bstring b backwards from the end of the
+ bstring. Returns the position of the found character or BSTR_ERR if it is
+ not found.
+
+ NOTE: This has been implemented as a macro on top of bstrrchrp ().
+
+ ..........................................................................
+
+ extern int bstrchrp (const_bstring b, int c, int pos);
+
+ Search for the character c in b forwards from the position pos
+ (inclusive). Returns the position of the found character or BSTR_ERR if
+ it is not found.
+
+ ..........................................................................
+
+ extern int bstrrchrp (const_bstring b, int c, int pos);
+
+ Search for the character c in b backwards from the position pos in bstring
+ (inclusive). Returns the position of the found character or BSTR_ERR if
+ it is not found.
+
+ ..........................................................................
+
+ extern int bsetstr (bstring b0, int pos, const_bstring b1, unsigned char fill);
+
+ Overwrite the bstring b0 starting at position pos with the bstring b1. If
+ the position pos is past the end of b0, then the character "fill" is
+ appended as necessary to make up the gap between the end of b0 and pos.
+ If b1 is NULL, it behaves as if it were a 0-length bstring. The value
+ BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is
+ returned.
+
+ ..........................................................................
+
+ extern int binsert (bstring s1, int pos, const_bstring s2, unsigned char fill);
+
+ Inserts the bstring s2 into s1 at position pos. If the position pos is
+ past the end of s1, then the character "fill" is appended as necessary to
+ make up the gap between the end of s1 and pos. The value BSTR_OK is
+ returned if the operation is successful, otherwise BSTR_ERR is returned.
+
+ ..........................................................................
+
+ int binsertblk (bstring b, int pos, const void * blk, int len,
+ unsigned char fill)
+
+ Inserts the block of characters at blk with length len into b at position
+ pos. If the position pos is past the end of b, then the character "fill"
+ is appended as necessary to make up the gap between the end of b1 and pos.
+ Unlike bsetstr, binsert does not allow b2 to be NULL.
+
+ ..........................................................................
+
+ extern int binsertch (bstring s1, int pos, int len, unsigned char fill);
+
+ Inserts the character fill repeatedly into s1 at position pos for a
+ length len. If the position pos is past the end of s1, then the
+ character "fill" is appended as necessary to make up the gap between the
+ end of s1 and the position pos + len (exclusive). The value BSTR_OK is
+ returned if the operation is successful, otherwise BSTR_ERR is returned.
+
+ ..........................................................................
+
+ extern int breplace (bstring b1, int pos, int len, const_bstring b2,
+ unsigned char fill);
+
+ Replace a section of a bstring from pos for a length len with the bstring
+ b2. If the position pos is past the end of b1 then the character "fill"
+ is appended as necessary to make up the gap between the end of b1 and
+ pos.
+
+ ..........................................................................
+
+ extern int bfindreplace (bstring b, const_bstring find,
+ const_bstring replace, int position);
+
+ Replace all occurrences of the find substring with a replace bstring
+ after a given position in the bstring b. The find bstring must have a
+ length > 0 otherwise BSTR_ERR is returned. This function does not
+ perform recursive per character replacement; that is to say successive
+ searches resume at the position after the last replace.
+
+ So for example:
+
+ bfindreplace (a0 = bfromcstr("aabaAb"), a1 = bfromcstr("a"),
+ a2 = bfromcstr("aa"), 0);
+
+ Should result in changing a0 to "aaaabaaAb".
+
+ This function performs exactly (b->slen - position) bstring comparisons,
+ and data movement is bounded above by character volume equivalent to size
+ of the output bstring.
+
+ ..........................................................................
+
+ extern int bfindreplacecaseless (bstring b, const_bstring find,
+ const_bstring replace, int position);
+
+ Replace all occurrences of the find substring, ignoring case, with a
+ replace bstring after a given position in the bstring b. The find bstring
+ must have a length > 0 otherwise BSTR_ERR is returned. This function
+ does not perform recursive per character replacement; that is to say
+ successive searches resume at the position after the last replace.
+
+ So for example:
+
+ bfindreplacecaseless (a0 = bfromcstr("AAbaAb"), a1 = bfromcstr("a"),
+ a2 = bfromcstr("aa"), 0);
+
+ Should result in changing a0 to "aaaabaaaab".
+
+ This function performs exactly (b->slen - position) bstring comparisons,
+ and data movement is bounded above by character volume equivalent to size
+ of the output bstring.
+
+ ..........................................................................
+
+ extern int balloc (bstring b, int length);
+
+ Increase the allocated memory backing the data buffer for the bstring b
+ to a length of at least length. If the memory backing the bstring b is
+ already large enough, not action is performed. This has no effect on the
+ bstring b that is visible to the bstring API. Usually this function will
+ only be used when a minimum buffer size is required coupled with a direct
+ access to the ->data member of the bstring structure.
+
+ Be warned that like any other bstring function, the bstring must be well
+ defined upon entry to this function. I.e., doing something like:
+
+ b->slen *= 2; /* ?? Most likely incorrect */
+ balloc (b, b->slen);
+
+ is invalid, and should be implemented as:
+
+ int t;
+ if (BSTR_OK == balloc (b, t = (b->slen * 2))) b->slen = t;
+
+ This function will return with BSTR_ERR if b is not detected as a valid
+ bstring or length is not greater than 0, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int ballocmin (bstring b, int length);
+
+ Change the amount of memory backing the bstring b to at least length.
+ This operation will never truncate the bstring data including the
+ extra terminating '\0' and thus will not decrease the length to less than
+ b->slen + 1. Note that repeated use of this function may cause
+ performance problems (realloc may be called on the bstring more than
+ the O(log(INT_MAX)) times). This function will return with BSTR_ERR if b
+ is not detected as a valid bstring or length is not greater than 0,
+ otherwise BSTR_OK is returned.
+
+ So for example:
+
+ if (BSTR_OK == ballocmin (b, 64)) b->data[63] = 'x';
+
+ The idea is that this will set the 64th character of b to 'x' if it is at
+ least 64 characters long otherwise do nothing. And we know this is well
+ defined so long as the ballocmin call was successfully, since it will
+ ensure that b has been allocated with at least 64 characters.
+
+ ..........................................................................
+
+ int btrunc (bstring b, int n);
+
+ Truncate the bstring to at most n characters. This function will return
+ with BSTR_ERR if b is not detected as a valid bstring or n is less than
+ 0, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int bpattern (bstring b, int len);
+
+ Replicate the starting bstring, b, end to end repeatedly until it
+ surpasses len characters, then chop the result to exactly len characters.
+ This function operates in-place. This function will return with BSTR_ERR
+ if b is NULL or of length 0, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int btoupper (bstring b);
+
+ Convert contents of bstring to upper case. This function will return with
+ BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int btolower (bstring b);
+
+ Convert contents of bstring to lower case. This function will return with
+ BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int bltrimws (bstring b);
+
+ Delete whitespace contiguous from the left end of the bstring. This
+ function will return with BSTR_ERR if b is NULL or of length 0, otherwise
+ BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int brtrimws (bstring b);
+
+ Delete whitespace contiguous from the right end of the bstring. This
+ function will return with BSTR_ERR if b is NULL or of length 0, otherwise
+ BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int btrimws (bstring b);
+
+ Delete whitespace contiguous from both ends of the bstring. This function
+ will return with BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK
+ is returned.
+
+ ..........................................................................
+
+ extern struct bstrList* bstrListCreate (void);
+
+ Create an empty struct bstrList. The struct bstrList output structure is
+ declared as follows:
+
+ struct bstrList {
+ int qty, mlen;
+ bstring * entry;
+ };
+
+ The entry field actually is an array with qty number entries. The mlen
+ record counts the maximum number of bstring's for which there is memory
+ in the entry record.
+
+ The Bstrlib API does *NOT* include a comprehensive set of functions for
+ full management of struct bstrList in an abstracted way. The reason for
+ this is because aliasing semantics of the list are best left to the user
+ of this function, and performance varies wildly depending on the
+ assumptions made. For a complete list of bstring data type it is
+ recommended that the C++ public std::vector<CBString> be used, since its
+ semantics and usage are more standard.
+
+ ..........................................................................
+
+ extern int bstrListDestroy (struct bstrList * sl);
+
+ Destroy a struct bstrList structure that was returned by the bsplit
+ function. Note that this will destroy each bstring in the ->entry array
+ as well. See bstrListCreate() above for structure of struct bstrList.
+
+ ..........................................................................
+
+ extern int bstrListAlloc (struct bstrList * sl, int msz);
+
+ Ensure that there is memory for at least msz number of entries for the
+ list.
+
+ ..........................................................................
+
+ extern int bstrListAllocMin (struct bstrList * sl, int msz);
+
+ Try to allocate the minimum amount of memory for the list to include at
+ least msz entries or sl->qty whichever is greater.
+
+ ..........................................................................
+
+ extern struct bstrList * bsplit (bstring str, unsigned char splitChar);
+
+ Create an array of sequential substrings from str divided by the
+ character splitChar. Successive occurrences of the splitChar will be
+ divided by empty bstring entries, following the semantics from the Python
+ programming language. To reclaim the memory from this output structure,
+ bstrListDestroy () should be called. See bstrListCreate() above for
+ structure of struct bstrList.
+
+ ..........................................................................
+
+ extern struct bstrList * bsplits (bstring str, const_bstring splitStr);
+
+ Create an array of sequential substrings from str divided by any
+ character contained in splitStr. An empty splitStr causes a single entry
+ bstrList containing a copy of str to be returned. See bstrListCreate()
+ above for structure of struct bstrList.
+
+ ..........................................................................
+
+ extern struct bstrList * bsplitstr (bstring str, const_bstring splitStr);
+
+ Create an array of sequential substrings from str divided by the entire
+ substring splitStr. An empty splitStr causes a single entry bstrList
+ containing a copy of str to be returned. See bstrListCreate() above for
+ structure of struct bstrList.
+
+ ..........................................................................
+
+ extern bstring bjoin (const struct bstrList * bl, const_bstring sep);
+
+ Join the entries of a bstrList into one bstring by sequentially
+ concatenating them with the sep bstring in between. If sep is NULL, it
+ is treated as if it were the empty bstring. Note that:
+
+ bjoin (l = bsplit (b, s->data[0]), s);
+
+ should result in a copy of b, if s->slen is 1. If there is an error NULL
+ is returned, otherwise a bstring with the correct result is returned.
+ See bstrListCreate() above for structure of struct bstrList.
+
+ ..........................................................................
+
+ bstring bjoinblk (const struct bstrList * bl, void * blk, int len);
+
+ Join the entries of a bstrList into one bstring by sequentially
+ concatenating them with the content from blk for length len in between.
+ If there is an error NULL is returned, otherwise a bstring with the
+ correct result is returned.
+
+ ..........................................................................
+
+ extern int bsplitcb (const_bstring str, unsigned char splitChar, int pos,
+ int (* cb) (void * parm, int ofs, int len), void * parm);
+
+ Iterate the set of disjoint sequential substrings over str starting at
+ position pos divided by the character splitChar. The parm passed to
+ bsplitcb is passed on to cb. If the function cb returns a value < 0,
+ then further iterating is halted and this value is returned by bsplitcb.
+
+ Note: Non-destructive modification of str from within the cb function
+ while performing this split is not undefined. bsplitcb behaves in
+ sequential lock step with calls to cb. I.e., after returning from a cb
+ that return a non-negative integer, bsplitcb continues from the position
+ 1 character after the last detected split character and it will halt
+ immediately if the length of str falls below this point. However, if the
+ cb function destroys str, then it *must* return with a negative value,
+ otherwise bsplitcb will continue in an undefined manner.
+
+ This function is provided as an incremental alternative to bsplit that is
+ abortable and which does not impose additional memory allocation.
+
+ ..........................................................................
+
+ extern int bsplitscb (const_bstring str, const_bstring splitStr, int pos,
+ int (* cb) (void * parm, int ofs, int len), void * parm);
+
+ Iterate the set of disjoint sequential substrings over str starting at
+ position pos divided by any of the characters in splitStr. An empty
+ splitStr causes the whole str to be iterated once. The parm passed to
+ bsplitcb is passed on to cb. If the function cb returns a value < 0,
+ then further iterating is halted and this value is returned by bsplitcb.
+
+ Note: Non-destructive modification of str from within the cb function
+ while performing this split is not undefined. bsplitscb behaves in
+ sequential lock step with calls to cb. I.e., after returning from a cb
+ that return a non-negative integer, bsplitscb continues from the position
+ 1 character after the last detected split character and it will halt
+ immediately if the length of str falls below this point. However, if the
+ cb function destroys str, then it *must* return with a negative value,
+ otherwise bsplitscb will continue in an undefined manner.
+
+ This function is provided as an incremental alternative to bsplits that
+ is abortable and which does not impose additional memory allocation.
+
+ ..........................................................................
+
+ extern int bsplitstrcb (const_bstring str, const_bstring splitStr, int pos,
+ int (* cb) (void * parm, int ofs, int len), void * parm);
+
+ Iterate the set of disjoint sequential substrings over str starting at
+ position pos divided by the entire substring splitStr. An empty splitStr
+ causes each character of str to be iterated. The parm passed to bsplitcb
+ is passed on to cb. If the function cb returns a value < 0, then further
+ iterating is halted and this value is returned by bsplitcb.
+
+ Note: Non-destructive modification of str from within the cb function
+ while performing this split is not undefined. bsplitstrcb behaves in
+ sequential lock step with calls to cb. I.e., after returning from a cb
+ that return a non-negative integer, bsplitstrcb continues from the position
+ 1 character after the last detected split character and it will halt
+ immediately if the length of str falls below this point. However, if the
+ cb function destroys str, then it *must* return with a negative value,
+ otherwise bsplitscb will continue in an undefined manner.
+
+ This function is provided as an incremental alternative to bsplitstr that
+ is abortable and which does not impose additional memory allocation.
+
+ ..........................................................................
+
+ extern bstring bformat (const char * fmt, ...);
+
+ Takes the same parameters as printf (), but rather than outputting
+ results to stdio, it forms a bstring which contains what would have been
+ output. Note that if there is an early generation of a '\0' character,
+ the bstring will be truncated to this end point.
+
+ Note that %s format tokens correspond to '\0' terminated char * buffers,
+ not bstrings. To print a bstring, first dereference data element of the
+ the bstring:
+
+ /* b1->data needs to be '\0' terminated, so tagbstrings generated
+ by blk2tbstr () might not be suitable. */
+ b0 = bformat ("Hello, %s", b1->data);
+
+ Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been
+ compiled the bformat function is not present.
+
+ ..........................................................................
+
+ extern int bformata (bstring b, const char * fmt, ...);
+
+ In addition to the initial output buffer b, bformata takes the same
+ parameters as printf (), but rather than outputting results to stdio, it
+ appends the results to the initial bstring parameter. Note that if
+ there is an early generation of a '\0' character, the bstring will be
+ truncated to this end point.
+
+ Note that %s format tokens correspond to '\0' terminated char * buffers,
+ not bstrings. To print a bstring, first dereference data element of the
+ the bstring:
+
+ /* b1->data needs to be '\0' terminated, so tagbstrings generated
+ by blk2tbstr () might not be suitable. */
+ bformata (b0 = bfromcstr ("Hello"), ", %s", b1->data);
+
+ Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been
+ compiled the bformata function is not present.
+
+ ..........................................................................
+
+ extern int bassignformat (bstring b, const char * fmt, ...);
+
+ After the first parameter, it takes the same parameters as printf (), but
+ rather than outputting results to stdio, it outputs the results to
+ the bstring parameter b. Note that if there is an early generation of a
+ '\0' character, the bstring will be truncated to this end point.
+
+ Note that %s format tokens correspond to '\0' terminated char * buffers,
+ not bstrings. To print a bstring, first dereference data element of the
+ the bstring:
+
+ /* b1->data needs to be '\0' terminated, so tagbstrings generated
+ by blk2tbstr () might not be suitable. */
+ bassignformat (b0 = bfromcstr ("Hello"), ", %s", b1->data);
+
+ Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been
+ compiled the bassignformat function is not present.
+
+ ..........................................................................
+
+ extern int bvcformata (bstring b, int count, const char * fmt, va_list arglist);
+
+ The bvcformata function formats data under control of the format control
+ string fmt and attempts to append the result to b. The fmt parameter is
+ the same as that of the printf function. The variable argument list is
+ replaced with arglist, which has been initialized by the va_start macro.
+ The size of the output is upper bounded by count. If the required output
+ exceeds count, the string b is not augmented with any contents and a value
+ below BSTR_ERR is returned. If a value below -count is returned then it
+ is recommended that the negative of this value be used as an update to the
+ count in a subsequent pass. On other errors, such as running out of
+ memory, parameter errors or numeric wrap around BSTR_ERR is returned.
+ BSTR_OK is returned when the output is successfully generated and
+ appended to b.
+
+ Note: There is no sanity checking of arglist, and this function is
+ destructive of the contents of b from the b->slen point onward. If there
+ is an early generation of a '\0' character, the bstring will be truncated
+ to this end point.
+
+ Although this function is part of the external API for Bstrlib, the
+ interface and semantics (length limitations, and unusual return codes)
+ are fairly atypical. The real purpose for this function is to provide an
+ engine for the bvformata macro.
+
+ Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been
+ compiled the bvcformata function is not present.
+
+ ..........................................................................
+
+ extern bstring bread (bNread readPtr, void * parm);
+ typedef size_t (* bNread) (void *buff, size_t elsize, size_t nelem,
+ void *parm);
+
+ Read an entire stream into a bstring, verbatum. The readPtr function
+ pointer is compatible with fread sematics, except that it need not obtain
+ the stream data from a file. The intention is that parm would contain
+ the stream data context/state required (similar to the role of the FILE*
+ I/O stream parameter of fread.)
+
+ Abstracting the block read function allows for block devices other than
+ file streams to be read if desired. Note that there is an ANSI
+ compatibility issue if "fread" is used directly; see the ANSI issues
+ section below.
+
+ ..........................................................................
+
+ extern int breada (bstring b, bNread readPtr, void * parm);
+
+ Read an entire stream and append it to a bstring, verbatum. Behaves
+ like bread, except that it appends it results to the bstring b.
+ BSTR_ERR is returned on error, otherwise 0 is returned.
+
+ ..........................................................................
+
+ extern bstring bgets (bNgetc getcPtr, void * parm, char terminator);
+ typedef int (* bNgetc) (void * parm);
+
+ Read a bstring from a stream. As many bytes as is necessary are read
+ until the terminator is consumed or no more characters are available from
+ the stream. If read from the stream, the terminator character will be
+ appended to the end of the returned bstring. The getcPtr function must
+ have the same semantics as the fgetc C library function (i.e., returning
+ an integer whose value is negative when there are no more characters
+ available, otherwise the value of the next available unsigned character
+ from the stream.) The intention is that parm would contain the stream
+ data context/state required (similar to the role of the FILE* I/O stream
+ parameter of fgets.) If no characters are read, or there is some other
+ detectable error, NULL is returned.
+
+ bgets will never call the getcPtr function more often than necessary to
+ construct its output (including a single call, if required, to determine
+ that the stream contains no more characters.)
+
+ Abstracting the character stream function and terminator character allows
+ for different stream devices and string formats other than '\n'
+ terminated lines in a file if desired (consider \032 terminated email
+ messages, in a UNIX mailbox for example.)
+
+ For files, this function can be used analogously as fgets as follows:
+
+ fp = fopen ( ... );
+ if (fp) b = bgets ((bNgetc) fgetc, fp, '\n');
+
+ (Note that only one terminator character can be used, and that '\0' is
+ not assumed to terminate the stream in addition to the terminator
+ character. This is consistent with the semantics of fgets.)
+
+ ..........................................................................
+
+ extern int bgetsa (bstring b, bNgetc getcPtr, void * parm, char terminator);
+
+ Read from a stream and concatenate to a bstring. Behaves like bgets,
+ except that it appends it results to the bstring b. The value 1 is
+ returned if no characters are read before a negative result is returned
+ from getcPtr. Otherwise BSTR_ERR is returned on error, and 0 is returned
+ in other normal cases.
+
+ ..........................................................................
+
+ extern int bassigngets (bstring b, bNgetc getcPtr, void * parm, char terminator);
+
+ Read from a stream and concatenate to a bstring. Behaves like bgets,
+ except that it assigns the results to the bstring b. The value 1 is
+ returned if no characters are read before a negative result is returned
+ from getcPtr. Otherwise BSTR_ERR is returned on error, and 0 is returned
+ in other normal cases.
+
+ ..........................................................................
+
+ extern struct bStream * bsopen (bNread readPtr, void * parm);
+
+ Wrap a given open stream (described by a fread compatible function
+ pointer and stream handle) into an open bStream suitable for the bstring
+ library streaming functions.
+
+ ..........................................................................
+
+ extern void * bsclose (struct bStream * s);
+
+ Close the bStream, and return the handle to the stream that was
+ originally used to open the given stream. If s is NULL or detectably
+ invalid, NULL will be returned.
+
+ ..........................................................................
+
+ extern int bsbufflength (struct bStream * s, int sz);
+
+ Set the length of the buffer used by the bStream. If sz is the macro
+ BSTR_BS_BUFF_LENGTH_GET (which is 0), the length is not set. If s is
+ NULL or sz is negative, the function will return with BSTR_ERR, otherwise
+ this function returns with the previous length.
+
+ ..........................................................................
+
+ extern int bsreadln (bstring r, struct bStream * s, char terminator);
+
+ Read a bstring terminated by the terminator character or the end of the
+ stream from the bStream (s) and return it into the parameter r. The
+ matched terminator, if found, appears at the end of the line read. If
+ the stream has been exhausted of all available data, before any can be
+ read, BSTR_ERR is returned. This function may read additional characters
+ into the stream buffer from the core stream that are not returned, but
+ will be retained for subsequent read operations. When reading from high
+ speed streams, this function can perform significantly faster than bgets.
+
+ ..........................................................................
+
+ extern int bsreadlna (bstring r, struct bStream * s, char terminator);
+
+ Read a bstring terminated by the terminator character or the end of the
+ stream from the bStream (s) and concatenate it to the parameter r. The
+ matched terminator, if found, appears at the end of the line read. If
+ the stream has been exhausted of all available data, before any can be
+ read, BSTR_ERR is returned. This function may read additional characters
+ into the stream buffer from the core stream that are not returned, but
+ will be retained for subsequent read operations. When reading from high
+ speed streams, this function can perform significantly faster than bgets.
+
+ ..........................................................................
+
+ extern int bsreadlns (bstring r, struct bStream * s, bstring terminators);
+
+ Read a bstring terminated by any character in the terminators bstring or
+ the end of the stream from the bStream (s) and return it into the
+ parameter r. This function may read additional characters from the core
+ stream that are not returned, but will be retained for subsequent read
+ operations.
+
+ ..........................................................................
+
+ extern int bsreadlnsa (bstring r, struct bStream * s, bstring terminators);
+
+ Read a bstring terminated by any character in the terminators bstring or
+ the end of the stream from the bStream (s) and concatenate it to the
+ parameter r. If the stream has been exhausted of all available data,
+ before any can be read, BSTR_ERR is returned. This function may read
+ additional characters from the core stream that are not returned, but
+ will be retained for subsequent read operations.
+
+ ..........................................................................
+
+ extern int bsread (bstring r, struct bStream * s, int n);
+
+ Read a bstring of length n (or, if it is fewer, as many bytes as is
+ remaining) from the bStream. This function will read the minimum
+ required number of additional characters from the core stream. When the
+ stream is at the end of the file BSTR_ERR is returned, otherwise BSTR_OK
+ is returned.
+
+ ..........................................................................
+
+ extern int bsreada (bstring r, struct bStream * s, int n);
+
+ Read a bstring of length n (or, if it is fewer, as many bytes as is
+ remaining) from the bStream and concatenate it to the parameter r. This
+ function will read the minimum required number of additional characters
+ from the core stream. When the stream is at the end of the file BSTR_ERR
+ is returned, otherwise BSTR_OK is returned.
+
+ ..........................................................................
+
+ extern int bsunread (struct bStream * s, const_bstring b);
+
+ Insert a bstring into the bStream at the current position. These
+ characters will be read prior to those that actually come from the core
+ stream.
+
+ ..........................................................................
+
+ extern int bspeek (bstring r, const struct bStream * s);
+
+ Return the number of currently buffered characters from the bStream that
+ will be read prior to reads from the core stream, and append it to the
+ the parameter r.
+
+ ..........................................................................
+
+ extern int bssplitscb (struct bStream * s, const_bstring splitStr,
+ int (* cb) (void * parm, int ofs, const_bstring entry), void * parm);
+
+ Iterate the set of disjoint sequential substrings over the stream s
+ divided by any character from the bstring splitStr. The parm passed to
+ bssplitscb is passed on to cb. If the function cb returns a value < 0,
+ then further iterating is halted and this return value is returned by
+ bssplitscb.
+
+ Note: At the point of calling the cb function, the bStream pointer is
+ pointed exactly at the position right after having read the split
+ character. The cb function can act on the stream by causing the bStream
+ pointer to move, and bssplitscb will continue by starting the next split
+ at the position of the pointer after the return from cb.
+
+ However, if the cb causes the bStream s to be destroyed then the cb must
+ return with a negative value, otherwise bssplitscb will continue in an
+ undefined manner.
+
+ This function is provided as way to incrementally parse through a file
+ or other generic stream that in total size may otherwise exceed the
+ practical or desired memory available. As with the other split callback
+ based functions this is abortable and does not impose additional memory
+ allocation.
+
+ ..........................................................................
+
+ extern int bssplitstrcb (struct bStream * s, const_bstring splitStr,
+ int (* cb) (void * parm, int ofs, const_bstring entry), void * parm);
+
+ Iterate the set of disjoint sequential substrings over the stream s
+ divided by the entire substring splitStr. The parm passed to
+ bssplitstrcb is passed on to cb. If the function cb returns a
+ value < 0, then further iterating is halted and this return value is
+ returned by bssplitstrcb.
+
+ Note: At the point of calling the cb function, the bStream pointer is
+ pointed exactly at the position right after having read the split
+ character. The cb function can act on the stream by causing the bStream
+ pointer to move, and bssplitstrcb will continue by starting the next
+ split at the position of the pointer after the return from cb.
+
+ However, if the cb causes the bStream s to be destroyed then the cb must
+ return with a negative value, otherwise bssplitscb will continue in an
+ undefined manner.
+
+ This function is provided as way to incrementally parse through a file
+ or other generic stream that in total size may otherwise exceed the
+ practical or desired memory available. As with the other split callback
+ based functions this is abortable and does not impose additional memory
+ allocation.
+
+ ..........................................................................
+
+ extern int bseof (const struct bStream * s);
+
+ Return the defacto "EOF" (end of file) state of a stream (1 if the
+ bStream is in an EOF state, 0 if not, and BSTR_ERR if stream is closed or
+ detectably erroneous.) When the readPtr callback returns a value <= 0
+ the stream reaches its "EOF" state. Note that bunread with non-empty
+ content will essentially turn off this state, and the stream will not be
+ in its "EOF" state so long as its possible to read more data out of it.
+
+ Also note that the semantics of bseof() are slightly different from
+ something like feof(). I.e., reaching the end of the stream does not
+ necessarily guarantee that bseof() will return with a value indicating
+ that this has happened. bseof() will only return indicating that it has
+ reached the "EOF" and an attempt has been made to read past the end of
+ the bStream.
+
+The macros
+----------
+
+ The macros described below are shown in a prototype form indicating their
+ intended usage. Note that the parameters passed to these macros will be
+ referenced multiple times. As with all macros, programmer care is
+ required to guard against unintended side effects.
+
+ int blengthe (const_bstring b, int err);
+
+ Returns the length of the bstring. If the bstring is NULL err is
+ returned.
+
+ ..........................................................................
+
+ int blength (const_bstring b);
+
+ Returns the length of the bstring. If the bstring is NULL, the length
+ returned is 0.
+
+ ..........................................................................
+
+ int bchare (const_bstring b, int p, int c);
+
+ Returns the p'th character of the bstring b. If the position p refers to
+ a position that does not exist in the bstring or the bstring is NULL,
+ then c is returned.
+
+ ..........................................................................
+
+ char bchar (const_bstring b, int p);
+
+ Returns the p'th character of the bstring b. If the position p refers to
+ a position that does not exist in the bstring or the bstring is NULL,
+ then '\0' is returned.
+
+ ..........................................................................
+
+ char * bdatae (bstring b, char * err);
+
+ Returns the char * data portion of the bstring b. If b is NULL, err is
+ returned.
+
+ ..........................................................................
+
+ char * bdata (bstring b);
+
+ Returns the char * data portion of the bstring b. If b is NULL, NULL is
+ returned.
+
+ ..........................................................................
+
+ char * bdataofse (bstring b, int ofs, char * err);
+
+ Returns the char * data portion of the bstring b offset by ofs. If b is
+ NULL, err is returned.
+
+ ..........................................................................
+
+ char * bdataofs (bstring b, int ofs);
+
+ Returns the char * data portion of the bstring b offset by ofs. If b is
+ NULL, NULL is returned.
+
+ ..........................................................................
+
+ struct tagbstring var = bsStatic ("...");
+
+ The bsStatic macro allows for static declarations of literal string
+ constants as struct tagbstring structures. The resulting tagbstring does
+ not need to be freed or destroyed. Note that this macro is only well
+ defined for string literal arguments. For more general string pointers,
+ use the btfromcstr macro.
+
+ The resulting struct tagbstring is permanently write protected. Attempts
+ to write to this struct tagbstring from any bstrlib function will lead to
+ BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct
+ tagbstring has no effect.
+
+ ..........................................................................
+
+ <void * blk, int len> <- bsStaticBlkParms ("...")
+
+ The bsStaticBlkParms macro emits a pair of comma seperated parameters
+ corresponding to the block parameters for the block functions in Bstrlib
+ (i.e., blk2bstr, bcatblk, blk2tbstr, bisstemeqblk, bisstemeqcaselessblk.)
+ Note that this macro is only well defined for string literal arguments.
+
+ Examples:
+
+ bstring b = blk2bstr (bsStaticBlkParms ("Fast init. "));
+ bcatblk (b, bsStaticBlkParms ("No frills fast concatenation."));
+
+ These are faster than using bfromcstr() and bcatcstr() respectively
+ because the length of the inline string is known as a compile time
+ constant. Also note that seperate struct tagbstring declarations for
+ holding the output of a bsStatic() macro are not required.
+
+ ..........................................................................
+
+ void btfromcstr (struct tagbstring& t, const char * s);
+
+ Fill in the tagbstring t with the '\0' terminated char buffer s. This
+ action is purely reference oriented; no memory management is done. The
+ data member is just assigned s, and slen is assigned the strlen of s.
+ The s parameter is accessed exactly once in this macro.
+
+ The resulting struct tagbstring is initially write protected. Attempts
+ to write to this struct tagbstring in a write protected state from any
+ bstrlib function will lead to BSTR_ERR being returned. Invoke the
+ bwriteallow on this struct tagbstring to make it writeable (though this
+ requires that s be obtained from a function compatible with malloc.)
+
+ ..........................................................................
+
+ void btfromblk (struct tagbstring& t, void * s, int len);
+
+ Fill in the tagbstring t with the data buffer s with length len. This
+ action is purely reference oriented; no memory management is done. The
+ data member of t is just assigned s, and slen is assigned len. Note that
+ the buffer is not appended with a '\0' character. The s and len
+ parameters are accessed exactly once each in this macro.
+
+ The resulting struct tagbstring is initially write protected. Attempts
+ to write to this struct tagbstring in a write protected state from any
+ bstrlib function will lead to BSTR_ERR being returned. Invoke the
+ bwriteallow on this struct tagbstring to make it writeable (though this
+ requires that s be obtained from a function compatible with malloc.)
+
+ ..........................................................................
+
+ void btfromblkltrimws (struct tagbstring& t, void * s, int len);
+
+ Fill in the tagbstring t with the data buffer s with length len after it
+ has been left trimmed. This action is purely reference oriented; no
+ memory management is done. The data member of t is just assigned to a
+ pointer inside the buffer s. Note that the buffer is not appended with a
+ '\0' character. The s and len parameters are accessed exactly once each
+ in this macro.
+
+ The resulting struct tagbstring is permanently write protected. Attempts
+ to write to this struct tagbstring from any bstrlib function will lead to
+ BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct
+ tagbstring has no effect.
+
+ ..........................................................................
+
+ void btfromblkrtrimws (struct tagbstring& t, void * s, int len);
+
+ Fill in the tagbstring t with the data buffer s with length len after it
+ has been right trimmed. This action is purely reference oriented; no
+ memory management is done. The data member of t is just assigned to a
+ pointer inside the buffer s. Note that the buffer is not appended with a
+ '\0' character. The s and len parameters are accessed exactly once each
+ in this macro.
+
+ The resulting struct tagbstring is permanently write protected. Attempts
+ to write to this struct tagbstring from any bstrlib function will lead to
+ BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct
+ tagbstring has no effect.
+
+ ..........................................................................
+
+ void btfromblktrimws (struct tagbstring& t, void * s, int len);
+
+ Fill in the tagbstring t with the data buffer s with length len after it
+ has been left and right trimmed. This action is purely reference
+ oriented; no memory management is done. The data member of t is just
+ assigned to a pointer inside the buffer s. Note that the buffer is not
+ appended with a '\0' character. The s and len parameters are accessed
+ exactly once each in this macro.
+
+ The resulting struct tagbstring is permanently write protected. Attempts
+ to write to this struct tagbstring from any bstrlib function will lead to
+ BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct
+ tagbstring has no effect.
+
+ ..........................................................................
+
+ void bmid2tbstr (struct tagbstring& t, bstring b, int pos, int len);
+
+ Fill the tagbstring t with the substring from b, starting from position
+ pos with a length len. The segment is clamped by the boundaries of
+ the bstring b. This action is purely reference oriented; no memory
+ management is done. Note that the buffer is not appended with a '\0'
+ character. Note that the t parameter to this macro may be accessed
+ multiple times. Note that the contents of t will become undefined
+ if the contents of b change or are destroyed.
+
+ The resulting struct tagbstring is permanently write protected. Attempts
+ to write to this struct tagbstring in a write protected state from any
+ bstrlib function will lead to BSTR_ERR being returned. Invoking the
+ bwriteallow macro on this struct tagbstring will have no effect.
+
+ ..........................................................................
+
+ bstring bfromStatic("...");
+
+ Allocate a bstring with the contents of a string literal. Returns
+ NULL if an error has occurred (ran out of memory). The string literal
+ parameter is enforced as literal at compile time.
+
+ ..........................................................................
+
+ int bcatStatic (bstring b, "...");
+
+ Append a string literal to bstring b. Returns 0 if successful, or
+ BSTR_ERR if some error has occurred. The string literal parameter is
+ enforced as literal at compile time.
+
+ ..........................................................................
+
+ int binsertStatic (bstring s1, int pos, " ... ", char fill);
+
+ Inserts the string literal into s1 at position pos. If the position pos
+ is past the end of s1, then the character "fill" is appended as necessary
+ to make up the gap between the end of s1 and pos. The value BSTR_OK is
+ returned if the operation is successful, otherwise BSTR_ERR is returned.
+
+ ..........................................................................
+
+ int bassignStatic (bstring b, " ... ");
+
+ Assign the contents of a string literal to the bstring b. The string
+ literal parameter is enforced as literal at compile time.
+
+ ..........................................................................
+
+ int biseqStatic (const_bstring b, " ... ");
+
+ Compare the string b with the string literal. If the content differs, 0
+ is returned, if the content is the same, 1 is returned, if there is an
+ error, -1 is returned. If the length of the strings are different, this
+ function is O(1). '\0' characters are not treated in any special way.
+
+ ..........................................................................
+
+ int biseqcaselessStatic (const_bstring b, " ... ");
+
+ Compare content of b and the string literal for equality without
+ differentiating between character case. If the content differs other
+ than in case, 0 is returned, if, ignoring case, the content is the same,
+ 1 is returned, if there is an error, -1 is returned. If the length of
+ the strings are different, this function is O(1). '\0' characters are
+ not treated in any special way.
+
+ ..........................................................................
+
+ int bisstemeqStatic (bstring b, " ... ");
+
+ Compare beginning of bstring b with a string literal for equality. If
+ the beginning of b differs from the memory block (or if b is too short),
+ 0 is returned, if the bstrings are the same, 1 is returned, if there is
+ an error, -1 is returned. The string literal parameter is enforced as
+ literal at compile time.
+
+ ..........................................................................
+
+ int bisstemeqcaselessStatic (bstring b, " ... ");
+
+ Compare beginning of bstring b with a string literal without
+ differentiating between case for equality. If the beginning of b differs
+ from the memory block other than in case (or if b is too short), 0 is
+ returned, if the bstrings are the same, 1 is returned, if there is an
+ error, -1 is returned. The string literal parameter is enforced as
+ literal at compile time.
+
+ ..........................................................................
+
+ bstring bjoinStatic (const struct bstrList * bl, " ... ");
+
+ Join the entries of a bstrList into one bstring by sequentially
+ concatenating them with the string literal in between. If there is an
+ error NULL is returned, otherwise a bstring with the correct result is
+ returned. See bstrListCreate() above for structure of struct bstrList.
+
+ ..........................................................................
+
+ void bvformata (int& ret, bstring b, const char * format, lastarg);
+
+ Append the bstring b with printf like formatting with the format control
+ string, and the arguments taken from the ... list of arguments after
+ lastarg passed to the containing function. If the containing function
+ does not have ... parameters or lastarg is not the last named parameter
+ before the ... then the results are undefined. If successful, the
+ results are appended to b and BSTR_OK is assigned to ret. Otherwise
+ BSTR_ERR is assigned to ret.
+
+ Example:
+
+ void dbgerror (FILE * fp, const char * fmt, ...) {
+ int ret;
+ bstring b;
+ bvformata (ret, b = bfromcstr ("DBG: "), fmt, fmt);
+ if (BSTR_OK == ret) fputs ((char *) bdata (b), fp);
+ bdestroy (b);
+ }
+
+ Note that if the BSTRLIB_NOVSNP macro was set when bstrlib had been
+ compiled the bvformata macro will not link properly. If the
+ BSTRLIB_NOVSNP macro has been set, the bvformata macro will not be
+ available.
+
+ ..........................................................................
+
+ void bwriteprotect (struct tagbstring& t);
+
+ Disallow bstring from being written to via the bstrlib API. Attempts to
+ write to the resulting tagbstring from any bstrlib function will lead to
+ BSTR_ERR being returned.
+
+ Note: bstrings which are write protected cannot be destroyed via bdestroy.
+
+ Note to C++ users: Setting a CBString as write protected will not prevent
+ it from being destroyed by the destructor.
+
+ ..........................................................................
+
+ void bwriteallow (struct tagbstring& t);
+
+ Allow bstring to be written to via the bstrlib API. Note that such an
+ action makes the bstring both writable and destroyable. If the bstring is
+ not legitimately writable (as is the case for struct tagbstrings
+ initialized with a bsStatic value), the results of this are undefined.
+
+ Note that invoking the bwriteallow macro may increase the number of
+ reallocs by one more than necessary for every call to bwriteallow
+ interleaved with any bstring API which writes to this bstring.
+
+ ..........................................................................
+
+ int biswriteprotected (struct tagbstring& t);
+
+ Returns 1 if the bstring is write protected, otherwise 0 is returned.
+
+===============================================================================
+
+Unicode functions
+-----------------
+
+ The two modules utf8util.c and buniutil.c implement basic functions for
+ parsing and collecting Unicode data in the UTF8 format. Unicode is
+ described by a sequence of "code points" which are values between 0 and
+ 1114111 inclusive mapped to symbol content corresponding to nearly all
+ the standardized scripts of the world.
+
+ The semantics of Unicode code points is varied and complicated. The
+ base support of the better string library does not attempt to perform
+ any interpretation of these code points. The better string library
+ solely provides support for iterating through unicode code points,
+ appending and extracting code points to and from bstrings, and parsing
+ UTF8 and UTF16 from raw data.
+
+ The types cpUcs4 and cpUcs2 respectively are defined as 4 byte and 2 byte
+ encoding formats corresponding to UCS4 and UCS2 respectively. To test
+ if a raw code point is valid, the macro isLegalUnicodeCodePoint() has
+ been defined. The utf8 iterator is defined by struct utf8Iterator. To
+ test if the iterator has more code points to walk through the macro
+ utf8IteratorNoMore() has been defined.
+
+ To use these functions compile and link utf8util.c and buniutil.c
+
+ ..........................................................................
+
+ extern void utf8IteratorInit (struct utf8Iterator* iter,
+ unsigned char* data, int slen);
+
+ Initialize a unicode utf8 iterator to traverse an array of utf8 encoded
+ code points pointed to by data, with length slen from the start. The
+ iterator iter is only valid for as long as the array it is pointed to
+ is valid and not modified.
+
+ ..........................................................................
+
+ extern void utf8IteratorUninit (struct utf8Iterator* iter);
+
+ Invalidate utf8 iterator. After calling this the iterator iter, should
+ yield false when passed to the utf8IteratorNoMore() macro.
+
+ ..........................................................................
+
+ extern cpUcs4 utf8IteratorGetNextCodePoint (struct utf8Iterator* iter,
+ cpUcs4 errCh);
+
+ Parse code point the iterator is pointing at and advance the iterator to
+ the next code point. If the iterator was pointing at a valid code point
+ the code point is returned, otherwise, errCh will be returned.
+
+ ..........................................................................
+
+ extern cpUcs4 utf8IteratorGetCurrCodePoint (struct utf8Iterator* iter,
+ cpUcs4 errCh);
+
+ Parse code point the iterator is pointing at. If the iterator was
+ pointing at a valid code point the code point is returned, otherwise,
+ errCh will be returned.
+
+ ..........................................................................
+
+ extern int utf8ScanBackwardsForCodePoint (unsigned char* msg, int len,
+ int pos, cpUcs4* out);
+
+ From the position "pos" in the array msg of length len, search for the
+ last position before or at pos where from which a valid Unicode code
+ point can be parsed. If such an offset is found it is returned otherwise
+ a negative value is returned. The code point parsed is put into *out if
+ it is not NULL.
+
+ ..........................................................................
+
+ extern int buIsUTF8Content (const_bstring bu);
+
+ Scan a bstring and determine if it is made entirely of unicode code
+ valid points. If it is, 1 is returned, otherwise 0 is returned.
+
+ ..........................................................................
+
+ extern int buAppendBlkUcs4 (bstring b, const cpUcs4* bu, int len,
+ cpUcs4 errCh);
+
+ Append the code points passed in the UCS4 format (raw numbers) in the
+ array bu of length len. Any unparsable characters are replaced by errCh.
+ If errCh is not a valid Unicode code point, then parsing errors will cause
+ BSTR_ERR to be returned.
+
+ ..........................................................................
+
+ extern int buGetBlkUTF16 (cpUcs2* ucs2, int len, cpUcs4 errCh,
+ const_bstring bu, int pos);
+
+ Convert a string of UTF8 codepoints (bu), skipping the first pos, into a
+ sequence of UTF16 encoded code points. Returns the number of UCS2 16-bit
+ words written to the output. No more than len words are written to the
+ target array ucs2. If any code point in bu is unparsable, it will be
+ translated to errCh.
+
+ ..........................................................................
+
+ extern int buAppendBlkUTF16 (bstring bu, const cpUcs2* utf16, int len,
+ cpUcs2* bom, cpUcs4 errCh);
+
+ Append an array of UCS2 code points (utf16) to UTF8 codepoints (bu). Any
+ invalid code point is replaced by errCh. If errCh is itself not a
+ valid code point, then this translation will halt upon the first error
+ and return BSTR_ERR. Otherwise BSTR_OK is returned. If a byte order mark
+ has been previously read, it may be passed in as bom, otherwise if *bom is
+ set to 0, it will be filled in with the BOM as read from the first
+ character if it is a BOM.
+
+===============================================================================
+
+The bstest module
+-----------------
+
+The bstest module is just a unit test for the bstrlib module. For correct
+implementations of bstrlib, it should execute with 0 failures being reported.
+This test should be utilized if modifications/customizations to bstrlib have
+been performed. It tests each core bstrlib function with bstrings of every
+mode (read-only, NULL, static and mutable) and ensures that the expected
+semantics are observed (including results that should indicate an error). It
+also tests for aliasing support. Passing bstest is a necessary but not a
+sufficient condition for ensuring the correctness of the bstrlib module.
+
+
+The test module
+---------------
+
+The test module is just a unit test for the bstrwrap module. For correct
+implementations of bstrwrap, it should execute with 0 failures being
+reported. This test should be utilized if modifications/customizations to
+bstrwrap have been performed. It tests each core bstrwrap function with
+CBStrings write protected or not and ensures that the expected semantics are
+observed (including expected exceptions.) Note that exceptions cannot be
+disabled to run this test. Passing test is a necessary but not a sufficient
+condition for ensuring the correctness of the bstrwrap module.
+
+===============================================================================
+
+Using Bstring and CBString as an alternative to the C library
+-------------------------------------------------------------
+
+First let us give a table of C library functions and the alternative bstring
+functions and CBString methods that should be used instead of them.
+
+C-library Bstring alternative CBString alternative
+--------- ------------------- --------------------
+gets bgets ::gets
+strcpy bassign = operator
+strncpy bassignmidstr ::midstr
+strcat bconcat += operator
+strncat bconcat + btrunc += operator + ::trunc
+strtok bsplit, bsplits ::split
+sprintf b(assign)format ::format
+snprintf b(assign)format + btrunc ::format + ::trunc
+vsprintf bvformata bvformata
+
+vsnprintf bvformata + btrunc bvformata + btrunc
+vfprintf bvformata + fputs use bvformata + fputs
+strcmp biseq, bstrcmp comparison operators.
+strncmp bstrncmp, memcmp bstrncmp, memcmp
+strlen ->slen, blength ::length
+strdup bstrcpy constructor
+strset bpattern ::fill
+strstr binstr ::find
+strpbrk binchr ::findchr
+stricmp bstricmp cast & use bstricmp
+strlwr btolower cast & use btolower
+strupr btoupper cast & use btoupper
+strrev bReverse (aux module) cast & use bReverse
+strchr bstrchr cast & use bstrchr
+strspnp use strspn use strspn
+ungetc bsunread bsunread
+
+The top 9 C functions listed here are troublesome in that they impose memory
+management in the calling function. The Bstring and CBstring interfaces have
+built-in memory management, so there is far less code with far less potential
+for buffer overrun problems. strtok can only be reliably called as a "leaf"
+calculation, since it (quite bizarrely) maintains hidden internal state. And
+gets is well known to be broken no matter what. The Bstrlib alternatives do
+not suffer from those sorts of problems.
+
+The substitute for strncat can be performed with higher performance by using
+the blk2tbstr macro to create a presized second operand for bconcat.
+
+C-library Bstring alternative CBString alternative
+--------- ------------------- --------------------
+strspn strspn acceptable strspn acceptable
+strcspn strcspn acceptable strcspn acceptable
+strnset strnset acceptable strnset acceptable
+printf printf acceptable printf acceptable
+puts puts acceptable puts acceptable
+fprintf fprintf acceptable fprintf acceptable
+fputs fputs acceptable fputs acceptable
+memcmp memcmp acceptable memcmp acceptable
+
+Remember that Bstring (and CBstring) functions will automatically append the
+'\0' character to the character data buffer. So by simply accessing the data
+buffer directly, ordinary C string library functions can be called directly
+on them. Note that bstrcmp is not the same as memcmp in exactly the same way
+that strcmp is not the same as memcmp.
+
+C-library Bstring alternative CBString alternative
+--------- ------------------- --------------------
+fread balloc + fread ::alloc + fread
+fgets balloc + fgets ::alloc + fgets
+
+These are odd ones because of the exact sizing of the buffer required. The
+Bstring and CBString alternatives requires that the buffers are forced to
+hold at least the prescribed length, then just use fread or fgets directly.
+However, typically the automatic memory management of Bstring and CBstring
+will make the typical use of fgets and fread to read specifically sized
+strings unnecessary.
+
+Implementation Choices
+----------------------
+
+Overhead:
+.........
+
+The bstring library has more overhead versus straight char buffers for most
+functions. This overhead is essentially just the memory management and
+string header allocation. This overhead usually only shows up for small
+string manipulations. The performance loss has to be considered in
+light of the following:
+
+1) What would be the performance loss of trying to write this management
+ code in one's own application?
+2) Since the bstring library source code is given, a sufficiently powerful
+ modern inlining globally optimizing compiler can remove function call
+ overhead.
+
+Since the data type is exposed, a developer can replace any unsatisfactory
+function with their own inline implementation. And that is besides the main
+point of what the better string library is mainly meant to provide. Any
+overhead lost has to be compared against the value of the safe abstraction
+for coupling memory management and string functionality.
+
+Performance of the C interface:
+...............................
+
+The algorithms used have performance advantages versus the analogous C
+library functions. For example:
+
+1. bfromcstr/blk2str/bstrcpy versus strcpy/strdup. By using memmove instead
+ of strcpy, the break condition of the copy loop is based on an independent
+ counter (that should be allocated in a register) rather than having to
+ check the results of the load. Modern out-of-order executing CPUs can
+ parallelize the final branch mis-predict penality with the loading of the
+ source string. Some CPUs will also tend to have better built-in hardware
+ support for counted memory moves than load-compare-store. (This is a
+ minor, but non-zero gain.)
+2. biseq versus strcmp. If the strings are unequal in length, bsiseq will
+ return in O(1) time. If the strings are aliased, or have aliased data
+ buffers, biseq will return in O(1) time. strcmp will always be O(k),
+ where k is the length of the common prefix or the whole string if they are
+ identical.
+3. ->slen versus strlen. ->slen is obviously always O(1), while strlen is
+ always O(n) where n is the length of the string.
+4. bconcat versus strcat. Both rely on precomputing the length of the
+ destination string argument, which will favor the bstring library. On
+ iterated concatenations the performance difference can be enormous.
+5. bsreadln versus fgets. The bsreadln function reads large blocks at a time
+ from the given stream, then parses out lines from the buffers directly.
+ Some C libraries will implement fgets as a loop over single fgetc calls.
+ Testing indicates that the bsreadln approach can be several times faster
+ for fast stream devices (such as a file that has been entirely cached.)
+6. bsplits/bsplitscb versus strspn. Accelerators for the set of match
+ characters are generated only once.
+7. binstr versus strstr. The binstr implementation unrolls the loops to
+ help reduce loop overhead. This will matter if the target string is
+ long and source string is not found very early in the target string.
+ With strstr, while it is possible to unroll the source contents, it is
+ not possible to do so with the destination contents in a way that is
+ effective because every destination character must be tested against
+ '\0' before proceeding to the next character.
+8. bReverse versus strrev. The C function must find the end of the string
+ first before swaping character pairs.
+9. bstrrchr versus no comparable C function. Its not hard to write some C
+ code to search for a character from the end going backwards. But there
+ is no way to do this without computing the length of the string with
+ strlen.
+
+Practical testing indicates that in general Bstrlib is never signifcantly
+slower than the C library for common operations, while very often having a
+performance advantage that ranges from significant to massive. Even for
+functions like b(n)inchr versus str(c)spn() (where, in theory, there is no
+advantage for the Bstrlib architecture) the performance of Bstrlib is vastly
+superior to most tested C library implementations.
+
+Some of Bstrlib's extra functionality also lead to inevitable performance
+advantages over typical C solutions. For example, using the blk2tbstr macro,
+one can (in O(1) time) generate an internal substring by reference while not
+disturbing the original string. If disturbing the original string is not an
+option, typically, a comparable char * solution would have to make a copy of
+the substring to provide similar functionality. Another example is reverse
+character set scanning -- the str(c)spn functions only scan in a forward
+direction which can complicate some parsing algorithms.
+
+Where high performance char * based algorithms are available, Bstrlib can
+still leverage them by accessing the ->data field on bstrings. So
+realistically Bstrlib can never be significantly slower than any standard
+'\0' terminated char * based solutions.
+
+Performance of the C++ interface:
+.................................
+
+The C++ interface has been designed with an emphasis on abstraction and safety
+first. However, since it is substantially a wrapper for the C bstring
+functions, for longer strings the performance comments described in the
+"Performance of the C interface" section above still apply. Note that the
+(CBString *) type can be directly cast to a (bstring) type, and passed as
+parameters to the C functions (though a CBString must never be passed to
+bdestroy.)
+
+Probably the most controversial choice is performing full bounds checking on
+the [] operator. This decision was made because 1) the fast alternative of
+not bounds checking is still available by first casting the CBString to a
+(const char *) buffer or to a (struct tagbstring) then derefencing .data and
+2) because the lack of bounds checking is seen as one of the main weaknesses
+of C/C++ versus other languages. This check being done on every access leads
+to individual character extraction being actually slower than other languages
+in this one respect (other language's compilers will normally dedicate more
+resources on hoisting or removing bounds checking as necessary) but otherwise
+bring C++ up to the level of other languages in terms of functionality.
+
+It is common for other C++ libraries to leverage the abstractions provided by
+C++ to use reference counting and "copy on write" policies. While these
+techniques can speed up some scenarios, they impose a problem with respect to
+thread safety. bstrings and CBStrings can be properly protected with
+"per-object" mutexes, meaning that two bstrlib calls can be made and execute
+simultaneously, so long as the bstrings and CBstrings are distinct. With a
+reference count and alias before copy on write policy, global mutexes are
+required that prevent multiple calls to the strings library to execute
+simultaneously regardless of whether or not the strings represent the same
+string.
+
+One interesting trade off in CBString is that the default constructor is not
+trivial. I.e., it always prepares a ready to use memory buffer. The purpose
+is to ensure that there is a uniform internal composition for any functioning
+CBString that is compatible with bstrings. It also means that the other
+methods in the class are not forced to perform "late initialization" checks.
+In the end it means that construction of CBStrings are slower than other
+comparable C++ string classes. Initial testing, however, indicates that
+CBString outperforms std::string and MFC's CString, for example, in all other
+operations. So to work around this weakness it is recommended that CBString
+declarations be pushed outside of inner loops.
+
+Practical testing indicates that with the exception of the caveats given
+above (constructors and safe index character manipulations) the C++ API for
+Bstrlib generally outperforms popular standard C++ string classes. Amongst
+the standard libraries and compilers, the quality of concatenation operations
+varies wildly and very little care has gone into search functions. Bstrlib
+dominates those performance benchmarks.
+
+Memory management:
+..................
+
+The bstring functions which write and modify bstrings will automatically
+reallocate the backing memory for the char buffer whenever it is required to
+grow. The algorithm for resizing chosen is to snap up to sizes that are a
+power of two which are sufficient to hold the intended new size. Memory
+reallocation is not performed when the required size of the buffer is
+decreased. This behavior can be relied on, and is necessary to make the
+behaviour of balloc deterministic. This trades off additional memory usage
+for decreasing the frequency for required reallocations:
+
+1. For any bstring whose size never exceeds n, its buffer is not ever
+ reallocated more than log_2(n) times for its lifetime.
+2. For any bstring whose size never exceeds n, its buffer is never more than
+ 2*(n+1) in length. (The extra characters beyond 2*n are to allow for the
+ implicit '\0' which is always added by the bstring modifying functions.)
+
+Decreasing the buffer size when the string decreases in size would violate 1)
+above and in real world case lead to pathological heap thrashing. Similarly,
+allocating more tightly than "least power of 2 greater than necessary" would
+lead to a violation of 1) and have the same potential for heap thrashing.
+
+Property 2) needs emphasizing. Although the memory allocated is always a
+power of 2, for a bstring that grows linearly in size, its buffer memory also
+grows linearly, not exponentially. The reason is that the amount of extra
+space increases with each reallocation, which decreases the frequency of
+future reallocations.
+
+Obviously, given that bstring writing functions may reallocate the data
+buffer backing the target bstring, one should not attempt to cache the data
+buffer address and use it after such bstring functions have been called.
+This includes making reference struct tagbstrings which alias to a writable
+bstring.
+
+balloc or bfromcstralloc can be used to preallocate the minimum amount of
+space used for a given bstring. This will reduce even further the number of
+times the data portion is reallocated. If the length of the string is never
+more than one less than the memory length then there will be no further
+reallocations.
+
+Note that invoking the bwriteallow macro may increase the number of reallocs
+by one more than necessary for every call to bwriteallow interleaved with any
+bstring API which writes to this bstring.
+
+The library does not use any mechanism for automatic clean up for the C API.
+Thus explicit clean up via calls to bdestroy() are required to avoid memory
+leaks.
+
+Constant and static tagbstrings:
+................................
+
+A struct tagbstring can be write protected from any bstrlib function using
+the bwriteprotect macro. A write protected struct tagbstring can then be
+reset to being writable via the bwriteallow macro. There is, of course, no
+protection from attempts to directly access the bstring members. Modifying a
+bstring which is write protected by direct access has undefined behavior.
+
+static struct tagbstrings can be declared via the bsStatic macro. They are
+considered permanently unwritable. Such struct tagbstrings's are declared
+such that attempts to write to it are not well defined. Invoking either
+bwriteallow or bwriteprotect on static struct tagbstrings has no effect.
+
+struct tagbstring's initialized via btfromcstr or blk2tbstr are protected by
+default but can be made writeable via the bwriteallow macro. If bwriteallow
+is called on such struct tagbstring's, it is the programmer's responsibility
+to ensure that:
+
+1) the buffer supplied was allocated from the heap.
+2) bdestroy is not called on this tagbstring (unless the header itself has
+ also been allocated from the heap.)
+3) free is called on the buffer to reclaim its memory.
+
+bwriteallow and bwriteprotect can be invoked on ordinary bstrings (they have
+to be dereferenced with the (*) operator to get the levels of indirection
+correct) to give them write protection.
+
+Buffer declaration:
+...................
+
+The memory buffer is actually declared "unsigned char *" instead of "char *".
+The reason for this is to trigger compiler warnings whenever uncasted char
+buffers are assigned to the data portion of a bstring. This will draw more
+diligent programmers into taking a second look at the code where they
+have carelessly left off the typically required cast. (Research from
+AT&T/Lucent indicates that additional programmer eyeballs is one of the most
+effective mechanisms at ferreting out bugs.)
+
+Function pointers:
+..................
+
+The bgets, bread and bStream functions use function pointers to obtain
+strings from data streams. The function pointer declarations have been
+specifically chosen to be compatible with the fgetc and fread functions.
+While this may seem to be a convoluted way of implementing fgets and fread
+style functionality, it has been specifically designed this way to ensure
+that there is no dependency on a single narrowly defined set of device
+interfaces, such as just stream I/O. In the embedded world, its quite
+possible to have environments where such interfaces may not exist in the
+standard C library form. Furthermore, the generalization that this opens up
+allows for more sophisticated uses for these functions (performing an fgets
+like function on a socket, for example.) By using function pointers, it also
+allows such abstract stream interfaces to be created using the bstring library
+itself while not creating a circular dependency.
+
+Use of int's for sizes:
+.......................
+
+This is just a recognition that 16bit platforms with requirements for strings
+that are larger than 64K and 32bit+ platforms with requirements for strings
+that are larger than 4GB are pretty marginal. The main focus is for 32bit
+platforms, and emerging 64bit platforms with reasonable < 4GB string
+requirements. Using ints allows for negative values which has meaning
+internally to bstrlib.
+
+Semantic consideration:
+.......................
+
+Certain care needs to be taken when copying and aliasing bstrings. A bstring
+is essentially a pointer type which points to a multipart abstract data
+structure. Thus usage, and lifetime of bstrings have semantics that follow
+these considerations. For example:
+
+ bstring a, b;
+ struct tagbstring t;
+
+ a = bfromcstr("Hello"); /* Create new bstring and copy "Hello" into it. */
+ b = a; /* Alias b to the contents of a. */
+ t = *a; /* Create a current instance pseudo-alias of a. */
+ bconcat (a, b); /* Double a and b, t is now undefined. */
+ bdestroy (a); /* Destroy the contents of both a and b. */
+
+Variables of type bstring are really just references that point to real
+bstring objects. The equal operator (=) creates aliases, and the asterisk
+dereference operator (*) creates a kind of alias to the current instance (which
+is generally not useful for any purpose.) Using bstrcpy() is the correct way
+of creating duplicate instances. The ampersand operator (&) is useful for
+creating aliases to struct tagbstrings (remembering that constructed struct
+tagbstrings are not writable by default.)
+
+CBStrings use complete copy semantics for the equal operator (=), and thus do
+not have these sorts of issues.
+
+Debugging:
+..........
+
+Bstrings have a simple, exposed definition and construction, and the library
+itself is open source. So most debugging is going to be fairly straight-
+forward. But the memory for bstrings come from the heap, which can often be
+corrupted indirectly, and it might not be obvious what has happened even from
+direct examination of the contents in a debugger or a core dump. There are
+some tools such as Purify, Insure++ and Electric Fence which can help solve
+such problems, however another common approach is to directly instrument the
+calls to malloc, realloc, calloc, free, memcpy, memmove and/or other calls
+by overriding them with macro definitions.
+
+Although the user could hack on the Bstrlib sources directly as necessary to
+perform such an instrumentation, Bstrlib comes with a built-in mechanism for
+doing this. By defining the macro BSTRLIB_MEMORY_DEBUG and providing an
+include file named memdbg.h this will force the core Bstrlib modules to
+attempt to include this file. In such a file, macros could be defined which
+overrides Bstrlib's useage of the C standard library.
+
+Rather than calling malloc, realloc, free, memcpy or memmove directly, Bstrlib
+emits the macros bstr__alloc, bstr__realloc, bstr__free, bstr__memcpy and
+bstr__memmove in their place respectively. By default these macros are simply
+assigned to be equivalent to their corresponding C standard library function
+call. However, if they are given earlier macro definitions (via the back
+door include file) they will not be given their default definition. In this
+way Bstrlib's interface to the standard library can be changed but without
+having to directly redefine or link standard library symbols (both of which
+are not strictly ANSI C compliant.)
+
+An example definition might include:
+
+ #define bstr__alloc(sz) X_malloc ((sz), __LINE__, __FILE__)
+
+which might help contextualize heap entries in a debugging environment.
+
+The NULL parameter and sanity checking of bstrings is part of the Bstrlib
+API, and thus Bstrlib itself does not present any different modes which would
+correspond to "Debug" or "Release" modes. Bstrlib always contains mechanisms
+which one might think of as debugging features, but retains the performance
+and small memory footprint one would normally associate with release mode
+code.
+
+Integration Microsoft's Visual Studio debugger:
+...............................................
+
+Microsoft's Visual Studio debugger has a capability of customizable mouse
+float over data type descriptions. This is accomplished by editting the
+AUTOEXP.DAT file to include the following:
+
+ ; new for CBString
+ tagbstring =slen=<slen> mlen=<mlen> <data,st>
+ Bstrlib::CBStringList =count=<size()>
+
+In Visual C++ 6.0 this file is located in the directory:
+
+ C:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin
+
+and in Visual Studio .NET 2003 its located here:
+
+ C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Packages\Debugger
+
+This will improve the ability of debugging with Bstrlib under Visual Studio.
+
+Security
+--------
+
+Bstrlib does not come with explicit security features outside of its fairly
+comprehensive error detection, coupled with its strict semantic support.
+That is to say that certain common security problems, such as buffer overrun,
+constant overwrite, arbitrary truncation etc, are far less likely to happen
+inadvertently. Where it does help, Bstrlib maximizes its advantage by
+providing developers a simple adoption path that lets them leave less secure
+string mechanisms behind. The library will not leave developers wanting, so
+they will be less likely to add new code using a less secure string library
+to add functionality that might be missing from Bstrlib.
+
+That said there are a number of security ideas not addressed by Bstrlib:
+
+1. Race condition exploitation (i.e., verifying a string's contents, then
+raising the privilege level and execute it as a shell command as two
+non-atomic steps) is well beyond the scope of what Bstrlib can provide. It
+should be noted that MFC's built-in string mutex actually does not solve this
+problem either -- it just removes immediate data corruption as a possible
+outcome of such exploit attempts (it can be argued that this is worse, since
+it will leave no trace of the exploitation). In general race conditions have
+to be dealt with by careful design and implementation; it cannot be assisted
+by a string library.
+
+2. Any kind of access control or security attributes to prevent usage in
+dangerous interfaces such as system(). Perl includes a "trust" attribute
+which can be endowed upon strings that are intended to be passed to such
+dangerous interfaces. However, Perl's solution reflects its own limitations
+-- notably that it is not a strongly typed language. In the example code for
+Bstrlib, there is a module called taint.cpp. It demonstrates how to write a
+simple wrapper class for managing "untainted" or trusted strings using the
+type system to prevent questionable mixing of ordinary untrusted strings with
+untainted ones then passing them to dangerous interfaces. In this way the
+security correctness of the code reduces to auditing the direct usages of
+dangerous interfaces or promotions of tainted strings to untainted ones.
+
+3. Encryption of string contents is way beyond the scope of Bstrlib.
+Maintaining encrypted string contents in the futile hopes of thwarting things
+like using system-level debuggers to examine sensitive string data is likely
+to be a wasted effort (imagine a debugger that runs at a higher level than a
+virtual processor where the application runs). For more standard encryption
+usages, since the bstring contents are simply binary blocks of data, this
+should pose no problem for usage with other standard encryption libraries.
+
+Compatibility
+-------------
+
+The Better String Library is known to compile and function correctly with the
+following compilers:
+
+ - Microsoft Visual C++
+ - Watcom C/C++
+ - Intel's C/C++ compiler (Windows)
+ - The GNU C/C++ compiler (cygwin and Linux on PPC64)
+ - Borland C
+ - Turbo C
+
+Setting of configuration options should be unnecessary for these compilers
+(unless exceptions are being disabled or STLport has been added to WATCOM
+C/C++). Bstrlib has been developed with an emphasis on portability. As such
+porting it to other compilers should be straight forward. This package
+includes a porting guide (called porting.txt) which explains what issues may
+exist for porting Bstrlib to different compilers and environments.
+
+ANSI issues
+-----------
+
+1. The function pointer types bNgetc and bNread have prototypes which are very
+similar to, but not exactly the same as fgetc and fread respectively.
+Basically the FILE * parameter is replaced by void *. The purpose of this
+was to allow one to create other functions with fgetc and fread like
+semantics without being tied to ANSI C's file streaming mechanism. I.e., one
+could very easily adapt it to sockets, or simply reading a block of memory,
+or procedurally generated strings (for fractal generation, for example.)
+
+The problem is that invoking the functions (bNgetc)fgetc and (bNread)fread is
+not technically legal in ANSI C. The reason being that the compiler is only
+able to coerce the function pointers themselves into the target type, however
+are unable to perform any cast (implicit or otherwise) on the parameters
+passed once invoked. I.e., if internally void * and FILE * need some kind of
+mechanical coercion, the compiler will not properly perform this conversion
+and thus lead to undefined behavior.
+
+Apparently a platform from Data General called "Eclipse" and another from
+Tandem called "NonStop" have a different representation for pointers to bytes
+and pointers to words, for example, where coercion via casting is necessary.
+(Actual confirmation of the existence of such machines is hard to come by, so
+it is prudent to be skeptical about this information.) However, this is not
+an issue for any known contemporary platforms. One may conclude that such
+platforms are effectively apocryphal even if they do exist.
+
+To correctly work around this problem to the satisfaction of the ANSI
+limitations, one needs to create wrapper functions for fgets and/or
+fread with the prototypes of bNgetc and/or bNread respectively which performs
+no other action other than to explicitely cast the void * parameter to a
+FILE *, and simply pass the remaining parameters straight to the function
+pointer call.
+
+The wrappers themselves are trivial:
+
+ size_t freadWrap (void * buff, size_t esz, size_t eqty, void * parm) {
+ return fread (buff, esz, eqty, (FILE *) parm);
+ }
+
+ int fgetcWrap (void * parm) {
+ return fgetc ((FILE *) parm);
+ }
+
+These have not been supplied in bstrlib or bstraux to prevent unnecessary
+linking with file I/O functions.
+
+2. vsnprintf is not available on all compilers. Because of this, the bformat
+and bformata functions (and format and formata methods) are not guaranteed to
+work properly. For those compilers that don't have vsnprintf, the
+BSTRLIB_NOVSNP macro should be set before compiling bstrlib, and the format
+functions/method will be disabled.
+
+The more recent ANSI C standards have specified the required inclusion of a
+vsnprintf function.
+
+3. The bstrlib function names are not unique in the first 6 characters. This
+is only an issue for older C compiler environments which do not store more
+than 6 characters for function names.
+
+4. The bsafe module defines macros and function names which are part of the
+C library. This simply overrides the definition as expected on all platforms
+tested, however it is not sanctioned by the ANSI standard. This module is
+clearly optional and should be omitted on platforms which disallow its
+undefined semantics.
+
+In practice the real issue is that some compilers in some modes of operation
+can/will inline these standard library functions on a module by module basis
+as they appear in each. The linker will thus have no opportunity to override
+the implementation of these functions for those cases. This can lead to
+inconsistent behaviour of the bsafe module on different platforms and
+compilers.
+
+===============================================================================
+
+Comparison with Microsoft's CString class
+-----------------------------------------
+
+Although developed independently, CBStrings have very similar functionality to
+Microsoft's CString class. However, the bstring library has significant
+advantages over CString:
+
+1. Bstrlib is a C-library as well as a C++ library (using the C++ wrapper).
+
+ - Thus it is compatible with more programming environments and
+ available to a wider population of programmers.
+
+2. The internal structure of a bstring is considered exposed.
+
+ - A single contiguous block of data can be cut into read-only pieces by
+ simply creating headers, without allocating additional memory to create
+ reference copies of each of these sub-strings.
+ - In this way, using bstrings in a totally abstracted way becomes a choice
+ rather than an imposition. Further this choice can be made differently
+ at different layers of applications that use it.
+
+3. Static declaration support precludes the need for constructor
+ invocation.
+
+ - Allows for static declarations of constant strings that has no
+ additional constructor overhead.
+
+4. Bstrlib is not attached to another library.
+
+ - Bstrlib is designed to be easily plugged into any other library
+ collection, without dependencies on other libraries or paradigms (such
+ as "MFC".)
+
+The bstring library also comes with a few additional functions that are not
+available in the CString class:
+
+ - bsetstr
+ - bsplit
+ - bread
+ - breplace (this is different from CString::Replace())
+ - Writable indexed characters (for example a[i]='x')
+
+Interestingly, although Microsoft did implement mid$(), left$() and right$()
+functional analogues (these are functions from GWBASIC) they seem to have
+forgotten that mid$() could be also used to write into the middle of a string.
+This functionality exists in Bstrlib with the bsetstr() and breplace()
+functions.
+
+Among the disadvantages of Bstrlib is that there is no special support for
+localization or wide characters. Such things are considered beyond the scope
+of what bstrings are trying to deliver. CString essentially supports the
+older UCS-2 version of Unicode via widechar_t as an application-wide compile
+time switch.
+
+CString's also use built-in mechanisms for ensuring thread safety under all
+situations. While this makes writing thread safe code that much easier, this
+built-in safety feature has a price -- the inner loops of each CString method
+runs in its own critical section (grabbing and releasing a light weight mutex
+on every operation.) The usual way to decrease the impact of a critical
+section performance penalty is to amortize more operations per critical
+section. But since the implementation of CStrings is fixed as a one critical
+section per-operation cost, there is no way to leverage this common
+performance enhancing idea.
+
+The search facilities in Bstrlib are comparable to those in MFC's CString
+class, though it is missing locale specific collation. But because Bstrlib
+is interoperable with C's char buffers, it will allow programmers to write
+their own string searching mechanism (such as Boyer-Moore), or be able to
+choose from a variety of available existing string searching libraries (such
+as those for regular expressions) without difficulty.
+
+Microsoft used a very non-ANSI conforming trick in its implementation to
+allow printf() to use the "%s" specifier to output a CString correctly. This
+can be convenient, but it is inherently not portable. CBString requires an
+explicit cast, while bstring requires the data member to be dereferenced.
+Microsoft's own documentation recommends casting, instead of relying on this
+feature.
+
+Comparison with C++'s std::string
+---------------------------------
+
+This is the C++ language's standard STL based string class.
+
+1. There is no C implementation.
+2. The [] operator is not bounds checked.
+3. Missing a lot of useful functions like printf-like formatting.
+4. Some sub-standard std::string implementations (SGI) are necessarily unsafe
+ to use with multithreading.
+5. Limited by STL's std::iostream which in turn is limited by ifstream which
+ can only take input from files. (Compare to CBStream's API which can take
+ abstracted input.)
+6. Extremely uneven performance across implementations.
+
+Comparison with ISO C TR 24731 proposal
+---------------------------------------
+
+Following the ISO C99 standard, Microsoft has proposed a group of C library
+extensions which are supposedly "safer and more secure". This proposal is
+expected to be adopted by the ISO C standard which follows C99.
+
+The proposal reveals itself to be very similar to Microsoft's "StrSafe"
+library. The functions are basically the same as other standard C library
+string functions except that destination parameters are paired with an
+additional length parameter of type rsize_t. rsize_t is the same as size_t,
+however, the range is checked to make sure its between 1 and RSIZE_MAX. Like
+Bstrlib, the functions perform a "parameter check". Unlike Bstrlib, when a
+parameter check fails, rather than simply outputing accumulatable error
+statuses, they call a user settable global error function handler, and upon
+return of control performs no (additional) detrimental action. The proposal
+covers basic string functions as well as a few non-reenterable functions
+(asctime, ctime, and strtok).
+
+1. Still based solely on char * buffers (and therefore strlen() and strcat()
+ is still O(n), and there are no faster streq() comparison functions.)
+2. No growable string semantics.
+3. Requires manual buffer length synchronization in the source code.
+4. No attempt to enhance functionality of the C library.
+5. Introduces a new error scenario (strings exceeding RSIZE_MAX length).
+
+The hope is that by exposing the buffer length requirements there will be
+fewer buffer overrun errors. However, the error modes are really just
+transformed, rather than removed. The real problem of buffer overflows is
+that they all happen as a result of erroneous programming. So forcing
+programmers to manually deal with buffer limits, will make them more aware of
+the problem but doesn't remove the possibility of erroneous programming. So
+a programmer that erroneously mixes up the rsize_t parameters is no better off
+from a programmer that introduces potential buffer overflows through other
+more typical lapses. So at best this may reduce the rate of erroneous
+programming, rather than making any attempt at removing failure modes.
+
+The error handler can discriminate between types of failures, but does not
+take into account any callsite context. So the problem is that the error is
+going to be manifest in a piece of code, but there is no pointer to that
+code. It would seem that passing in the call site __FILE__, __LINE__ as
+parameters would be very useful, but the API clearly doesn't support such a
+thing (it would increase code bloat even more than the extra length
+parameter does, and would require macro tricks to implement).
+
+The Bstrlib C API takes the position that error handling needs to be done at
+the callsite, and just tries to make it as painless as possible. Furthermore,
+error modes are removed by supporting auto-growing strings and aliasing. For
+capturing errors in more central code fragments, Bstrlib's C++ API uses
+exception handling extensively, which is superior to the leaf-only error
+handler approach.
+
+Comparison with Managed String Library CERT proposal
+----------------------------------------------------
+
+The main webpage for the managed string library:
+http://www.cert.org/secure-coding/managedstring.html
+
+Robert Seacord at CERT has proposed a C string library that he calls the
+"Managed String Library" for C. Like Bstrlib, it introduces a new type
+which is called a managed string. The structure of a managed string
+(string_m) is like a struct tagbstring but missing the length field. This
+internal structure is considered opaque. The length is, like the C standard
+library, always computed on the fly by searching for a terminating NUL on
+every operation that requires it. So it suffers from every performance
+problem that the C standard library suffers from. Interoperating with C
+string APIs (like printf, fopen, or anything else that takes a string
+parameter) requires copying to additionally allocating buffers that have to
+be manually freed -- this makes this library probably slower and more
+cumbersome than any other string library in existence.
+
+The library gives a fully populated error status as the return value of every
+string function. The hope is to be able to diagnose all problems
+specifically from the return code alone. Comparing this to Bstrlib, which
+aways returns one consistent error message, might make it seem that Bstrlib
+would be harder to debug; but this is not true. With Bstrlib, if an error
+occurs there is always enough information from just knowing there was an error
+and examining the parameters to deduce exactly what kind of error has
+happened. The managed string library thus gives up nested function calls
+while achieving little benefit, while Bstrlib does not.
+
+One interesting feature that "managed strings" has is the idea of data
+sanitization via character set whitelisting. That is to say, a globally
+definable filter that makes any attempt to put invalid characters into strings
+lead to an error and not modify the string. The author gives the following
+example:
+
+ // create valid char set
+ if (retValue = strcreate_m(&str1, "abc") ) {
+ fprintf(
+ stderr,
+ "Error %d from strcreate_m.\n",
+ retValue
+ );
+ }
+ if (retValue = setcharset(str1)) {
+ fprintf(
+ stderr,
+ "Error %d from setcharset().\n",
+ retValue
+ );
+ }
+ if (retValue = strcreate_m(&str1, "aabbccabc")) {
+ fprintf(
+ stderr,
+ "Error %d from strcreate_m.\n",
+ retValue
+ );
+ }
+ // create string with invalid char set
+ if (retValue = strcreate_m(&str1, "abbccdabc")) {
+ fprintf(
+ stderr,
+ "Error %d from strcreate_m.\n",
+ retValue
+ );
+ }
+
+Which we can compare with a more Bstrlib way of doing things:
+
+ bstring bCreateWithFilter (const char * cstr, const_bstring filter) {
+ bstring b = bfromcstr (cstr);
+ if (BSTR_ERR != bninchr (b, filter) && NULL != b) {
+ fprintf (stderr, "Filter violation.\n");
+ bdestroy (b);
+ b = NULL;
+ }
+ return b;
+ }
+
+ struct tagbstring charFilter = bsStatic ("abc");
+ bstring str1 = bCreateWithFilter ("aabbccabc", &charFilter);
+ bstring str2 = bCreateWithFilter ("aabbccdabc", &charFilter);
+
+The first thing we should notice is that with the Bstrlib approach you can
+have different filters for different strings if necessary. Furthermore,
+selecting a charset filter in the Managed String Library is uni-contextual.
+That is to say, there can only be one such filter active for the entire
+program, which means its usage is not well defined for intermediate library
+usage (a library that uses it will interfere with user code that uses it, and
+vice versa.) It is also likely to be poorly defined in multi-threading
+environments.
+
+There is also a question as to whether the data sanitization filter is checked
+on every operation, or just on creation operations. Since the charset can be
+set arbitrarily at run time, it might be set *after* some managed strings have
+been created. This would seem to imply that all functions should run this
+additional check every time if there is an attempt to enforce this. This
+would make things tremendously slow. On the other hand, if it is assumed that
+only creates and other operations that take char *'s as input need be checked
+because the charset was only supposed to be called once at and before any
+other managed string was created, then one can see that its easy to cover
+Bstrlib with equivalent functionality via a few wrapper calls such as the
+example given above.
+
+And finally we have to question the value of sanitation in the first place.
+For example, for httpd servers, there is generally a requirement that the
+URLs parsed have some form that avoids undesirable translation to local file
+system filenames or resources. The problem is that the way URLs can be
+encoded, it must be completely parsed and translated to know if it is using
+certain invalid character combinations. That is to say, merely filtering
+each character one at a time is not necessarily the right way to ensure that
+a string has safe contents.
+
+In the article that describes this proposal, it is claimed that it fairly
+closely approximates the existing C API semantics. On this point we should
+compare this "closeness" with Bstrlib:
+
+ Bstrlib Managed String Library
+ ------- ----------------------
+
+Pointer arithmetic Segment arithmetic N/A
+
+Use in C Std lib ->data, or bdata{e} getstr_m(x,*) ... free(x)
+
+String literals bsStatic, bsStaticBlk strcreate_m()
+
+Transparency Complete None
+
+Its pretty clear that the semantic mapping from C strings to Bstrlib is fairly
+straightforward, and that in general semantic capabilities are the same or
+superior in Bstrlib. On the other hand the Managed String Library is either
+missing semantics or changes things fairly significantly.
+
+Comparison with Annexia's c2lib library
+---------------------------------------
+
+This library is available at:
+http://www.annexia.org/freeware/c2lib
+
+1. Still based solely on char * buffers (and therefore strlen() and strcat()
+ is still O(n), and there are no faster streq() comparison functions.)
+ Their suggestion that alternatives which wrap the string data type (such as
+ bstring does) imposes a difficulty in interoperating with the C langauge's
+ ordinary C string library is not founded.
+2. Introduction of memory (and vector?) abstractions imposes a learning
+ curve, and some kind of memory usage policy that is outside of the strings
+ themselves (and therefore must be maintained by the developer.)
+3. The API is massive, and filled with all sorts of trivial (pjoin) and
+ controvertial (pmatch -- regular expression are not sufficiently
+ standardized, and there is a very large difference in performance between
+ compiled and non-compiled, REs) functions. Bstrlib takes a decidely
+ minimal approach -- none of the functionality in c2lib is difficult or
+ challenging to implement on top of Bstrlib (except the regex stuff, which
+ is going to be difficult, and controvertial no matter what.)
+4. Understanding why c2lib is the way it is pretty much requires a working
+ knowledge of Perl. bstrlib requires only knowledge of the C string library
+ while providing just a very select few worthwhile extras.
+5. It is attached to a lot of cruft like a matrix math library (that doesn't
+ include any functions for getting the determinant, eigenvectors,
+ eigenvalues, the matrix inverse, test for singularity, test for
+ orthogonality, a grahm schmit orthogonlization, LU decomposition ... I
+ mean why bother?)
+
+Convincing a development house to use c2lib is likely quite difficult. It
+introduces too much, while not being part of any kind of standards body. The
+code must therefore be trusted, or maintained by those that use it. While
+bstring offers nothing more on this front, since its so much smaller, covers
+far less in terms of scope, and will typically improve string performance,
+the barrier to usage should be much smaller.
+
+Comparison with stralloc/qmail
+------------------------------
+
+More information about this library can be found here:
+http://www.canonical.org/~kragen/stralloc.html or here:
+http://cr.yp.to/lib/stralloc.html
+
+1. Library is very very minimal. A little too minimal.
+2. Untargetted source parameters are not declared const.
+3. Slightly different expected emphasis (like _cats function which takes an
+ ordinary C string char buffer as a parameter.) Its clear that the
+ remainder of the C string library is still required to perform more
+ useful string operations.
+
+The struct declaration for their string header is essentially the same as that
+for bstring. But its clear that this was a quickly written hack whose goals
+are clearly a subset of what Bstrlib supplies. For anyone who is served by
+stralloc, Bstrlib is complete substitute that just adds more functionality.
+
+stralloc actually uses the interesting policy that a NULL data pointer
+indicates an empty string. In this way, non-static empty strings can be
+declared without construction. This advantage is minimal, since static empty
+bstrings can be declared inline without construction, and if the string needs
+to be written to it should be constructed from an empty string (or its first
+initializer) in any event.
+
+wxString class
+--------------
+
+This is the string class used in the wxWindows project. A description of
+wxString can be found here:
+http://www.wxwindows.org/manuals/2.4.2/wx368.htm#wxstring
+
+This C++ library is similar to CBString. However, it is littered with
+trivial functions (IsAscii, UpperCase, RemoveLast etc.)
+
+1. There is no C implementation.
+2. The memory management strategy is to allocate a bounded fixed amount of
+ additional space on each resize, meaning that it does not have the
+ log_2(n) property that Bstrlib has (it will thrash very easily, cause
+ massive fragmentation in common heap implementations, and can easily be a
+ common source of performance problems).
+3. The library uses a "copy on write" strategy, meaning that it has to deal
+ with multithreading problems.
+
+Vstr
+----
+
+This is a highly orthogonal C string library with an emphasis on
+networking/realtime programming. It can be found here:
+http://www.and.org/vstr/
+
+1. The convoluted internal structure does not contain a '\0' char * compatible
+ buffer, so interoperability with the C library a non-starter.
+2. The API and implementation is very large (owing to its orthogonality) and
+ can lead to difficulty in understanding its exact functionality.
+3. An obvious dependency on gnu tools (confusing make configure step)
+4. Uses a reference counting system, meaning that it is not likely to be
+ thread safe.
+
+The implementation has an extreme emphasis on performance for nontrivial
+actions (adds, inserts and deletes are all constant or roughly O(#operations)
+time) following the "zero copy" principle. This trades off performance of
+trivial functions (character access, char buffer access/coersion, alias
+detection) which becomes significantly slower, as well as incremental
+accumulative costs for its searching/parsing functions. Whether or not Vstr
+wins any particular performance benchmark will depend a lot on the benchmark,
+but it should handily win on some, while losing dreadfully on others.
+
+The learning curve for Vstr is very steep, and it doesn't come with any
+obvious way to build for Windows or other platforms without gnu tools. At
+least one mechanism (the iterator) introduces a new undefined scenario
+(writing to a Vstr while iterating through it.) Vstr has a very large
+footprint, and is very ambitious in its total functionality. Vstr has no C++
+API.
+
+Vstr usage requires context initialization via vstr_init() which must be run
+in a thread-local context. Given the totally reference based architecture
+this means that sharing Vstrings across threads is not well defined, or at
+least not safe from race conditions. This API is clearly geared to the older
+standard of fork() style multitasking in UNIX, and is not safely transportable
+to modern shared memory multithreading available in Linux and Windows. There
+is no portable external solution making the library thread safe (since it
+requires a mutex around each Vstr context -- not each string.)
+
+In the documentation for this library, a big deal is made of its self hosted
+s(n)printf-like function. This is an issue for older compilers that don't
+include vsnprintf(), but also an issue because Vstr has a slow conversion to
+'\0' terminated char * mechanism. That is to say, using "%s" to format data
+that originates from Vstr would be slow without some sort of native function
+to do so. Bstrlib sidesteps the issue by relying on what snprintf-like
+functionality does exist and having a high performance conversion to a char *
+compatible string so that "%s" can be used directly.
+
+Str Library
+-----------
+
+This is a fairly extensive string library, that includes full unicode support
+and targetted at the goal of out performing MFC and STL. The architecture,
+similarly to MFC's CStrings, is a copy on write reference counting mechanism.
+
+http://www.utilitycode.com/str/default.aspx
+
+1. Commercial.
+2. C++ only.
+
+This library, like Vstr, uses a ref counting system. There is only so deeply
+I can analyze it, since I don't have a license for it. However, performance
+improvements over MFC's and STL, doesn't seem like a sufficient reason to
+move your source base to it. For example, in the future, Microsoft may
+improve the performance CString.
+
+It should be pointed out that performance testing of Bstrlib has indicated
+that its relative performance advantage versus MFC's CString and STL's
+std::string is at least as high as that for the Str library.
+
+libmib astrings
+---------------
+
+A handful of functional extensions to the C library that add dynamic string
+functionality.
+http://www.mibsoftware.com/libmib/astring/
+
+This package basically references strings through char ** pointers and assumes
+they are pointing to the top of an allocated heap entry (or NULL, in which
+case memory will be newly allocated from the heap.) So its still up to user
+to mix and match the older C string functions with these functions whenever
+pointer arithmetic is used (i.e., there is no leveraging of the type system
+to assert semantic differences between references and base strings as Bstrlib
+does since no new types are introduced.) Unlike Bstrlib, exact string length
+meta data is not stored, thus requiring a strlen() call on *every* string
+writing operation. The library is very small, covering only a handful of C's
+functions.
+
+While this is better than nothing, it is clearly slower than even the
+standard C library, less safe and less functional than Bstrlib.
+
+To explain the advantage of using libmib, their website shows an example of
+how dangerous C code:
+
+ char buf[256];
+ char *pszExtraPath = ";/usr/local/bin";
+
+ strcpy(buf,getenv("PATH")); /* oops! could overrun! */
+ strcat(buf,pszExtraPath); /* Could overrun as well! */
+
+ printf("Checking...%s\n",buf); /* Some printfs overrun too! */
+
+is avoided using libmib:
+
+ char *pasz = 0; /* Must initialize to 0 */
+ char *paszOut = 0;
+ char *pszExtraPath = ";/usr/local/bin";
+
+ if (!astrcpy(&pasz,getenv("PATH"))) /* malloc error */ exit(-1);
+ if (!astrcat(&pasz,pszExtraPath)) /* malloc error */ exit(-1);
+
+ /* Finally, a "limitless" printf! we can use */
+ asprintf(&paszOut,"Checking...%s\n",pasz);fputs(paszOut,stdout);
+
+ astrfree(&pasz); /* Can use free(pasz) also. */
+ astrfree(&paszOut);
+
+However, compare this to Bstrlib:
+
+ bstring b, out;
+
+ bcatcstr (b = bfromcstr (getenv ("PATH")), ";/usr/local/bin");
+ out = bformat ("Checking...%s\n", bdatae (b, "<Out of memory>"));
+ /* if (out && b) */ fputs (bdatae (out, "<Out of memory>"), stdout);
+ bdestroy (b);
+ bdestroy (out);
+
+Besides being shorter, we can see that error handling can be deferred right
+to the very end. Also, unlike the above two versions, if getenv() returns
+with NULL, the Bstrlib version will not exhibit undefined behavior.
+Initialization starts with the relevant content rather than an extra
+autoinitialization step.
+
+libclc
+------
+
+An attempt to add to the standard C library with a number of common useful
+functions, including additional string functions.
+http://libclc.sourceforge.net/
+
+1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass
+ the responsibility to guard against aliasing to the programmer.
+2. Adds no safety or memory management whatsoever.
+3. Most of the supplied string functions are completely trivial.
+
+The goals of libclc and Bstrlib are clearly quite different.
+
+fireString
+----------
+
+http://firestuff.org/
+
+1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass
+ the responsibility to guard against aliasing to the programmer.
+2. Mixes char * and length wrapped buffers (estr) functions, doubling the API
+ size, with safety limited to only half of the functions.
+
+Firestring was originally just a wrapper of char * functionality with extra
+length parameters. However, it has been augmented with the inclusion of the
+estr type which has similar functionality to stralloc. But firestring does
+not nearly cover the functional scope of Bstrlib.
+
+Safe C String Library
+---------------------
+
+A library written for the purpose of increasing safety and power to C's string
+handling capabilities.
+http://www.zork.org/safestr/safestr.html
+
+1. While the safestr_* functions are safe in of themselves, interoperating
+ with char * string has dangerous unsafe modes of operation.
+2. The architecture of safestr's causes the base pointer to change. Thus,
+ its not practical/safe to store a safestr in multiple locations if any
+ single instance can be manipulated.
+3. Dependent on an additional error handling library.
+4. Uses reference counting, meaning that it is either not thread safe or
+ slow and not portable.
+
+I think the idea of reallocating (and hence potentially changing) the base
+pointer is a serious design flaw that is fatal to this architecture. True
+safety is obtained by having automatic handling of all common scenarios
+without creating implicit constraints on the user.
+
+Because of its automatic temporary clean up system, it cannot use "const"
+semantics on input arguments. Interesting anomolies such as:
+
+ safestr_t s, t;
+ s = safestr_replace (t = SAFESTR_TEMP ("This is a test"),
+ SAFESTR_TEMP (" "), SAFESTR_TEMP ("."));
+ /* t is now undefined. */
+
+are possible. If one defines a function which takes a safestr_t as a
+parameter, then the function would not know whether or not the safestr_t is
+defined after it passes it to a safestr library function. The author
+recommended method for working around this problem is to examine the
+attributes of the safestr_t within the function which is to modify any of
+its parameters and play games with its reference count. I think, therefore,
+that the whole SAFESTR_TEMP idea is also fatally broken.
+
+The library implements immutability, optional non-resizability, and a "trust"
+flag. This trust flag is interesting, and suggests that applying any
+arbitrary sequence of safestr_* function calls on any set of trusted strings
+will result in a trusted string. It seems to me, however, that if one wanted
+to implement a trusted string semantic, one might do so by actually creating
+a different *type* and only implement the subset of string functions that are
+deemed safe (i.e., user input would be excluded, for example.) This, in
+essence, would allow the compiler to enforce trust propogation at compile
+time rather than run time. Non-resizability is also interesting, however,
+it seems marginal (i.e., to want a string that cannot be resized, yet can be
+modified and yet where a fixed sized buffer is undesirable.)
+
+Libsrt
+------
+
+This is a length based string library based on a slightly different strategy.
+The string contents are appended to the end of the header directly so strings
+only require a single allocation. However, whenever a reallocation occurs,
+the header is replicated and the base pointer for the string is changed.
+That means references to the string are only valid so long as they are not
+resized after any such reference is cached. The internal structure maintains
+a lot some state used to accelerate unicode manipulation. This state is
+dynamically updated according to usage (so, like Bstrlib, it supports both
+a binary mode and a Unicode mode basically all the time). But this makes
+sustainable usage of the library essentially opaque. This also creates a
+bottleneck for whatever extensions to the library one desires (write all
+extensions on top of the base library, put in a request to the author, or
+dedicate an expert to learn the internals of the library).
+
+SDS
+---
+
+Sds uses a strategy very similar to Libsrt. However, it uses some dynamic
+headers to decrease the overhead for very small strings. This requires an
+extra switch statement for access to each string attribute. The source code
+appears to use gcc/clang extensions, and thus it is not portable.
+
+===============================================================================
+
+Examples
+--------
+
+ Dumping a line numbered file:
+
+ FILE * fp;
+ int i, ret;
+ struct bstrList * lines;
+ struct tagbstring prefix = bsStatic ("-> ");
+
+ if (NULL != (fp = fopen ("bstrlib.txt", "rb"))) {
+ bstring b = bread ((bNread) fread, fp);
+ fclose (fp);
+ if (NULL != (lines = bsplit (b, '\n'))) {
+ for (i=0; i < lines->qty; i++) {
+ binsert (lines->entry[i], 0, &prefix, '?');
+ printf ("%04d: %s\n", i, bdatae (lines->entry[i], "NULL"));
+ }
+ bstrListDestroy (lines);
+ }
+ bdestroy (b);
+ }
+
+For numerous other examples, see bstraux.c, bstraux.h and the example archive.
+
+===============================================================================
+
+License
+-------
+
+The Better String Library is available under either the BSD license (see the
+accompanying license.txt) or the Gnu Public License version 2 (see the
+accompanying gpl.txt) at the option of the user.
+
+===============================================================================
+
+Acknowledgements
+----------------
+
+The following individuals have made significant contributions to the design
+and testing of the Better String Library:
+
+Bjorn Augestad
+Clint Olsen
+Darryl Bleau
+Fabian Cenedese
+Graham Wideman
+Ignacio Burgueno
+International Business Machines Corporation
+Ira Mica
+John Kortink
+Manuel Woelker
+Marcel van Kervinck
+Michael Hsieh
+Mike Steinert
+Richard A. Smith
+Simon Ekstrom
+Wayne Scott
+Zed A. Shaw
+
+===============================================================================